I have cca 25 databases which I need to consolidate into 1 database. First I tried to build a ssis package which would copy all data from each table into one place but then I got error:
Information: The buffer manager failed a memory allocation call for
10485760 bytes, but was unable to swap out any buffers to relieve
memory pressure. 1892 buffers were considered and 1892 were locked.
Either not enough memory is available to the pipeline because not
enough are installed, other processes were using it, or too many
buffers are locked.
Then I realized this is not good idea and that I need to insert only new records and update existing ones. After that I tried this option:
Get a list of all conn. strings
foreach db, copy new records and update existing ones (those which need to be updated copy from source to temp table, delete them from destination and copy from temp to destination table)
Here's how data flow task looks like
In some cases data flow procceses more than million rows. BUT, I still get the same error - ran out of memory.
In task manager the situation is following:
I have to note that there are 28 databases being replicated on this same server and when this package is not running sql server is still using more than 1gb of memory. I've read that it's normal, but now I'm not that sure...
I have installed hotfix for SQL Server I've found in this article: http://support.microsoft.com/kb/977190
But it doesn't help...
Am I doing something wrong or this is just the way things work and I am suppose to find a workaround solution?
Thanks,
Ile
You might run into memory issues if your Lookup transformation is set to Full cache. From what I have seen, the Merge Join performs better than Lookup transformation if the number of rows exceed 10 million.
Have a look at the following where I have explained the differences between Merge Join and Lookup transformation.
What are the differences between Merge Join and Lookup transformations in SSIS?
I found a solution and the problem was in SQL Server - it was consuming too much of memory. By default max server memory was set to 2147483647 (this is default value). Since my server has 4gb RAM, I limited this number to 1100 mb. Since then, there were no memory problems, but still, my flow tasks were very slow. The problem was in using Lookup. By default, Lookup selects everything from Lookup table - I changed this and selected only columns I need for lookup - it fastened the process several times.
Now the whole process of consolidation takes about 1:15h.
Related
I have a PostgreSQL (10.0 on OS X) database with a single table for the moment. I have noticed something weird when I'm importing a csv file in that table.
When the import fails for various reasons (e.g. one extra row in the csv file or too many characters in a column for a given row), no rows are being added to the table but PostgreSQL still claims that space on my hard disk.
Now, I have a very big csv to import and it failed several time because the csv was not compliant to begin with - so I had tons of import fails that I fixed and tried to import again. What I've realized now is that my computer storage has been reduced by 30-50 GB or so because of that and my database is still empty.
Is that normal?
I suspect this is somewhere in my database cache. Is there a way for me to clear that cache or do I have to fully reinstall my database?
Thanks!
Inserting rows into the database will increase the table size.
Even if the COPY statement fails, the rows that have been inserted so far remain in the table, but they are dead rows since the transaction that inserted them failed.
In PostgreSQL, the SQL statement VACUUM will free that space. That typically does not shrink the table, but it makes the space available for future inserts.
Normally, this is done automatically in the background by the autovacuum daemon.
There are several possibilities:
You disabled autovacuum.
Autovacuum is not fast enough cleaning up the table, so the next load cannot reuse the space yet.
What can you do:
Run VACUUM (VERBOSE) on the table to remove the dead rows manually.
If you want to reduce the table size, run VACUUM (FULL) on the table. That will lock the table for the duration of the operation.
I have a table called test i insert 40000 records in it ,I split my database file into two file groups like this :
The size of both files based on round robin algorithm increased 160 mb as you can see .
after this i delete the data in my table .but the size of both file (FileGroup)
remains on 160 mb .Why ?
This is because SQL Server is assuming that if your database got that large once, it is likely it will need to do so again. To save having to go through the overhead of requesting space from the operating system each time SQL Server wants to use some more disk space, it will simply hold on to what it has and fill it back up as required unless you manually issue a SHRINK DATABASE command.
Due to this, using shrink is generally considered a bad idea, unless you are very confident your database will not need that space at some point in the future.
I am encountering this weird problem and any help will be really appreciated.
I have a single container in which I have 2 data flow task in my SSIS package, data transfer is very huge. Breakdown of problem.
First Container is transferring from oracle to SQL around 130 million rows and it ran just fine and transfer the rows in about 40 to 60 mins which is very much acceptable.
Now come the second part another data flow task is there that is transferring around 86 million rows from SQL server to SQL server(one table) only, the data transfer flies very fast till 60 70 million and after that it just dies out or crawls just like anything for next 10 million rows it took 15 hours, I am not able to understand why is it happening so?
Table get truncated and then it gets loaded, I have tried increasing DataBuffer proeprties etc but with no avail.
Thanks in advance for any help.
You are creating a single transaction and the transaction log is filling up. You can get 10-100x faster speeds if you move 10000 rows at a time. You may also try setting Maximum Insert Commit Size to 0 or try 5000 and go up to see the impact on performance. This is on the OLE DB Destination component. In my experience 10000 rows is the current magic number that seems to be the sweet spot but of course it is very dependent on how large the rows are, version of SQL Server and the hardware setup.
You should also look if there are indexes on the target table you can try dropping the indexes, loading the table and recreating the indexes.
What is your destination recovery model? Full/Simple, etc...
Are there any transformations between the source and destination? Try sending the source to a RowCount to determine the maximum speed your source can send data. You may be seeing a slowdown on the source side as well.
Is there any difference in content of the rows once you notice the slow down? For example, maybe the more recent rows have lots of text in a varchar(max) column that the early rows did not make use of.
Is your destination running on a VM? If yes, have you pre-allocated the CPU and RAM? SSIS is multi-threaded, but it won't necessarily use 100% of each core. VM hosts may share the resources with other VMs because the SSIS VM is not reporting full usage of all of the resources.
I need to perform a task in which we have a table who has 19 columns with text data type. I want to delete these columns from this source table and move those columns to a new table with data type as varchar(max). The source table has currently 30k rows (with text data type data). This will increase eventually as client will use the database for record storage. For transferring this old data i tried to use "insert into..select.." query but it is taking around 25-30 mins to transfer these much rows(30k). Same is the case with "Select from..insert.." query. I have also tried creating data flow task of SSIS for transferring with OLE DB as source and destination as well. But still it's taking same amount of time. I'm really confused as all posts over internet suggests that SSIS is fastest way for data transfer. Can you please suggests me better way to improve performance of data transfer using any technique?
Thanks
SSIS probably isn't faster if the source and the destination are in the same database and the SSIS process is on the same box.
One approach might be to figure out where you are spending the time and optimise that. If you set Management Studio to "discard results after execution" and run just the select part of your query, how long does that take? If this is a substantial part of the 25-30 minutes then work on optimising that.
If the select statement turns out to be really fast, then all the time is being spent on the insert and you need to look at improving that part of the process instead. There are a couple of things you can try here before you go hardware shopping; are there any indexes or constraints (or triggers!) on the target table that you can drop for the duration of the insert and put back again at the end? Can you put the database in simple mode?
In an ssis Dataflow there is a lookup component ,which lookup on a table with 18 million records.I have configured the lookup with full cache.
Default buffer size :20485760
Default Buffer Max rows: 100000
The lookup join is based on an ID column of varchar(13)type
It gives an error as shown below.What lookup configuration is suitable to cache these many records
Error: The buffer manager cannot write 8 bytes to file "C:\Users\usrname\AppData\Local\Temp\16\DTS{B98CD347-1EF1-4BC1-9DD9-C1B3AB2B8D73}.tmp". There was insufficient disk space or quota.
what would be the difference in performance if i use a lookup with no cache?
I did understand that in full cache mode ,the data is cached before pre execute stage and do not have to go back to database.This full cache memory takes large amount of memory and add aditional startup time for the data flow.My question is what configuration do i have to setup in order to handle large amount of data in full cache mode
Whats the solution if the lookup table has million records (and they dont fit in a full cache)
Use a Merge Join component instead. Sort both inputs on the join key, specify inner/left/full joins based on your specification. Use the different outputs to get functionality like the lookup component.
Merge Join usually performs better on larger datasets.
You can set Buffertempstoragepath property in SSIS to some of the fastdrives as Blobtempstoragepath and buffertempstoragepath will be using temp and tmp system variables. So if the tmp variable cannot hold the large dataset in you case, its using lookup transformation. So the largedataset will be using the drive space and will perform the job for you.