SSIS data transfer slows down after inserting few million rows - sql-server

I am encountering this weird problem and any help will be really appreciated.
I have a single container in which I have 2 data flow task in my SSIS package, data transfer is very huge. Breakdown of problem.
First Container is transferring from oracle to SQL around 130 million rows and it ran just fine and transfer the rows in about 40 to 60 mins which is very much acceptable.
Now come the second part another data flow task is there that is transferring around 86 million rows from SQL server to SQL server(one table) only, the data transfer flies very fast till 60 70 million and after that it just dies out or crawls just like anything for next 10 million rows it took 15 hours, I am not able to understand why is it happening so?
Table get truncated and then it gets loaded, I have tried increasing DataBuffer proeprties etc but with no avail.
Thanks in advance for any help.

You are creating a single transaction and the transaction log is filling up. You can get 10-100x faster speeds if you move 10000 rows at a time. You may also try setting Maximum Insert Commit Size to 0 or try 5000 and go up to see the impact on performance. This is on the OLE DB Destination component. In my experience 10000 rows is the current magic number that seems to be the sweet spot but of course it is very dependent on how large the rows are, version of SQL Server and the hardware setup.
You should also look if there are indexes on the target table you can try dropping the indexes, loading the table and recreating the indexes.

What is your destination recovery model? Full/Simple, etc...
Are there any transformations between the source and destination? Try sending the source to a RowCount to determine the maximum speed your source can send data. You may be seeing a slowdown on the source side as well.
Is there any difference in content of the rows once you notice the slow down? For example, maybe the more recent rows have lots of text in a varchar(max) column that the early rows did not make use of.
Is your destination running on a VM? If yes, have you pre-allocated the CPU and RAM? SSIS is multi-threaded, but it won't necessarily use 100% of each core. VM hosts may share the resources with other VMs because the SSIS VM is not reporting full usage of all of the resources.

Related

Why adding another LOOKUP transformation slows down performance significantly SSIS

I have a simple SSIS package that transfer data between source and destination from one server to another.
If its new records - it inserts, otherwise it checks HashByteValue column and if it different its update record.
Table contains approx 1.5 million rows, and updates around 50 columns.
When I start debug the package, for around 2 minutes nothing happens, I cant even see the green check-mark. After that I can see data starts flowing through, but sometimes it stops, then flowing again, then stops again and so on.
The whole package looks like this:
But if I do just INSERT part (without update) then it works perfectly, 1 min and all 1.5 million records in a destination table.
So why adding another LOOKUP transformation to the package that updates records slows down performance so significantly.
Is it something to do with memory? I am using FULL CACHE option in both lookups.
what would be the way to increase performance?
Can the reason be in Auto Growth File size:
Besides changing AutoGrowth size to 100MB, your Database Log file is 29GB. That means you most likely are not doing Transaction Log backups.
If you're not, and only do Full Backups nightly or periodically. Change the Recovery Model of your Database from Full to Simple.
Database Properties > Options > Recovery Model
Then Shrink your Log file down to 100MB using:
DBCC SHRINKFILE(Catalytic_Log, 100)
I don't think that your problem is in the lookup. The OLE DB Command is realy slow on SSIS and I don't think it is meant for a massive update of rows. Look at this answer in the MSDN: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/4f1a62e2-50c7-4d22-9ce9-a9b3d12fd7ce/improve-data-load-perfomance-in-oledb-command?forum=sqlintegrationservices
To verify that the error is not the lookup, try disabling the "OLE DB Command" and rerun the process and see how long it takes.
In my personal experience it is always better to create a Stored procedure to do the whole "dataflow" when you have to update or insert based on certain conditions. To do that you would need a Staging table and a Destination table (where you are going to load the transformed data).
Hope it helps.

How to copy 19 million rows having text data type columns in faster way to another table in sql server 2012

I need to perform a task in which we have a table who has 19 columns with text data type. I want to delete these columns from this source table and move those columns to a new table with data type as varchar(max). The source table has currently 30k rows (with text data type data). This will increase eventually as client will use the database for record storage. For transferring this old data i tried to use "insert into..select.." query but it is taking around 25-30 mins to transfer these much rows(30k). Same is the case with "Select from..insert.." query. I have also tried creating data flow task of SSIS for transferring with OLE DB as source and destination as well. But still it's taking same amount of time. I'm really confused as all posts over internet suggests that SSIS is fastest way for data transfer. Can you please suggests me better way to improve performance of data transfer using any technique?
Thanks
SSIS probably isn't faster if the source and the destination are in the same database and the SSIS process is on the same box.
One approach might be to figure out where you are spending the time and optimise that. If you set Management Studio to "discard results after execution" and run just the select part of your query, how long does that take? If this is a substantial part of the 25-30 minutes then work on optimising that.
If the select statement turns out to be really fast, then all the time is being spent on the insert and you need to look at improving that part of the process instead. There are a couple of things you can try here before you go hardware shopping; are there any indexes or constraints (or triggers!) on the target table that you can drop for the duration of the insert and put back again at the end? Can you put the database in simple mode?

What is the fastest approach to populate MS SQL Database with large amount of data

Dilemma:
I am about to perform population of data on MS SQL Server (2012 Dev Edition). Data is based on production data. Amount is around 4TB (around 250 million items).
Purpose:
To test performance on full text search and on regular index as well. Target number should be around 300 million items around 500K each.
Question:
What should I do before to speed up the process or consequences that I should worry about?
Ex.
Switching off statistics?
Should I do a bulk insert of 1k items per transaction instead of single transaction?
Simple recovery model?
Log truncation?
Important:
I will use sample of 2k of production items to create every random item that will be inserted into database. I will use near unique samples generated in c#. It will be one table:
table
(
long[id],
nvarchar(50)[index],
nvarchar(50)[index],
int[index],
float,
nvarchar(50)[index],
text[full text search index]
)
Almost invariably, in a situation like this, and I've had several of them, I've used SSIS. SSIS is the fastest way I know to import large amounts of data into a SQL Server database. You have complete control over batch (transaction size) and it will perform bulk inserting. In addition, if you have transformation requirements, SSIS will handle this with ease.

SSIS - out of memory error again

I have cca 25 databases which I need to consolidate into 1 database. First I tried to build a ssis package which would copy all data from each table into one place but then I got error:
Information: The buffer manager failed a memory allocation call for
10485760 bytes, but was unable to swap out any buffers to relieve
memory pressure. 1892 buffers were considered and 1892 were locked.
Either not enough memory is available to the pipeline because not
enough are installed, other processes were using it, or too many
buffers are locked.
Then I realized this is not good idea and that I need to insert only new records and update existing ones. After that I tried this option:
Get a list of all conn. strings
foreach db, copy new records and update existing ones (those which need to be updated copy from source to temp table, delete them from destination and copy from temp to destination table)
Here's how data flow task looks like
In some cases data flow procceses more than million rows. BUT, I still get the same error - ran out of memory.
In task manager the situation is following:
I have to note that there are 28 databases being replicated on this same server and when this package is not running sql server is still using more than 1gb of memory. I've read that it's normal, but now I'm not that sure...
I have installed hotfix for SQL Server I've found in this article: http://support.microsoft.com/kb/977190
But it doesn't help...
Am I doing something wrong or this is just the way things work and I am suppose to find a workaround solution?
Thanks,
Ile
You might run into memory issues if your Lookup transformation is set to Full cache. From what I have seen, the Merge Join performs better than Lookup transformation if the number of rows exceed 10 million.
Have a look at the following where I have explained the differences between Merge Join and Lookup transformation.
What are the differences between Merge Join and Lookup transformations in SSIS?
I found a solution and the problem was in SQL Server - it was consuming too much of memory. By default max server memory was set to 2147483647 (this is default value). Since my server has 4gb RAM, I limited this number to 1100 mb. Since then, there were no memory problems, but still, my flow tasks were very slow. The problem was in using Lookup. By default, Lookup selects everything from Lookup table - I changed this and selected only columns I need for lookup - it fastened the process several times.
Now the whole process of consolidation takes about 1:15h.

sql server table fast load isn't

I've inherited an SSIS package which loads 500K rows (about 30 columns) into a staging table.
It's been cooking now for about 120 minutes and it's not done --- this suggests it's running at less than 70 rows per second. I know that everybody's environment is different but I think this is a couple orders of magnitude off from "typical".
Oddly enough the staging table has a PK constraint on an INT (identity) column -- and now I'm thinking that it may be hampering the load performance. There are no other constraints, indexes, or triggers on the staging table.
Any suggestions?
---- Additional information ------
The source is a tab delimited file which connects to two separate Data Flow Components that add some static data (the run date, and batch ID) to the stream, which then connects to an OLE DB Destination Adapter
Access mode is OpenRowset using FastLoad
FastLoadOptions are TABLOCK,CHECK_CONSTRAINTS
Maximum insert commit size: 0
I’m not sure about the etiquette of answering my own question -- so sorry in advance if this is better suited for a comment.
The issue was the datatype of the input columns from the text file: They were all declared as “text stream [DT_TEXT]” and when I changed that to “String [DT_STR]” 2 million rows loaded in 58 seconds which is now in the realm of “typical” -- I'm not sure what the Text file source is doing when columns are declared that way, but it's behind me now!
I'd say there is a problem of some sort, I bulk insert a staging table from a file with 20 million records and more columns and an identity field in far less time than that and SSIS is supposed to be faster than SQL Server 2000 bulk insert.
Have you checked for blocking issues?
If it is running in one big transaction, that may explain things. Make sure that a commit is done every now and then.
You may also want to check processor load, memory and IO to rule out resource issues.
This is hard to say.
I there was complex ETL, I would check the max number of threads allowed in the data flows, see if some things can run in parallel.
But it sounds like it's a simple transfer.
With 500,000 rows, batching is an option, but I wouldn't think it necessary for that few rows.
The PK identity should not be an issue. Do you have any complex constraints or persisted calculated columns on the destination?
Is this pulling or pushing over a slow network link? Is it pulling or pushing from a complex SP or view? What is the data source?

Resources