We have the below requirement,
A large text file of size 44GB containing insert scripts for a table is given. We need to execute these scripts against target SQL server 2008 R2 database. We followed 2 step process to execute the scripts.
1. Bulk inserted all the insert statements into intermeditate table one by one(approx 22 million records).
2. Then executed the statements in the intermediate table using a cursor.
The first step is succeeding, however the second step is not so effective as it is slow and a few insert statements fail in the middle of execution. We are unable to locate the exact point of failure. Could you please let us know an effective way of accomplishing the task.
Using a cursor is generally not recommended due to being slow and a memory hog. Try using a WHILE loop instead?
Reference example:
SQL Server stored procedure avoid cursor
Related
I have a sample transformation setup for the purpose of this question:
Table Input step -> Table output step.
When running the transformation and looking at the live stats I see this:
The table output step loads ~11 rows per second which is extremely slow. My commit size in the Table Output step is set to 1000. The SQL input is returning 40k rows and returns in 10 seconds when run by itself without pointing to the table output. The input and output tables are located in the same database.
System Info:
pdi 8.0.0.0
Windows 10
SQL Server 2017
Table output is in general very slow.
If I'm not entirely mistaken, it does an insert for each incoming row, which takes a lot of time.
A much faster approach is using 'bulk load' which streams data from inside Kettle to a named pipe using "LOAD DATA INFILE 'FIFO File' INTO TABLE ....".
You can read more about how the bulk loading is working here: https://dev.mysql.com/doc/refman/8.0/en/load-data.html
Anyways: If you are doing input from a table to another table, in the same database, then I would have created an 'Execute SQL script'-step and do the update with a single query.
If you take a look at this post, you can learn more about updating a table from another table in a single SQL-query:
SQL update from one Table to another based on a ID match
I am working in SQL Server 2008 and BIDS. Due to some performance problems, I am re-designing my current architecture. Currently, I have a stored procedure that has many INSERT INTO SELECT statements inside of it. In my new architecture, I am trying to get the performance of SSIS for inserts (instead of INSERT INTO in SSMS). So, my new stored proc will still have all of the SELECT statements (just no INSERT INTO before each of them). I will call this stored proc in SSIS (with a few parameters supplied that are needed by the SELECTs). My goal is to have each SELECT write to separate flat files. (Actually, certain groups of SELECTS will write to separate flat files, such that I have just a few -- instead of a billion -- flat file connection managers.) I know how to execute a stored proc in SISS and have it write a multiple-row set to a flat file. But, is it possible to have the execution of 1 stored proc in SSIS to write several multiple-row sets to several flat files? If so, how can it be done?
You can have one stored proc write to as many files as you want. Please look at this article by Phil Factor, https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
However you are loosing all the power of SSIS - such as redirection on error rows, logging, parrallel processing. What you need to do sounds like a perfect SSIS task (or series of tasks).
Using Data Flow for Dynamic Export is not possible due to Strict Metadata Architecture of SSIS. But you can do it using Control Flow task. You have to write BCP command in Execute Process Task and call it for each table you want to export.
Steps:
Call select * from information_schema.tables and grab result set into variable
Use foreach Loop task to loop through tables
Use execute process task to call BCP in your loop.
We have a datastore (powerbuilder datawindow's twin sister) that contains over 40.000 rows, which takes more than 30 minutes to insert into a Microsoft SQL Server table.
Currently, I am using a script generator that generates the sql table definition and an insert command for each row. At the end, the full script to sql server for execution.
I have already found that script generation process consumes more than 97% of the whole task.
Could you please help me finding a more efficient way of copying my client's data to sql server table?
Edit1 (after NoazDad's comments):
Before answer, please bear in mind that:
Tabel structure is dynamic;
I am trying to avoid using datastore.Update() method;
Not sure it would be faster but you could save the data from the datastore in a tab delimited file then do a BULK INSERT via Sql. Something like
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
You can try saving the datastore contents into a string variable via ds.object.datawindow.data syntax then save that to a file then execute the SQL.
The way I read this, you're saying that the table that the data is being inserted into doesn't even exist in the schema until the user presses "GO" and initiates the script? And then you create embedded SQL statements that create the table, and insert rows 1 by 1 in a loop?
That's... Well, let's just say I wouldn't do it this way.
Do you not have any idea what the schema will look like ahead of time? If you do, then paint the datastore against that table, and use ds_1.Update() to generate the INSERT statements. Use the datawindow for what it's good for.
If that's not possible, and you must use embedded SQL, then at least perform a COMMIT every 1000 rows or so. Otherwise, SQLServer is building up UNDO logs against the table, in case something goes wrong and they have to be rolled back.
Other ideas...
Disable triggers on the updated table while it is being updated (if possible)
Use the PB Pipeline object, it has settings for commit- might be faster but not much.
Best idea. Do something on the server side. I'd try to create SQL statements for your 40K inserts, and call a stored procedure sending all 40k insert/update statements and let the stored procedure handle the inserts/updates.
Create a dummy table with a few columns, one being a long text, update it with a block of SQL statements like mentioned in last idea and have a process that delimits and executes the sql statements.
Some variant of above but using bulk insert as mentioned by Matt. Bulk insert is the fastest way to insert many rows.
Maybe try something with autocommit so that you commit only at the end, or every 10k rows as mentioned by someone already.
PB has an async option in the transaction object (connection) maybe you could let the update go in the background and let the user continue. This doesn't work with all databases and may not work in your situation. I haven't had much luck using async option.
The reason your process is so slow is that PB does each update separately, so you are hitting the network and database constantly. There may be triggers on the update table and those are getting hammered too. Slamming them in on the server eliminates network lag and is much faster. Using bulk load is ever faster yet because it doesn't run triggers and eliminates a lot of the database management overhead.
Expanding on the idea of sending SQL statements to a procedure, you can create the sql very easily by doing a dw_1.saveas( SQL! ) (syntax is not right) and send it to the server all at once. Let the server parse it and run the SQL.
Send something like this to the server via procedure, it should update pretty fast as it is only one statement:
Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')
In procedure:
Parse the sql statements, and run them. Easy peasy.
While Matt's answer is probably best, I have another option. (Options are good, right?)
I'm not sure why you're avoiding the datastore.Update() method. I'm assuming it's because the schema doesn't exist at the time of the update. If that's the only reason, it can still be used, thus eliminating 40,000 instances of string manipulation to generate valid SQL.
To do it, you would first create the table. Then, you would use datastore.SyntaxFromSQL() to create a datastore that's bound to the table. It might take a couple of Modify() statements to make the datastore update-able. Then you'd move the data from your original datastore to the update-able, bound datastore. (Look at RowsMove() or dot notation.) After that, an Update() statement generates all of your SQL without the overhead of string parsing and looping.
I need to copy a large amount (~200,000) of records between two tables inside the same SQL Server 2000 database.
I can't change the original table to include the columns I would need, so the copy is the only solution.
I made a script with insert select statement. It works, but sometimes the .net form that triggers the stored procedure catches an exception with a timeout expired error.
Is there a more effective way to copy this many records around?
Any tips about how to check where the timeout occurred in the database?
INSERT (id,name) SELECT id,name FROM
your_table WHERE your_condition
And i'd suggest you to put your form in a different thread so It won't freeze, you can also increase the timeout, it's in your connection string.
If you can't avoid the multiple insert, you can try to split them in smaller stack, for instance send only 50 query at a time.
Are you wanting to create an application to copy data between tables or is this just a one-off solution? If you only need to do this once, you should create a script to execute on the database server itself to copy the data you need to transfer between tables.
Are you using a SqlCommand to execute the stored procedure?
If so, set the CommandTimeout:
myCmd.CommandTimeout = 360; //value is in seconds.
1>Compare two database with redgate data compare since other table is empty the script which will generate after comparing will be all inserts.Select insert for that table only.
2>Use multiscript from redgate just add those script to multiscript and execute on that database table it will keep on executing till complete and then you can compare if u have all data correctly.
3> If you don't want to use multiscript create a command line application to just insert the data .
I have data coming in from datastage that is being put in our SQL Server 2008 database in a table: stg_table_outside_data. The ourside source is putting the data into that table every morning. I want to move the data from stg_table_outside_data to table_outside_data where I keep multiple days worth of data.
I created a stored procedure that inserts the data from stg_table_outside_Data into table_outside_data and then truncates stg_table_outside_Data. The outside datastage process is outside of my control, so I have to do this all within SQL Server 2008. I had originally planned on using a simple after insert statement, but datastage is doing a commit after every 100,000 rows. The trigger would run after the first commit and cause a deadlock error to come up for the datastage process.
Is there a way to set up an after insert to wait 30 minutes then make sure there wasn't a new commit within that time frame? Is there a better solution to my problem? The goal is to get the data out of the staging table and into the working table without duplications and then truncate the staging table for the next morning's load.
I appreciate your time and help.
One way you could do this is take advantage of the new MERGE statement in SQL Server 2008 (see the MSDN docs and this blog post) and just schedule that as a SQL job every 30 minutes or so.
The MERGE statement allows you to easily just define operations (INSERT, UPDATE, DELETE, or nothing at all) depending on whether the source data (your staging table) and the target data (your "real" table) match on some criteria, or not.
So in your case, it would be something like:
MERGE table_outside_data AS target
USING stg_table_outside_data AS source
ON (target.ProductID = source.ProductID) -- whatever join makes sense for you
WHEN NOT MATCHED THEN
INSERT VALUES(.......)
WHEN MATCHED THEN
-- do nothing
You shouldn't be using a trigger to do this, you should use a scheduled job.
maybe building a procedure that moves all data from stg_table_outside_Data to table_outside_data once a day, or by using job scheduler.
Do a row count on the trigger, if the count is less than 100,000 do nothing. Otherwise, run your process.