SQL - Compare speed of query execution - sql-server

I have to insert 100,000 records into a table at one time.
I wrote two methods for it.
One is to Loop through 100,000 values from VB.net and insert them one by one.
The other is to send a datatable as a parameter from VB.net into the stored procedure in SQL
I should write a report about the difference of performances between the two. How do I get the exact time both took to be executed?
Any help would be appreciated

You will need to print or insert into another table the current time at the start and at the end. For this you can use GETDATE().
On a side note, if you need to copy row from one table to another, I would consider using SQLBulkCopy. This allows you to efficiently bulk load a SQL Server table with data from another source.

Related

Pentaho Data Integration SQL Server Table Output step Performance Issues

I have a sample transformation setup for the purpose of this question:
Table Input step -> Table output step.
When running the transformation and looking at the live stats I see this:
The table output step loads ~11 rows per second which is extremely slow. My commit size in the Table Output step is set to 1000. The SQL input is returning 40k rows and returns in 10 seconds when run by itself without pointing to the table output. The input and output tables are located in the same database.
System Info:
pdi 8.0.0.0
Windows 10
SQL Server 2017
Table output is in general very slow.
If I'm not entirely mistaken, it does an insert for each incoming row, which takes a lot of time.
A much faster approach is using 'bulk load' which streams data from inside Kettle to a named pipe using "LOAD DATA INFILE 'FIFO File' INTO TABLE ....".
You can read more about how the bulk loading is working here: https://dev.mysql.com/doc/refman/8.0/en/load-data.html
Anyways: If you are doing input from a table to another table, in the same database, then I would have created an 'Execute SQL script'-step and do the update with a single query.
If you take a look at this post, you can learn more about updating a table from another table in a single SQL-query:
SQL update from one Table to another based on a ID match

Call stored procedure from SSIS Dataflow

The question in short:
Can I call a stored procedure that has an output parameter in a data flow?
In long:
I have many tables to extract, transform, and load from one db to another one.
Almost all of the tables require one transformation which is fixing the country codes (from 3 letters to two). So my idea is as follows:
for each row: call the stored procedure, pass the wrong country code, replace the wrong code with the correct one (the output of the stored procedure)
There are at least two solutions for this:
Look-Up component: configuring it in advance mode and make sure the last sentence of the SProc is the Select statement that returns the good country code (e.g. SELECT #good_country_code)
Using an OLEDB Command
The latter (OLEDB Command) is actually quite simple, you need to configure it with:
EXEC ? = dbo.StoredProc #param1 = ?, #param2 = ?
As a consequence a #RETURN_VALUE will appear on the Available Destination Columns which you can then map to an existing column in the pipeline. Remember to create a new pipeline field/column (e.g. Good_Country_Code) using a Derived Column component before the OLEDB component and you'll have the chance to have both values, or replace the wrong one using another Derived Column component after OLEDB Command.
No, natively there isn't a component that is going to handle that. You can accomplish it with a Script Component but you don't want to.
What you're describing is a Lookup. The Data Flow Task has a Lookup Component but you'll be better served, especially for a finite list of values like Countries to push your query into the component.
SELECT T.Country3, T.Country2 FROM dbo.Table T;
Then you drag your SourceCountry column and match to Country3. Check Country2 and for all the rows that match, you'll get the 2 letter abbreviation.
A big disadvantage of trying to use your stored procedure is efficiency. The default Lookup is going to cache all those values. With the Script Version, say you have 10k rows come through, all with CAN. That's 10k invocations of your stored procedure where the results never change.
You do pay a startup cost as the default Lookup mode is Full Cache which means it's going to run your query and keep all those values local. This is great with your data set: 1000 countries max, 5 or 10 byte per row. That's nothing.
Yes, you can. You'll want to use a couple Execute SQL Tasks to do this.
Use an Execute SQL Task to gather a Result Set of Wrong_Country_Codes.
Add a ForEach Container as a successor to the previous Execute SQL Task. Pass the Result Set to this Container.
Inside that ForEach container, you will have another Execute SQL Task that will call your sproc, using each row (e.g. Wrong_Country_Code) as a variable parameter.
That should work. Only select the columns necessary to pass to your stored procedure.
Edit
In acknowledgement to the other answer, performance is going to be an issue. Perhaps rather than have the stored procedure produce an output, alter the sproc to do the updates for you.

What is the fastest way of copying a data from a DataWindow/DataStore to SQL Server table using Powerbuilder

We have a datastore (powerbuilder datawindow's twin sister) that contains over 40.000 rows, which takes more than 30 minutes to insert into a Microsoft SQL Server table.
Currently, I am using a script generator that generates the sql table definition and an insert command for each row. At the end, the full script to sql server for execution.
I have already found that script generation process consumes more than 97% of the whole task.
Could you please help me finding a more efficient way of copying my client's data to sql server table?
Edit1 (after NoazDad's comments):
Before answer, please bear in mind that:
Tabel structure is dynamic;
I am trying to avoid using datastore.Update() method;
Not sure it would be faster but you could save the data from the datastore in a tab delimited file then do a BULK INSERT via Sql. Something like
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
You can try saving the datastore contents into a string variable via ds.object.datawindow.data syntax then save that to a file then execute the SQL.
The way I read this, you're saying that the table that the data is being inserted into doesn't even exist in the schema until the user presses "GO" and initiates the script? And then you create embedded SQL statements that create the table, and insert rows 1 by 1 in a loop?
That's... Well, let's just say I wouldn't do it this way.
Do you not have any idea what the schema will look like ahead of time? If you do, then paint the datastore against that table, and use ds_1.Update() to generate the INSERT statements. Use the datawindow for what it's good for.
If that's not possible, and you must use embedded SQL, then at least perform a COMMIT every 1000 rows or so. Otherwise, SQLServer is building up UNDO logs against the table, in case something goes wrong and they have to be rolled back.
Other ideas...
Disable triggers on the updated table while it is being updated (if possible)
Use the PB Pipeline object, it has settings for commit- might be faster but not much.
Best idea. Do something on the server side. I'd try to create SQL statements for your 40K inserts, and call a stored procedure sending all 40k insert/update statements and let the stored procedure handle the inserts/updates.
Create a dummy table with a few columns, one being a long text, update it with a block of SQL statements like mentioned in last idea and have a process that delimits and executes the sql statements.
Some variant of above but using bulk insert as mentioned by Matt. Bulk insert is the fastest way to insert many rows.
Maybe try something with autocommit so that you commit only at the end, or every 10k rows as mentioned by someone already.
PB has an async option in the transaction object (connection) maybe you could let the update go in the background and let the user continue. This doesn't work with all databases and may not work in your situation. I haven't had much luck using async option.
The reason your process is so slow is that PB does each update separately, so you are hitting the network and database constantly. There may be triggers on the update table and those are getting hammered too. Slamming them in on the server eliminates network lag and is much faster. Using bulk load is ever faster yet because it doesn't run triggers and eliminates a lot of the database management overhead.
Expanding on the idea of sending SQL statements to a procedure, you can create the sql very easily by doing a dw_1.saveas( SQL! ) (syntax is not right) and send it to the server all at once. Let the server parse it and run the SQL.
Send something like this to the server via procedure, it should update pretty fast as it is only one statement:
Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')|Update TABLE set (col1, col2) values ('a', 'b')
In procedure:
Parse the sql statements, and run them. Easy peasy.
While Matt's answer is probably best, I have another option. (Options are good, right?)
I'm not sure why you're avoiding the datastore.Update() method. I'm assuming it's because the schema doesn't exist at the time of the update. If that's the only reason, it can still be used, thus eliminating 40,000 instances of string manipulation to generate valid SQL.
To do it, you would first create the table. Then, you would use datastore.SyntaxFromSQL() to create a datastore that's bound to the table. It might take a couple of Modify() statements to make the datastore update-able. Then you'd move the data from your original datastore to the update-able, bound datastore. (Look at RowsMove() or dot notation.) After that, an Update() statement generates all of your SQL without the overhead of string parsing and looping.

SQL Server - Copy Data Between Tables

I need to copy a large amount (~200,000) of records between two tables inside the same SQL Server 2000 database.
I can't change the original table to include the columns I would need, so the copy is the only solution.
I made a script with insert select statement. It works, but sometimes the .net form that triggers the stored procedure catches an exception with a timeout expired error.
Is there a more effective way to copy this many records around?
Any tips about how to check where the timeout occurred in the database?
INSERT (id,name) SELECT id,name FROM
your_table WHERE your_condition
And i'd suggest you to put your form in a different thread so It won't freeze, you can also increase the timeout, it's in your connection string.
If you can't avoid the multiple insert, you can try to split them in smaller stack, for instance send only 50 query at a time.
Are you wanting to create an application to copy data between tables or is this just a one-off solution? If you only need to do this once, you should create a script to execute on the database server itself to copy the data you need to transfer between tables.
Are you using a SqlCommand to execute the stored procedure?
If so, set the CommandTimeout:
myCmd.CommandTimeout = 360; //value is in seconds.
1>Compare two database with redgate data compare since other table is empty the script which will generate after comparing will be all inserts.Select insert for that table only.
2>Use multiscript from redgate just add those script to multiscript and execute on that database table it will keep on executing till complete and then you can compare if u have all data correctly.
3> If you don't want to use multiscript create a command line application to just insert the data .

After insert trigger - SQL Server 2008

I have data coming in from datastage that is being put in our SQL Server 2008 database in a table: stg_table_outside_data. The ourside source is putting the data into that table every morning. I want to move the data from stg_table_outside_data to table_outside_data where I keep multiple days worth of data.
I created a stored procedure that inserts the data from stg_table_outside_Data into table_outside_data and then truncates stg_table_outside_Data. The outside datastage process is outside of my control, so I have to do this all within SQL Server 2008. I had originally planned on using a simple after insert statement, but datastage is doing a commit after every 100,000 rows. The trigger would run after the first commit and cause a deadlock error to come up for the datastage process.
Is there a way to set up an after insert to wait 30 minutes then make sure there wasn't a new commit within that time frame? Is there a better solution to my problem? The goal is to get the data out of the staging table and into the working table without duplications and then truncate the staging table for the next morning's load.
I appreciate your time and help.
One way you could do this is take advantage of the new MERGE statement in SQL Server 2008 (see the MSDN docs and this blog post) and just schedule that as a SQL job every 30 minutes or so.
The MERGE statement allows you to easily just define operations (INSERT, UPDATE, DELETE, or nothing at all) depending on whether the source data (your staging table) and the target data (your "real" table) match on some criteria, or not.
So in your case, it would be something like:
MERGE table_outside_data AS target
USING stg_table_outside_data AS source
ON (target.ProductID = source.ProductID) -- whatever join makes sense for you
WHEN NOT MATCHED THEN
INSERT VALUES(.......)
WHEN MATCHED THEN
-- do nothing
You shouldn't be using a trigger to do this, you should use a scheduled job.
maybe building a procedure that moves all data from stg_table_outside_Data to table_outside_data once a day, or by using job scheduler.
Do a row count on the trigger, if the count is less than 100,000 do nothing. Otherwise, run your process.

Resources