I have package which has 1 for each loop container which has bunch of inserts and selects.
The list that the loop iterates is about few million complex rows or so..
Package is in Integration Services catalog, where it's ran by simply executing in SSMS (no agent job).
When i look in the resource monitor, memory for the ISServerExec.exe (comparable for dtsexec.exe) is growing every second (it takes about 1 second of the for each loop to complete).
After awhile all the memory in the windows server is used and server ends up paging to disk. And then the waiting times for the loop's queries become huge, 20 - 30 seconds per query.
What I am doing wrong?
I would write the list to a SQL table, then loop using a For Loop container wrapped around your For Each container.
At the start of the For Loop container I would read a single record from the list table using SELECT TOP 1, and deliver it into the Recordset variable. The scope of that variable should be moved to the For Loop container.
At the end of the For Loop Container I would update a flag and/or a datetime column to indicate that the row has been processed and should not be included in the next iteration of the initial SELECT.
Along the way you can update the list table to indicate progress/status of each row.
This design is also useful for logging and restart requirements.
Related
Been running a query which has run for 7 days and was near its end, when I got a network error:
Msg 121, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
The query is still running in its process on SQL Server and is not yet rolling back.
It was looping through 10 parameters and then for each parameter carrying out however many updates were required to match up all the records somewhere between 10 and 50 updates per parameter until no rows were effected and them moving onto the next parameter.
It had reached the point were only 1 row was being updated at a time on the last parameter after 7 days when I had a short network drop.
I have used dirty read to copy the results out to a different table.
It still shows up in ActivityManager (active expensive queries), and sp_who/sp_who2 and using sys.sysprocesses.
After the update statement it should go on to print out the number of iterations and then de-allocate the parameters that were passed in through a cursor.
It is a while rowcount > 0 inside a while ##FETCH_STATUS = 0 where the cursor is going through a comma separated list of parameters.
Looking at sys.sysprocesses, the CPU count continues to increase and it shows 2 open_tran
Is it possible to connect to a specific process in SQL Server?
If so what client can I use (SQL Server Management Studio, code, or OS mssql on linux)?
If not possible to connect, can I monitor to see if it completes?
Any other suggestions?
I used a dirty read to copy all the data out of the table being updated.
The query did complete, it continued running for about an hour and then committed everything prior to the process ending.
I was then able to compare the dirty read copy against the final results.
Next time I go for updating with a loop until no more updates, I will put in a go to commit as it goes along.
It was roughly 1,000 ten minute update queries that were run. The previous set up update queries were about 4 min each, so I looped them up with parameters, one mistake since it was easy to was to add in 10 parameters rather than 5 so 7 days vs 3.5 days, and an initial estimate of 1.5 days.
we have a requirement where SSIS job should trigger based on the availability of value in the status table maintained,point to remember here that we are not sure about the exact time when the status is going to be available so my SSIS process must continuously look for the value in status table,if value(ex: success) is available in status table then job should trigger.here we have 20 different ssis batch processes which should invoke based on respective/related status value is available.
What you can do is:
Scheduled the SSIS package that run frequently.
For that scheduled package, assign the value from the table to a package variable
Use either expression for disabling the task or constraint expression to let the package proceeds.
Starting a SSIS package takes some time. So I would recommend to create a package with the following structure:
Package variable Check_run type int, initial value 1440 (to stop run after 24 hours if we run check every minute). This is to avoid infinite package run.
Set For Loop, check if Check_run is greater than zero and decrement it on each loop run.
In For loop check your flag variable in Exec SQL task, select single result value and assign its result to a variable, say, Flag.
Create conditional execution branches based on Flag variable value. If Flag variable is set to run - start other packages. Otherwise - wait for a minute with Exec SQL command waitfor delay '01:00'
You mentioned the word trigger. How about you create a trigger when that status column meets the criteria to run the packages:
Also this is how to run a package from T-SQL:
https://www.timmitchell.net/post/2016/11/28/a-better-way-to-execute-ssis-packages-with-t-sql/
You might want to consider creating a master package that runs all the packages associated with this trigger.
I would take #Long's approach, but enhance it by doing the following:
1.) use Execute SQL Task to query the status table for all records that pertain to the specific job function and load the results into a recordset. Note: the variable that you are loading the recordset into must be of type object.
2.) Create a Foreach Loop enumerator of type ADO to loop over the recordset.
3.) Do stuff.
4.) When the job is complete, go back to the status table and mark the record complete so that it is not processed again.
5.) Set the job to run periodically (e.g., minute, hourly, daily, etc.).
The enhancement hear is that no flags are needed to govern the job. If a record exists then the foreach loop does its job. If no records exist within the recordset then the job exits successfully. This simplifies the design.
I have a SQL Server job that picks up a max of 1000 items each time in the queue for processing at an interval of 1 minutes.
In the job I use MERGE INTO to the table I need and mark the status of these items as complete and the job completes and will process the next batch in the next interval.
All good so far except recently there has been an incident where there is a problem in one of the item and since we are processing the batch in one single SQL statement, the whole batch has failed due to that one single item.
No big deal as we later identified the faulty one and have it patched and re-processed the whole failed batch.
What I am interested to know is what are some of the things I can do to avoid failing of the entire batch?
This time I know that the reason of this faulty item hence I can add a check to flush out this faulty item before the single MERGE INTO statement, but that does not cover other unknown errors.
I'm copying 99 million rows from one SQL Server instance to another using the right-click "Tasks" > "Import Data" method. It's just a straight copy into a new, empty table on a new and empty NDF file. I'm using the identity insert when doing the copy so that the IDs will stay in tact. It was going very slowly (30 million records after 12 hours), so my boss told me to cancel it, remove all indexes from the new empty table, then run again.
Will removing indexes on the new table really speed up the transfer of records, and why? I imagine I can create indexes after the table is filled.
What is the underlying process behind right-click "Import Data"? Is it using SqlBulkCopy, is it logging tons of stuff? I know it's not in a transaction because cancelling it stopped it immediately and the already inserted rows were there.
My file growth on the NDF file that holds the table is 20MB. Will increasing this speed up the process when using the above records on 99 million records? It's just an idea I had.
Yes, it should. Each new row being inserted will cause each index to be updated with the new data. It's worth noting that if you remove the indexes, import, then re-add the indexes, those indexes will take a very long time to build anyway.
It essentially runs as a very simple SSIS package. It reads rows from the source and inserts in chunks as a transaction. If your recovery model is set to Full, you could switch it to Bulk Logged for the import. This should be done if you're bulk-moving data when other updates to the database won't be happening, though.
I would try to size the MDF/NDF close to what you'd expect the end result to be. The autogrowth can take time, especially if you have it set low.
I have to make C application with OCI which retrieve new rows from database, I mean: rows added in time from last session to current. ora_rowscn is not solution: this value is changed for blocks so that few different rows can have same SCN.
On example I have table with dates:
03.05.2015
05.05.2015
07.05.2015
I can make structure:
struct Bounds {
Timestamp start, end;
};
03.05.2015 is as start and 07.05.2015 is as end.
Checking rows after Bounds.end is simple. But it could be some delay or transaction after my last query and I can have new values.
03.05.2015
04.05.2015
05.05.2015
06.05.2015
07.05.2015
These new rows count can be detected by query (STARD and END are values of structure):
select count(*) from logs where log_time > START and log_time < END
Then I have 3 rows and 5 after it. My application have only read persmission.
Oracle database is concurrent environment. So generally there is no way how to tell what is the "last" inserted row because there technically is no last inserted row.
AFAIK you have two options
Use Continuous Query Notification. This bypasses SQL query interface and uses special API dedicated for this particular purpose.
The other option is to query current databases SCN and start a transaction with this SCN. See OCIStmtExecute, this function has two parameters snap_in/snap_out. Theoretically you can use them to track you view on databases SCN. But I'm not sure I never used that.
In Oracle readers do not block writers and vice-versa.
So a row inserted on 06.05.2015 (but commited on 08.05.2015) will be visible AFTER 7.5.2015. Oracle is parallel database and it does not guarantee any serialization.
Maybe if you used row level ora_rowsncn, then it would work. But this requires redefinition of the source table.