Stored procedure 1 has a select query returning a record set, dates to be specific. I am using a cursor to go through that record set and for each row another stored procedure is called.
Stored procedure 2 inserts about 20K rows into a table for each value from the cursor.
Since there are about 100 records in the cursor, total number of rows inserted amounts to 200K, which makes the query run for days until it's stopped in production.
The same query takes about 8 minutes in dev.
I tried using foreach container in SSIS (dev) and this takes 5 minutes now (dev).
Is there a faster way of inserting these records?
I considered using table valued function but the join between the two is difficult considering the first record set contains only dates.
Dpending on what stored procedure 2 is doing it's probably worthwhile to look at bulk insert.
See: https://www.simple-talk.com/sql/learn-sql-server/bulk-inserts-via-tsql-in-sql-server/
You may also want to review indexes, and configuration of the prod environment to ensure optimal performance of the load.
The link above has some suggestions on how to improve insert performance.
So definitely worth a read.
Related
I have a database table which have more than 1 million records uniquely identified by a GUID column. I want to find out which of these record or rows was selected or retrieved in the last 5 years. The select query can happen from multiple places. Sometimes the row will be returned as a single row. Sometimes it will be part of a set of rows. there is select query that does the fetching from a jdbc connection from a java code. Also a SQL procedure also fetches data from the table.
My intention is to clean up a database table.I want to delete all rows which was never used( retrieved via select query) in last 5 years.
Does oracle DB have any inbuild meta data which can give me this information.
My alternative solution was to add a column LAST_ACCESSED and update this column whenever I select a row from this table. But this operation is a costly operation for me based on time taken for the whole process. Atleast 1000 - 10000 records will be selected from the table for a single operation. Is there any efficient way to do this rather than updating table after reading it. Mine is a multi threaded application. so update such large data set may result in deadlocks or large waiting period for the next read query.
Any elegant solution to this problem?
Oracle Database 12c introduced a new feature called Automatic Data Optimization that brings you Heat Maps to track table access (modifications as well as read operations). Careful, the feature is currently to be licensed under the Advanced Compression Option or In-Memory Option.
Heat Maps track whenever a database block has been modified or whenever a segment, i.e. a table or table partition, has been accessed. It does not track select operations per individual row, neither per individual block level because the overhead would be too heavy (data is generally often and concurrently read, having to keep a counter for each row would quickly become a very costly operation). However, if you have you data partitioned by date, e.g. create a new partition for every day, you can over time easily determine which days are still read and which ones can be archived or purged. Also Partitioning is an option that needs to be licensed.
Once you have reached that conclusion you can then either use In-Database Archiving to mark rows as archived or just go ahead and purge the rows. If you happen to have the data partitioned you can do easy DROP PARTITION operations to purge one or many partitions rather than having to do conventional DELETE statements.
I couldn't use any inbuild solutions. i tried below solutions
1)DB audit feature for select statements.
2)adding a trigger to update a date column whenever a select query is executed on the table.
Both were discarded. Audit uses up a lot of space and have performance hit. Similary trigger also had performance hit.
Finally i resolved the issue by maintaining a separate table were entries older than 5 years that are still used or selected in a query are inserted. While deleting I cross check this table and avoid deleting entries present in this table.
I have a merge query which on satisfying a condition updates the table, and in case it doesn't satisfy then it inserts the records in another table. The issue is this particular insertion is taking lot of time, around 25 minutes to insert 15,000 records. What I found out is that, while inserting the records, we're also inserting a sequence id, which in turn is generated by a Trigger associated with it. The trigger selects the max id from 2 tables and as such it adds 1 to the max and returns it, which is then used by the Insert query.
Is this the exact reason why Inserts are slower in my Stored Procedure? This SP, runs on DB2.
Triggers are not good solution for performance. Use autoincrement column on your table.
The bottleneck was indeed the Trigger. I dropped the trigger and ran the sp, it took at most 2 minutes to insert around 15k records. I am using a sequence instead of trigger now.
I have one UNIX script
In this we are creating the table, index and loading the date from file to this table using SQL Loader .
And doing near 70 direct update (not using for all or bulk collect) on this table.
At last we are inserting this new table Data to another table. Per day it's processing 500 000 records. all these update are very fast.
During inserting this data into another table taking 20 minutes. How can this be improved?
No problem in insert because on the same table we are inserting 500 000 rectors from another table that's working fine. Insert done in less than a minute.
Insert into tables () select () from tablex;
It's taking 20 min for 500 000 records
Tablex- created , loaded , 70 direct update done in the same shell script .
Checked the explain plan cost for select alone and with insert script both are same.
Insert into tables () select () from tabley;
The above statement executed less than a second.
I used parallel hint. Cost is reduced . And cpu utilisation is zero.
Shall I create one more table tablez then load the data from tablez to my final table?
Is stats gathering is required? This is daily run program.
When we do direct path insert using SQL Loader, the records are inserted above the HighWaterMark. After the load is completed and the HighWaterMark is moved up, there could be lots of empty blocks below the original/old HighWaterMark position. If your SELECT is going for a Full Table Scan, it will be reading all those empty blocks too. Check to see if your table has accumulated lots of empty blocks over the period of time. You may use Segment Advisor for this. Based on advisor recommendations, shrink the table and free unused space. This could speed up the execution. Hope this helps.
We have a lot of operations that time out in our site log.
After installing Redgate SQL Monitor on the server, we figured out we have many blocked processes and some times deadlock.
With Redgate we realized problem is for a stored procedure. It's a simple stored procedure and just increase view count of product (simple update)
ALTER PROCEDURE [dbo].[SP_IncreaseProductView]
#PID int
AS
BEGIN
UPDATE dbo.stProduct
SET ViewCount = ViewCount + 1
WHERE ID = #PID
END
When that stored procedure is off, everything is fine but some times block process error is back.
This table(product) hasn't any trigger. but its like heart of system and has 12000 records.
It has 3 indexes, 1 clustered and 2 non-clustered, and many statistics
We don't have any transaction. block process mostly happens in update query.
How can I figure out where the problem is?
Sorry for my bad languages
Thanks
Edit:
I think the problem isn't the SP, its about update on product table (Its my opinion). Its large table. I still get block process when SP is off but less.
We have a lot of select and update on this table.
Also i rewrite the increase view count with LINQ to SQL, but still get block process like when SP is on.
Edit 2:
I set profiler and get all query on product table.
530 select (most with join with another 2 table) and 25 update per minute (on product table only).
For now, [SP_IncreaseProductView] is off. Because when its on, we getting block process and operation timed out about every 10 second and web site stopped.
After that (set SP to off) block process still exist but roughly 50 per day.
I would go with Ingaz' second solution, but further optimizations or simplification can be performed:
1) store view count in a product 1:1 table. This is particularly useful when some queries do not need view count
2) view count redundancy
- keep view count in product table and read it from there
- also define view count in another table (just productid and viewcount columns)
- application can update asynchronously directly in the second table
- a job updates in the product table based on data from the second table
This ensures that locking is affecting product table much less than independent updates.
It is expected: I suppose that you have a lot of updates into single small (12K rows) table.
You can:
Work around your problem
Put ROWLOCK hint in your UPDATE
Change database option to READ_COMMITED_SNAPSHOT
Be warned though: it can create another problems.
More complex recipe to eliminate blocking completely
Not for faint of heart.
Create table dbo.stProduct_Increment.
Modify your [SP_IncreaseProductView] to INSERT into increment table.
Create periodic task that UPDATEs your dbo.stProduct and clear increment table.
Create view that combines stProduct and stProduct_Increment.
Modify all SELECT statements for stProduct to created view.
I'm wondering what is the correct solution to the below is.
I have an UPDATE statement in T-SQL that needs to be run as a daily task. The procedure will update one bit column in one table. Rows affected is around 30,000.
A pseudo version of the T-SQL
UPDATE TABLE_NAME
SET BIT_FIELD = [dbo].[FUNCTION](TABLE_NAME.ID)
WHERE -- THIS ISN'T RELEVANT
The function that determines true or false basically runs a few checks and hits around 3 other tables. Currently the procedure takes about 30 minutes to run and update 30,000 rows in our development environment. I was expecting this to double on production.
The problem I'm having is that intermittently TABLE_NAME table locks up. If I run it in batches of 1000 it seems ok but if I increase this it appears to run fine but eventually the table locks up. The only resolution is to cancel the query which results in no rows being updated.
Please note that the procedure is not wrapped in a TRANSACTION.
If I run each update in a separate UPDATE statement would this fix it? What would be a good solution when updating quite a large number of records in a live environment?
Any help would be much appreciated.
Thanks!
In your case, the SQL Server Optimizer has probably determined that a table lock is needed to perform the update of your table. You should perform rework on your query so that this table lock will not occur or will have a smaller impact on your users. So in a practical way this means: (a) speed up your query and (b) make sure the table will not lock.
Personally I would consider the following:
1. Create clustered and non-clustered indexes on your tables in order to improve the performance of your query.
2. See if it is possible to not use a function, but instead use joins, they are typically a lot faster.
3. Break up the update in multiple parts and perform these parts separately. You might have an 'or' satement in your 'where' clause, that is a good splitting point, but you can also consider creating a cursor to loop through the table and perform the update at one record at a time.