I have one UNIX script
In this we are creating the table, index and loading the date from file to this table using SQL Loader .
And doing near 70 direct update (not using for all or bulk collect) on this table.
At last we are inserting this new table Data to another table. Per day it's processing 500 000 records. all these update are very fast.
During inserting this data into another table taking 20 minutes. How can this be improved?
No problem in insert because on the same table we are inserting 500 000 rectors from another table that's working fine. Insert done in less than a minute.
Insert into tables () select () from tablex;
It's taking 20 min for 500 000 records
Tablex- created , loaded , 70 direct update done in the same shell script .
Checked the explain plan cost for select alone and with insert script both are same.
Insert into tables () select () from tabley;
The above statement executed less than a second.
I used parallel hint. Cost is reduced . And cpu utilisation is zero.
Shall I create one more table tablez then load the data from tablez to my final table?
Is stats gathering is required? This is daily run program.
When we do direct path insert using SQL Loader, the records are inserted above the HighWaterMark. After the load is completed and the HighWaterMark is moved up, there could be lots of empty blocks below the original/old HighWaterMark position. If your SELECT is going for a Full Table Scan, it will be reading all those empty blocks too. Check to see if your table has accumulated lots of empty blocks over the period of time. You may use Segment Advisor for this. Based on advisor recommendations, shrink the table and free unused space. This could speed up the execution. Hope this helps.
Related
Stored procedure 1 has a select query returning a record set, dates to be specific. I am using a cursor to go through that record set and for each row another stored procedure is called.
Stored procedure 2 inserts about 20K rows into a table for each value from the cursor.
Since there are about 100 records in the cursor, total number of rows inserted amounts to 200K, which makes the query run for days until it's stopped in production.
The same query takes about 8 minutes in dev.
I tried using foreach container in SSIS (dev) and this takes 5 minutes now (dev).
Is there a faster way of inserting these records?
I considered using table valued function but the join between the two is difficult considering the first record set contains only dates.
Dpending on what stored procedure 2 is doing it's probably worthwhile to look at bulk insert.
See: https://www.simple-talk.com/sql/learn-sql-server/bulk-inserts-via-tsql-in-sql-server/
You may also want to review indexes, and configuration of the prod environment to ensure optimal performance of the load.
The link above has some suggestions on how to improve insert performance.
So definitely worth a read.
I have Firebird table with 60 milions rows and i need delete ca. half of table.
Table rows has gps position of cars, timestamp of record and other data. Table has primary key IdVehicle+TimeStamp and one foreign key (into Vehicle table). There is no other key or index or trigger. One vehicle has 100 000 - 500 000 records.
I need delete older data, eg. from all vehicles delete data older than 1 March 2015. I tried different ways and actually use my fastest comes with 'execute block' (use primary key). First I read for one vehicle records older then 1.3.2015. Then I am going through the individual records and prepare sql execute a block and then perform it into firebird for every 50 entries.
EXECUTE BLOCK AS BEGIN
DELETE FROM RIDE_POS WHERE IdVehicle = 1547 and date = '4.5.2015 8:56:47'
DELETE FROM RIDE_POS WHERE IdVehicle = 1547 and date = '4.5.2015 8:56:59'
DELETE FROM RIDE_POS WHERE IdVehicle = 1547 and date = '4.5.2015 8:57:17'
...... a total of 50 line
END
Thus delete 1 million lines per 800 seconds (about 1 record for 1 ms).
Is there another quicker way to delete records?
Additionally, this way I can delete only a few million lines, and then I have to restart firebird, otherwise starts to slow down and jam (on the test server there is no other database / application). From early records cleared quickly and gradually takes longer and longer time.
For orientation, how quickly you erasing records routinely in large tables (not completely erase the table, but only a part of the record).
If you want to delete all records older than given date, no matter the vehicle, then there is no point including the Idvehicle in the query, just the date is enough. Ie following should do, just straight query, no need for execute block either:
DELETE FROM RIDE_POS WHERE date < '2015-03-01'
If you have to delete many thousands (or millions) records do not do it in one single transaction. You better do it in several steps - delete for example 1000 records and commit, then delete other 1000 and commit - it should be faster than delete one million of records in one transaction. 1000 is not a rule, it depends on your particular situation (how large are your records, how many linked data they have via foreign keys with "on delete cascade"). Also check whether you have "on delete" triggers and maybe it is possible to temporary deactivate them.
Maybe a combined approach would help.
Add (temporarily) index on date:
CREATE INDEX IDX_RIDE_POS_date_ASC ON RIDE_POS (date)
Write an execute block:
EXECUTE BLOCK
AS
DECLARE VARIABLE V_ID_VEHICLE INTEGER;
BEGIN
FOR SELECT
DISTINCT ID_VEHICLE
FROM
RIDE_POS
INTO
:V_ID_VEHICLE
DO BEGIN
DELETE FROM RIDE_POS WHERE IdVehicle = :V_ID_VEHICLE AND date < '1.3.2015'
END
END
Drop index if you don't want to have it anymore.
DROP INDEX IDX_RIDE_POS_date_ASC'
I think that even taking into account a time that is needed for creating index, you would still save some time on deleting records.
Finally, I found where the problem was. The main problem was that I am using the classic Winforms application (or IBExpert) and that causing jams and slowing query. I used to execute block and erases the data portions, which has solved the problem of jams, but it was slow.
The solution was to create a simple console application and run query from it. I left primary key and erases through it (no adding or deleting indexes) and the speed of deleting the records was some 65 per milisecond (1 million rows per 16 second).
When I tried to delete primary and add index on datetime column, than erasing speed up just little about 5-10%.
We have a lot of operations that time out in our site log.
After installing Redgate SQL Monitor on the server, we figured out we have many blocked processes and some times deadlock.
With Redgate we realized problem is for a stored procedure. It's a simple stored procedure and just increase view count of product (simple update)
ALTER PROCEDURE [dbo].[SP_IncreaseProductView]
#PID int
AS
BEGIN
UPDATE dbo.stProduct
SET ViewCount = ViewCount + 1
WHERE ID = #PID
END
When that stored procedure is off, everything is fine but some times block process error is back.
This table(product) hasn't any trigger. but its like heart of system and has 12000 records.
It has 3 indexes, 1 clustered and 2 non-clustered, and many statistics
We don't have any transaction. block process mostly happens in update query.
How can I figure out where the problem is?
Sorry for my bad languages
Thanks
Edit:
I think the problem isn't the SP, its about update on product table (Its my opinion). Its large table. I still get block process when SP is off but less.
We have a lot of select and update on this table.
Also i rewrite the increase view count with LINQ to SQL, but still get block process like when SP is on.
Edit 2:
I set profiler and get all query on product table.
530 select (most with join with another 2 table) and 25 update per minute (on product table only).
For now, [SP_IncreaseProductView] is off. Because when its on, we getting block process and operation timed out about every 10 second and web site stopped.
After that (set SP to off) block process still exist but roughly 50 per day.
I would go with Ingaz' second solution, but further optimizations or simplification can be performed:
1) store view count in a product 1:1 table. This is particularly useful when some queries do not need view count
2) view count redundancy
- keep view count in product table and read it from there
- also define view count in another table (just productid and viewcount columns)
- application can update asynchronously directly in the second table
- a job updates in the product table based on data from the second table
This ensures that locking is affecting product table much less than independent updates.
It is expected: I suppose that you have a lot of updates into single small (12K rows) table.
You can:
Work around your problem
Put ROWLOCK hint in your UPDATE
Change database option to READ_COMMITED_SNAPSHOT
Be warned though: it can create another problems.
More complex recipe to eliminate blocking completely
Not for faint of heart.
Create table dbo.stProduct_Increment.
Modify your [SP_IncreaseProductView] to INSERT into increment table.
Create periodic task that UPDATEs your dbo.stProduct and clear increment table.
Create view that combines stProduct and stProduct_Increment.
Modify all SELECT statements for stProduct to created view.
Is it more efficient and ultimately FASTER to delete rows from a DB in blocks of 1000 or 10000? I am having to remove approx 3 million rows from many tables. I first did the deletes in blocks of 100K rows but the performance wasn't looking good. I changed to 10000 and seem to be removing faster. Wondering if even smaller like 1K per DELETE statement is even better.
Thoughts?
I am deleting like this:
DELETE TOP(10000)
FROM TABLE
WHERE Date < '1/1/2012'
Yes, it is. It all depends on your server though. I mean, last time I did that i was using this approeach to delete things in 64 million increments (on a table that had at that point around 14 billion rows, 80% Of which got ultimately deleted). I got a delete through every 10 seconds or so.
It really depends on your hardware. Going moreg granular is more work but it means less waiting for tx logs for other things operating on the table. You have to try out and find where you are comfortable - there is no ultimate answer because it is totally dependend on usage of the table and hardware.
We used Table Partitioning to remove 5 million rows in less than a sec but this was from just one table. It took some work up-front but ultimately was the best way. This may not be the best way for you.
From our document about partitioning:
Let’s say you want to add 5 million rows to a table but don’t want to lock the table up while you do it. I ran into a case in an ordering system where I couldn’t insert the rows without stopping the system from taking orders. BAD! Partitioning is one way of doing it if you are adding rows that don’t overlap current data.
WHAT TO WATCH OUT FOR:
Data CANNOT overlap current data. You have to partition the data on a value. The new data cannot be intertwined within the currently partitioned data. If removing data, you have to remove an entire partition or partitions. You will not have a WHERE clause.
If you are doing this on a production database and want to limit the locking on the table, create your indexes with “ONLINE = ON”.
OVERVIEW OF STEPS:
FOR ADDING RECORDS
Partition the table you want to add records to (leave a blank partition for the new data). Do not forget to partition all of your indexes.
Create new table with the exact same structure (keys, data types, etc.).
Add a constraint to the new table to limit that data so that it would fit into the blank partition in the old table.
Insert new rows into new table.
Add indexes to match old table.
Swap the new table with the blank partition of the old table.
Un-partition the old table if you wish.
FOR DELETING RECORDS
Partition the table into sets so that the data you want to delete is all on partitions by itself (this could be many different partitions).
Create a new table with the same partitions.
Swap the partitions with the data you want to delete to the new table.
Un-partition the old table if you wish.
Yes, no, it depends on the usage of table due to locking. I would try to delete the records in a slower pace. So the opposite of the op's question.
set rowcount 10000
while ##rowcount > 0
begin
waitfor delay '0:0:1'
delete
from table
where date < convert(datetime, '20120101', 112)
end
set rowcount 0
I have two tables T_A and T_B.
Both are empty.
Both has clustered index on them.
Recovery model is set to SIMPLE.
The insert...select.. meets the requirements of minimal logging. See
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Both staging tables contains large amount of data.
I need to import data into them from staging tables.
If I perform the following T-SQL blocks individually, each takes 2 to 3 minutes to finish. The total time is about 5 to 6 minutes.
BEGIN TRAN
INSERT INTO T_A WITH(TABLOCK) FROM SRC_A WITH(NOLOCK);
COMMIT TRAN
BEGIN TRAN
INSERT INTO T_B WITH(TABLOCK) FROM SRC_B WITH(NOLOCK);
COMMIT TRAN
To make it faster I open two sessions in SMSS and execute the two blocks in parallel. To my surprise, each session takes about 10 to 12 minutes to finish. Together the total time is more than doubled. The wait_type shown is PAGEIOLATCH_SH which point to disk I/O bottleneck. What I don't understand is that even if the two sessions have to wait on each other for I/O it should not wait for that long. Can anyone help explain this?
My story has not ended here yet. Then I removed the clustered index on both table and ran the two blocks in parallel each in a different session. This time each takes about 1 minutes to finish. The total time is about 1 minutes since they are in parallel. Great! But the nightmare comes when I try to create clustered index back.
If I create the cluster index individually it takes 4 minutes each to finish. The total time is about 8 minutes. This defeated my purpose of improving performance.
Then I try to create clustered index on the two tables in parallel each on a different session. This time it is the worst: one takes 12 minutes to finish and the other takes 25 minutes to finish.
From my test result my best choice is back to square one: execute the two transactions sequentially with clustered index on the table.
Has anyone experienced similar situation and what is the best practice to make it faster?
When creating the clustered index after inserting the records SQL has to recreate this table in the background anyway so it would be faster to insert the records directly into the table with the clustered index already present.
Also disable any non clustered indexes while inserting and enable them afterwards again, creating indexes on a filled table is faster than updating them for each insert. Remember to set Max DOP option to 0 when creating indexes.
Bulk inserts are also a lot faster then insert into statement.
I use the 'SQL server import and export wizard' for copying large amounts of data and it seems to be way faster (the wizard uses bulk statements). If necessary you can try to find the statement this wizard uses and run it yourself.