I am trying to insert thousands of rows into a table and performance is not acceptable. Rows on a particular table take 300ms per row to insert.
I know that tools exist to profile queries run against SQL Server (SQL Server Profile, Database Tuning Advisor), but how would I profile insert and update statements to determine slow running inserts? Am I forced to use perfmon while the queries run and deduce the issue with counters?
I would first check the query plan of a single insert to understand the costs associated to that operation - it is not known from the question whether the insert is selecting the data from elsewhere.
I would then check the table indexing for the following:
how many indexes are in place (apart from filtered indexes, each index will be inserted into as well)
whether a clustered index is present or are we inserting into a heap.
if the clustered index key means we will be getting a hotspot benefit on the end of the table or causing a large quantity of page splits.
This is all SQL schema based issues, assuming there is no problems within SQL, you can start checking disk IO counters to check for disk queue lengths and response time. Not forgetting the Log drive response time since each insert will be logged.
These kind of problems are very difficult to nail down as any 1 perscriptive thing / silver bullet you can give advice over, just a range of things you should be checking.
I'm betting that the problem is with the selects and not necessarily the updates. Have you tried profiling the select part of the update statement to make sure there isn't a problem there first?
Related
I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.
A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.
How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.
I have 1.2 million rows in Azure data table. The following command:
DELETE FROM _PPL_DETAIL WHERE RunId <> 229
is painfully slow.
There is an index on RunId.
I am deleting most of the data.
229 is a small number of records.
It has been running for an hour now
Should it take this long?
I am pretty sure it will finish.
Is there anything I can do to make operations like this faster?
The database does have a PK, although it is a dummy PK (not used). I already saw that as an optimization need to help this problem, but it still takes way too long (SQL Server treats a table without a PK differently -- much less efficient). It is still taking 1+ hour.
How about trying something like below
BEGIN TRAN
SELECT * INTO #T FROM _PPL_DETAIL WHERE RunId = 229
TRUNCATE TABLE _PPL_DETAIL
INSERT INTO _PPL_DETAIL
SELECT * FROM #T
COMMIT TRAN
Without knowing what database tier is using the database where that statment runs it is not easy to help you. However, let us tell you how the system works so that you can make this determination with a bit more investigation by yourself.
Currently the log commit rate is limited by the tier the database has. Deletes are fundamentally limited on the ability to write out log records (and replicate them to multiple machines in case your main machine dies). When you select records, you don't have to go over the network to N machines and you may not even need to go to the local disk if the records are preserved in memory, so selects are generally expected to be faster than inserts/updates/deletes because of the need to harden log for you. You can read about the specific limits for different reservation sizes are here: DTU Limits and vCore Limits.
One common problem is to do individual operations in a loop (like a cursor or driven from the client). This implies that each statement has a single row updated and thus has to harden each log record serially because the app has to wait for the statement to return before submitting the next statement. You are not hitting that since you are running a big delete as a single statement. That could be slow for other reasons such as:
Locking - if you have other users doing operations on the table, it could block the progress of the delete statement. You can potentially see this by looking at sys.dm_exec_requests to see if your statement is blocking on other locks.
Query Plan choice. If you have to scan a lot of rows to delete a small fraction, you could be blocked on the IO to find them. Looking at the query plan shape will help here, as will set statistics time on (We suggest you change the query to do TOP 100 or similar to get a sense of whether you are doing lots of logical read IOs vs. actual logical writes). This could imply that your on-disk layout is suboptimal for this problem. The general solutions would be to either pick a better indexing strategy or to use partitioning to help you quickly drop groups of rows instead of having to delete all the rows explicitly.
An additional strategy to have better performance with deletes is to perform batching.
As I know SQL Server had a change and the default DOP is 1 on their servers, so if you run the query with OPTION(MAXDOP 0) could help.
Try this:
DELETE FROM _PPL_DETAIL
WHERE RunId <> 229
OPTION (MAXDOP 0);
I have reports that perform some time consuming data calculations for each user in my database, and the result is 10 to 20 calculated new records for each user. To improve report responsiveness, a nightly job was created to run the calculations and dump the results to a snapshot table in the database. It only runs for active users.
So with 50k users, 30k of which are active, the job "updates" 300k to 600k records in the large snapshot table. The method it currently uses is it deletes all previous records for a given user, then inserts the new set. There is no PK on the table, only a business key is used to group the sets of data.
So my question is, when removing and adding up to 600k records every night, are there techniques to optimize the table to handle this? For instance, since the data can be recreated on demand, is there a way to disable logging for the table as these changes are made?
UPDATE:
One issue is I cannot do this in batch because the way the script works, it's examining one user at a time, so it looks at a user, deletes the previous 10-20 records, and inserts a new set of 10-20 records. It does this over and over. I am worried that the transaction log will run out of space or other performance issues could occur. I would like to configure the table to now worry about data preservation or other items that could slow it down. I cannot drop the indexes and all that because people are accessing the table concurrently to it being updated.
It's also worth noting that indexing could potentially speed up this bulk update rather than slow it down, because UPDATE and DELETE statements still need to be able to locate the affected rows in the first place, and without appropriate indexes it will resort to table scans.
I would, at the very least, consider a non-clustered index on the column(s) that identify the user, and (assuming you are using 2008) consider the MERGE statement, which can definitely avoid the shortcomings of the mass DELETE/INSERT method currently employed.
According to The Data Loading Performance Guide (MSDN), MERGE is minimally logged for inserts with the use of a trace flag.
I won't say too much more until I know which version of SQL Server you are using.
This is called Bulk Insert, you have to drop all indexes in destination table and send insert commands in large packs (hundreds of insert statements) separated by ;
Another way is to use BULK INSERT statement http://msdn.microsoft.com/en-us/library/ms188365.aspx
but it involves dumping data to file.
See also: Bulk Insert Sql Server millions of record
It really depends upon many things
speed of your machine
size of the records being processed
network speed
etc.
Generally it is quicker to add records to a "heap" or an un-indexed table. So dropping all of your indexes and re-creating them after the load may improve your performance.
Partitioning the table may see performance benefits if you partition by active and inactive users (although the data set may be a little small for this)
Ensure you test how long each tweak adds or reduces your load and work from there.
I have a project that involves recording data from a device directly into a sql table.
I do very little processing in code before writing to sql server (2008 express by the way)
typically i use the sqlhelper class's ExecuteNonQuery method and pass in a stored proc name and list of parameters that the SP expects.
This is very convenient, but i need a much faster way of doing this.
Thanks.
ExecuteNonQuery with an INSERT statement, or even a stored procedure, will get you into thousands of inserts per second range on Express. 4000-5000/sec are easily achievable, I know this for a fact.
What usually slows down individual updates is the wait time for log flush and you need to account for that. The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.
With batch commits you'll probably bottleneck on disk log write performance, which there is nothing you can do about it short of changing the hardware (going raid 0 stripe on log).
If you hit earlier bottlenecks (unlikely) then you can look into batching statements, ie. send one single T-SQL batch with multiple inserts on it. But this seldom pays off.
Of course, you'll need to reduce the size of your writes to a minimum, meaning reduce the width of your table to the minimally needed columns, eliminate non-clustered indexes, eliminate unneeded constraints. If possible, use a Heap instead of a clustered index, since Heap inserts are significantly faster than clustered index ones.
There is little need to use the fast insert interface (ie. SqlBulkCopy). Using ordinary INSERTS and ExecuteNoQuery on batch commits you'll exhaust the drive sequential write throughput much faster than the need to deploy bulk insert. Bulk insert is needed on fast SAN connected machines, and you mention Express so it's probably not the case. There is a perception of the contrary out there, but is simply because people don't realize that bulk insert gives them batch commit, and its the batch commit that speeds thinks up, not the bulk insert.
As with any performance test, make sure you eliminate randomness, and preallocate the database and the log, you don't want to hit db or log growth event during test measurements or during production, that is sooo amateurish.
bulk insert would be the fastest since it is minimally logged
.NET also has the SqlBulkCopy Class
Here is a good way to insert a lot of records using table variables...
...but best to limit it to 1000 records at a time because table variables are "in Memory"
In this example I will insert 2 records into a table with 3 fields -
CustID, Firstname, Lastname
--first create an In-Memory table variable with same structure
--you could also use a temporary table, but it would be slower
declare #MyTblVar table (CustID int, FName nvarchar(50), LName nvarchar(50))
insert into #MyTblVar values (100,'Joe','Bloggs')
insert into #MyTblVar values (101,'Mary','Smith')
Insert into MyCustomerTable
Select * from #MyTblVar
This is typically done by way of a BULK INSERT. Basically, you prepare a file and then issue the BULK INSERT statement and SQL Server copies all the data from the file to the table with the fast method possible.
It does have some restrictions (for example, there's no way to do "update or insert" type of behaviour if you have possibly-existing rows to update), but if you can get around those, then you're unlikely to find anything much faster.
If you mean from .NET then use SqlBulkCopy
Things that can slow inserts include indexes and reads or updates (locks) on the same table. You can speed up situations like yours by avoiding both and inserting individual transactions to a separate holding table with no indexes or other activity. Then batch the holding table to the main table a little less frequently.
It can only really go as fast as your SP will run. Ensure that the table(s) are properly indexed and if you have a clustered index, ensure that it has a narrow, unique, increasing key. Ensure that the remaining indexes and constraints (if any) do not have a lot of overhead.
You shouldn't see much overhead in the ADO.NET layer (I wouldn't necessarily use any other .NET library above SQLCommand). You may be able to use ADO.NET Async methods in order to queue several calls to the stored proc without blocking a single thread in your application (this potentially could free up more throughput than anything else - just like having multiple machines inserting into the database).
Other than that, you really need to tell us more about your requirements.
I have a table with about 45 columns and as more data goes in, the longer it takes for the inserts to happen. I have increased the size of the data and log files, reduced the fill factor on all the indexes on that table, and still slower and slower insert times. Any ideas would be GREATLY appreciated.
For inserts, you want to DECREASE the fillfactor on the indexes on the table in order to reduce page splitting.
It is somewhat expected that it will take longer to insert as more data goes in, because your indexes just plain get bigger.
Try putting in data in batches instead of row-by-row. SQL Server is more efficient that way.
Make sure you don't have too many indexes on your tables.
Consider using SQL Server 2005's INCLUDE statement on your indexes if you are just including columns in your indexes because you want them covered in your queries.
How big is the table?
What is the context? Is this a batch of many new records?
Can you post the schema including index definition?
Can you SET STATISTICS IO ON, SET STATISTICS TIME ON, and post the display for one iteration?
Is there anything pathological about the data, or the context? Is this on a server or a laptop (testing)?
Why dont you drop index before inserting and recreate index back on to table so you no need to do update statistics
You could also ensure that the indexes on that table are defragmented