I have a table with about 45 columns and as more data goes in, the longer it takes for the inserts to happen. I have increased the size of the data and log files, reduced the fill factor on all the indexes on that table, and still slower and slower insert times. Any ideas would be GREATLY appreciated.
For inserts, you want to DECREASE the fillfactor on the indexes on the table in order to reduce page splitting.
It is somewhat expected that it will take longer to insert as more data goes in, because your indexes just plain get bigger.
Try putting in data in batches instead of row-by-row. SQL Server is more efficient that way.
Make sure you don't have too many indexes on your tables.
Consider using SQL Server 2005's INCLUDE statement on your indexes if you are just including columns in your indexes because you want them covered in your queries.
How big is the table?
What is the context? Is this a batch of many new records?
Can you post the schema including index definition?
Can you SET STATISTICS IO ON, SET STATISTICS TIME ON, and post the display for one iteration?
Is there anything pathological about the data, or the context? Is this on a server or a laptop (testing)?
Why dont you drop index before inserting and recreate index back on to table so you no need to do update statistics
You could also ensure that the indexes on that table are defragmented
Related
There is a table in a production database in which I changed one of the column lengths from Nvarchar(3000) to Nvarchar(4000).
After that the application team is getting a timeout error. Could any of you suggest how to troubleshoot this issue?
There were no blockings when the query was executed.
Most of the times, timeouts relate to statistics not being up to date...
Try rebuilding indexes. Since you have changed a column, it might cause some fragmentation.
ALTER INDEX all ON yourtable REBUILD
And try updating statistics as well, since an index rebuild updates only indexed column statistics.
UPDATE STATISTICS yourtable WITH FULLSCAN, COLUMNS
If you are still getting timeouts, try to fine-tune the queries involved.
After such change data could be heavilly distributed over storage and this could cause enormously high i/o. Rebulding indexes/statistics is for sure worth to try but please also consider creating new table and move data to this newly created table.
To check if I/O is high please use SET STATISTICS IO ON.
Of course it would be perfect if you could compare it with old version.
I have an .exe that compares a vbTab delimited .txt file with an SQL table.
Updates to the table's existing records goes very fast. Inserts into the table for new records is quite slow.
As I'm new to SQL, I'm wondering if my idea is crazy talk:
I thought that maybe a solution would be to "pre populate" the database with 10,000 empty rows (minus the primary key) and somehow have this speed up the process?
Any suggestions would be greatly appreciated.
There is no straightforward answer to your question as many things are unknown to us (DB configuration, HW, existing data etc.)
But you can try below things,
Try using DB export-import functionality
Instead of fetching records from DB with an iterator and comparing them with a record from a file and then do insert of modification you can directly import those records into DB using upsert (update if present or insert if not) strategy. Believe me this works lot faster than previous.
If you have indexes on that table, while import or insert drop the current indexes on that table and do the operation. After operation re-apply those indexes again. Indexes slows down the performance of inserts.
If import strategy is not good for you (If you are doing with those records first before insertion) then probably go for stored procedure for modification and insert new rows after dropping indexes.
During this activity check for DB configuration as well. Use proper tuning for buffers, paging, locking.
Hope this helps :)
To answer your question we may need more information.
How many rows does your table have?
I guess it may be a lack of indexes.
I have reports that perform some time consuming data calculations for each user in my database, and the result is 10 to 20 calculated new records for each user. To improve report responsiveness, a nightly job was created to run the calculations and dump the results to a snapshot table in the database. It only runs for active users.
So with 50k users, 30k of which are active, the job "updates" 300k to 600k records in the large snapshot table. The method it currently uses is it deletes all previous records for a given user, then inserts the new set. There is no PK on the table, only a business key is used to group the sets of data.
So my question is, when removing and adding up to 600k records every night, are there techniques to optimize the table to handle this? For instance, since the data can be recreated on demand, is there a way to disable logging for the table as these changes are made?
UPDATE:
One issue is I cannot do this in batch because the way the script works, it's examining one user at a time, so it looks at a user, deletes the previous 10-20 records, and inserts a new set of 10-20 records. It does this over and over. I am worried that the transaction log will run out of space or other performance issues could occur. I would like to configure the table to now worry about data preservation or other items that could slow it down. I cannot drop the indexes and all that because people are accessing the table concurrently to it being updated.
It's also worth noting that indexing could potentially speed up this bulk update rather than slow it down, because UPDATE and DELETE statements still need to be able to locate the affected rows in the first place, and without appropriate indexes it will resort to table scans.
I would, at the very least, consider a non-clustered index on the column(s) that identify the user, and (assuming you are using 2008) consider the MERGE statement, which can definitely avoid the shortcomings of the mass DELETE/INSERT method currently employed.
According to The Data Loading Performance Guide (MSDN), MERGE is minimally logged for inserts with the use of a trace flag.
I won't say too much more until I know which version of SQL Server you are using.
This is called Bulk Insert, you have to drop all indexes in destination table and send insert commands in large packs (hundreds of insert statements) separated by ;
Another way is to use BULK INSERT statement http://msdn.microsoft.com/en-us/library/ms188365.aspx
but it involves dumping data to file.
See also: Bulk Insert Sql Server millions of record
It really depends upon many things
speed of your machine
size of the records being processed
network speed
etc.
Generally it is quicker to add records to a "heap" or an un-indexed table. So dropping all of your indexes and re-creating them after the load may improve your performance.
Partitioning the table may see performance benefits if you partition by active and inactive users (although the data set may be a little small for this)
Ensure you test how long each tweak adds or reduces your load and work from there.
I have a custom import tool which bulk-insert the data in temp (421776 rows). After that the tool inserts unknown rows into the target table and updates existing rows based on a hash key(combination of 2 columns). The target db has nearly the same row count. The update query looks something like this (about 20 less update columns)
update targetTable set
theDB.dbo.targetTable.code=temp.code,
theDB.dbo.targetTable.name=temp.name,
from [tempDB].[dbo].[targettable] as temp
where theDB.dbo.targetTable.hash=temp.hash COLLATE SQL_Latin1_General_CP1_CI_AS
I know the nvarchar compare with a collate is a bit bad but not easy to avoid. Still the hash column has it's own unique index. Also locally it works well but on this server of mine the temp DB keeps growing to 21 gig. Reindexing and shrinking won't work at all.
Just a side note for others who face tempdb problems. A good read is http://bradmcgehee.com/wp-content/uploads/presentations/Optimizing_tempdb_Performance_chicago.pdf
It looks like you're using tempdb explicitly with data you've put there. Is there are a reason to use tempdb as if it was your own database?
The reason tempdb is growing is because you're explicitly putting data there. 420k rows doesn't sound heavy, but it's best to keep it within your own user db.
Suggest changing your business logic to move away from [tempDB].[dbo].[targettable] to something on your own user database.
You can temporarily change the transaction logging from Full or Bulk logged down to simple. That will keep everything from getting logged for a rollback.
Is this a cartesian product when there's no explicit join?
I am trying to insert thousands of rows into a table and performance is not acceptable. Rows on a particular table take 300ms per row to insert.
I know that tools exist to profile queries run against SQL Server (SQL Server Profile, Database Tuning Advisor), but how would I profile insert and update statements to determine slow running inserts? Am I forced to use perfmon while the queries run and deduce the issue with counters?
I would first check the query plan of a single insert to understand the costs associated to that operation - it is not known from the question whether the insert is selecting the data from elsewhere.
I would then check the table indexing for the following:
how many indexes are in place (apart from filtered indexes, each index will be inserted into as well)
whether a clustered index is present or are we inserting into a heap.
if the clustered index key means we will be getting a hotspot benefit on the end of the table or causing a large quantity of page splits.
This is all SQL schema based issues, assuming there is no problems within SQL, you can start checking disk IO counters to check for disk queue lengths and response time. Not forgetting the Log drive response time since each insert will be logged.
These kind of problems are very difficult to nail down as any 1 perscriptive thing / silver bullet you can give advice over, just a range of things you should be checking.
I'm betting that the problem is with the selects and not necessarily the updates. Have you tried profiling the select part of the update statement to make sure there isn't a problem there first?