Index gets out of sync on bulk insert

Index gets out of sync on bulk insert - sql-server

I have a strange problem with my SQL Server database.
I am writing bulk data (about 90,000 rows) using SqlBulkCopy.WriteToServer and I am also writing about 30,000 rows in batches of 1,000 using EF's AddRange.
This causes the indexes on these tables to go out of sync and queries take many factors longer than usually (timeout after 10 minutes instead of a result after a few seconds).
After I manually rebuild the indexes, the queries are fast again until another of these imports is happening.
My understanding of bulk loading is that it should also update the index.
My question is: Is there a well-known reason for this behavior? If not, how can I go about trouble shooting this?

We have got exactly the same issue some years ago. And as dfundako suggested, the answer is the outdated statistics.
SQLServer by defaults updates the statistics if a certain percent of records was changed. This is a problem if your table has a huge number of records, so 90000 added records would not reach the required percentage of number of changed rows.
So if you want to be sure, after inserting you have either reindex you table (as you did) or update the statistics of your table
update statistics <your table>

Based on the comments and answers here, I tried to figure out if I can change that 20% threshold somehow.
And indeed, there is a way to do this, using trace flag 2371
You can enable it like this:
DBCC TRACEON(2371, -1)
I will now wait a few weeks to be sure that this fixed the problem, but I have good hopes about it.

Related

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.

A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.

How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

SQL Server 2012: quickly INSERT millions of rows from SELECT

Apologies if I am enraging the forum with repetitive question. Couldn't find the right solution in the forum, hence posting it.
I need to fetch 129991763 rows into a cursor or temp table or a staging table quickly and process them into another table. And this destination table is also huge table.
Currently I am using INSERT using SELECT statement (the SELECT is nested 4 levels) used hints like Option (FAST 1000), MAXDOP 1, RECOMPILE ...etc...
The procedure is consuming lot of time and showing no results or not getting completed at all.
Previously I used a cursor with the same hints; but as it was also running more than 22 hours; I switched to INSERT using SELECT.
Literally, I need to stop the execution for above both methods.
And to be honest, I am beginner in SQL Server database.
Even if specifically filter out the records in SELECT based on criteria; still the process needs to broken 4 or 5 chunks and these chunks are also taking more than 4 - 5 hours to complete.
Please help.
Thanks
Pradyumna

In the past I've used BULK INSERT with reasonable success, but I suspect the suggestion of breaking it into chunks and dropping indexes would still be wise. You can find some details on it here
https://msdn.microsoft.com/en-GB/library/ms188365.aspx
Hope it helps, good luck.

Apologies, you will probably be best using an SSIS package to pull it across. With this you can also transform the data if needed. I would still recommend keeping indexes off the table you are inserting the data into where possible. You'll need to have a bit of a read but hard to explain on here due to the use of the GUI.
Good luck

Bulk Copy from small table to larger one in SQL Server 2005

I'm a newbie in SQL Server and have the following dilemma:
I have two tables with the same structure. Call it runningTbl and finalTbl.
runningTbl contains about 600 000 to 1 million rows every 15 minutes.
After doing some data cleanup in runningTbl I want to move all the records to finalTbl.
finalTbl currently has about 38 million rows.
The above process needs to be repeated every 15-20 minutes.
The problem is that the moving of data from runningTbl to finalTbl is taking way longer than 20 minutes at times..
Initially when the tables were small it took anything from 10 seconds to 2 minutes to copy.
Now it just takes too long.
Any one that can assist with this? SQL query to follow..
Thanks

There are a number of things that you will need to do in order to get the most efficient method of copying the data. So far you are on the right track but you have a long way to go. I would suggest you first look at your indexes. There may be optimizations there that can help. Next, make sure you don't have triggers on this table that could cause a slowdown. Next, change the logging level (if that is permutable).
There is a bunch more help here (from Microsoft):
http://msdn.microsoft.com/en-us/library/ms190421(v=SQL.90).aspx
Basically you are on the right track using BCP. This is actually Microsoft's recommendation:
To bulk-copy data from one instance of SQL Server to another, use bcp to export the table data into a data file. Then use one of the bulk import methods to import the data from the file to a table. Perform both the bulk export and bulk import operations using either native or Unicode native format.
When you do this though, you need to also consider the possibility of dropping your indexes if there is too much data being brought in (based upon the type of index you use). If you use a clustered index, it may also be a good idea to order your data before import. Here is more information (including the source of the above quote):
http://msdn.microsoft.com/en-US/library/ms177445(v=SQL.90).aspx

For starters : one of the things I've learned over the years is that MSSQL does a great job at optimizing all kinds of operations but to do so heavily relies on the statistics for all tables involved. Hence, I would suggest to run "UPDATE STATISTICS processed_logs" & "UPDATE STATISTICS unprocessed_logs" before running the actual inserts; even on a large table these things don't take all that long.
Apart from that, based on the query above, a lot depends on the indexes of the target table. I'm assuming the target table has its clustered index (or PRIMARY KEY) on (at least) UnixTime, if not you'll create major data-fragmentation when you squeeze more and more data in-between the already existing records. To work around this you could try defragmenting the target table once in a while (can be done online, but takes a long time), but making the clustered index (or PK) so that data is always appended to the end of the table would be the better approach; well, at least in my opinion.

I suggest that you should have a window service and use timer and a boolean variable. Once your request is sent to server set the bool to high bit and the timer event should not execute code until the bit is low.

SQL Server 2005 FREETEXT() Perfomance Issue

I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)

This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)

Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.

How to profile and address insert/update performance issues?

I am trying to insert thousands of rows into a table and performance is not acceptable. Rows on a particular table take 300ms per row to insert.
I know that tools exist to profile queries run against SQL Server (SQL Server Profile, Database Tuning Advisor), but how would I profile insert and update statements to determine slow running inserts? Am I forced to use perfmon while the queries run and deduce the issue with counters?

I would first check the query plan of a single insert to understand the costs associated to that operation - it is not known from the question whether the insert is selecting the data from elsewhere.
I would then check the table indexing for the following:
how many indexes are in place (apart from filtered indexes, each index will be inserted into as well)
whether a clustered index is present or are we inserting into a heap.
if the clustered index key means we will be getting a hotspot benefit on the end of the table or causing a large quantity of page splits.
This is all SQL schema based issues, assuming there is no problems within SQL, you can start checking disk IO counters to check for disk queue lengths and response time. Not forgetting the Log drive response time since each insert will be logged.
These kind of problems are very difficult to nail down as any 1 perscriptive thing / silver bullet you can give advice over, just a range of things you should be checking.

I'm betting that the problem is with the selects and not necessarily the updates. Have you tried profiling the select part of the update statement to make sure there isn't a problem there first?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight