SQL Server 2012: quickly INSERT millions of rows from SELECT - sql-server

Apologies if I am enraging the forum with repetitive question. Couldn't find the right solution in the forum, hence posting it.
I need to fetch 129991763 rows into a cursor or temp table or a staging table quickly and process them into another table. And this destination table is also huge table.
Currently I am using INSERT using SELECT statement (the SELECT is nested 4 levels) used hints like Option (FAST 1000), MAXDOP 1, RECOMPILE ...etc...
The procedure is consuming lot of time and showing no results or not getting completed at all.
Previously I used a cursor with the same hints; but as it was also running more than 22 hours; I switched to INSERT using SELECT.
Literally, I need to stop the execution for above both methods.
And to be honest, I am beginner in SQL Server database.
Even if specifically filter out the records in SELECT based on criteria; still the process needs to broken 4 or 5 chunks and these chunks are also taking more than 4 - 5 hours to complete.
Please help.
Thanks
Pradyumna

In the past I've used BULK INSERT with reasonable success, but I suspect the suggestion of breaking it into chunks and dropping indexes would still be wise. You can find some details on it here
https://msdn.microsoft.com/en-GB/library/ms188365.aspx
Hope it helps, good luck.

Apologies, you will probably be best using an SSIS package to pull it across. With this you can also transform the data if needed. I would still recommend keeping indexes off the table you are inserting the data into where possible. You'll need to have a bit of a read but hard to explain on here due to the use of the GUI.
Good luck

Related

Slow insert: select from view

The following statement takes at least 4 seconds:
INSERT INTO [SomeSmallTable]
SELECT * FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
However, if we only run the SELECT part without the INSERT INTO, it takes less than half a second:
SELECT *
FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
The view selects less than 200 rows, and the table called "SomeSmallTable" holds only a few rows. I think this issue started when we updated the view called "ComplexView". ComplexView is based on other views (and some of these views are based on other views itself), as well as some tables.
I tried to refresh all views using sp_refreshview, but to no avail.
How can we determine the cause of this issue and hopefully solve it?
[EDIT]
My reply to some comments:
#Dale K: I can't post the execution plans, I think they way to complex, and not relevant as they are equal for both statements, with or without the INSERT part, except for the Table Insert part. But I did see that the INSERT costs 100%. For some reason SQL has trouble inserting the view results in the table.
#Panagiotis Kanavos: Nobody but me is using the database. It's a copy of our clients database and I'm working on it on my local machine.
#gotqn: SomeSmallTable is a table, so no table variable or temporary table. However, it is created when a user opens a specific form in our application, and deleted then the user closes this form.
#Arvo: SomeSmallTable has no keys and no triggers. The view returns less than 200 rows which are inserted in this table, and before these are inserted the table is empty.
I followed the steps in the accepted answer, and eventually compared the current "ComplexView" with the previous version, and found out what caused this issue.
Checking the execution plan is the first step, as others have said. Given that the INSERT (rather than the query) is causing the delay, you could troubleshoot that further. Here are some things you can try:
Try using Statistics IO to find out more, as answered here.
Attempt an INSERT using static data (e.g. INSERT INTO [SomeSmallTable] VALUES (1, 2, '...etc');). This will tell you if the issue is any INSERT statement, or when inserting from a view specifically.
Check how much data the view is returning. 4s may or may not be reasonable, depending on how many rows are being inserted.
Check the table design to see how it is using primary keys, foreign keys, composite keys, indexes, triggers, etc. Some of these features optimise a table's design for selecting, but make insertion slower as a trade-off. A good answer about this can be found here.
If you know it's not a load issue (because you're the only one using this database), check whether something else might be restricting resources on the machine you're using (other resource-intensive tasks, any other queries happening at the same time, scheduled jobs within SQL Server, etc.) You can use SQL Server Profiler to watch the queries in real time.
If slow performance is not limited to this particular query, then there are other general design considerations you can look into.

Index gets out of sync on bulk insert

I have a strange problem with my SQL Server database.
I am writing bulk data (about 90,000 rows) using SqlBulkCopy.WriteToServer and I am also writing about 30,000 rows in batches of 1,000 using EF's AddRange.
This causes the indexes on these tables to go out of sync and queries take many factors longer than usually (timeout after 10 minutes instead of a result after a few seconds).
After I manually rebuild the indexes, the queries are fast again until another of these imports is happening.
My understanding of bulk loading is that it should also update the index.
My question is: Is there a well-known reason for this behavior? If not, how can I go about trouble shooting this?
We have got exactly the same issue some years ago. And as dfundako suggested, the answer is the outdated statistics.
SQLServer by defaults updates the statistics if a certain percent of records was changed. This is a problem if your table has a huge number of records, so 90000 added records would not reach the required percentage of number of changed rows.
So if you want to be sure, after inserting you have either reindex you table (as you did) or update the statistics of your table
update statistics <your table>
Based on the comments and answers here, I tried to figure out if I can change that 20% threshold somehow.
And indeed, there is a way to do this, using trace flag 2371
You can enable it like this:
DBCC TRACEON(2371, -1)
I will now wait a few weeks to be sure that this fixed the problem, but I have good hopes about it.

SQL Server 2005 FREETEXT() Perfomance Issue

I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where.
Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists)
So today, all of a sudden, the same query takes around 1-1.5 minutes.
After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try.
I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now).
When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter.
So it really is the ´FREETEXT´predicate that causes it to be so slow.
Has anyone got any ideas on what the reason might be? Or further tests I could do?
[Edit]
Well, I just ran the query again to get the execution plan and now it takes 2-5 seconds again, without any changes made to it, though the problem still existed yesterday. And it wasn't due to any external factors, as I'd stopped all applications accessing the database when I first tested the issue last thursday, so it wasn't due to any other loads.
Well, I'll still include the execution plan, though it might not help a lot now that everything is working again... And beware, it's a huge query to a legacy database that I can't change (i.e. normalize data or get rid of some unneccessary intermediate tables)
Query plan
ok here's the full query
I might have to explain what exactly it does. basically it gets search results for job ads, where there's two types of ads, premium ones and normal ones. the results are paginated to 25 results per page, 10 premium ones up top and 15 normal ones after that, if there are enough.
so there's the two inner queries that select as many premium/normal ones as needed (e.g. on page 10 it fetches the top 100 premium ones and top 150 normal ones), then those two queries are interleaved with a row_number() command and some math. then the combination is ordered by rownumber and the query is returned. well it's used at another place to just get the 25 ads needed for the current page.
Oh and this whole query is constructed in a HUGE legacy Coldfusion file and as it's been working fine, I haven't dared thouching/changing large portions so far... never touch a running system and so on ;) Just small stuff like changing bits of the central where clause.
The file also generates other queries which do basically the same, but without the premium/non premium distinction and a lot of other variations of this query, so I'm never quite sure how a change to one of them might change the others...
Ok as the problem hasn't surfaced again, I gave Martin the bounty as he's been the most helpful so far and I didn't want the bounty to expire needlessly. Thanks to everyone else for their efforts, I'll try your suggestions if it happens again :)
This issue might arise due to a poor cardinality estimate of the number of results that will be returned by the full text query leading to a poor strategy for the JOIN operations.
How do you find performance if you break it into 2 steps?
One new step that populates a temporary table or table variable with the results of the Full Text query and the second one changing your existing query to refer to the temp table instead.
(NB: You might want to try this JOIN with and without OPTION(RECOMPILE) whilst looking at query plans for (A) a free text search term that returns many results (B) One that returns only a handful of results.)
Edit It's difficult to clarify exactly in the absence of the offending query but what I mean is instead of doing
SELECT <col-list>
FROM --Some 6 table Join
WHERE FREETEXT(...);
How does this perform?
DECLARE #Table TABLE
(
<pk-col-list>
)
INSERT INTO #Table
SELECT PK
FROM YourTable
WHERE FREETEXT(...)
SELECT <col-list>
FROM --Some 6 table Join including onto #Table
OPTION(RECOMPILE)
Usually when we have this issue, it is because of table fragmentation and stale statistics on the indexes in question.
Next time, try to EXEC sp_updatestats after a rebuild/reindex.
See Using Statistics to Improve Query Performance for more info.

Query hangs with INNER JOIN on datetime field

We've got a weird problem with joining tables from SQL Server 2005 and MS Access 2003.
There's a big table on the server and a rather small table locally in Access. The tables are joined via 3 fields, one of them a datetime field (containing a day; idea is to fetch additional data (daily) from the big server table to add data to the local table).
Up until the weekend this ran fine every day. Since yesterday we experienced strange non-time-outs in Access with this query. Non-time-out means that the query runs forever with rather high network transfer, but no timeout occurs. Access doesn't even show the progress bar. Server trace tells us that the same query is exectuted over and over on the SQL server without error but without result either. We've narrowed it down to the problem seemingly being accessing server table with a big table and either JOIN or WHERE containing a date, but we're not really able to narrow it down. We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Thanks, Mike.
If you join a local table in Access to a linked table in SQL Server, and the query isn't really trivial according to specific limitations of joins to linked data, it's very likely that Access will pull the whole table from SQL Server and perform the join locally against the entire set. It's a known problem.
This doesn't directly address the question you ask, but how far are you from having all the data in one place (SQL Server)? IMHO you can expect the same type of performance problems to haunt you as long as you have some data in each system.
If it were all in SQL Server a pass-through query would optimize and use available indexes, etc.
Thanks for your quick answer!
The actual query is really huge; you won't be happy with it :)
However, we've narrowed it down to a simple:
SELECT * FROM server_table INNER JOIN access_table ON server_table.date = local_table.date;
If the server_table is a big table (hard to say, we've got 1.5 million rows in it; test tables with 10 rows or so have worked) and the local_table is a table with a single cell containing a date. This runs forever. It's not only slow, It just does nothing besides - it seems - causing network traffic and no time out (this is what I find so strange; normally you get a timeout, but this just keeps on running).
We've just found KB article 828169; seems to be our problem, we'll look into that. Thanks for your help!
Use the DATEDIFF function to compare the two dates as follows:
' DATEDIFF returns 0 if dates are identical based on datepart parameter, in this case d
WHERE DATEDIFF(d,Column,OtherColumn) = 0
DATEDIFF is optimized for use with dates. Comparing the result of the CONVERT function on both sides of the equal (=) sign might result in a table scan if either of the dates is NULL.
Hope this helps,
Bill
Try another syntax ? Something like:
SELECT * FROM BigServerTable b WHERE b.DateFld in (SELECT DISTINCT s.DateFld FROM SmallLocalTable s)
The strange thing in your problem description is "Up until the weekend this ran fine every day".
That would mean the problem is really somewhere else.
Did you try creating a new blank Access db and importing everything from the old one ?
Or just refreshing all your links ?
Please post the query that is doing this, just because you have indexes doesn't mean that they will be used. If your WHERE or JOIN clause is not sargable then the index will not be used
take this for example
WHERE CONVERT(varchar(49),Column,113) = CONVERT(varchar(49),OtherColumn,113)
that will not use an index
or this
WHERE YEAR(Column) = 2008
Functions on the left side of the operator (meaning on the column itself) will make the optimizer do an index scan instead of a seek because it doesn't know the outcome of that function
We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Access can kill many good things....have you looked into blocking at all
run
exec sp_who2
look at the BlkBy column and see who is blocking what
Just an idea, but in SQL Server you can attach your Access database and use the table there. You could then create a view on the server to do the join all in SQL Server. The solution proposed in the Knowledge Base article seems problematic to me, as it's a kludge (if LIKE works, then = ought to, also).
If my suggestion works, I'd say that it's a more robust solution in terms of maintainability.

Have you ever encountered a query that SQL Server could not execute because it referenced too many tables?

Have you ever seen any of there error messages?
-- SQL Server 2000
Could not allocate ancillary table for view or function resolution.
The maximum number of tables in a query (256) was exceeded.
-- SQL Server 2005
Too many table names in the query. The maximum allowable is 256.
If yes, what have you done?
Given up? Convinced the customer to simplify their demands? Denormalized the database?
#(everyone wanting me to post the query):
I'm not sure if I can paste 70 kilobytes of code in the answer editing window.
Even if I can this this won't help since this 70 kilobytes of code will reference 20 or 30 views that I would also have to post since otherwise the code will be meaningless.
I don't want to sound like I am boasting here but the problem is not in the queries. The queries are optimal (or at least almost optimal). I have spent countless hours optimizing them, looking for every single column and every single table that can be removed. Imagine a report that has 200 or 300 columns that has to be filled with a single SELECT statement (because that's how it was designed a few years ago when it was still a small report).
For SQL Server 2005, I'd recommend using table variables and partially building the data as you go.
To do this, create a table variable that represents your final result set you want to send to the user.
Then find your primary table (say the orders table in your example above) and pull that data, plus a bit of supplementary data that is only say one join away (customer name, product name). You can do a SELECT INTO to put this straight into your table variable.
From there, iterate through the table and for each row, do a bunch of small SELECT queries that retrieves all the supplemental data you need for your result set. Insert these into each column as you go.
Once complete, you can then do a simple SELECT * from your table variable and return this result set to the user.
I don't have any hard numbers for this, but there have been three distinct instances that I have worked on to date where doing these smaller queries has actually worked faster than doing one massive select query with a bunch of joins.
#chopeen You could change the way you're calculating these statistics, and instead keep a separate table of all per-product stats.. when an order is placed, loop through the products and update the appropriate records in the stats table. This would shift a lot of the calculation load to the checkout page rather than running everything in one huge query when running a report. Of course there are some stats that aren't going to work as well this way, e.g. tracking customers' next purchases after purchasing a particular product.
This would happen all the time when writing Reporting Services Reports for Dynamics CRM installations running on SQL Server 2000. CRM has a nicely normalised data schema which results in a lot of joins. There's actually a hotfix around that will up the limit from 256 to a whopping 260: http://support.microsoft.com/kb/818406 (we always thought this a great joke on the part of the SQL Server team).
The solution, as Dillie-O aludes to, is to identify appropriate "sub-joins" (preferably ones that are used multiple times) and factor them out into temp-table variables that you then use in your main joins. It's a major PIA and often kills performance. I'm sorry for you.
#Kevin, love that tee -- says it all :-).
I have never come across this kind of situation, and to be honest the idea of referencing > 256 tables in a query fills me with a mortal dread.
Your first question should probably by "Why so many?", closely followed by "what bits of information do I NOT need?" I'd be worried that the amount of data being returned from such a query would begin to impact performance of the application quite severely, too.
I'd like to see that query, but I imagine it's some problem with some sort of iterator, and while I can't think of any situations where its possible, I bet it's from a bad while/case/cursor or a ton of poorly implemented views.
Post the query :D
Also I feel like one of the possible problems could be having a ton (read 200+) of name/value tables which could condensed into a single lookup table.
I had this same problem... my development box runs SQL Server 2008 (the view worked fine) but on production (with SQL Server 2005) the view didn't. I ended up creating views to avoid this limitation, using the new views as part of the query in the view that threw the error.
Kind of silly considering the logical execution is the same...
Had the same issue in SQL Server 2005 (worked in 2008) when I wanted to create a view. I resolved the issue by creating a stored procedure instead of a view.

Resources