Why Would Remote Execution of a Query Cause it to be Suspended? - sql-server

I apologize in advance for not having all of the specifics available, but the machine is building an index probably for a good while still and is almost completely unresponsive.
I've got a table on SQL Server 2005 with a good number of columns, maybe 20, but a mammoth number of rows (tens, more likely hundreds of millions). In order to simplify the amount of JPA work I'd need to do to access it, I created a view that contained the bits I was interested in. The view was created as:
SELECT bigtable.ID, bigtable.external_identification, mediumtable.hostname,
CONVERT(VARCHAR, bigtable.datefield, 121) AS datefield
FROM schema.bigtable JOIN schema.mediumtable ON bigtable.joinID = mediumtable.ID;
When I want to select from the view, I do:
SELECT * FROM vwTable WHERE external_identification = 'some string';
This works just fine in SQL Management Studio. The external_identification column has a non-unique, non-clustered index in bigtable. This also worked just fine on our remotely executing Java program in our test environment. Now that we're a day or two away from production, the code has been changed a bit (although the fundamental JPA NamedQuery is still straightforward), but we have a new SQLServer installation on new hardware; the test version was on a 32-bit single core machine, the new hardware is 64-bit multi-core.
Whenever I try to run the code that uses this view on the new hardware, it either hangs indefinitely on the first call of this query or times out if I have a timeout specified. After doing some digging, something like:
SELECT status, command, wait_type, last_wait_type FROM sys.dm_exec_requests;
confirmed that the query was running, but showed it in the state:
suspended, SELECT, CXPACKET, CXPACKET
for as long as I cared to wait for it. Whenever I ran the exact same query from within the Management Studio, it completed immediately. So I did some research, and found out this is due to waiting on some kind of concurrent operation to start/finish. In an attempt to circumvent that, I set the server-wide MAXDOP to 1 (disabled concurrency). After that, the query still hangs, but the sys.dm_exec_requests would show:
suspended, SELECT, PAGEIOLATCH_SH, PAGEIOLATCH_SH
This indicates that it's some sort of HD/scanning issue. While certainly the machine is less responsive than I'd expect for newer hardware, I wouldn't expect this query (even over the view) to require much scanning, since the column I'm searching by is indexed in the underlying table and it works if I run it locally. But just because I'm out of ideas and under the gun, I'm adding indexes to the view; first I have to add the unique clustered index (over ID) before I can attempt to add the non-unique non-clustered index over external_identification.
I'm the only one using this database; when I select from sys.dm_exec_requests the only two results are the query I'm actively inspecting and the select from sys.dm_exec_requests query. So it's not like it's under legitimately heavy, or even at all concurrent, load.
But I suspect I'm grasping at straws. I'm no DBA, and every time I have to interact with SQL Server outside of querying it it baffles my intuitions. Does anyone have any ideas why a query executed remotely would immediately go into a suspended state while the same query locally would execute immediately?

Wow, this one caught me straight out of left field. It turns out that by default, the MSSQL JDBC driver sends its String datatypes as Unicode, which the table/view might not be prepared to handle specifically. In our case, the columns and indexes were not, so MSSQL would perform a full table scan for each lookup.
In our test environment, the table was small enough that this didn't matter, so I was tricked into thinking it worked fine. In retrospect, I'm glad it didn't -- I can't stand it when computers give the illusion of inconsistency.
When I added this little parameter to the end of my JDBC connection string:
jdbc:sqlserver://[IP]:1433;databaseName=[db];sendStringParametersAsUnicode=false
things immediately and magically started working. Sorry for the slightly misleading question (I barely even mentioned JPA), but I had no idea what the cause was and really did believe it was something SQL Server side. Task Manager didn't report heavy CPU/Memory usage while the query was suspended, so I just thought it was idling even though it was really under heavy disk usage.
More info about MSSQL JDBC and Unicode can be found where I stumbled across the solution, at http://server.pramati.com/blog/2010/06/02/perfissues-jdbcdrivers-mssqlserver/ . Thanks, Ed, for that detailed shot in the dark -- it may not have been the problem, but I certainly learned a lot (and fast!) about MSSQL's gritty parts!

It is likely that the query run in SSMS and by your application are using different query plans - from the wait types you're seeing in dm_exec_requests it sounds like the plan created for the application is doing a table scan where the plan for SSMS is using an index seek.
This is possible because the SSMS and application database connections likely use different connection options, some of which are used as a key to the database plan cache.
You can find out which options your application is using by running a default SQL server profiler trace against the server; the first command after the connection is created will be a number of SET... options:
SET DATEFORMAT dmy
SET ANSI_NULLS ON
...
I suspect this list will be different between your application and your SSMS connection - a common candidate is SET ARITHABORT {ON|OFF}, since that forms part of the key of the cached plan.
If you run the SET... commands in an SSMS window before executing the query, the same (bad) plan as is being used by the application should then be picked up.
Assuming this demonstrates the problem, the next step is to work out how to prevent the bad plan getting into cache. It's difficult to give generic instructions about how to do this, since there are a few possible causes.
It's a bit of a scattergun approach (there are other more targetted ways to attempt to resolve this problem but they require more detailed understanding of the issue that I have now), but one thing to try is to add OPTION (RECOMPILE) to the end of your query - this forces a new plan to be generated for every execution, and should prevent the bad plan being reused:
SELECT * FROM vwTable WHERE external_identification = 'some string' OPTION (RECOMPILE);
Assuming you can replicate the bad performance is SSMS using the steps above, you should be able to test this there.
Beware that this can have negative performance consequences if the query is executed very frequently (since each recompilation requires CPU) - this depends on the workload of your application and will need testing.
A couple of other thoughts:
Check the schemas between the test and production systems; this might be as simple as a missing index from one of the tables in the production database, although given that SSMS queries perform OK this is unlikely.
You should re-enable parallelism by taking the server-wide MAXDOP=1 off, since this will limit the performance of your system overall. The problem is almost certainly the query plan, not parallelism
You also need to beware of the consequences of adding indexes to the view - doing so effectively materialises the view, which will (given the size of the table) require a lot of storage overhead - the indexes will also need to be maintained when INSERT/UPDATE/DELETE statements take place on the base table. Indexing the view is probably unnecessary given that (from SSMS) you know it's possible for the query to perform.

Related

SQL Server CXPACKET timeout

We've got SQL Server 2016 (v13.0.4206.0), by default there is no restrictions for parallelism - any count SQL wants. And it didn't lead any problems... Till now.
For another feature there were written query that unexpectedly raised timeout exception in our application. I was deeply surprised when it was successfully executed with setting up maximum threads per query to 1. Yes, 6 seconds for query is not so good, even accounting to most of time was spent for fetching, but it's far away from 3 minutes timeout!
By the way, executing this query with SQL Server Management Studio works all the time despite of parallelism settings. It seems that something wrong with connection to database, but all other queries works fine, even which much harder then that one.
Our application is built on ASP.NET Core 3.0 (don't know if it matters), database connection is made using System.Data.SqlClient v4.8.0. All I could determine is that there are so much tasks created for this query:
I've tried to watch for execution in sys.dm_os_waiting_tasks (thanks google). I'm not sure I got it right, but it seems that tasks with context_id 0-8 is blocked with those who have context_id 9-16 and vise versa. Obvious example of deadlock, isn't it? But how can SQL Server manage threads to make it without my "help"? Or what am I doing wrong?
Just in case some inappropriate answers:
I won't turn parallelism off (set maximum threads per query to 1) as solution because of some heavy queries in our application;
I don't want to raise Cost Threshold for Parallelism setting because I'm afraid of same problem with another query (guess, a heavier one). So I just want to determine real cause;
Optimizing the query isn't considered (anymore), as according to actual execution plan I can't make it faster - there are enough indexes for it. But I'm ready to rethink after some really weighty arguments.
So, my question is: why does parallelism that I didn't ask for spoil the query execution? And how can I avoid that?
It's true sometimes the engine chooses to use parallel execution (or not to use) which leads to worse performance.
You do not want to control the server option and the cost as you are not sure how this will reflect to other queries, which is understandable.
If you are sure, your query will be execute better without being handle in parallel, you can specify the option just for it using query hints - MAXDOP like this:
SELECT ...
FROM ...
OPTION (MAXDOP 1);
It's easy and you can rollback if needed. Also, you are not affecting other queries.
You are saying that:
Optimizing query isn't considered (anymore), as according to actual execution plan...
The execution plan is sometimes misleading. As a start - you can save your execution plan and open it with SentryOne Plan Explorer - it's free and can give you a better look of what's going on.
Also, if a query is execute for either 3 seconds or 6 minutes, there must be something wrong with it or may be the activity of your database. If it is executed fast in the SSMS always, maybe the engine is using the correct cache plan. I thing it's better to share the query itself and to attach the two plans (serial and parallel) and spend more time tuning it.

Why would an ODBC query against MSSQL 2008SP2 take 100 times as long as the same query in Studio?

I have a really odd query involving a join to a complex view. I analyzed the heck out of the view, built some indexes, and got the query working in under a second when run from MSSQL Management Studio. However, when run from Perl via ODBC, the exact same query takes around 80 seconds to return.
I've dumped almost 8 hours into this and it continues to baffle me. In that time I've logged the query from Perl and copied it verbatim into Studio, I've wrapped it in a stored procedure (which makes it take a consistent 2.5 minutes from BOTH clients!), I've googled ODBC & MSSQL query caches, I've watched the query run via the Activity Monitor (it spends most of its time in the generic SLEEP_TASK wait state) and Profiler (the select statement gets one line which doesn't show up until it's done running), and I've started reading up on performance bottlenecks.
I haven't noticed this problem with any other queries from Perl and unfortunately we don't have a DBA on site. I'm a programmer who's done some DBA but I feel like I'm groping in the dark with this one. My best guess is that there is some sort of query cache available from Studio that the ODBC client can't access, but restarting Studio does not make the query's first execution take longer so it doesn't look like it's just because each new ODBC connection starts with an empty cache.
Without going into the view definitions, the base query is very simple:
SELECT * FROM VIEW1 LEFT OUTER JOIN VIEW2 WHERE SECTION = ? AND ID = ?
The delay goes away when I drop VIEW2, but I need the data from that view. I've already rewritten the view three times in attempts to simplify and improve efficiency but this feels kinda like a dead end since the query runs fine from Studio. The query only returns a single row but even dropping the ID criteria and selecting all 56k rows for an entire section only takes 40 seconds from Studio. Any other ideas?
Edit 2/8:
The article #Remus Rusanu linked was pretty clear, but I'm afraid it didn't quite apply. It now seems pretty clear that it's not ODBC at all, but that when I hard-code arguments vs. parametrize them, I get different execution plans. I can reproduce this in SSMS:
SELECT * FROM VIEW1 LEFT OUTER JOIN VIEW2 WHERE SECTION = 'a' AND ID = 'b'
is 100 times faster than
DECLARE #p1 VARCHAR(8), #p2 VARCHAR(3)
SET #p1 = 'a'
SET #p2 = 'b'
SELECT * FROM VIEW1 LEFT OUTER JOIN VIEW2 WHERE SECTION = #p1 AND ID = #p2
Unfortunately, I'm still at a loss to explain why the first should get an execution plan that takes two orders of magnitude less time than the parametrized version, for any values of SECTION & ID I can throw at it. It may be a deficiency in SQL Server, but it just seems stupid that the inputs are known in both places yet the one takes so much longer. If SQL server recomputed the parametrized plan from scratch every time, as it must be doing for the different constant values I am supplying, it would be 100 times faster. None of the RECOMPILE options suggested by the article seem to help either.
I think #GSerg called it below. I've yet to prove that it doesn't happen with the window function used externally to the view, but he describes the same thing and the timing discrepancy remains baffling.
If I don't run out of time to work on this, I'll try to adapt some of the article's advice and force the constant execution plan on the parametrized version, but it seems like an awful lot of what should be unnecessary trouble.
As explained by Martin Smith, it's an issue with predicate pushing in SQL Server which hasn't been fixed in full.
As suggested in answers and comments in the linked question, you can either
Wrap the query into something that appends option(recompile)
Convert it to a table-valued function.
Attach a plan guide with option (recompile)
Everything you ever wanted to know on the subject: Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.
There is no 'cache' available in SSMS that ODBC cannot access. Is just that you're getting different execution plans in SSMS vs. ODBC, either because of parameter sniffing or because of data type precedence rules. Read the article linked, it has both the means to identify the problem and recommendations on how to fix it.
Compare the plans for the two queries and check the SET settings for each connection (you can do this by looking in sys.dm_exec_sessions). I bet you'll see a difference in quoted_identifier, ansi_nulls or arithabort (or possibly all three). This usually causes vast differences in the execution plan. You should be able to set these settings manually in your ODBC version in order to match the settings that are being used by Management Studio.
Some related questions - there could be other obscure circumstances at play that you'll want to check into:
SQL Query slow in .NET application but instantaneous in SQL Server Management Studio
SQL Server Query Slow from PHP, but FAST from SQL Mgt Studio - WHY?
Query times out from web app but runs fine from management studio
SQL Server 2005 stored procedure fast in SSMS slow from VBA

SQL Server Performance and Update Statistics

We have a site in development that when we deployed it to the client's production server, we started getting query timeouts after a couple of hours.
This was with a single user testing it and on our server (which is identical in terms of Sql Server version number - 2005 SP3) we have never had the same problem.
One of our senior developers had come across similar behaviour in a previous job and he ran a query to manually update the statistics and the problem magically went away - the query returned in a few miliseconds.
A couple of hours later, the same problem occurred.So we again manually updated the statistics and again, the problem went away. We've checked the database properties and sure enough, auto update statistics isTRUE.
As a temporary measure, we've set a task to update stats periodically, but clearly, this isn't a good solution.
The developer who experienced this problem before is certain it's an environment problem - when it occurred for him previously, it went away of its own accord after a few days.
We have examined the SQL server installation on their db server and it's not what I would regard as normal. Although they have SQL 2005 installed (and not 2008) there's an empty "100" folder in installation directory. There is also MSQL.1, MSQL.2, MSQL.3 and MSQL.4 (which is where the executables and data are actually stored).
If anybody has any ideas we'd be very grateful - I'm of the opinion that rather than the statistics failing to update, they are somehow becoming corrupt.
Many thanks
Tony
Disagreeing with Remus...
Parameter sniffing allows SQL Server to guess the optimal plan for a wide range of input values. Some times, it's wrong and the plan is bad because of an atypical value or a poorly chosen default.
I used to be able to demonstrate this on demand by changing a default between 0 and NULL: plan and performance changed dramatically.
A statistics update will invalidate the plan. The query will thus be compiled and cached when next used
The workarounds are one of these follows:
parameter masking
use OPTIMISE FOR UNKNOWN hint
duplicate "default"
See these SO questions
Why does the SqlServer optimizer get so confused with parameters?
At some point in your career with SQL Server does parameter sniffing just jump out and attack?
SQL poor stored procedure execution plan performance - parameter sniffing
Known issue?: SQL Server 2005 stored procedure fails to complete with a parameter
...and Google search on SO
Now, Remus works for the SQL Server development team. However, this phenomenon is well documented by Microsoft on their own website so blaming developers is unfair
How Data Access Code Affects Database Performance (MSDN mag)
Suboptimal index usage within stored procedure (MS Connect)
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (an excellent white paper)
Is not that the statistics are outdated. What happens when you update statistics all plans get invalidated and some bad cached plan gets evicted. Things run smooth until a bad plan gets again cached and causes slow execution.
The real question is why do you get bad plans to start with? We can get into lengthy technical and philosophical arguments whether a query processor shoudl create a bad plan to start with, but the thing is that, when applications are written in a certain way, bad plans can happen. The typical example is having a where clause like (#somevaribale is null) or (somefield= #somevariable). Ultimately 99% of the bad plans can be traced to developers writing queries that have C style procedural expectation instead of sound, set based, relational processing.
What you need to do now is to identify the bad queries. Is really easy, just check sys.dm_exec_query_stats, the bad queries will stand out in terms of total_elapsed_time and total_logical_reads. Once you identified the bad plan, you can take corrective measures which depend from query to query.

What are these sys.sp_* Stored Procedures doing?

I'm tracking down an odd and massive performance problem in my SQL server installation. On my system, a particular stored procedure takes 2 minutes to execute; on a colleague's system it takes less than 1 second. We have similar databases/data and configurations, but there's obviously something very different.
I ran the SP in question through the Profiler on both systems and noticed something odd. On My system, I see 9 entries with the following properties:
The Duration is way high relative to other rows. I have values as high as 37,698 and as low as 1734. On the "fast" system the maximum duration (for the entire SP call) is 259.
They are executed for two databases related to the one that contains the SP I'm running. (This SP makes calls via Linked Servers to these two databases).
They are executions of one of the following system SPs:
sp_tables_info_90_rowset
sp_check_constbytable_rowset
sp_columns_90_rowset
sp_table_statistics2_rowset
sp_indexes_90_rowset
I can't find any Googleable documentation on what these are, why they would be so slow, or why they would run on one system but not the other. Does anyone know what they're all about?
Try manually updating statistics on that table.
UPDATE STATISTICS [TableName]
Then double check that the database option to AutoUpdateStatistics is TRUE. Even if it is, though, I've seen cases where adding large amounts of data to a table doesn't always cause the statistics to update in a timely way, and queries can be slow.
I don't know the answer to your question. But to try to fix the problem you're having (which, I assume, is what you're actually interested in), the first thing I'd do is run a re-index on the tables you're querying. This frequently will fix any kind of slowness when the conditions are as you described (same database structure, different data/database, same query).
These are the tables created when you have linked server calls. These are called work tables created in Tempdb. They are automatically created by the database engine for temporary operations like Spooling etc.
Those sp's mean your query is hitting linked servers by using synonyms. This should be avoided whenever possible.
I'm not familiar with those specific procedures, but you can try running:
SELECT object_definition(object_id('Procedure Name'))
To get a better idea of what's going on under the hood.
Last index rebuild? Last statistics update?
Otherwise, these stored procs are used by the SQL Server client too... no? And probably won't cause these errors

SQL Server Maintenance Suggestions?

I run an online photography community and it seems that the site draws to a crawl on database access, sometimes hitting timeouts.
I consider myself to be fairly compentent writing SQL queries and designing tables, but am by no means a DBA... hence the problem.
Some background:
My site and SQL server are running on a remote host. I update the ASP.NET code from Visual Studio and the SQL via SQL Server Mgmt. Studio Express. I do not have physical access to the server.
All my stored procs (I think I got them all) are wrapped in transactions.
The main table is only 9400 records at this time. I add 12 new records to this table nightly.
There is a view on this main table that brings together data from several other tables into a single view.
secondary tables are smaller records, but more of them. 70,000 in one, 115,000 in another. These are comments and ratings records for the items in #3.
Indexes are on the most needed fields. And I set them to Auto Recompute Statistics on the big tables.
When the site grinds to a halt, if I run code to clear the transaction log, update statistics, rebuild the main view, as well as rebuild the stored procedure to get the comments, the speed returns. I have to do this manually however.
Sadly, my users get frustrated at these issues and their participation dwindles.
So my question is... in a remote environment, what is the best way to setup and schedule a maintenance plan to keep my SQL db running at its peak???
My gut says you are doing something wrong. It sounds a bit like those stories you hear where some system cannot stay up unless you reboot the server nightly :-)
Something is wrong with your queries, the number of rows you have is almost always irrelevant to performance and your database is very small anyway. I'm not too familiar with SQL server, but I imagine it has some pretty sweet query analysis tools. I also imagine it has a way of logging slow queries.
I really sounds like you have a missing index. Sure you might think you've added the right indexes, but until you verify the are being used, it doesn't matter. Maybe you think you have the right ones, but your queries suggest otherwise.
First, figure out how to log your queries. Odds are very good you've got a killer in there doing some sequential scan that an index would fix.
Second, you might have a bunch of small queries that are killing it instead. For example, you might have some "User" object that hits the database every time you look up a username from a user_id. Look for spots where you are querying the database a hundred times and replace it with a cache--even if that "cache" is nothing more then a private variable that gets wiped at the end of a request.
Bottom line is, I really doubt it is something mis-configured in SQL Server. I mean, if you had to reboot your server every night because the system ground to a halt, would you blame the system or your code? Same deal here... learn the tools provided by SQL Server, I bet they are pretty slick :-)
That all said, once you accept you are doing something wrong, enjoy the process. Nothing, to me, is funner then optimizing slow database queries. It is simply amazing you can take a query with a 10 second runtime and turn it into one with a 50ms runtime with a single, well-placed index.
You do not need to set up your maintenance tasks as a maintenance plan.
Simply create a stored procedure that carries out the maintenance tasks you wish to perform, index rebuilds, statistics updates etc.
Then create a job that calls your stored procedure/s. The job can be configured to run on your desired schedule.
To create a job, use the procedure sp_add_job.
To create a schedule use the procedure sp_add_schedule.
I hope what I have detailed is clear and understandable but feel free to drop me a line if you need further assistance.
Cheers, John

Resources