Let's have:
$DB an SQL Server database
$DBSP1 an SQL Server database containing stored procedures referencing $DB
$DBSP2 is exactly like $DBSP1
$SP is a stored procedure
Running $SP on $DBSP1 from C# code takes around 1.5s.
Running $SP on $DBSP2 from C# code takes around 0.5s.
The C# code is very simple and use SqlClient with default parameters.
When I execute $SP in an SQL console on both $DBSP1 and $DBSP2, it takes 0.4s.
The only difference between the two code databases is $DBSP1 is in production and is a bit loaded, while $DBSP2 is idle. There is no data in code databases, only stored procedures and views over $DB.
Can someone suggest reasons why this happen? Since all the work happens in $DB which is equally loaded in both cases I would expect performances to be similar.
The difference in speed came from different query plans. $SP is a complicated stored procedure and its initial query plan may vary vastly depending on the first query hitting it. And at this point we were not controlling the first request since right after deploying the $SP, it was published in production.
We tried providing hints about $SP parameters both at $SP levels and query levels, without success. So instead, we warm up the $SP before deploying with well crafted queries known to generate the correct query plan.
I think you have your answer. $DBSP1 is busy. It still has to task switch and prioritize work that comes in even though most of the work is happening in $DB.
Further, depending on the amount of data being transferred, the network activity comes into play. If $DBSP1 is sending / receiving a fair amount of data, that lessens the amount of available bandwidth for it to communicate with $DB and will cause result sets to transfer slower.
1) Check that both servers have the same indexes defined on each table
2) Rebuild Indexes (or at a minimum, update Statistics)
Do you have regular index maintenance jobs scheduled? Sounds like out of date statistics or fragmented indexes are causing the discrepancy..
Suggest you run profiler to see what is happening, but often the workload on a production database makes everything run slower. Also statistics may be out of date. Performance is rarely the same between two servers in my experience.
Measure.
Add SET STATISTICS TIME ON and SET STATISTICS IO ON to your C# and on the sql console call as well. In C#, hook up the SqlConneciton.InfoMessage event to capture the informational messages and display them. also separate the time measured for the SqlConnection.Open() call and the SqlCommand.ExecuteNonQuery call.
Then you'll be able to tell the difference.
different conenction Open() times?
different compilation/execution times ?
different logical reads/physical reads ratio?
different logical reads count?
Related
I am using SQL Server 2008 R2.
The process is actually like this:
First, about 2 million records are pulled from a remote server,
then a join is done locally,
the final result is thousands of records.
The time cost varies from less one 1 min to 30 mins.
And after I experienced the 30 mins delay, it seems the following time costs are all only around 3 mins.
It is the same data, same SP.
What could cause this drastic difference?
Update
I delet the SP, re-start the SQL server service, and re-creat the SP. The execution took only 50 seconds!
What's wrong?
The behaviour you describe seems extreme - but (if you exclude the client), there are 3 logical places to look.
The first is the query execution on the database server. It's worth using the Query Analyzer tool to see if it's using any indices - by far the most common reason for variable performance of database queries is that the query is not using (the right) indices, and that therefore the impact of the query cache plays a big part. SQL Server will cache a lot of data, and the first run of your proc populates that cache; the second run is faster because it hits the cache. After a while, the cache goes stale, and running the proc slows down again.
The second possibility is that the database server is wobbly - it may just not be powerful enough to do all the work it's supposed to do. In that case, one moment you get lucky, have all the server resources to yourself; the next, someone else is running a query and yours slows down. That would make all queries slow, not just this one - so it doesn't sound likely.
Third possibility is networking weirdness - as Phil says, "thousands of records" is nothing too scary, but if they're big, and your network is saturated with pictures of kittens, it might have an impact. Again, that would manifest in general network slowness, and is unlikely to explain a delay of 30 minutes...
Fourth, is anything going on at the same time?
Fifth, does your SP use dynamically generated SQL statements? This would cause the SP not to become pre-compiled. If possible seperate such statements into child SPs.
I apologize in advance for not having all of the specifics available, but the machine is building an index probably for a good while still and is almost completely unresponsive.
I've got a table on SQL Server 2005 with a good number of columns, maybe 20, but a mammoth number of rows (tens, more likely hundreds of millions). In order to simplify the amount of JPA work I'd need to do to access it, I created a view that contained the bits I was interested in. The view was created as:
SELECT bigtable.ID, bigtable.external_identification, mediumtable.hostname,
CONVERT(VARCHAR, bigtable.datefield, 121) AS datefield
FROM schema.bigtable JOIN schema.mediumtable ON bigtable.joinID = mediumtable.ID;
When I want to select from the view, I do:
SELECT * FROM vwTable WHERE external_identification = 'some string';
This works just fine in SQL Management Studio. The external_identification column has a non-unique, non-clustered index in bigtable. This also worked just fine on our remotely executing Java program in our test environment. Now that we're a day or two away from production, the code has been changed a bit (although the fundamental JPA NamedQuery is still straightforward), but we have a new SQLServer installation on new hardware; the test version was on a 32-bit single core machine, the new hardware is 64-bit multi-core.
Whenever I try to run the code that uses this view on the new hardware, it either hangs indefinitely on the first call of this query or times out if I have a timeout specified. After doing some digging, something like:
SELECT status, command, wait_type, last_wait_type FROM sys.dm_exec_requests;
confirmed that the query was running, but showed it in the state:
suspended, SELECT, CXPACKET, CXPACKET
for as long as I cared to wait for it. Whenever I ran the exact same query from within the Management Studio, it completed immediately. So I did some research, and found out this is due to waiting on some kind of concurrent operation to start/finish. In an attempt to circumvent that, I set the server-wide MAXDOP to 1 (disabled concurrency). After that, the query still hangs, but the sys.dm_exec_requests would show:
suspended, SELECT, PAGEIOLATCH_SH, PAGEIOLATCH_SH
This indicates that it's some sort of HD/scanning issue. While certainly the machine is less responsive than I'd expect for newer hardware, I wouldn't expect this query (even over the view) to require much scanning, since the column I'm searching by is indexed in the underlying table and it works if I run it locally. But just because I'm out of ideas and under the gun, I'm adding indexes to the view; first I have to add the unique clustered index (over ID) before I can attempt to add the non-unique non-clustered index over external_identification.
I'm the only one using this database; when I select from sys.dm_exec_requests the only two results are the query I'm actively inspecting and the select from sys.dm_exec_requests query. So it's not like it's under legitimately heavy, or even at all concurrent, load.
But I suspect I'm grasping at straws. I'm no DBA, and every time I have to interact with SQL Server outside of querying it it baffles my intuitions. Does anyone have any ideas why a query executed remotely would immediately go into a suspended state while the same query locally would execute immediately?
Wow, this one caught me straight out of left field. It turns out that by default, the MSSQL JDBC driver sends its String datatypes as Unicode, which the table/view might not be prepared to handle specifically. In our case, the columns and indexes were not, so MSSQL would perform a full table scan for each lookup.
In our test environment, the table was small enough that this didn't matter, so I was tricked into thinking it worked fine. In retrospect, I'm glad it didn't -- I can't stand it when computers give the illusion of inconsistency.
When I added this little parameter to the end of my JDBC connection string:
jdbc:sqlserver://[IP]:1433;databaseName=[db];sendStringParametersAsUnicode=false
things immediately and magically started working. Sorry for the slightly misleading question (I barely even mentioned JPA), but I had no idea what the cause was and really did believe it was something SQL Server side. Task Manager didn't report heavy CPU/Memory usage while the query was suspended, so I just thought it was idling even though it was really under heavy disk usage.
More info about MSSQL JDBC and Unicode can be found where I stumbled across the solution, at http://server.pramati.com/blog/2010/06/02/perfissues-jdbcdrivers-mssqlserver/ . Thanks, Ed, for that detailed shot in the dark -- it may not have been the problem, but I certainly learned a lot (and fast!) about MSSQL's gritty parts!
It is likely that the query run in SSMS and by your application are using different query plans - from the wait types you're seeing in dm_exec_requests it sounds like the plan created for the application is doing a table scan where the plan for SSMS is using an index seek.
This is possible because the SSMS and application database connections likely use different connection options, some of which are used as a key to the database plan cache.
You can find out which options your application is using by running a default SQL server profiler trace against the server; the first command after the connection is created will be a number of SET... options:
SET DATEFORMAT dmy
SET ANSI_NULLS ON
...
I suspect this list will be different between your application and your SSMS connection - a common candidate is SET ARITHABORT {ON|OFF}, since that forms part of the key of the cached plan.
If you run the SET... commands in an SSMS window before executing the query, the same (bad) plan as is being used by the application should then be picked up.
Assuming this demonstrates the problem, the next step is to work out how to prevent the bad plan getting into cache. It's difficult to give generic instructions about how to do this, since there are a few possible causes.
It's a bit of a scattergun approach (there are other more targetted ways to attempt to resolve this problem but they require more detailed understanding of the issue that I have now), but one thing to try is to add OPTION (RECOMPILE) to the end of your query - this forces a new plan to be generated for every execution, and should prevent the bad plan being reused:
SELECT * FROM vwTable WHERE external_identification = 'some string' OPTION (RECOMPILE);
Assuming you can replicate the bad performance is SSMS using the steps above, you should be able to test this there.
Beware that this can have negative performance consequences if the query is executed very frequently (since each recompilation requires CPU) - this depends on the workload of your application and will need testing.
A couple of other thoughts:
Check the schemas between the test and production systems; this might be as simple as a missing index from one of the tables in the production database, although given that SSMS queries perform OK this is unlikely.
You should re-enable parallelism by taking the server-wide MAXDOP=1 off, since this will limit the performance of your system overall. The problem is almost certainly the query plan, not parallelism
You also need to beware of the consequences of adding indexes to the view - doing so effectively materialises the view, which will (given the size of the table) require a lot of storage overhead - the indexes will also need to be maintained when INSERT/UPDATE/DELETE statements take place on the base table. Indexing the view is probably unnecessary given that (from SSMS) you know it's possible for the query to perform.
For a while now we've been having anecdotal slowness on our newly-minted (VMWare-based) SQL Server 2005 database servers. Recently the problem has come to a head and I've started looking for the root cause of the issue.
Here's the weird part: on the stored procedure that I'm using as a performance test case, I get a 30x difference in the execution speed depending on which DB server I run it on. This is using the same database (mdf) and log (ldf) files, detached, copied, and reattached from the slow server to the fast one. This doesn't appear to be a (virtualized) hardware issue: he slow server has 4x the CPU capacity and 2x the memory as the fast one.
As best as I can tell, the problem lies in the environment/configuration of the servers (either operating system or SQL Server installation). However, I've checked a bunch of variables (SQL Server config options, running services, disk fragmentation) and found nothing that has made a difference in testing.
What things should I be looking at? What tools can I use to investigate why this is happening?
Blindly checking variables and settings won't get you very far. You need to approach this methodically.
Are the two procedures executed the same way? Namely, is the plan different? A quick check is to SET STATISTICS IO ON and run the two cases. Is the number of logical-reads the same? Is the number of physical-reads the same? Is the number of writes the same? Differences in logical-reads or writes would indicate a different plan. Differences in physical-reads (while logical-reads is similar) indicate cache and memory problems. If the plans are different, you need to further investigate what is different in the actual execution plan. Does one plan uses a different degree of parallelism? Does one use different join types? Different access paths?
If the plans are similar yet the execution is still different, and you cannot blame the IO subsystem, then you need to check contention. Use SET STATISTICS TIME ON and compare the elapsed time and worker time in the two cases. Similar worker time but different elapsed time indicate that there is more waiting in one case. Use the wait_type and wait_resource info in sys.dm_exec_requests to identify the cause of contention.
The methodology of investigation is discussed in more detail in the Waits and Queues whitepaper.
Run SQL Server Profiler to gather information about running processes within SQL Server. This is probably the best start. This will give you a good idea of the things that are consuming a lot of resources.
If you still have issues after Indexing / Rebuilding Indexes, or rewriting queries, then the next step would be to run PerfMon.
I have a huge disgusting stored procedure that wasn't slow a couple months ago, but now is. I barely know what this thing does and I am in no way interested in rewriting it.
I do know that if I take the body of the stored procedure and then declare/set the values of the parameters and run it in query analyzer that it runs more than 20x faster.
From the internet, I've read that this is probably due to a bad cached query plan. So, I've tried running the sp with "WITH RECOMPILE" after the EXEC and I've also tried putting the "WITH RECOMPLE" inside the sp, but neither of those helped even a little bit.
When I look at the execution plan of the sp vs the query, the biggest difference is that the sp has "Parallelism" operations all over the place and the query doesn't have any. Can this be the cause of the difference in speeds?
Thank you, any ideas would be great... I'm stuck.
If the only difference between the two query plans is parallelism, try putting OPTION (MAXDOP 1) at the end of the query to limit it to a serial plan.
As to why they are different, I'm not sure, but I remember the SQL Server 2000 optimizer as being, um, finicky. Similar to your case, what we usually saw was that ad-hoc query batches would be fast and the same query via sp_executesql would be slow. Never did fully figure out what was going on.
Serial v parallel can definitely explain the difference in speeds, though. On SQL Server 2000, parallel plans use all the processors on the machine, not just the ones it needs:
If SQL Server chooses to use parallelism, it must use all the configured processors (as determined by the MAXDOP query hint configuration) for the execution of a parallel plan. For example, if you use MAXDOP=0 on a 32-way server, SQL Server tries to use all 32 processors even if seven processors might perform the job more efficiently as compared to a serial plan that only uses one processor. Because of this all-or-nothing behavior, if SQL Server chooses the parallel plan and you do not restrict the MAXDOP query hint[...], the time that it takes SQL Server to coordinate all the processors on a high-end server outweighs the advantages of using a parallel plan.
By default, I believe the server-wide setting of MAXDOP is 0, meaning use as many as possible. If you recently upgraded your database server with more processors to help performance, that could ironically explain why your performance is suffering. If that's the case, you might try setting the MAXDOP hint to the number of processors you had before and see if that helps.
try adding SET ARITHABORT ON at the top of the procedure.
as seen here: https://stackoverflow.com/questions/2465887/why-would-set-arithabort-on-dramatically-speed-up-a-query
If you have made many changes to the table and not run a re-index or defragment on the tables in question you probably should. Check out this article. The reason i suggest this is because the procedure at one time was fast and now over time it has degraded performance. I don't think making changes to an already existing procedure that was tested and worked well at one time should change on account of degraded performance over time. This usually only treats the symptoms not the actual problem.
I do know that if I take the body of
the stored procedure and then
declare/set the values of the
parameters and run it in query
analyzer that it runs more than 20x
faster.
Are you sure that it is not the fetching of these params ahead of the SP's execution that's not causing your slowness? With bypassing the population of the params you could be oversimplifying your issue.
Where do these params come from? How are they populated? It seems from your question that you've isolated the stored proc and found out that it might not be the issue.
Could it be a problem with contention? Does this store procedure run at a particular time when other heavy lifting is also happening?
I'm tracking down an odd and massive performance problem in my SQL server installation. On my system, a particular stored procedure takes 2 minutes to execute; on a colleague's system it takes less than 1 second. We have similar databases/data and configurations, but there's obviously something very different.
I ran the SP in question through the Profiler on both systems and noticed something odd. On My system, I see 9 entries with the following properties:
The Duration is way high relative to other rows. I have values as high as 37,698 and as low as 1734. On the "fast" system the maximum duration (for the entire SP call) is 259.
They are executed for two databases related to the one that contains the SP I'm running. (This SP makes calls via Linked Servers to these two databases).
They are executions of one of the following system SPs:
sp_tables_info_90_rowset
sp_check_constbytable_rowset
sp_columns_90_rowset
sp_table_statistics2_rowset
sp_indexes_90_rowset
I can't find any Googleable documentation on what these are, why they would be so slow, or why they would run on one system but not the other. Does anyone know what they're all about?
Try manually updating statistics on that table.
UPDATE STATISTICS [TableName]
Then double check that the database option to AutoUpdateStatistics is TRUE. Even if it is, though, I've seen cases where adding large amounts of data to a table doesn't always cause the statistics to update in a timely way, and queries can be slow.
I don't know the answer to your question. But to try to fix the problem you're having (which, I assume, is what you're actually interested in), the first thing I'd do is run a re-index on the tables you're querying. This frequently will fix any kind of slowness when the conditions are as you described (same database structure, different data/database, same query).
These are the tables created when you have linked server calls. These are called work tables created in Tempdb. They are automatically created by the database engine for temporary operations like Spooling etc.
Those sp's mean your query is hitting linked servers by using synonyms. This should be avoided whenever possible.
I'm not familiar with those specific procedures, but you can try running:
SELECT object_definition(object_id('Procedure Name'))
To get a better idea of what's going on under the hood.
Last index rebuild? Last statistics update?
Otherwise, these stored procs are used by the SQL Server client too... no? And probably won't cause these errors