EXECUTE AS vs SELECT FROM linked server - sql-server

I've got an Oracle server integrated with MS SQL as a linked server. Currently I'm working on the query optimization. I've found out that queries that written as following:
SELECT colName1, colName2, ..
FROM ORACLE.TBL_TBLENAME
WHERE something = #something
work very slowly. On the other hand, the same query written as:
EXECUTE ('SELECT colName1, colName2, ..
FROM TBL_TBLENAME
WHERE something :something',#something) at ORACLE
work much faster.
What I'm concerned about is the execution plan. For the first query Estimated Subtree Cost is 0.16, for the second it is 3.36. The second query performs a 'Remote scan'. I don't know whether this is good or not.
The query is supposed to run quite often (around 20 queries in 1 minute).

given you execution plan (and i'm an oracle guy not a sql server guy), it appears that the first one is doing a full table scan and filtering at the sql server end (compute scalar?), whereas the 2nd one is sumitting the filter to oracle and so much quicker.
are the stats up-to-date on the oracle table (perhaps it thinks there's only a few rows in the table so sql server is deciding its better to just fetch the whole table over and do the procesing locally?) and are there any histograms involved on "something"?
if the 2nd one is performing good for you though, is there really a problem?

Related

Updating varchar column over linked server with parameterized query causes remote scan and cursorfetch

I'm issuing a fairly simple update of a single varchar column against a remote linked server - like this:
UPDATE Hydrogen.CRM.dbo.Customers
SET EyeColor = 'Blue'
WHERE CustomerID = 619
And that works fine when is written as an ad-hoc query:
Parameterized queries bad
When we do what we're supposed to do, and have our SqlCommand issue it as a parameterized query, the SQL ends up being: (not strictly true, but close enough)
EXEC sp_executesql N'UPDATE [Hydrogen].[CRM].[dbo].[Customers]
SET [EyeColor] = #P1
WHERE [CustomerID] = #P5',
N'#P1 varchar(4),#P5 bigint',
'Blue',619
This parameterized form of the query ends up performing a remote scan against the linked server:
It creates a cursor on the linked server, and takes about 35 seconds to pull back 1.2M rows to the local server through a series of hundreds of sp_cursorfetch - each pulling down a few thousand rows.
Why, in the world, would the local SQL Server optimizer ever decide to pull back all 1.2M rows to the local server in order to update anything? And even if it was going to decide to pull back rows to the local server, why in the world would it do it using a cursor?
It only fails on varchar columns. If I try updating an INT column, it works fine. But this column is varchar - and it fails.
I tried other parametrizing the column as nvarchar, and it's still bad.
Every answer I've seen actually are questions:
"is the collation the same?"
"What if you change the column type?"
"Have you tried OPENQUERY?"
"Does the login have sysadmin role on the linked server?"
I already have my workaround: parameterized queries bad - use ad-hoc queries.
I was hoping for an explanation of the thing that makes no sense. And hopefully if we have an explanation we can fix it - rather than workaround it.
Of course I can't reproduce it anywhere except the customer's live environment. So it is going to require knowledge of SQL Server to come up with an explanation of what's happening.
Bonus Reading
Stackoverflow: Remote Query is slow when using variables vs literal
Stackoverflow: Slow query when connecting to linked server
https://dba.stackexchange.com/q/36893/2758
Stackoverflow: Parameter in linked-server query is converted from varchar to nvarchar, causing index scan and bad performance
Performance Issues when Updating Data with a SQL Server Linked Server
Update statements causing lots of calls to sp_cursorfetch?
Remote Scan on Linked Server - Fast SELECT/Slow UPDATE

In SQL Server, how to allow for multiple execution plans for a single query in a SP without having to recompile every time?

In SQL Server, what is the best way to allow for multiple execution plans to exist for a query in a SP without having to recompile every time?
For example, I have a case where the query plan varies significantly depending on how many rows are in a temp table that the query uses. Since there was no "one size fits all" plan that was satisfactory, and since it was unacceptable to recompile every time, I ended up copy/pasting (ick) the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It actually seemed to work beautifully performance-wise, but it feels a bit clunky. (I know I could similarly break this part out into multiple SPs to do the same thing.) Is there a better way to do this?
IF #RowCount < 1
[paste query here]
ELSE IF #RowCount < 50
[paste query here]
ELSE IF #RowCount < 200
[paste query here]
ELSE
[paste query here]
You can use OPTIMIZE FOR in certain situations, to create a plan targeted to a certain value of a parameter (but not multiple plans per se). This allows you to specify what parameter value we want SQL Server to use when creating the execution plan. This is a SQL Server 2005 onwards hint.
Optimize Parameter Driven Queries with the OPTIMIZE FOR Hint in SQL Server
There is also OPTIMIZE FOR UNKNOWN – a SQL Server 2008 onwards feature (use judiciously):
This hint directs the query optimizer
to use the standard algorithms it has
always used if no parameters values
had been passed to the query at all.
In this case the optimizer will look
at all available statistical data to
reach a determination of what the
values of the local variables used to
generate the queryplan should be,
instead of looking at the specific
parameter values that were passed to
the query by the application.
Perhaps also look into optimize for ad hoc workloads Option
SQL Server 2005+ has statement level recompilation and is better at dealing with this kind of branching. You have one plan still but the plan can be partially recompiled at the statement level.
But it is ugly.
I'd go with #Mitch Wheat's option personally because you have recompilations anyway with the stored procedure using a temp table. See Temp table and stored proc compilation

Intermittent slow query on SQL Server 2008

I am developing a system which periodically (4-5 times daily) runs a select statement, that normally takes less than 10 seconds but periodically has taken up to 40 minutes.
The database is on Windows Server 2008 + SQL Server 2008 R2; both 64bit.
There is a service on the machine running the database which polls the database and generates values for records which require it. These records are then periodically queried using a multi table join select from a service on a second machine written in C++ (VS 2010) using the MFC CRecordset class to extract the data. An example of the the query causing the problem is shown below.
SELECT DISTINCT "JobKeysFrom"."Key" AS "KeyFrom","KeysFrom"."ID" AS "IDFrom",
"KeysFrom"."X" AS "XFrom","KeysFrom"."Y" AS "YFrom","JobKeysTo"."Key" AS "KeyTo",
"KeysTo"."ID" AS "IDTo","KeysTo"."X" AS "XTo","KeysTo"."Y" AS "YTo",
"Matrix"."TimeInSeconds","Matrix"."DistanceInMetres","Matrix"."Calculated"
FROM "JobKeys" AS "JobKeysFrom"
INNER JOIN "JobKeys" AS "JobKeysTo" ON
("JobKeysFrom"."Key"<>"JobKeysTo"."Key") AND
("JobKeysFrom"."JobID"=531) AND
("JobKeysTo"."JobID"=531)
INNER JOIN "Keys" AS "KeysFrom" ON
("JobKeysFrom"."Key"="KeysFrom"."Key") AND ("JobKeysFrom"."Status"=4)
INNER JOIN "Keys" AS "KeysTo" ON
("JobKeysTo"."Key"="KeysTo"."Key") AND ("JobKeysTo"."Status"=4)
INNER JOIN "Matrix" AS "Matrix" ON
("Matrix"."IDFrom"="KeysFrom"."ID") AND ("Matrix"."IDTo"="KeysTo"."ID")
ORDER BY "JobKeysFrom"."Key","JobKeysTo"."Key"
I have tried the following
checked the indexes and all seem correct and they are active and are being used according to the query
the design advisor comes back with no suggestions
I have tried defragging the indexes and data
rebuilt the database from scratch by exporting the data and reimporting it in a new database.
ran the profiler on it and found that when it goes wrong it seems to do many millions (up to 100 million) of reads rather than a few hundred thousand.
ran the database on a different server
During the time it is running the query, I can run exactly the same query in the management studio window and it will be back to running in 10 seconds. The problem does not seem to be lock, deadlock, CPU, disk or memory related as it has done it when the machine running the database was only running this one query. The server has 4 processors and 16 gb of memory to run it in. I have also tried upgrading the disks to much faster ones and this had no effect.
It seems to me that it is almost as though the database receives the query, starts to process it and then goes to sleep for 40 minutes or runs the query without using the indexes.
When it takes a long time it will eventually finish and send the query results (normally about 70-100000 records) back to the calling application.
Any help or suggestions would be gratefully received, many thanks
This sounds very much like parameter sniffing.
When a stored procedure is invoked and there is no existing execution plan in the cache matching the set options for the connection a new execution plan will be compiled using the parameter values passed in on that invocation.
Sometimes this will happen when the parameters passed are atypical (e.g. have unusually high selectivity) so the generated plan will not be suitable for most other invocations with different parameters. For example it may choose a plan with index seeks and bookmark lookups which is fine for a highly selective case but poor if it needs to be done hundreds of thousands of times.
This would explain why the number of reads goes through the roof.
Your SSMS connection will likely have different SET ... options so will not get handed the same problematic plan from the cache when you execute the stored procedure inside SSMS
You can use the following to get the plan for the slow session
select p.query_plan, *
from sys.dm_exec_requests r
cross apply sys.dm_exec_query_plan(r.plan_handle) p
where r.session_id = <session_id>
Then compare with the plan for the good session.
If you do determine that parameter sniffing is at fault you can use OPTIMIZE FOR hints to avoid it choosing the bad plan.
Check that you don't have a maintenance task running that is rebuilding indexes, or that your database statistics are somehow invalid when the query is executed.
This is exactly the sort of thing one would expect to see if the query is not using your indexes, which is usually because either the indexes are not accessible to the query at the point it runs or because the statistics are invalid and make the optimiser believe that your large table(s) only have a few rows in them and the query would run faster with a full table scan than using indexed access.

Full Text Query takes minutes instead of sub seconds after upgrade

We just upgraded our SQL Server 2005 to SQL server 2008 R2 and noticed some performance problems.
The query below was already slow but now in 2008 it just times out. We rebuild the catalog to make sure its freshly made on 2008
DECLARE #FREETEXT varchar(255) = 'TEN-T'
select Distinct ...
from
DOSSIER_VERSION
inner join
DOSSIER_VERSION_LOCALISED ...
where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
or
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
The query takes minutes if you have both conditions enabled.
If you just put the following in the where
CONTAINS(DOSSIER_VERSION.*,#FREETEXT)
Its super fast. Same goes for the case if its just
CONTAINS(DOSSIER_VERSION_LOCALISED.*,#FREETEXT)
Since we are or'ing the results I would expect the time for this query to run to be less than the sum but as stated above it takes minutes/times out.
Can anyone tell me what is going on here? If I use a union (which is conceptually the same as the or) the performance problem is gone but I would like to know what issue I am running into here since I want to avoid rewriting queries.
Regards, Tom
See my answers to these very similar questions:
Adding more OR searches with
CONTAINS Brings Query to Crawl
SQL Server full text query across
multiple tables - why so slow?
The basic idea is that using LEFT JOINs to CONTAINSTABLE (or FREETEXTTABLE) performs significantly better than having multiple CONTAINS (or FREETEXT) ORed together in the WHERE clause.

how can I test performance in Sql Server Mgmt Studio without outputting data?

Using SQL Server Management Studio.
How can I test the performance of a large select (say 600k rows) without the results window impacting my test? All things being equal it doesn't really matter, since the two queries will both be outputting to the same place. But I'd like to speed up my testing cycles and I'm thinking that the output settings of SQL Server Management Studio are getting in my way. Output to text is what I'm using currently, but I'm hoping for a better alternative.
I think this is impacting my numbers because the database is on my local box.
Edit: Had a question about doing WHERE 1=0 here (thinking that the join would happen but no output), but I tested it and it didn't work -- not a valid indicator of query performance.
You could do SET ROWCOUNT 1 before your query. I'm not sure it's exactly what you want but it will avoid having to wait for lots of data to be returned and therefore give you accurate calculation costs.
However, if you add Client Statistics to your query, one of the numbers is Wait time on server replies which will give you the server calculation time not including the time it takes to transfer the data over the network.
You can SET STATISTICS TIME ON to get a measurement of the time on server. And you can use the Query/Include Client Statistics (Shift+Alt+S) on SSMS to get detail information about the client time usage. Note that SQL queries don't run and then return the result to the client when finished, but instead they run as they return results and even suspend execution if the communication channel is full.
The only context under which a query completely ignores sending the result packets back to the client is activation. But then the time to return the output to the client should be also considered when you measure your performance. Are you sure your own client will be any faster than SSMS?
SET ROWCOUNT 1 will stop processing after the first row is returned which means unless the plan happens to have a blocking operator the results will be useless.
Taking a trivial example
SELECT * FROM TableX
The cost of this query in practice will heavily depend on the number of rows in TableX.
Using SET ROWCOUNT 1 won't show any of that. Irrespective of whether TableX has 1 row or 1 billion rows it will stop executing after the first row is returned.
I often assign the SELECT results to variables to be able to look at things like logical reads without being slowed down by SSMS displaying the results.
SET STATISTICS IO ON
DECLARE #name nvarchar(35),
#type nchar(3)
SELECT #name = name,
#type = type
FROM master..spt_values
There is a related Connect Item request Provide "Discard results at server" option in SSMS and/or TSQL
The best thing you can do is to check the Query Execution Plan (press Ctrl+L) for the actual query. That will give you the best guesstimate for performance available.
I'd think that the where clause of WHERE 1=0 is definitely happening on the SQL Server side, and not Management Studio. No results would be returned.
Is you DB engine on the same machine that you're running the Mgmt Studio on?
You could :
Output to Text or
Output to File.
Close the Query Results pane.
That'd just move the cycles spent on drawing the grid in Mgmt Studio. Perhaps the Resuls to Text would be more performant on the whole. Hiding the pane would save the cycles on Mgmt Studio on having to draw the data. It's still being returned to the Mgmt Studio, so it really isn't saving a lot of cycles.
How can you test performance of your query if you don't output the results? Speeding up the testing is pointless if the testing doesn't tell you anything about how the query is going to perform. Do you really want to find out this dog of a query takes ten minutes to return data after you push it to prod?
And of course its going to take some time to return 600,000 records. It will in your user interface as well, it will probably take longer than in your query window because the info has to go across the network.
There is a lot of more correct answers of answers but I assume real question here is the one I just asked myself when I stumbled upon this question:
I have a query A and a query B on the same test data. Which is faster? And I want to check quick and dirty. For me the answer is - temp tables (overhead of creating temp table here is easy to ignore). This is to be done on perf/testing/dev server only!
Query A:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS (to clear statistics
SELECT * INTO #temp1 FROM ...
Query B
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SELECT * INTO #temp2 FROM ...

Resources