SQL Server - Multiple select queries hit performance - sql-server

Recently I ran into an issue where we have multiple concurrent client requests causing performance issue in db. I tried the test scenario and as it turned out, when I run SELECT queries (same query) 6 to 7 times (gets worse with more), It degrades the performance and execution takes a lot of time. However I tried this one
SELECT TOP (100) COUNT(DISTINCT([Doc_Number])) AS "Expression"
FROM (
SELECT *
FROM "dbo"."Dummy_Table" "table_alias"
WHERE ((CAST("table_alias"."ID" AS NVARCHAR)) NOT IN
(
SELECT "PrimaryKey" AS ExceptionKey
FROM dbo.exceptions inner_exceptionStatus
LEFT JOIN dbo.Workflow inner_workflowStates ON
(inner_exceptionStatus."Status"= inner_workflowStates."UUID" AND
inner_exceptionStatus."UUID"= 'CA1662D6-73A2-4692-A765-E7E3EDB66062')
WHERE ("inner_workflowStates"."RemoveFromRecordSet" = 1 AND
"inner_workflowStates"."IsDeleted" = 0) AND
("inner_exceptionStatus"."IsArchived" IS NULL OR
"inner_exceptionStatus"."IsArchived" = 0)))) wrapperQuery
The query when runs alone takes around 1sec execution time. But If we runs it in parallel, for each query it takes up a wried amount of time of leads to timeout.
The only thing bothers me here is that SELECT query should be non-blocking and even with shared lock, then need to get along easily.
I am not sure if there is anything wrong in the query that adds up the situation.
Any help is deeply appreciated !!

Try this way
SELECT Count(DISTINCT( [Doc_Number] )) AS Expression
FROM dbo.Dummy_Table table_alias
WHERE NOT EXISTS (SELECT 1
FROM dbo.exceptions inner_exceptionStatus
INNER JOIN dbo.Workflow inner_workflowStates
ON ( inner_exceptionStatus.Status = inner_workflowStates.UUID
AND inner_exceptionStatus.UUID = 'CA1662D6-73A2-4692-A765-E7E3EDB66062' )
WHERE inner_workflowStates.RemoveFromRecordSet = 1
AND inner_workflowStates.IsDeleted = 0
AND ( inner_exceptionStatus.IsArchived IS NULL
OR inner_exceptionStatus.IsArchived = 0 )
AND table_alias.ID = PrimaryKey)
Made couple of changes.
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
Still if the query is running slow then you need to post the execution plan and make sure the statistics are up-to-date

You can simplyfy your query like this :
SELECT COUNT(DISTINCT(Doc_Number)) AS Expression
FROM dbo.Dummy_Table dmy
WHERE not exists
(
SELECT *
FROM dbo.exceptions ies
INNER JOIN dbo.Workflow iws ON ies.Status= iws.UUID AND ies.UUID= 'CA1662D6-73A2-4692-A765-E7E3EDB66062'
WHERE iws.RemoveFromRecordSet = 1 AND iws.IsDeleted = 0 AND (ies.IsArchived IS NULL OR ies.IsArchived = 0)
and dmy.ID=PrimaryKey
)
Like prdp say :
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
I add :
Remove you temporary table wrapperQuery
You can use INNER JOIN because into where you test RemoveFromRecordSet = 1 then you remove null values.
Remove not utils quotes ,brackets and parenthèses into where clause

Related

Streams + tasks missing inserts?

We've setup a stream on a table that is continuously loaded via snowpipe.
We're consuming this data with a task that runs every minute where we merge into another table. There is a possibility of duplicate keys so we use a ROW_NUMBER() window function, ordered by the file created timestamp descending where row_num=1. This way we always get the latest insert
Initially we used a standard task with the merge statement but we noticed that in some instances, since snowpipe does not guarantee loading in order of when the files were staged, we were updating rows with older data. As such, on the WHEN MATCHED section we added a condition so only when the file created ts > existing, to update the row
However, since we did that, reconciliation checks show that some new inserts are missing. I don't know for sure why changing the matched clause would interfere with the not matched clause.
My theory was that the extra clause added a bit of time to the task run where some runs were skipped or the next run happened almost immediately after the last one completed. The idea being that the missing rows were caught up in the middle and the offset changed before they could be consumed
As such, we changed the task to call a stored procedure which uses an explicit transaction. We did this because the docs seem to suggest that using a transaction will lock the stream. However even with this we can see that new inserts are still missing. We're talking very small numbers e.g. 8 out of 100,000s
Any ideas what might be happening?
Example task code below (not the sp version)
WAREHOUSE = TASK_WH
SCHEDULE = '1 minute'
WHEN SYSTEM$stream_has_data('my_stream')
AS
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED AND ms.file_created >= pd.file_created THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....
;
I am not fully sure what is going wrong here, but the file created time related recommendation is given by Snowflake somewhere. It suggest that the file created timestamp is calculated in cloud service and it may be bit different than you think. There is another recommendation related to snowpipe and data ingestion. The queue service takes a min to consume the data from pipe and if you have lot of data being flown inside with in a min, you may end up this issue. Look you implementation and simulate if pushing data in 1min interval solve that issue and don't rely on file create time.
The condition "AND ms.file_created >= pd.file_created" seems to be added as a mechanism to avoid updating the same row multiple times.
Alternative approach could be using IS DISTINCT FROM to compare source against target columns(except id):
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED
AND (pd.col1, pd.col2,..., pd.coln) IS DISTINCT FROM (ms.col1, ms.col2,..., ms.coln)
THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....;
This approach will also prevent updating row when nothing has changed.

Stored procedure was fine but then became extremly slow [duplicate]

This question already has answers here:
SQL Server: Query fast, but slow from procedure
(12 answers)
Closed 2 years ago.
My stored procedure was taking around 10 seconds, but suddenly (for unknown reasons) it became so slow (taking 9 minutes).
I did not do any changes at all that may cause the delay.
I wonder if someone can tell why it is so slow.
Here is my query
SELECT
P.PtsID, P.PtsCode, P.PtsName,
FORMAT(P.DOB, 'dd/MM/yyyy') AS DOB, P.Gender, V.VisitID, V.VisitType,
FORMAT(V.VisitDate, 'dd/MM/yyyy') AS VisitDate,
FORMAT(V.DischargeDate, 'dd/MM/yyyy') AS DischargeDate,
R.RepID, R.RepDate, R.RepType, R.RepDesc
FROM
Patients P
INNER JOIN
Visits V ON P.PtsID = V.PtsID
INNER JOIN
Reps R ON R.PtsID = P.PtsID AND R.VisitID = V.VisitID
WHERE
(P.Deleted = 0 AND V.Deleted = 0 AND R.Deleted = 0)
AND (P.PtsName LIKE '%'+TRIM(#PtsName)+'%' OR TRIM(#PtsName) = '')
AND (P.PtsCode LIKE '%'+TRIM(#PtsNo)+'%' OR TRIM(#PtsNo) = '')
AND (R.RepText LIKE '%'+TRIM(#RepText)+'%' OR TRIM(#RepText) = '')
AND (TRIM(#RepCode) = '' OR R.RepID IN (SELECT RepID
FROM tags
WHERE tag = 'XXX'
AND Deleted = 0
AND code IN (SELECT value
FROM string_split(#RepCode,','))))
and this is the execution plan
When I execute the script as an ad-hoc query, not as stored procedure, it is very fast.
Edit
Here is my actual execution plan:
https://www.brentozar.com/pastetheplan/?id=ByPLltcZD
Thanks
So purely bassed on the execution plan.
Statistics
In your exection plan you can see the actual numbers and estimates are far apart.
This can have multiple causes.
Out of date statistics and indexes. Try rebuilding indexes and updating statistics
Updating stats: EXEC sp_updatestats (this might take some time and resources, so if its a production server, do it out of the office hours / batch job windows.)
Parameter sniffig can also cause wrong estimations. You can expirment with OPTION(RECOMPILE) or OPTION(OPTIMIZE FOR UNKNOWN) to test if this is the case.
201 Bucket Problem
Query optimization
When you write IN(subquery) and you have a key lookup, like you have. You are going to have a bad time. For every row returned from the index (NonClusteredIndex-code) the engine needs to access the clustered index to retrieve the RepID one-by-one = Painfully slow when it's multiple rows (In your case: 430.474.500 rows).
You can change this by using EXISTS, in your example:
SELECT
P.PtsID, P.PtsCode, P.PtsName,
FORMAT(P.DOB, 'dd/MM/yyyy') AS DOB, P.Gender, V.VisitID, V.VisitType,
FORMAT(V.VisitDate, 'dd/MM/yyyy') AS VisitDate,
FORMAT(V.DischargeDate, 'dd/MM/yyyy') AS DischargeDate,
R.RepID, R.RepDate, R.RepType, R.RepDesc
FROM
Patients P
INNER JOIN
Visits V ON P.PtsID = V.PtsID
INNER JOIN
Reps R ON R.PtsID = P.PtsID AND R.VisitID = V.VisitID
WHERE
(P.Deleted = 0 AND V.Deleted = 0 AND R.Deleted = 0)
AND (P.PtsName LIKE '%'+TRIM(#PtsName)+'%' OR TRIM(#PtsName) = '')
AND (P.PtsCode LIKE '%'+TRIM(#PtsNo)+'%' OR TRIM(#PtsNo) = '')
AND (R.RepText LIKE '%'+TRIM(#RepText)+'%' OR TRIM(#RepText) = '')
AND (TRIM(#RepCode) = '' OR EXISTS (SELECT 1
FROM tags
WHERE tag = 'XXX'
AND Deleted = 0
AND tags.RepId = r.RepId
AND code IN (SELECT value
FROM string_split(#RepCode,','))))
Index optimizations
If you are still suffering from the Key lookup you may want to change the index, so it also has RepId or include it.
You still have 2 other key lookups. You could also solve this with an INCLUDE on the indexes but only if it makes sense. (Can they be used for other queries, is the current query executing frequently, ...)
Doing the trims and concatenation and storing them in a separate variable may give you some minor improvements.
Wait stats
The following wait stats where also included in the execution plan.
<WaitStats>
<Wait WaitType="RESERVED_MEMORY_ALLOCATION_EXT" WaitTimeMs="1018" WaitCount="2694600"/>
<Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="514" WaitCount="159327"/>
<Wait WaitType="ASYNC_NETWORK_IO" WaitTimeMs="63" WaitCount="5"/>
<Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="25" WaitCount="14639"/>
</WaitStats>
RESERVED_MEMORY_ALLOCATION_EXT and MEMORY_ALLOCATION_EXT but there is no issue with wait stats.
As mentioned in another post: SQL Server: Query fast, but slow from procedure
You can use the following workaround
Slow: SET ANSI_NULLS OFF
Fast: SET ANSI_NULLS ON

SQL Server execution speed varies wildly depending on how parameters are provided to an inline table function

I am investigating a problem with the execution speed of an inline table function in SQL Server. Or that's where I thought the problem lay. I came across
T-SQL code is extremely slow when saved as an Inline Table-valued Function
which looked promising, since it described what I was seeing, but I seemed to have the opposite problem - when I passed variables to my function, it took 17 seconds, but when I ran the code of my function in a query window, using DECLARE statements for the variables (which I thought effectively made them literals), it ran in milliseconds. Same code, same parameters - just wrapping them up in an inline table function seemed to drag it way down.
I tried to reduce my query to the minimum possible code that still exhibited the behaviour. I am using numerous existing inline table functions (all of which have worked fine for years), and managed to strip my code down to needing just a call of one existing inline table function to be able to highlight the speed difference. But in doing so I noticed something very odd
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot('2019', 'REG')
takes 17 seconds whereas
DECLARE #strAcademicSessionStart varchar(4) = '2019'
DECLARE #strProgressCode varchar(12)= 'REG'
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot(#strAcademicSessionStart, #strProgressCode)
takes milliseconds! So nothing to do with wrapping the code in an inline table function, but everything to do with how the parameters are passed to a nested function within it. Based on the cited article I'm guessing there are two different execution plans in play, but I have no idea why/how, and more importantly, what I can do to persuade SQL Server to use the efficient one?
P.S. here is the code of the inner UDF call in response to a comment request
ALTER FUNCTION [dbo].[udfNominalSnapshot]
(
#strAcademicSessionStart varchar(4)='%',
#strProgressCode varchar(10)='%'
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 100 PERCENT S.strStudentNumber, S.strSurname, S.strForenames, S.strTitle, S.strPreviousSurname, S.dtmDoB, S.strGender, S.strMaritalStatus,
S.strResidencyCode, S.strNationalityCode, S.strHESAnumber, S.strSLCnumber, S.strPreviousSchoolName, S.strPreviousSchoolCode,
S.strPreviousSchoolType,
COLLEGE_EMAIL.strEmailAddress AS strEmailAlias,
PERSONAL_EMAIL.strEmailAddress AS strPersonalEmail,
P.[str(Sub)Plan], P.intYearOfCourse, P.strProgressCode,
P.strAcademicSessionStart, strC2Knumber AS C2K_ID, AcadPlan, strC2KmailAlias
,ISNULL([strC2KmailAlias], [strC2Knumber]) + '#c2kni.net' AS strC2KmailAddress
FROM dbo.tblStudents AS S
LEFT JOIN
dbo.udfMostRecentEmail('COLLEGE') AS COLLEGE_EMAIL ON S.strStudentNumber = COLLEGE_EMAIL.strStudentNumber
LEFT JOIN
dbo.udfMostRecentEmail('PERSONAL') AS PERSONAL_EMAIL ON S.strStudentNumber = PERSONAL_EMAIL.strStudentNumber
INNER JOIN
dbo.udfProgressHistory(#strAcademicSessionStart) AS P ON S.strStudentNumber = P.strStudentNumber
WHERE (P.strProgressCode LIKE #strProgressCode OR (SUBSTRING(#strProgressCode, 1, 1) = '^' AND P.strProgressCode NOT LIKE SUBSTRING(#strProgressCode, 2, LEN(#strProgressCode)))) AND
(P.strStudentNumber NOT IN
(SELECT strStudentNumber
FROM dbo.tblPilgrims
WHERE (strAcademicSessionStart = #strAcademicSessionStart) AND (strScheme = 'BEI')))
ORDER BY P.[str(Sub)Plan], P.intYearOfCourse, S.strSurname
)
Expanding on #Ross Pressers comment, this might not really be an answer, but demonstrates what is happening (a bit), with my understanding (which could be wrong!) of what is happening...
Run the setup code at the end and then....
Execute the following with query plan on (Ctrl-M)... (note: depending on the random number generator you may or may not get any results, that does not affect the plan)
declare #one varchar(100) = '379', #two varchar(200) = '726'
select * from wibble(#one, #two) -- 1
select * from wibble('379', '726') -- 2
select * from wibble(#one, #two) OPTION (RECOMPILE) -- 3
select * from wibble(#one, #two) -- 4
Caveat. The following is what happens on MY system, your mileage may vary...
-- 1 (and -- 4) are the most expensive.
SQL Server creates a generic plan as it does not know what the parameters are (yes they are defined, but the plan is for wibble(#one, #two) where, at that point, the parameter values are "unknown")
https://www.brentozar.com/pastetheplan/?id=rJtIRwx_r
-- 2 has a different plan
Here, sql server knows what the parameters are, so can create a specific plan, which is quite different to --1
https://www.brentozar.com/pastetheplan/?id=rJa9APldS
-- 3 has the same plan as --2
Testing this further, adding OPTION (RECOMPILE) gets SQL Server to create a specific plan for the specific execution of wibble(#one, #two) so we get the same plan as --2
--4 is there for completeness to show that after all that mucking about the generic plan is still in place
So, in this simple example we have a parameterised TVF being called with identical values, that are passed either as parameters or inline, producing different execution plans and different execution times as per the OP
Set up
use tempdb
GO
drop table if EXISTS Orders
GO
create table Orders (
OrderID int primary key,
UserName varchar(50),
PhoneNumber1 varchar(50),
)
-- generate 300000 with randon "phone" numbers
;WITH TallyTable AS (
SELECT TOP 300000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [N]
FROM dbo.syscolumns tb1,dbo.syscolumns tb2
)
insert into Orders
select n, 'user' + cast(n as varchar(10)), cast(CRYPT_GEN_RANDOM(3) as int)
FROM TallyTable;
GO
drop function if exists wibble
GO
create or alter function wibble (
#one varchar(4) = '%'
, #two varchar(4) = '%'
)
returns table
as
return select * from Orders
where PhoneNumber1 like '%' + #one + '%'
and PhoneNumber1 like '%' + #two + '%'
or (SUBSTRING(#one, 1, 1) = '^' AND PhoneNumber1 NOT LIKE SUBSTRING(#two, 2, LEN(#two)))
and (select 1) = 1
GO
Problem was overcome (I wouldn't say "fixed") by following up on Ross Presser's observation about the complexity of udfProgressHistory. This sucked data from a table tblProgressHistory which was joined to itself. The table is added to annually. I think this year's additional 2K records must have caused the sudden cost-hike when using a particular execution plan. I deleted >2K redundant records and we're back to sub-second execution.

Decrease execution time of SQL query

I've got a question in terms of processing and making a query more efficient whilst maintaining its accuracy. Before I display the query I'd like to point out some basics of it.
I've got a case that manipulates the where-clause to get all childs of the parent. Basically I've got two types of data that I need to display; a red and a green type. The red type has a column (TRK_TrackerGroup_LKID2) set to NULL by default, whereas the green data has a value in said column (ranging from 5-7).
My problem is that I need to extract both types of data to accurately get a count of outstanding issues in a view, but doing so (by adding the case) the execution time goes from < 1 second to well over 15 seconds.
This is the query (with the mentioned case):
SELECT TS.id AS TrackerStartDateID,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
Count(TF.id) AS Cnt
FROM [dbo].[trk_startdate] TS
INNER JOIN [dbo].[trk_startdatemap] TSM
ON TS.id = TSM.trk_startdateid
AND TSM.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures] TF
ON TF.trk_startdateid = TS.id
AND TF.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK
ON TFLK.id = TF.trk_feature_lkid
WHERE TS.deletedflag = 0
AND TF.applicabletoproject = 1
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
AND TF.datestamp = (SELECT Max(TF2.datestamp)
FROM [dbo].[trk_trackerfeatures] TF2
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK2
ON TFLK2.id = TF2.trk_feature_lkid
WHERE TF.trk_startdateid = TF2.trk_startdateid
AND TFLK2.trk_trackergroup_lkid = TFLK.trk_trackergroup_lkid)
GROUP BY TS.id,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
TF.datestamp
It functions as a 'parent' in the sense that it grabs the latest inserted data-set (using DateStamp) from every single child-group. This is necessary to produce a parent-report in SSRS report at a later time, but at the moment my problem (as mentioned above) is the execution time.
I'd like to hear if there are any suggestions on how to decrease the execution time whilst maintaining the accuracy of the query.
Expected output:
Without the case I get this:
Your problem is this condition cant use INDEX
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
Try to change it to
AND ( TF.readyforwork = 0 and TF.trk_trackerstatus_lkid2 IS NULL
OR TF.readyforwork = 1 and TF.trk_trackerstatus_lkid2 IS NOT NULL
)
But again you should check with EXPLAIN ANALIZE to test if your query is using index or not.
The most problematic bit of your query seems to be the correlated subquery, because you must call it for every possible row.
You should optimize this first. To do so you can add indexes that the engine could use to quickly calculate that value on each row.
Based on your query I would add these two indexes multiples :
On Table trackerfeatures, index fields : trk_startdateid, datestamp
On Table trk_trackerfeatures_lk, index fields : id, trk_trackergroup_lkid

Space in SQL Server 2008 R2 slows down performance

I have run into a rather weird problem. I have created the following query in SQL Server
SELECT * FROM leads.BatchDetails T1
INNER JOIN leads.BatchHeader h ON T1.LeadBatchHeaderId = h.ID
WHERE
T1.LeadBatchHeaderId = 34
AND (T1.TypeRC = 'R' OR h.DefaultTypeRC = 'R')
AND EXISTS (SELECT ID FROM leads.BatchDetails T2 where
T1.FirstName = T2.FirstName AND
T1.LastName = T2.LastName AND
T1.Address1 = T2.Address1 AND
T1.City = T2.City AND
T1.[State] = T2.[State] AND
T1.Zip5 = T2.Zip5 AND
T1.LeadBatchHeaderId = T2.LeadBatchHeaderId
and t2.ID < t1.ID
AND (T2.TypeRC = 'R' OR h.DefaultTypeRC = 'R' )
)
It runs decently fast in 2 seconds. When formatting the code I accidently added an additional SPACE between AND + EXISTS so the query look like this.
SELECT * FROM leads.BatchDetails T1
INNER JOIN leads.BatchHeader h ON T1.LeadBatchHeaderId = h.ID
WHERE
T1.LeadBatchHeaderId = 34
AND (T1.TypeRC = 'R' OR h.DefaultTypeRC = 'R')
AND EXISTS (SELECT ID FROM leads.BatchDetails T2 where
T1.FirstName = T2.FirstName AND
T1.LastName = T2.LastName AND
T1.Address1 = T2.Address1 AND
T1.City = T2.City AND
T1.[State] = T2.[State] AND
T1.Zip5 = T2.Zip5 AND
T1.LeadBatchHeaderId = T2.LeadBatchHeaderId
and t2.ID < t1.ID
AND (T2.TypeRC = 'R' OR h.DefaultTypeRC = 'R' )
)
All of a sudden the query takes 13 seconds to execute.
I am running SQL Server in an isolated sandbox environment and I have even tested it on a different sandbox. I also checked the executed query in profiler, the reads are virtually the same, but CPU time is way up
If this is not weird enough, it's getting weirder. When I change SELECT * FROM to SELECT Field1, ... FROM at the top of the query the execution takes over 3 minutes.
I have been working with SQL Server for 10 years and never seen anything like this.
Edit:
After following the suggestions below it appears that queries are "white-space-sensitive". However I still have no idea why the SELECT * FROM is so much faster than SELECT Field1, ... FROM
I would guess that you're dealing with two different cached query plans:
You you ran the query once, with a certain set of parameters. SQL Server determined an appropriate query plan, and stored that query plan "Auto-parametrized", in other words replacing the values you provided with variables, for the purposes of the query plan.
You then ran the same query again, with different parameters. The query gets auto-parameterized, and matches the existing cached query plan (even though that query plan may not be optimal for the new parameters provided!).
You then run this second query again, with your extra space. This time, the auto-parametrized query does NOT match anything in the cache, and therefore gets its own plan based on THIS set of parameters (remember, the first plan was for a different set of parameters). This query plan happens to end up faster (or slower).
If this is truly the explanation, you should be able to make the effect go away, by running DBCC FREEPROCCACHE: http://msdn.microsoft.com/en-us/library/ms174283.aspx
There's lots of stuff on auto-parameterization out there, I personally liked Gail Shaw's series:
http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/
http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/
http://sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-3/
(for the record, I have no idea whether SQL Server eliminates/normalizes whitespace before storing an auto-parameterized query plan; I would have assumed so, but this entire answer asssumes that it doesn't!)
This might very well be related to caching issues. When you change your query, even by as little as a space, the cached execution plan of your previous query will no longer be used. If my answer is correct, you should see the same (2 seconds) performance when you run the bottom query for the second time...
Just my 2 cents
You could flush the cache with the following two statements:
DBCC FreeProcCache
DBCC DROPCLEANBUFFERS

Resources