I have SQL query like this:
DECLARE #cdate1 date = '20200401 00:00:00'
DECLARE #cdate2 date = '20200630 23:59:59'
SELECT DISTINCT ([hf].[id])
FROM ((([hf]
JOIN [pw] AS [PP] ON [PP].[identity] = [hf].[id]
AND [PP].[type] = 67
AND [PP].[ideletestate] = 0
AND [PP].[datein] = (SELECT MAX([datein])
FROM [pw]
WHERE [pw].[identity] = [hf].[id]
AND [pw].[ideletestate] = 0
AND [pw].[type] = 67))
JOIN [px] ON [px].[idpaper] = [PP].[id]
AND [px].[ideletestate] = 0
AND [px].[type] = 30036
AND [px].[nazvanie] NOT LIKE '')
JOIN [pw] ON ([pw].[identity] = [hf].[id]
AND ([pw].[id] > 0)
AND ([pw].[ideletestate] = 0)
AND ([pw].[type] IN (16, 2, 3012, 19, 3013)))
LEFT JOIN [px] AS [px102] ON [px102].[idpaper] = [pw].[id]
AND [px102].[type] = 102
AND [px102].[ideletestate] = 0)
WHERE
(([pw].[idcompany] in (12461, 12466, 12467, 12462, 12463, 13258)) OR
([pw].[idcompany2] in (12461, 12466, 12467, 12462, 12463, 13258)) OR
([px102].[idcompany] in (12461, 12466, 12467, 12462, 12463, 13258)) ) AND
[pw].[datein] >= #cdate1 AND [pw].[datein] <= #cdate2
It works fine, but if I print it like this ...AND [pw].[datein] >= '20200401 00:00:00' AND [pw].[datein] <= '20200630 23:59:59', it work very slowly. 10 minutes vs 1 sec.
One more strange, if i use first date '20200101 00:01:00' it work fast too. If date more then 10 March 2020, it work very slow (if date like string in query, if variable it work good).
Do I have a bad query? But why do it work with variable? Or is it some issue with SQL Server?
This looks like a statistics problem.
SQL server will build a histogram of the values in a table to give it some idea what kind of query plan to create.
For example, if you have a table T with a million rows in it, but the value of column C is always 1, and then you do select * from T with C = 1, the engine will choose a plan that expects to get a lot of rows returned, because the histogram says "it is statistically likely that this table contains a hell of a lot of rows where C = 1"
Alternatively, if you have a table T with a million rows in it, but the value of column C is never 1, then the histogram tells the engine "very few rows are likely to be returned for the query select * from T where C = 1 so pick a plan optimized for a small number of rows".
A problem can arise when the values in a column have significantly changed, but the histogram (statistics) have yet to be updated. Then SQL might pick a plan based on the histogram, where a different plan would have been much better. In your case, the histogram may not indicate to the engine that there are any values greater than about the 10th of March 2020. Statistics issues are fairly common with dates, because you are often inserting getdate(), which means newer rows in the table will contain values that have never been seen before, and thus won't be covered by the histogram until it gets updated.
SQL will automatically update statistics based on a number of different triggers (this is an old article, newer versions of the engine may have changed slightly), as long as you have auto update statistics enabled in the database settings.
You can find out whether this is the issue by forcing SQL to update statistics on your table. Statistics can be refreshed by either fully scanning the table, or sampling it. Sampling is much faster for large tables, but the result won't be as accurate.
To find out whether statistics is the problem in your case, do:
update statistics PW with fullscan
Related
I'm doing a select on a table with about 6 millions records selecting GETDATE()
select getdate() as date, [...] from MyTable
I verified that the performance issue is on GETDATE(), removing all other fields the query is still slow.
I thought that putting the value of GETDATE() in a separate var would speed the query up
declare #now datetime
set #now = GETDATE()
select #now as date, [...] from MyTable
It is slow as well. Why?
I'd never really noticed this before. But I am seeing the same thing.
Ran the following on a 10 million row table...
-- query #1
DECLARE #now AS DATETIME ;
SET #now = GETDATE() ;
SELECT #now AS [date], * FROM [MyTable] ;
-- cpu time = 2,563 ms
-- duration = 27,511 ms
-- query #2
SELECT GETDATE() AS [date], * FROM [MyTable] ;
-- cpu time = 2,421 ms
-- duration = 26,862 ms
-- query #3
SELECT * FROM [MyTable] ;
-- cpu time = 1,969 ms
-- duration = 23,149 ms
And the cpu times and durations are showing a difference.
All three query plans are more or less the same, with negligible difference between estimated costs for the queries.
The only differences I could see between the plans were the wait stats...
Query #1
WaitType = ASYNC_NETWORK_IO
WaitCount = 77,716
WaitTimeMs = 24,234
Query #2
WaitType = ASYNC_NETWORK_IO
WaitCount = 75,261
WaitTimeMs = 23,662
Query #3
WaitType = ASYNC_NETWORK_IO
WaitCount = 55,434
WaitTimeMs = 20,280
That's an extra 3-4 seconds, between including and not including the GETDATE() column in the result set, just waiting for whatever's running the query to acknowledge it has consumed the data and is ready for more.
In my case, I was using SSMS to execute the queries. So, I can only put it down to SSMS dragging its heels to render that extra column, which amounted to about 75 MB (10M x 8 bytes).
Having said that, the bulk of the time is obviously taken up with scanning all 10 million rows.
Unfortunately, I think the extra execution time to include your GETDATE() column is unavoidable.
Two points.
ASYNC_NETWORK_IO is SQL Server saying that it is waiting for network bandwidth to be available in order to send more data down the pipe.
SSMS stores the output of the Results window in a temp file on your C:\ drive so will be affected by disk I/O, AV scanning, other processes, etc. running on your machine. Same concept if you use a Linux OS.
I'd experiment with limiting the size of the data being returned (10M records can hardly be analysed by a human), and using a different tool to pull the records (if you really need 10M records) for starters.
Also, review the Execution Plan to find out where exactly the delay is. If it still points yo the ASYNC_NETWORK_IO wait, then your problem could be one or more of the network components between yourself and the server. Try using a wired connection instead of WiFi. Do you have a VPN? Is there anything limiting data transfer rates? Or the reason might simply be that too much data is being pulled.
Consider table testTable, a table with six fields: one of them a UNIQUEIDENTIFIER, one a TIMESTAMP and four of them VARCHARs. Field Filename is VARCHAR.
This first query takes 1 minutes 38 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512.b'
Either of these queries takes 1-3 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512'
Select top 1 * from testTable WHERE Filename like 'cusip.realloc.1412.b%'
I have looked at the execution plan for all three and the only difference is that the last query (the LIKE statement) used 46% index seek\54% Key Lookup vs a 50\50 index\key lookup for the first two. As far as I can tell as soon as I no longer use the .b part of this search criterium, the queries go back to normal speed.
FileName has been indexed; table has been removed and recreated just in case. We have added indexes, removed indexes, check table, checked database, restart services, restart server, recreated the table. This field used to be VARCHAR(MAX) and I changed it to VARCHAR(100) to index it, but the problem was occurring before making this change.
Something else that I believe may be happening is that there might be something wrong with the end of the table. It will never complete a full:
Select * from testTable
I hoped it was a corrupted table but that wasn't the case. However when we attempt to generate a script in SSMS it fails to generate (no error given). I was able to recreate it by using another SQL client and generating the structure from SSMS and data copy from the other SQL client.
We are pretty stumped.
I have a simple SQL query to count the number of telemetry records by clients within the last 24 hours.
With an index on TimeStamp, the following query runs in less than 1 seconds for about 10k rows
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR,-24, getutcdate()) and getutcdate() group by MachineName
However, when I tried to making the hard-coded -24 configurable and added a variable, it took more than 5 min for the query to get executed.
DECLARE #cutoff int; SET #cutoff = 24
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate() group by MachineName
Is there any specific reason for the significant decrease of performance? What's the best way of adding a variable without impacting performance?
My guess is that you also have an index on MachineName - or that SQL is deciding that since it needs to group by MachineName, that would be a better way to access the records.
Updating statistics as suggested by AngularRat is a good start - but SQL often maintains those automatically. (In fact, the good performance when SQL knows the 24 hour interval in advance is evidence that the statistics are good...but when SQL doesn't know the size of the BETWEEN in advance, then it thinks other approaches might be a better idea).
Given:
CREATE TABLE Telemetry ( machineName sysname, message varchar(88), [timestamp] timestamp)
CREATE INDEX Telemetry_TS ON Telemetry([timestamp]);
First, try the OPTION (OPTIMIZE FOR ( #cutoff = 24 )); clause to let SQL know how to approach the query, and if that is insufficient then try WITH (Index( Telemetry_TS)). Using the INDEX hint is less desirable.
DECLARE #cutoff int = 24;
select MachineName,count(Message)
from Telemetry -- WITH (Index( Telemetry_TS))
where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate()
group by MachineName
OPTION (OPTIMIZE FOR ( #cutoff = 24 ));
Your parameter should actually work, but you MIGHT be seeing an issue where the database is using out of date statistics for the query plan. I'd try updating statistics for the table you are quering. Something like:
UPDATE STATISTICS TableName;
Additionally, if your code is running from within a stored procedure, you might want to recompile the procedure. Something like:
EXEC sp_recompile N'ProcedureName';
A lot of times when I have a query that seems like it should run a lot faster but isn't, it's a statistic/query plan out of date issue.
References:
https://msdn.microsoft.com/en-us/library/ms187348.aspx
https://msdn.microsoft.com/en-us/library/ms190439.aspx
In my SQL Server query I try to get 2 seconds range of data:
DECLARE #runtime AS datetime
SELECT #runtime = '2014-02-15 03:34:17'
SELECT Application FROM commandcip
WHERE
commandname = 'RunTestCase' AND
(createdate BETWEEN DATEADD(s, -1, #runtime) AND DATEADD(s, 1, #runtime))
This command is extremely slow, it takes minutes and the Estimated Subtree Cost based on Performance analyzer is 2800.
On other hand if I compute the range manually, the query is perfectly fast (Estimated Subtree Cost = 0.5, query time < 1 second):
SELECT Application FROM commandcip
WHERE
commandname = 'RunTestCase' AND
createdate BETWEEN '2014-02-15 03:34:16' AND '2014-02-15 03:34:18'
I verified that both commands return correct data. I verified that my DATEADD commands return correct dates. I also tried to get DATEADD one step sooner (into separate variables #mindate, #maxdate), but it didn't help.
How should I speedup first query without manually computing the range?
For createdate BETWEEN '2014-02-15 03:34:16' AND '2014-02-15 03:34:18' the literal values can be looked up in the column statistics to estimate the number of rows that will match.
The values of variables are not sniffed except if you use option (recompile) so SQL Server will just use heuristics to guess a number.
Presumably the plan that is derived from using the first number is different from that from using the second number.
e.g. One estimates fewer rows and uses a non covering index with lookups and the other a full scan as the estimated number of rows is above the tipping point where this option is considered cheaper.
A function on the left side of the comparison is like a black box to SQL Server. You always have to try to move the function to the right
The "between" keyword is added for convenience for the developer. The query optimizer always rewrites this to double comparison. Between isn't slower than double comparison.
You can see this in action when you use: SET STATISTICS PROFILE ON at the top of your query
A query execution time depends on many factors.
More, in this case, doing operations on WHERE clause, for each tuple, it's normal to be a little slow.
My suggetion is to tru to improve your select.
For example, add 2 variables, #start datetime = DATEADD(s, -1, #runtime), #end datetime = DATEADD(s, 1, #runtime), and replace DATEADD(s, -1, #runtime) and DATEADD(s, 1, #runtime).
Another , sometimes between is slower than double comparison (>= , <=).
I have a view that returns 2 ints from a table using a CTE. If I query the view like this it runs in less than a second
SELECT * FROM view1 WHERE ID = 1
However if I query the view like this it takes 4 seconds.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
I've checked the 2 query plans and the first query is performing a Clustered index seek on the main table returning 1 record then applying the rest of the view query to that result set, where as the second query is performing an index scan which is returning about 3000 records records rather than just the one I'm interested in and then later filtering the result set.
Is there anything obvious that I'm missing to try to get the second query to use the Index Seek rather than an index scan. I'm using SQL 2008 but anything I do needs to also run on SQL 2005. At first I thought it was some sort of parameter sniffing problem but I get the same results even if I clear the cache.
Probably it is because in the parameter case, the optimizer cannot know that the value is not null, so it needs to create a plan that returns correct results even when it is. If you have SQL Server 2008 SP1 you can try adding OPTION(RECOMPILE) to the query.
You could add an OPTIMIZE FOR hint to your query, e.g.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id OPTION (OPTIMIZE FOR (#ID = 1))
In my case in DB table column type was defined as VarChar and in parameterized query parameter type was defined as NVarChar, this introduced CONVERT_IMPLICIT in the actual execution plan to match data type before comparing and that was culprit for sow performance, 2 sec vs 11 sec. Just correcting parameter type made parameterized query as fast as non parameterized version.
One possible way to do that is to CAST the parameters, as such:
SELECT ...
FROM ...
WHERE name = CAST(:name AS varchar)
Hope this may help someone with similar issue.
I ran into this problem myself with a view that ran < 10ms with a direct assignment (WHERE UtilAcctId=12345), but took over 100 times as long with a variable assignment (WHERE UtilAcctId = #UtilAcctId).
The execution-plan for the latter was no different than if I had run the view on the entire table.
My solution didn't require tons of indexes, optimizer-hints, or a long-statistics-update.
Instead I converted the view into a User-Table-Function where the parameter was the value needed on the WHERE clause. In fact this WHERE clause was nested 3 queries deep and it still worked and it was back to the < 10ms speed.
Eventually I changed the parameter to be a TYPE that is a table of UtilAcctIds (int). Then I can limit the WHERE clause to a list from the table.
WHERE UtilAcctId = [parameter-List].UtilAcctId.
This works even better. I think the user-table-functions are pre-compiled.
When SQL starts to optimize the query plan for the query with the variable it will match the available index against the column. In this case there was an index so SQL figured it would just scan the index looking for the value. When SQL made the plan for the query with the column and a literal value it could look at the statistics and the value to decide if it should scan the index or if a seek would be correct.
Using the optimize hint and a value tells SQL that “this is the value which will be used most of the time so optimize for this value” and a plan is stored as if this literal value was used. Using the optimize hint and the sub-hint of UNKNOWN tells SQL you do not know what the value will be, so SQL looks at the statistics for the column and decides what, seek or scan, will be best and makes the plan accordingly.
I know this is long since answered, but I came across this same issue and have a fairly simple solution that doesn't require hints, statistics-updates, additional indexes, forcing plans etc.
Based on the comment above that "the optimizer cannot know that the value is not null", I decided to move the values from a variable into a table:
Original Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
SELECT * FROM ...
WHERE
C.CreateDtTm >= #StartTime
AND C.CreateDtTm < #EndTime
New Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
CREATE TABLE #Times (StartTime datetime2(0) NOT NULL, EndTime datetime2(0) NOT NULL)
INSERT INTO #Times(StartTime, EndTime) VALUES(#StartTime, #EndTime)
SELECT * FROM ...
WHERE
C.CreateDtTm >= (SELECT MAX(StartTime) FROM #Times)
AND C.CreateDtTm < (SELECT MAX(EndTime) FROM #Times)
This performed instantly as opposed to several minutes for the original code (obviously your results may vary) .
I assume if I changed my data type in my main table to be NOT NULL, it would work as well, but I was not able to test this at this time due to system constraints.
Came across this same issue myself and it turned out to be a missing index involving a (left) join on the result of a subquery.
select *
from foo A
left outer join (
select x, count(*)
from bar
group by x
) B on A.x = B.x
Added an index named bar_x for bar.x
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
Do this
DECLARE #sql varchar(max)
SET #sql = 'SELECT * FROM View1 WHERE ID =' + CAST(#id as varchar)
EXEC (#sql)
Solves your problem