In my SQL Server query I try to get 2 seconds range of data:
DECLARE #runtime AS datetime
SELECT #runtime = '2014-02-15 03:34:17'
SELECT Application FROM commandcip
WHERE
commandname = 'RunTestCase' AND
(createdate BETWEEN DATEADD(s, -1, #runtime) AND DATEADD(s, 1, #runtime))
This command is extremely slow, it takes minutes and the Estimated Subtree Cost based on Performance analyzer is 2800.
On other hand if I compute the range manually, the query is perfectly fast (Estimated Subtree Cost = 0.5, query time < 1 second):
SELECT Application FROM commandcip
WHERE
commandname = 'RunTestCase' AND
createdate BETWEEN '2014-02-15 03:34:16' AND '2014-02-15 03:34:18'
I verified that both commands return correct data. I verified that my DATEADD commands return correct dates. I also tried to get DATEADD one step sooner (into separate variables #mindate, #maxdate), but it didn't help.
How should I speedup first query without manually computing the range?
For createdate BETWEEN '2014-02-15 03:34:16' AND '2014-02-15 03:34:18' the literal values can be looked up in the column statistics to estimate the number of rows that will match.
The values of variables are not sniffed except if you use option (recompile) so SQL Server will just use heuristics to guess a number.
Presumably the plan that is derived from using the first number is different from that from using the second number.
e.g. One estimates fewer rows and uses a non covering index with lookups and the other a full scan as the estimated number of rows is above the tipping point where this option is considered cheaper.
A function on the left side of the comparison is like a black box to SQL Server. You always have to try to move the function to the right
The "between" keyword is added for convenience for the developer. The query optimizer always rewrites this to double comparison. Between isn't slower than double comparison.
You can see this in action when you use: SET STATISTICS PROFILE ON at the top of your query
A query execution time depends on many factors.
More, in this case, doing operations on WHERE clause, for each tuple, it's normal to be a little slow.
My suggetion is to tru to improve your select.
For example, add 2 variables, #start datetime = DATEADD(s, -1, #runtime), #end datetime = DATEADD(s, 1, #runtime), and replace DATEADD(s, -1, #runtime) and DATEADD(s, 1, #runtime).
Another , sometimes between is slower than double comparison (>= , <=).
Related
I have SQL query like this:
DECLARE #cdate1 date = '20200401 00:00:00'
DECLARE #cdate2 date = '20200630 23:59:59'
SELECT DISTINCT ([hf].[id])
FROM ((([hf]
JOIN [pw] AS [PP] ON [PP].[identity] = [hf].[id]
AND [PP].[type] = 67
AND [PP].[ideletestate] = 0
AND [PP].[datein] = (SELECT MAX([datein])
FROM [pw]
WHERE [pw].[identity] = [hf].[id]
AND [pw].[ideletestate] = 0
AND [pw].[type] = 67))
JOIN [px] ON [px].[idpaper] = [PP].[id]
AND [px].[ideletestate] = 0
AND [px].[type] = 30036
AND [px].[nazvanie] NOT LIKE '')
JOIN [pw] ON ([pw].[identity] = [hf].[id]
AND ([pw].[id] > 0)
AND ([pw].[ideletestate] = 0)
AND ([pw].[type] IN (16, 2, 3012, 19, 3013)))
LEFT JOIN [px] AS [px102] ON [px102].[idpaper] = [pw].[id]
AND [px102].[type] = 102
AND [px102].[ideletestate] = 0)
WHERE
(([pw].[idcompany] in (12461, 12466, 12467, 12462, 12463, 13258)) OR
([pw].[idcompany2] in (12461, 12466, 12467, 12462, 12463, 13258)) OR
([px102].[idcompany] in (12461, 12466, 12467, 12462, 12463, 13258)) ) AND
[pw].[datein] >= #cdate1 AND [pw].[datein] <= #cdate2
It works fine, but if I print it like this ...AND [pw].[datein] >= '20200401 00:00:00' AND [pw].[datein] <= '20200630 23:59:59', it work very slowly. 10 minutes vs 1 sec.
One more strange, if i use first date '20200101 00:01:00' it work fast too. If date more then 10 March 2020, it work very slow (if date like string in query, if variable it work good).
Do I have a bad query? But why do it work with variable? Or is it some issue with SQL Server?
This looks like a statistics problem.
SQL server will build a histogram of the values in a table to give it some idea what kind of query plan to create.
For example, if you have a table T with a million rows in it, but the value of column C is always 1, and then you do select * from T with C = 1, the engine will choose a plan that expects to get a lot of rows returned, because the histogram says "it is statistically likely that this table contains a hell of a lot of rows where C = 1"
Alternatively, if you have a table T with a million rows in it, but the value of column C is never 1, then the histogram tells the engine "very few rows are likely to be returned for the query select * from T where C = 1 so pick a plan optimized for a small number of rows".
A problem can arise when the values in a column have significantly changed, but the histogram (statistics) have yet to be updated. Then SQL might pick a plan based on the histogram, where a different plan would have been much better. In your case, the histogram may not indicate to the engine that there are any values greater than about the 10th of March 2020. Statistics issues are fairly common with dates, because you are often inserting getdate(), which means newer rows in the table will contain values that have never been seen before, and thus won't be covered by the histogram until it gets updated.
SQL will automatically update statistics based on a number of different triggers (this is an old article, newer versions of the engine may have changed slightly), as long as you have auto update statistics enabled in the database settings.
You can find out whether this is the issue by forcing SQL to update statistics on your table. Statistics can be refreshed by either fully scanning the table, or sampling it. Sampling is much faster for large tables, but the result won't be as accurate.
To find out whether statistics is the problem in your case, do:
update statistics PW with fullscan
I'm using two databases and I've got a problem.
One uses date with datetime datatype, while the other one uses int (201901010000 for example).
I want to compare them and it needs to be fast, I've tried so far :
convert(date, 21) or other numbers (my datetime is YYYY-MM-DD HH:MM:SSSS) but it doesn't work with minutes and hours.
I tried to delete all the '-' and ':' from the datetime and then convert to int
So my question is simple, how to compare it efficiently?
Thank you =)
You could use one of the below expressions to do the JOIN (not saying this is pretty at all). As per the comments I made in under the question, however, neither are going to perform well.
SELECT *
FROM (VALUES(CONVERT(numeric,'201901010000'))) N(Dt)
JOIN (VALUES(CONVERT(datetime,'2019-01-01T00:00:00'))) D(dt) ON CONVERT(datetime,STUFF(STUFF(STUFF(STUFF(CONVERT(varchar(12),N.dt),11,0,':'),9,0,'T'),7,0,'-'),5,0,'-') + ':00.000') = D.Dt;
SELECT *
FROM (VALUES(CONVERT(numeric,'201901010000'))) N(Dt)
JOIN (VALUES(CONVERT(datetime,'2019-01-01T00:00:00'))) D(dt) ON N.Dt = CONVERT(numeric,LEFT(REPLACE(REPLACE(REPLACE(CONVERT(varchar(20),CONVERT(datetime,D.Dt),126),'-',''),'T',''),':',''),12));
Unless you can change the datatype of one of the columns (i.e. via a computed column) there's .little you can do about the performance, as the query in non-SARGable.
I have a simple SQL query to count the number of telemetry records by clients within the last 24 hours.
With an index on TimeStamp, the following query runs in less than 1 seconds for about 10k rows
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR,-24, getutcdate()) and getutcdate() group by MachineName
However, when I tried to making the hard-coded -24 configurable and added a variable, it took more than 5 min for the query to get executed.
DECLARE #cutoff int; SET #cutoff = 24
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate() group by MachineName
Is there any specific reason for the significant decrease of performance? What's the best way of adding a variable without impacting performance?
My guess is that you also have an index on MachineName - or that SQL is deciding that since it needs to group by MachineName, that would be a better way to access the records.
Updating statistics as suggested by AngularRat is a good start - but SQL often maintains those automatically. (In fact, the good performance when SQL knows the 24 hour interval in advance is evidence that the statistics are good...but when SQL doesn't know the size of the BETWEEN in advance, then it thinks other approaches might be a better idea).
Given:
CREATE TABLE Telemetry ( machineName sysname, message varchar(88), [timestamp] timestamp)
CREATE INDEX Telemetry_TS ON Telemetry([timestamp]);
First, try the OPTION (OPTIMIZE FOR ( #cutoff = 24 )); clause to let SQL know how to approach the query, and if that is insufficient then try WITH (Index( Telemetry_TS)). Using the INDEX hint is less desirable.
DECLARE #cutoff int = 24;
select MachineName,count(Message)
from Telemetry -- WITH (Index( Telemetry_TS))
where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate()
group by MachineName
OPTION (OPTIMIZE FOR ( #cutoff = 24 ));
Your parameter should actually work, but you MIGHT be seeing an issue where the database is using out of date statistics for the query plan. I'd try updating statistics for the table you are quering. Something like:
UPDATE STATISTICS TableName;
Additionally, if your code is running from within a stored procedure, you might want to recompile the procedure. Something like:
EXEC sp_recompile N'ProcedureName';
A lot of times when I have a query that seems like it should run a lot faster but isn't, it's a statistic/query plan out of date issue.
References:
https://msdn.microsoft.com/en-us/library/ms187348.aspx
https://msdn.microsoft.com/en-us/library/ms190439.aspx
I have a T-SQL statement that I am running against a table with many rows. I am seeing some strange behavior. Comparing a DateTime column against a precalculated value is slower than comparing each row against a calculation based on the GETDATE() function.
The following SQL takes 8 secs:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
DECLARE #TimeZoneOffset int = -(DATEPART("HH", GETUTCDATE() - GETDATE()))
DECLARE #LowerTime DATETIME = DATEADD("HH", ABS(#TimeZoneOffset), CONVERT(VARCHAR, GETDATE(), 101) + ' 17:00:00')
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > #LowerTime
GO
This alternate strangely returns instantly:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO
SELECT TOP 200 Id, EventDate, Message
FROM Events WITH (NOLOCK)
WHERE EventDate > GETDATE()-1
GO
Why is the second query so much faster?
EDITED: I updated the SQL to accurately reflect other settings I am using
After doing a lot of reading and researching, I've discovered the issue here is parameter sniffing. Sql Server attempts to determine how best to use indexes based on the where clause, but in this case it isnt doing a very good job.
See the examples below :
Slow version:
declare #dNow DateTime
Select #dNow=GetDate()
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
Fast version:
Select *
From response_master_Incident rmi
Where rmi.response_date between DateAdd(hh,-2,GetDate()) AND GetDate()
The "Fast" version runs around 10x faster than the slow version. The Response_Date field is indexed and is a DateTime type.
The solution is to tell Sql Server how best to optimise the query. Modifying the example as follows to include the OPTIMIZE option resulted in it using the same execution plan as the "Fast Version". The OPTMIZE option here explicitly tells sql server to treat the local #dNow variable as a date (as if declaring it as DateTime wasnt enough :s )
Care should be taken when doing this however because in more complicated WHERE clauses you could end up making the query perform worse than Sql Server's own optimisations.
declare #dNow DateTime
SET #dNow=GetDate()
Select ID, response_date, call_back_phone
from response_master_Incident rmi
where rmi.response_date between DateAdd(hh,-2,#dNow) AND #dNow
-- The optimizer does not know too much about the variable so assumes to should perform a clusterd index scann (on the clustered index ID) - this is slow
-- This hint tells the optimzer that the variable is indeed a datetime in this format (why it does not know that already who knows)
OPTION(OPTIMIZE FOR (#dNow = '99991231'));
The execution plans must be different, because SQL Server does not evaluate the value of the variable when creating the execution plan in execution time. So, it uses average statistics from all the different dates that can be stored in the table.
On the other hand, the function getdate is evaluated in execution time, so the execution plan is created using statistics for that specific date, which of course, are more realistic that the previous ones.
If you create a stored procedure with #LowerTime as a parameter, you will get better results.
I'm working with an MSSQL 2000 database containing large amounts of Windows perfmon data collected for all servers in the environment. I'm using SSRS 2005 to build a custom report chart to visualize the metrics over time.
If I wanted to view, say, the last month the extensive number of data points would create an ugly report with unreadable labels on the X axis. I would like to reduce the aggregate the data by time down to n data points so to give the average value over the grouped time spans.
I've tried building a query with fancy GROUP BY clauses, haven't been able to build something that executes. I figured this ought to be a common task for SQL, but I haven't found any answers online.
The table structure basically looks like below. This is actually the MOM 2005 OnePoint database, but I think the application is irrelevant.
CREATE TABLE PerfTable (
[time] datetime,
value float,
Server nvarchar(356),
ObjectName nvarchar(225),
CounterName nvarchar(225),
InstanceName nvarchar(225),
Scale float
);
Might be worth building a View to look at a months worth of data and work with the SQL behind that to reduce the amount of data.
Then you can run the report from that View.
Also, it might be worth you giving us an idea of the table structure involved and SQL you've currently using to get the results.
Do you really need to reduce the number of records returned from SQL, or just the data rendered by the chart?
It might be easier to get all the values from SQL, and then massage the data into something more usable later. Changing the query will reduce the network usage as less data will be sent, but if that's not a concern then maybe the query isn't the best place to do it.
Say we want to have 3 timespand and the average of "value" in that span.
First we determine the periods.. start-end,start-end,start-end etc
This you can do in your own code so I use parameters.
In this example we also group by 'server' but you can add extra columns or remove it.
DECLARE #startdate1 as DateTime
DECLARE #enddate1 as DateTime
DECLARE #startdate2 as DateTime
DECLARE #enddate2 as DateTime
DECLARE #startdate3 as DateTime
DECLARE #enddate3 as DateTime
SELECT
CASE WHEN time >= #startdate1 AND time < #enddate1 THEN 'PERIOD1'
ELSE CASE WHEN time >= #startdate2 AND time < #enddate2 THEN 'PERIOD2'
ELSE CASE WHEN time >= #startdate3 AND time < #enddate3 THEN 'PERIOD3'
END
END
END as Period,
AVG(p.[value]),
p.[Server]
FROM PerfTable p
GROUP BY
CASE WHEN time >= #startdate1 AND time < #enddate1 THEN 'PERIOD1'
ELSE CASE WHEN time >= #startdate2 AND time < #enddate2 THEN 'PERIOD2'
ELSE CASE WHEN time >= #startdate3 AND time < #enddate3 THEN 'PERIOD3'
END
END
END,
p.[Server]
You can use DATEPART function to get chunks of data filtered by specific day, hour or minute (or several others). You should be able to group by these and get averages/aggregates you need.
OK, here's the solution to get n aggregates (as far as there's data in each chunk of time):
declare #points as int
declare #start as float
declare #period as float
set #points = 20
select
#start=cast(min(time) as float),
#period=cast(max(time)-min(time) as float)
from perftable
select avg(value),
round((cast(time as float)-#start)/(#period/#points),0,1)
from perftable
group by
round((cast(time as float)-#start)/(#period/#points),0,1)
#points variable is a number od aggregates you want to get.
#start is time of the first record in report casted to float
#period is difference between begin and end dates in report
The rest is pretty much linear scaling of dates to range [0;#points], truncating results to integers and grouping by truncated results.
I have a solution that works closely to what I was asking for. If I wanted to group by a time unit, it's pretty simple:
Group by hour:
select
dateadd(hh, datediff(hh, '1970-01-01', [time]), '1970-01-01'),
Server, ObjectName, CounterName, InstanceName, avg(value)
from PerfTable
group by
dateadd(hh, datediff(hh, '1970-01-01', [time]), '1970-01-01'),
ComputerName, ObjectName, CounterName, InstanceName
order by
dateadd(hh, datediff(hh, '1970-01-01', [time]), '1970-01-01') desc,
ObjectName, CounterName, InstanceName, ComputerName
This just doesn't address the need to scale down to n data points.