SQL Server: Order By DateDiff Performance issue - sql-server

I'm having a problem getting top 100 rows from a table with 2M rows in reasonable time.
The problem is the order by part, it takes more than 50 minutes to get results for this query..
What can be the best solution for this problem?
select top 100 * from THETABLE TT
Inner join SecondTable ST on TT.TypeID = ST.TypeID
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Many thanks,
Bentzy
Edit:
* TheTable is the one with 2M rows.
* SomeParam has 15 distinct values (more or less)

There are two things that come to mind to speed up this fetch:
If you need to run this query often, you should index the column 'lastCheckDate'. No matter which sql db you are using, a well defined index on the column will allow for faster selects, especially in an orders by clause.
Perform the date math before doing the select query. You are getting the difference in days between the row's checkDate and the current date, times some parameter. Does the multiplication affect the ordering of the rows? Can this simply be ordered by the 'lastCheckDate desc'? Explore other sorting options that return the same result.

Two ideas come to mind:
a) If ST.param doesn't change often, perhaps you can cache the result of the multiplication somewhere. The numbers would be "off" after a day, but the relative values would be the same - i.e., the sort order wouldn't change.
b) Find a way to reduce the size of the input tables. There are probably some values of LastCheckDate &/or SomeParam that will never be in the top 100. For example,
Select *
into #tmp
from THETABLE
where LastCheckDate between '2012-06-01' and getdate()
select top 100 *
from #tmp join SecondTable ST on #tmp.TypeID = ST.TypeID
order by DateDiff(day, LastCheckDate, getdate()) * ST.SomeParam desc
It's a lot faster to search a small table than a big one.

DATEDIFF(Day, TT.LastCheckDate, GETDATE()) is the number of days since "last check".
If you just order by TT.LastCheckDate you get a similar order.
EDIT
Maybe you can work out what dates you don't expect to get back and filter on them. Of course you then also need an index on that LastDateCheck column. If everything works out, you can at least shorten the list of records to check from 2M to some managable amount.

It is quite complicated.Do you seriouslly need all columns in query?There is one thing which you could try here. First just get the top 100 rows typeid
something like below
select top 100 typeid
,TT.lastcheckdate,st.someparam --do not use these if the typeid is unqiue in both tables..
--or just the PK columns of both tables and typeid so that these can be joined on PK
into #temptable
from st inner join tt on st.typeid = tt.typeid
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Above will sort very minimal data and thus should be faster.Based on how many columns you have in your table and indexes this should be way faster (it will be fast if you have many columns in both tables but this query will use just 3.Also, maybe these columns (st.typeid,st.someparam and tt.typeid and tt.lastcheckdate) are covered by some of indexes so no need to read underlying tables and thus reduce the IO as well) than actual one..Then join this data back to both tables.
If that doesnt work the way you expect.Then you can have indexed view using above select by adding the order by expression as column. Then use this indexed view to get top 100 and join with main tables.This will surely reduce the amount of work and thus improve perf.But Indexed view will have overhead which will depend on how frequently data changed in the table TT.

To lessen number of rows you might retrieve top (100) for each SecondTable record ordered by LastCheckDate, and then union all them and finally select top (100), by means of temporary table or dynamic sql generated query.
This solution uses cursor to fetch top 100 records for each value in SecondTable. With index on (TypeID, LastCheckDate) on TheTable it runs instantaneously (tested on my system with a table of 700,000 records and 50 SecondTable entries).
declare #SomeParam varchar(3)
declare #TypeID int
declare #tbl table (TheTableID int, LastCheckDate datetime, SomeParam float)
declare rstX cursor local fast_forward for
select TypeID, SomeParam
from SecondTable
open rstX
while 1 = 1
begin
fetch next from rstX into #TypeID, #SomeParam
if ##fetch_status <> 0
break
insert into #tbl
select top 100 ID, LastCheckDate, #SomeParam
from TheTable
where TypeID = #TypeID
order by LastCheckDate
end
close rstX
deallocate rstX
select top 100 *
from #tbl
order by DATEDIFF(Day, LastCheckDate, GETDATE()) * SomeParam
Obviously this solution fetches ID's only. You might want to expand temporary table with additional columns.

Related

How to speed up denormalized table with indexes

I have created a denormalized table which has 35 columns and 360k records. the table consist of 8 other tables. Table only has one inner join with some other table.
My main problem is performance, queries to this table works so slow. I also have full-text catalog related to that table. Fts searches also slow.
Also i am having deadlocks, in activity monitor i see LCK_IM_X, LCK_IM_S wait types.
I placed indexes like crazy and followed query executions plans.
Now i know that table scan is bad when there's too much record. I passed that long time ago, but right now, Index Scan's are going with %80 cost.
In below, inside of select comes from another table as string, this query works very much at its top speed. Also, i dynamically build where clause along with other conditions.
https://imgur.com/a/ijOUYeY
SET #Sql=';WITH TempResult AS(Select '+(SELECT STUFF((SELECT ',' +DBFieldName FROM TableFields where TableFields.TableID=3 FOR XML PATH('')), 1, 1, ''))+',
0 as [DeviceChange],
0 as [DeviceReturn],
0 as [SNORepeat]
from DenormalizedTable
inner join Companies On Companies.CompanyID=DenormalizedTable.CompanyID
WHERE (#state is null or DenormalizedTable.StateID=#state)
AND ((#status = -1 AND DenormalizedTable.Status IN(0,1,10,11,4)) OR (#status=1 AND DenormalizedTable.Status IN(1,4,10,11)) OR
(#status=2 AND DenormalizedTable.Status IN (0))) AND (#CategoryID is null or DenormalizedTable.CategoryID=#CategoryID)
AND (DenormalizedTable.CompanyID=#companyID OR Companies.SubCompanyOf=#companyID)
'+ ( CASE WHEN #CustomSearchParam='' THEN '' ELSE (Select dbo.[perf_whereBuilder](#CustomSearchParam)) END)+'
AND (#technicianID is null or DenormalizedTable.JobOrderID IN (Select AttendedStaff.JobOrderID from AttendedStaff Where AttendedStaff.StaffID=#technicianID))
AND (#FilterStartDate is null or convert(varchar, JobOrder_PerfTable.StartDate, 20) between #FilterStartDate and #FilterEndDate)
), TotalCount AS (Select COUNT(*) as TotalCount from TempResult)
Select * from TempResult, TotalCount
order by 1 desc
OFFSET #skip ROWS
FETCH NEXT #take ROWS ONLY;';
when i run the sp it tooks almos 5 seconds to load. if there's any search parameter it goes even more.
I need to know what i should do to run query faster, sorry if question is broad.

Using a running total calculated column in SQL Server table variable

I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?
GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it
Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc

Large Table Select and Insert

I haven't been able to find anything that solves this really, though I have found many things that seem to point in the right direction.
I have a table with ~4.7 Million records in it. This table also has ~319 columns. Of all of these ~319 columns, there are 16 that I am interested in, and I want to put them into another table that is just 2 columns. Now basically how this is set is that column "A" is just an ID and columns 1-15 are codes. None of the columns are grouped either (not sure if that matters).
I have tried things like:
Insert Into NewTable(ID,Profession)
Select ID, ProCode1 From OriginalTable WHERE ProCode1 > ''
UNION
Select ID, ProCode2 From OriginalTable WHERE ProCode2 > ''
And so on. This didn't seem to do anything at all and I let it go for ~ 20 minutes.
Now I can get a small result doing the same but dropping the union and using a TOP (1000) statement, however even that will never work.
So the question is what can I do to take this:
ID|PID|blah|blah|blah|...|ProCode1|ProCode2|ProCode3|...|ProCode15|blah|...
into:
ID|PID|ProCode|
across all ~4.7 million rows without running:
Insert Into NewTable(PID,ProCode)
select PID, ProCode1 FROM OriginalTable WHERE ProCode1 > ''
Insert Into NewTable(PID, ProCode)
select PID, ProCode2 FROM Original Table WHERE ProCode2 > ''
Insert Into New Table(PID, ProCode)
Select PID, ProCode3 FROM Original Table WHERE ProCode3 > ''
...
...
...
EDIT: I forgot that a majority of the columns for ProCodeX are blank. All ProCode1 rows are occupied, but that becomes exponentially less each increase (e.g. ProCode2 is <50% occupied, ProCode3 is <10% occupied)
Use Cross Apply with Table valued constructor to unpivot the data instead of using different UNION ALL
Insert Into NewTable(PID,ProCode)
select PID, ProCode FROM OriginalTable
Cross apply
(
values(ProCode1),(ProCode2),(ProCode3),..(ProCode15)
)
cs (ProCode)
Where ProCode <> ''
This will be much faster than the UNION ALL query since this will do single physical table hit.

Performing INSERT for each row in a select RESULT

First, a general description of the problem: I'm running a periodical process which updates total figures in a table. The issue is, that multiple updates may be required in each execution of the process, and each execution depends on the previous results.
My question is, can it be done in a single SQL Server SP?
My code (I altered it a little to simply the sample):
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
CTS.previousTotalSessions
FROM (SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x) MS
CROSS APPLY TVF_GetPreviousCustomerTotalSessions(MS.customer) CTS
ORDER BY time
The previousTotalSessions column depends on other rows in UpdatedTable, and its value is retrieved by CROSS APPLYing TVF_GetPreviousCustomerTotalSessions, but if I execute the SP as-is, all the rows use the value retrieved by the function without taking the rows added during the execution of the SP.
For the sake of completeness, here's TVF_GetPreviousCustomerTotalSessions:
FUNCTION [dbo].[TVF_GetCustomerCurrentSessions]
(
#customerId int
)
RETURNS #result TABLE (PreviousNumberOfSessions int)
AS
BEGIN
INSERT INTO #result
SELECT TOP 1 (PreviousNumberOfSessions + Opened - Closed) AS PreviousNumberOfSessions
FROM CustomerMinuteSessions
WHERE CustomerId = #customerId
ORDER BY time DESC
IF ##rowcount = 0
INSERT INTO #result(PreviousNumberOfSessions) VALUES(0)
RETURN
END
What is the best (i.e. without for loop, I guess...) to take previous rows within the query for subsequent rows?
If you are using SQL-2005 and later, you can do it with few CTEs in one shot. If you use SQL-2000 you'll can use inline table-valued function.
Personally I like the CTE approach more, so I'm including a schematic translation of your code to CTEs syntax. (Bare in mind hat I didn't prepare a test set to check it).
WITH LastSessionByCustomer AS
(
SELECT CustomerID, MAX(Time)
FROM CustomerMinuteSessions
GROUP BY CustomerID
)
, GetPreviousCustomerTotalSessions AS
(
SELECT LastSession.CustomerID, LastSession.PreviousNumberOfSessions + LastSession.Opened - LastSession.Closed AS PreviousNumberOfSessions
FROM CustomerMinuteSessions LastSession
INNER JOIN LastSessionByCustomer ON LastSessionByCustomer.CustomerID = LastSession.CustomerID
)
, MS AS
(
SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x
)
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
ISNULL(GetPreviousCustomerTotalSessions.previousTotalSessions, 0)
FROM MS
RIGHT JOIN GetPreviousCustomerTotalSessions ON MS.Customer = GetPreviousCustomerTotalSessions.CustomerID
Going a bit beyond your question, I think that your query with cross apply could make big damage to the database once table CustomerMinuteSessions database grows
I would add an index like to improve your chances of getting Index-Seek:
CREATE INDEX IX_CustomerMinuteSessions_CustomerId
ON CustomerMinuteSessions (CustomerId, [time] DESC, PreviousNumberOfSessions, Opened, Closed );

What is the most efficient way to page large amounts of data in SQL Server 2000?

If I have a query with a lot of information (something like a couple of views that each hit a handful of tables, with many tables having tens of thousands of rows), and I just need to get 10 records from it to display to the user, what's the best way, performance-wise, to retrieve those records while still supporting SQL Server 2000? Once I can use SQL Server 2005, ROW_NUMBER seems like the obvious choice (correct me if I'm wrong), but what to do in 2000?
Greg Hamilton has an article which uses SET ROWCOUNT and SELECTing into a variable to avoid having to reference rows that aren't needed, with some pretty compelling performance results. However, MSDN says
If a variable is referenced in a select list, it should be assigned a scalar value or the SELECT statement should only return one row.
But then it goes on to say
Note that effects are only visible if there are references among the assignments.
If a SELECT statement returns more than one row and the variable references a nonscalar expression, the variable is set to the value returned for the expression in the last row of the result set.
Indicating that it's really okay in this instance (right?)
Greg ends up with this:
CREATE PROCEDURE [dbo].[usp_PageResults_NAI]
(
#startRowIndex int,
#maximumRows int
)
AS
DECLARE #first_id int, #startRow int
-- A check can be added to make sure #startRowIndex isn't > count(1)
-- from employees before doing any actual work unless it is guaranteed
-- the caller won't do that
-- Get the first employeeID for our page of records
SET ROWCOUNT #startRowIndex
SELECT #first_id = employeeID FROM employees ORDER BY employeeid
-- Now, set the row count to MaximumRows and get
-- all records >= #first_id
SET ROWCOUNT #maximumRows
SELECT e.*, d.name as DepartmentName
FROM employees e
INNER JOIN Departments D ON
e.DepartmentID = d.DepartmentID
WHERE employeeid >= #first_id
ORDER BY e.EmployeeID
SET ROWCOUNT 0
GO
This method assumes that you have a unique ID to order by, I don't think that you can use this method as-is when sorting on, say, a non-unique DateTime column.

Resources