SQL running sum for an MVC application - sql-server

I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.

Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?

Related

SQl Server - Where clause uses maximum date in data

I'm struggling with something i thought would be easy.
I have a table that is updated via an append on most days and has a report date field that shows the date the rows were updated.
I want to join to this table but only pull back the records from the date the table was last updated
Most of the time I could get away just looking for yesterdays date as the table is updated most days
Where [reportdate] > DATEADD(DAY, -1, GETDATE())
But as its not always updated daily, I wanted to rule this issue out. Is there anyway of returning the max date?
I was trying to figure out max (date), but I can't figure out the grouping. I need to return all the fields. The below just seems to return the whole table
SELECT max ([ReportDate]) as reportdate
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM table
group by guid
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
I could get round it with a temp table that just has the max report date and then using this as the left part of a join
SELECT max ([ReportDate]) as reportdate
FROM [DOMCustomers].[dbo].[DCC_Device_Comms_Compiled]
But The SQL is triggered in Excel so temp tables are problematic (i think).
Is there anyway of returning the max date?
Like this:
SELECT *
FROM SomeTable
where ReportDate = (select max(ReportDate) from SomeTable)
Here is a conceptual example.
It will produce a latest row for each car make.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, make VARCHAR(20), ReportDate DATETIME);
INSERT INTO #tbl (make, ReportDate) VALUES
('Ford', '2020-12-31'),
('Ford', '2020-10-17'),
('Tesla', '2020-10-25'),
('Tesla', '2020-12-30');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY make ORDER BY ReportDate DESC) AS seq
FROM #tbl
)
SELECT * FROM rs
WHERE seq = 1;
Seems like a DENSE_RANK and TOP would work (assuming ReportDate is a date):
SELECT TOP (1) WITH TIES
[ReportDate]
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM YourTable
ORDER BY DENSE_RANK() OVER (ORDER BY ReportDate DESC);
If ReportDate is a date and time value, and you want everything for the latest date (ignoring time), then replace ReportDate with CONVERT(date,ReportDate) in the ORDER BY.

Using a running total calculated column in SQL Server table variable

I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?
GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it
Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc

Performing INSERT for each row in a select RESULT

First, a general description of the problem: I'm running a periodical process which updates total figures in a table. The issue is, that multiple updates may be required in each execution of the process, and each execution depends on the previous results.
My question is, can it be done in a single SQL Server SP?
My code (I altered it a little to simply the sample):
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
CTS.previousTotalSessions
FROM (SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x) MS
CROSS APPLY TVF_GetPreviousCustomerTotalSessions(MS.customer) CTS
ORDER BY time
The previousTotalSessions column depends on other rows in UpdatedTable, and its value is retrieved by CROSS APPLYing TVF_GetPreviousCustomerTotalSessions, but if I execute the SP as-is, all the rows use the value retrieved by the function without taking the rows added during the execution of the SP.
For the sake of completeness, here's TVF_GetPreviousCustomerTotalSessions:
FUNCTION [dbo].[TVF_GetCustomerCurrentSessions]
(
#customerId int
)
RETURNS #result TABLE (PreviousNumberOfSessions int)
AS
BEGIN
INSERT INTO #result
SELECT TOP 1 (PreviousNumberOfSessions + Opened - Closed) AS PreviousNumberOfSessions
FROM CustomerMinuteSessions
WHERE CustomerId = #customerId
ORDER BY time DESC
IF ##rowcount = 0
INSERT INTO #result(PreviousNumberOfSessions) VALUES(0)
RETURN
END
What is the best (i.e. without for loop, I guess...) to take previous rows within the query for subsequent rows?
If you are using SQL-2005 and later, you can do it with few CTEs in one shot. If you use SQL-2000 you'll can use inline table-valued function.
Personally I like the CTE approach more, so I'm including a schematic translation of your code to CTEs syntax. (Bare in mind hat I didn't prepare a test set to check it).
WITH LastSessionByCustomer AS
(
SELECT CustomerID, MAX(Time)
FROM CustomerMinuteSessions
GROUP BY CustomerID
)
, GetPreviousCustomerTotalSessions AS
(
SELECT LastSession.CustomerID, LastSession.PreviousNumberOfSessions + LastSession.Opened - LastSession.Closed AS PreviousNumberOfSessions
FROM CustomerMinuteSessions LastSession
INNER JOIN LastSessionByCustomer ON LastSessionByCustomer.CustomerID = LastSession.CustomerID
)
, MS AS
(
SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x
)
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
ISNULL(GetPreviousCustomerTotalSessions.previousTotalSessions, 0)
FROM MS
RIGHT JOIN GetPreviousCustomerTotalSessions ON MS.Customer = GetPreviousCustomerTotalSessions.CustomerID
Going a bit beyond your question, I think that your query with cross apply could make big damage to the database once table CustomerMinuteSessions database grows
I would add an index like to improve your chances of getting Index-Seek:
CREATE INDEX IX_CustomerMinuteSessions_CustomerId
ON CustomerMinuteSessions (CustomerId, [time] DESC, PreviousNumberOfSessions, Opened, Closed );

SQL Server: Order By DateDiff Performance issue

I'm having a problem getting top 100 rows from a table with 2M rows in reasonable time.
The problem is the order by part, it takes more than 50 minutes to get results for this query..
What can be the best solution for this problem?
select top 100 * from THETABLE TT
Inner join SecondTable ST on TT.TypeID = ST.TypeID
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Many thanks,
Bentzy
Edit:
* TheTable is the one with 2M rows.
* SomeParam has 15 distinct values (more or less)
There are two things that come to mind to speed up this fetch:
If you need to run this query often, you should index the column 'lastCheckDate'. No matter which sql db you are using, a well defined index on the column will allow for faster selects, especially in an orders by clause.
Perform the date math before doing the select query. You are getting the difference in days between the row's checkDate and the current date, times some parameter. Does the multiplication affect the ordering of the rows? Can this simply be ordered by the 'lastCheckDate desc'? Explore other sorting options that return the same result.
Two ideas come to mind:
a) If ST.param doesn't change often, perhaps you can cache the result of the multiplication somewhere. The numbers would be "off" after a day, but the relative values would be the same - i.e., the sort order wouldn't change.
b) Find a way to reduce the size of the input tables. There are probably some values of LastCheckDate &/or SomeParam that will never be in the top 100. For example,
Select *
into #tmp
from THETABLE
where LastCheckDate between '2012-06-01' and getdate()
select top 100 *
from #tmp join SecondTable ST on #tmp.TypeID = ST.TypeID
order by DateDiff(day, LastCheckDate, getdate()) * ST.SomeParam desc
It's a lot faster to search a small table than a big one.
DATEDIFF(Day, TT.LastCheckDate, GETDATE()) is the number of days since "last check".
If you just order by TT.LastCheckDate you get a similar order.
EDIT
Maybe you can work out what dates you don't expect to get back and filter on them. Of course you then also need an index on that LastDateCheck column. If everything works out, you can at least shorten the list of records to check from 2M to some managable amount.
It is quite complicated.Do you seriouslly need all columns in query?There is one thing which you could try here. First just get the top 100 rows typeid
something like below
select top 100 typeid
,TT.lastcheckdate,st.someparam --do not use these if the typeid is unqiue in both tables..
--or just the PK columns of both tables and typeid so that these can be joined on PK
into #temptable
from st inner join tt on st.typeid = tt.typeid
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Above will sort very minimal data and thus should be faster.Based on how many columns you have in your table and indexes this should be way faster (it will be fast if you have many columns in both tables but this query will use just 3.Also, maybe these columns (st.typeid,st.someparam and tt.typeid and tt.lastcheckdate) are covered by some of indexes so no need to read underlying tables and thus reduce the IO as well) than actual one..Then join this data back to both tables.
If that doesnt work the way you expect.Then you can have indexed view using above select by adding the order by expression as column. Then use this indexed view to get top 100 and join with main tables.This will surely reduce the amount of work and thus improve perf.But Indexed view will have overhead which will depend on how frequently data changed in the table TT.
To lessen number of rows you might retrieve top (100) for each SecondTable record ordered by LastCheckDate, and then union all them and finally select top (100), by means of temporary table or dynamic sql generated query.
This solution uses cursor to fetch top 100 records for each value in SecondTable. With index on (TypeID, LastCheckDate) on TheTable it runs instantaneously (tested on my system with a table of 700,000 records and 50 SecondTable entries).
declare #SomeParam varchar(3)
declare #TypeID int
declare #tbl table (TheTableID int, LastCheckDate datetime, SomeParam float)
declare rstX cursor local fast_forward for
select TypeID, SomeParam
from SecondTable
open rstX
while 1 = 1
begin
fetch next from rstX into #TypeID, #SomeParam
if ##fetch_status <> 0
break
insert into #tbl
select top 100 ID, LastCheckDate, #SomeParam
from TheTable
where TypeID = #TypeID
order by LastCheckDate
end
close rstX
deallocate rstX
select top 100 *
from #tbl
order by DATEDIFF(Day, LastCheckDate, GETDATE()) * SomeParam
Obviously this solution fetches ID's only. You might want to expand temporary table with additional columns.

Create SQL job that verifies daily entry of data into a table?

Writing my first SQL query to run specifically as a SQL Job and I'm a little out of my depth. I have a table within a SQL Server 2005 Database which is populated each day with data from various buildings. To monitor the system better, I am attempting to write a SQL Job that will run a query (or stored procedure) to verify the following:
- At least one row of data appears each day per building
My question has two main parts;
How can I verify that data exists for each building? While there is a "building" column, I'm not sure how to verify each one. I need the query/sp to fail unless all locations have reported it. Do I need to create a control table for the query/sp to compare against? (as the number of building reporting in can change)
How do I make this query fail so that the SQL Job fails? Do I need to wrap it in some sort of error handling code?
Table:
Employee RawDate Building
Bob 2010-07-22 06:04:00.000 2
Sally 2010-07-22 01:00:00.000 9
Jane 2010-07-22 06:04:00.000 12
Alex 2010-07-22 05:54:00.000 EA
Vince 2010-07-22 07:59:00.000 30
Note that the building column has at least one non-numeric value. The range of buildings that report in changes over time, so I would prefer to avoid hard-coding of building values (a table that I can update would be fine).
Should I use a cursor or dynamic SQL to run a looping SELECT statement that checks for each building based on a control table listing each currently active building?
Any help would be appreciated.
Edit: spelling
You could create a stored procedure that checks for missing entries. The procedure could call raiserror to make the job fail. For example:
if OBJECT_ID('CheckBuildingEntries') is null
exec ('create procedure CheckBuildingEntries as select 1')
go
alter procedure CheckBuildingEntries(
#check_date datetime)
as
declare #missing_buildings int
select #missing_buildings = COUNT(*)
from Buildings as b
left join
YourTable as yt
on yt.Building = b.name
and dateadd(dd,0, datediff(dd,0,yt.RawDate)) =
dateadd(dd,0, datediff(dd,0,#check_date))
where yt.Building is null
if #missing_buildings > 0
begin
raiserror('OMG!', 16, 0)
end
go
An example scheduled job running at 4AM to check yesterday's entries:
declare #yesterday datetime
set #yesterday = dateadd(day, -1, GETDATE())
exec CheckBuildingEntries #yesterday
If an entry was missing, the job would fail. You could set it up to send you an email.
Test tables:
create table Buildings (id int identity, name varchar(50))
create table YourTable (Employee varchar(50), RawDate datetime,
Building varchar(50))
insert into Buildings (name)
select '2'
union all select '9'
union all select '12'
union all select 'EA'
union all select '30'
insert into YourTable (Employee, RawDate, Building)
select 'Bob', '2010-07-22 06:04:00.000', '2'
union all select 'Sally', '2010-07-22 01:00:00.000', '9'
union all select 'Jane', '2010-07-22 06:04:00.000', '12'
union all select 'Alex', '2010-07-22 05:54:00.000', 'EA'
union all select 'Vince', '2010-07-22 07:59:00.000', '30'
Recommendations:
Do use a control table for the buildings - you may find that one
already exists, if you use the Object
Explorer in SQL Server Management
Studio
Don't use a cursor or dynamic SQL to run a loop - use set based
commands instead, possibly something
like the following:
SELECT BCT.Building, COUNT(YDT.Building) Build
FROM dbo.BuildingControlTable BCT
LEFT JOIN dbo.YourDataTable YDT
ON BCT.Building = YDT.Building AND
CAST(FLOOR( CAST( GETDATE() AS FLOAT ) - 1 ) AS DATETIME ) =
CAST(FLOOR( CAST( YDT.RawDate AS FLOAT ) ) AS DATETIME )
GROUP BY BCT.Building

Resources