Using a running total calculated column in SQL Server table variable - sql-server

I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?

GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it

Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc

Related

Select latest row on duplicate values while transfering table?

I have a logging table that is live which saves my value to a table frequently.
My plan is to take those values and put them on a temporary table with
SELECT * INTO #temp from Block
From there I guess my block table is empty and the logger can keep on logging new values.
The next step is that I want to save them in a existing table. I wanted to use
INSERT INTO TABLENAME(COLUMN1,COLUMN2...) SELECT (COLUMN1,COLUMN2...) FROM #temp
The problem is that the #temp table has duplicates primary keys. And I only want to store the last ID.
I have tried DISTINCT but it didn't work. Could not get ROW_Count to work. Are there any ideas on how I should do it? I wish to make it with as few reads as possible.
Also, in the future I plan to send them to another database, how do I do that on SQL Server? I guess it's something like FROM Table [in databes]?
I couldn't get the blocks to copy. But here goes:
create TABLE Product_log (
Grade char(64),
block_ID char(64) PRIMARY KEY NOT NULL,
Density char(64),
BatchNumber char(64) NOT NULL,
BlockDateID Datetime
);
That is my table i want to store the data in. There I do not wish to have duplicates on the id. The problem is, while logging I get duplicates since I log on change. Lets say that the batchid is 1, if it becomes 2 while logging. I will get a blockid twice, both with batch number 1 and 2. How do I pick the latter?
Hope I explained enough for guidance. While logging they look like this:
id SiemensTiaV15_s71200_BatchTester_NewBatchIDValue_VALUE SiemensTiaV15_s71200_BatchTester_TestWriteValue_VALUE SiemensTiaV15_s71200_BatchTester_TestWriteValue_TIMESTAMP SiemensTiaV15_s71200_MainTank_Density_VALUE SiemensTiaV15_s71200_MainTank_Grade_VALUE
1 00545 S0047782 2020-06-09 11:18:44.583 0 xxxxx
2 00545 S0047783 2020-06-09 11:18:45.800 0 xxxxx
Please use below query,
select * from
(select id, SiemensTiaV15_s71200_BatchTester_NewBatchIDValue_VALUE,SiemensTiaV15_s71200_BatchTester_TestWriteValue_VALUE, SiemensTiaV15_s71200_BatchTester_TestWriteValue_TIMESTAMP, SiemensTiaV15_s71200_MainTank_Density_VALUE,SiemensTiaV15_s71200_MainTank_Grade_VALUE,
row_number() over (partition by SiemensTiaV15_s71200_BatchTester_NewBatchIDValue_VALUE order by SiemensTiaV15_s71200_BatchTester_TestWriteValue_TIMESTAMP desc) as rnk
from table_name) qry
where rnk=1;
INTO #temp FROM Block; INSERT INTO Product_log(Grade, block_ID, Density, BatchNumber, BlockDateID)
selct NewBatchIDValue_VALUE, TestWriteValue_VALUE, TestWriteValue_TIMESTAMP,
Density_VALUE, Grade_VALUE from
(select NewBatchIDValue_VALUE, TestWriteValue_VALUE,
TestWriteValue_TIMESTAMP, Density_VALUE, Grade_VALUE, row_number() over
(partition by BatchTester_NewBatchIDValue_VALUE order by
BatchTester_TestWriteValue_VALUE) as rnk from #temp) qry
where rnk = 1;

Reverse of each value in a column

Suppose I have a table with even number of rows. For eg- a table Employee with two columns Name and EmpCode. The table looks like
Name EmpCode
Ajay 7
Vikash 5
Shalu 4
Hari 8
Anu 1
Puja 9
Now, I want my output in reverse of EmpCode like:
Name EmpCode
Ajay 9
Vikash 1
Shalu 8
Hari 4
Anu 5
Puja 7
I need to run this query in SQL Server.
As the OP hasn't replied, I'll post a little explanation for them instead. As everyone has eluded to, tables in SQL Server have no built in ordering. Your data is stored in what is known as a HEAP. This means, when you run a query without an ORDER BY your data can return in any order that the Server feels like. With small datasets this might be in the order you inserted it in, but that's just it (it might).
When you get to larger datasets, and when you have multiple cores running on the operation, then the order of a SELECT * FROM [Table]; is more likely to not be the order in insertion, and is more likely to be random which each instance of running the query. I have several tables where a SELECT TOP 1 *... will return a different row every time I run the query; even with the CLUSTERED INDEX.
The only, yes only, way to guarantee the order is by using ORDER BY. Now, you might have another column which you haven't shared that you can order by, but if not, perhaps this (very) simple example will at least assist you, if nothing else:
CREATE TABLE #Employee ([Name] varchar(10), EmpCode tinyint);
INSERT INTO #Employee
VALUES ('Ajay',7),
('Vikash',5),
('Shalu',4),
('Hari',8),
('Anu',1),
('Puja',9);
GO
--Just SELECT *. ORDER is NOT guaranteed, but, due to the low volume of data, will probably be in the order by insertion
SELECT *
FROM #Employee;
--But, we want to reverse the order, so, let's add an ORDER BY
SELECT *
FROM #Employee
ORDER BY [Name];
--Oh! That didn't work (duh). Let's try again
SELECT *
FROM #Employee
ORDER BY Empcode;
--Nope, this isn't working. That's because your data has nothing related to it's insertion order. So, let's give it one:
GO
DROP TABLE #Employee;
CREATE TABLE #Employee (ID int IDENTITY(1,1), --Oooo, what is this?
[Name] varchar(10),
EmpCode tinyint);
INSERT INTO #Employee
VALUES ('Ajay',7),
('Vikash',5),
('Shalu',4),
('Hari',8),
('Anu',1),
('Puja',9);
GO
--Now look
SELECT *
FROM #Employee;
--So, we can use an ORDER BY, and get the correct order too
SELECT [Name],
Empcode
FROM #Employee
ORDER BY ID;
--So, we got the right ORDER using an ORDER BY. Now we can do something about the ordering:
--We'll need a CTE for this:
WITH RNs AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ID ASC) AS RN1,
ROW_NUMBER() OVER (ORDER BY ID DESC) AS RN2
FROM #Employee)
SELECT R1.[Name],
R2.EmpCode
FROM RNs R1
JOIN RNs R2 ON R1.RN1 = R2.RN2;
GO
DROP TABLE #Employee;

Performing INSERT for each row in a select RESULT

First, a general description of the problem: I'm running a periodical process which updates total figures in a table. The issue is, that multiple updates may be required in each execution of the process, and each execution depends on the previous results.
My question is, can it be done in a single SQL Server SP?
My code (I altered it a little to simply the sample):
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
CTS.previousTotalSessions
FROM (SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x) MS
CROSS APPLY TVF_GetPreviousCustomerTotalSessions(MS.customer) CTS
ORDER BY time
The previousTotalSessions column depends on other rows in UpdatedTable, and its value is retrieved by CROSS APPLYing TVF_GetPreviousCustomerTotalSessions, but if I execute the SP as-is, all the rows use the value retrieved by the function without taking the rows added during the execution of the SP.
For the sake of completeness, here's TVF_GetPreviousCustomerTotalSessions:
FUNCTION [dbo].[TVF_GetCustomerCurrentSessions]
(
#customerId int
)
RETURNS #result TABLE (PreviousNumberOfSessions int)
AS
BEGIN
INSERT INTO #result
SELECT TOP 1 (PreviousNumberOfSessions + Opened - Closed) AS PreviousNumberOfSessions
FROM CustomerMinuteSessions
WHERE CustomerId = #customerId
ORDER BY time DESC
IF ##rowcount = 0
INSERT INTO #result(PreviousNumberOfSessions) VALUES(0)
RETURN
END
What is the best (i.e. without for loop, I guess...) to take previous rows within the query for subsequent rows?
If you are using SQL-2005 and later, you can do it with few CTEs in one shot. If you use SQL-2000 you'll can use inline table-valued function.
Personally I like the CTE approach more, so I'm including a schematic translation of your code to CTEs syntax. (Bare in mind hat I didn't prepare a test set to check it).
WITH LastSessionByCustomer AS
(
SELECT CustomerID, MAX(Time)
FROM CustomerMinuteSessions
GROUP BY CustomerID
)
, GetPreviousCustomerTotalSessions AS
(
SELECT LastSession.CustomerID, LastSession.PreviousNumberOfSessions + LastSession.Opened - LastSession.Closed AS PreviousNumberOfSessions
FROM CustomerMinuteSessions LastSession
INNER JOIN LastSessionByCustomer ON LastSessionByCustomer.CustomerID = LastSession.CustomerID
)
, MS AS
(
SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x
)
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
ISNULL(GetPreviousCustomerTotalSessions.previousTotalSessions, 0)
FROM MS
RIGHT JOIN GetPreviousCustomerTotalSessions ON MS.Customer = GetPreviousCustomerTotalSessions.CustomerID
Going a bit beyond your question, I think that your query with cross apply could make big damage to the database once table CustomerMinuteSessions database grows
I would add an index like to improve your chances of getting Index-Seek:
CREATE INDEX IX_CustomerMinuteSessions_CustomerId
ON CustomerMinuteSessions (CustomerId, [time] DESC, PreviousNumberOfSessions, Opened, Closed );

SQL running sum for an MVC application

I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.
Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?

SQL Server: Order By DateDiff Performance issue

I'm having a problem getting top 100 rows from a table with 2M rows in reasonable time.
The problem is the order by part, it takes more than 50 minutes to get results for this query..
What can be the best solution for this problem?
select top 100 * from THETABLE TT
Inner join SecondTable ST on TT.TypeID = ST.TypeID
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Many thanks,
Bentzy
Edit:
* TheTable is the one with 2M rows.
* SomeParam has 15 distinct values (more or less)
There are two things that come to mind to speed up this fetch:
If you need to run this query often, you should index the column 'lastCheckDate'. No matter which sql db you are using, a well defined index on the column will allow for faster selects, especially in an orders by clause.
Perform the date math before doing the select query. You are getting the difference in days between the row's checkDate and the current date, times some parameter. Does the multiplication affect the ordering of the rows? Can this simply be ordered by the 'lastCheckDate desc'? Explore other sorting options that return the same result.
Two ideas come to mind:
a) If ST.param doesn't change often, perhaps you can cache the result of the multiplication somewhere. The numbers would be "off" after a day, but the relative values would be the same - i.e., the sort order wouldn't change.
b) Find a way to reduce the size of the input tables. There are probably some values of LastCheckDate &/or SomeParam that will never be in the top 100. For example,
Select *
into #tmp
from THETABLE
where LastCheckDate between '2012-06-01' and getdate()
select top 100 *
from #tmp join SecondTable ST on #tmp.TypeID = ST.TypeID
order by DateDiff(day, LastCheckDate, getdate()) * ST.SomeParam desc
It's a lot faster to search a small table than a big one.
DATEDIFF(Day, TT.LastCheckDate, GETDATE()) is the number of days since "last check".
If you just order by TT.LastCheckDate you get a similar order.
EDIT
Maybe you can work out what dates you don't expect to get back and filter on them. Of course you then also need an index on that LastDateCheck column. If everything works out, you can at least shorten the list of records to check from 2M to some managable amount.
It is quite complicated.Do you seriouslly need all columns in query?There is one thing which you could try here. First just get the top 100 rows typeid
something like below
select top 100 typeid
,TT.lastcheckdate,st.someparam --do not use these if the typeid is unqiue in both tables..
--or just the PK columns of both tables and typeid so that these can be joined on PK
into #temptable
from st inner join tt on st.typeid = tt.typeid
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Above will sort very minimal data and thus should be faster.Based on how many columns you have in your table and indexes this should be way faster (it will be fast if you have many columns in both tables but this query will use just 3.Also, maybe these columns (st.typeid,st.someparam and tt.typeid and tt.lastcheckdate) are covered by some of indexes so no need to read underlying tables and thus reduce the IO as well) than actual one..Then join this data back to both tables.
If that doesnt work the way you expect.Then you can have indexed view using above select by adding the order by expression as column. Then use this indexed view to get top 100 and join with main tables.This will surely reduce the amount of work and thus improve perf.But Indexed view will have overhead which will depend on how frequently data changed in the table TT.
To lessen number of rows you might retrieve top (100) for each SecondTable record ordered by LastCheckDate, and then union all them and finally select top (100), by means of temporary table or dynamic sql generated query.
This solution uses cursor to fetch top 100 records for each value in SecondTable. With index on (TypeID, LastCheckDate) on TheTable it runs instantaneously (tested on my system with a table of 700,000 records and 50 SecondTable entries).
declare #SomeParam varchar(3)
declare #TypeID int
declare #tbl table (TheTableID int, LastCheckDate datetime, SomeParam float)
declare rstX cursor local fast_forward for
select TypeID, SomeParam
from SecondTable
open rstX
while 1 = 1
begin
fetch next from rstX into #TypeID, #SomeParam
if ##fetch_status <> 0
break
insert into #tbl
select top 100 ID, LastCheckDate, #SomeParam
from TheTable
where TypeID = #TypeID
order by LastCheckDate
end
close rstX
deallocate rstX
select top 100 *
from #tbl
order by DATEDIFF(Day, LastCheckDate, GETDATE()) * SomeParam
Obviously this solution fetches ID's only. You might want to expand temporary table with additional columns.

Resources