First, a general description of the problem: I'm running a periodical process which updates total figures in a table. The issue is, that multiple updates may be required in each execution of the process, and each execution depends on the previous results.
My question is, can it be done in a single SQL Server SP?
My code (I altered it a little to simply the sample):
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
CTS.previousTotalSessions
FROM (SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x) MS
CROSS APPLY TVF_GetPreviousCustomerTotalSessions(MS.customer) CTS
ORDER BY time
The previousTotalSessions column depends on other rows in UpdatedTable, and its value is retrieved by CROSS APPLYing TVF_GetPreviousCustomerTotalSessions, but if I execute the SP as-is, all the rows use the value retrieved by the function without taking the rows added during the execution of the SP.
For the sake of completeness, here's TVF_GetPreviousCustomerTotalSessions:
FUNCTION [dbo].[TVF_GetCustomerCurrentSessions]
(
#customerId int
)
RETURNS #result TABLE (PreviousNumberOfSessions int)
AS
BEGIN
INSERT INTO #result
SELECT TOP 1 (PreviousNumberOfSessions + Opened - Closed) AS PreviousNumberOfSessions
FROM CustomerMinuteSessions
WHERE CustomerId = #customerId
ORDER BY time DESC
IF ##rowcount = 0
INSERT INTO #result(PreviousNumberOfSessions) VALUES(0)
RETURN
END
What is the best (i.e. without for loop, I guess...) to take previous rows within the query for subsequent rows?
If you are using SQL-2005 and later, you can do it with few CTEs in one shot. If you use SQL-2000 you'll can use inline table-valued function.
Personally I like the CTE approach more, so I'm including a schematic translation of your code to CTEs syntax. (Bare in mind hat I didn't prepare a test set to check it).
WITH LastSessionByCustomer AS
(
SELECT CustomerID, MAX(Time)
FROM CustomerMinuteSessions
GROUP BY CustomerID
)
, GetPreviousCustomerTotalSessions AS
(
SELECT LastSession.CustomerID, LastSession.PreviousNumberOfSessions + LastSession.Opened - LastSession.Closed AS PreviousNumberOfSessions
FROM CustomerMinuteSessions LastSession
INNER JOIN LastSessionByCustomer ON LastSessionByCustomer.CustomerID = LastSession.CustomerID
)
, MS AS
(
SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x
)
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
ISNULL(GetPreviousCustomerTotalSessions.previousTotalSessions, 0)
FROM MS
RIGHT JOIN GetPreviousCustomerTotalSessions ON MS.Customer = GetPreviousCustomerTotalSessions.CustomerID
Going a bit beyond your question, I think that your query with cross apply could make big damage to the database once table CustomerMinuteSessions database grows
I would add an index like to improve your chances of getting Index-Seek:
CREATE INDEX IX_CustomerMinuteSessions_CustomerId
ON CustomerMinuteSessions (CustomerId, [time] DESC, PreviousNumberOfSessions, Opened, Closed );
Related
I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?
GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it
Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc
I have written a little CTE to get the total blocking time of a head blocker process, and I am unsure if I should first copy all of the processes that I want the CTE to run over into a temp table and then perform the query over this - i.e. I want to be sure that the data cannot change under my feet whilst the query runs and (worst case scenario), I end up with an infinite recursive loop!
This is my SQL including the temp table - I'd prefer not to have to use the table for performance reasons, and go directly to the sysprocesses dmv inside my CTE, but I'm not sure of the possible implications of this.
DECLARE #proc TABLE(
spid SMALLINT PRIMARY KEY,
blocked SMALLINT INDEX blocked_index,
waittime BIGINT)
INSERT INTO #proc
SELECT spid, blocked, waittime
FROM master..sysprocesses
;WITH block_cte AS
(
SELECT spid, CAST(blocked AS BIGINT) [wait_time], spid [root_spid]
FROM #proc
WHERE blocked = 0
UNION ALL
SELECT blocked.spid, blocked.waittime, block_cte.spid
FROM #proc AS blocked
INNER JOIN block_cte ON blocked.blocked = block_cte.spid
)
SELECT root_spid blocking_spid, SUM(wait_time) total_blocking_time
FROM block_cte
GROUP BY root_spid
This question is probably best transfered to Stack DBA. I'm sure those clever guys and girls can not only tell you the answer but also the reason behind it.
Not being sure myself I decided to test it...
My script captures the record count fromsysProcesses 1,000 times. Now to do this I had to circumnavigate several limits placed on CTEs. Among other restrictions; you cannot use aggregate functions. This makes counting records quite hard. So I created an inline table function to return the current row count from sysProcesses.
sysProcess Count Function
CREATE FUNCTION ProcessCount()
RETURNS TABLE
AS
RETURN
(
-- Return the current process count.
SELECT
COUNT(*) AS RecordCount
FROM
Master..sysProcesses
)
;
I wrapped this function in a CTE.
CTE
WITH RCTE AS
(
/* CTE to test if recursion is effected by updates to
* underlying data.
*/
-- Anchor part.
SELECT
1 AS ExecutionCount,
1 AS JoinField,
RecordCount
FROM
ProcessCount()
UNION ALL
-- Recursive part.
SELECT
r.ExecutionCount + 1 AS ExecutionCount,
1 AS JoinField,
pc.RecordCount
FROM
ProcessCount() AS pc
INNER JOIN RCTE AS r ON r.JoinField = 1
WHERE
r.ExecutionCount < 1000
)
SELECT
MIN(RecordCount) AS MinRecordCount,
MAX(RecordCount) AS MaxRecordCount
FROM
RCTE
OPTION
(MAXRECURSION 1000)
;
GO
If the min and max record counts are always equal this would suggest there is only one consistent view of sysProcesses, used throughout the query. Any difference proves this is not the case. Running on SQL Server 2008 R2 I did find differences:
Results
Run Min Max
1 113 254
2 107 108
3 86 108
Of course the inline function could be to blame here. It certainly changed my execution plan. This has taught me a lesson. I really need to better understand execution plans. I'm sure reading the OPs plan would provide a definitive answer.
I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.
Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?
I'm having a problem getting top 100 rows from a table with 2M rows in reasonable time.
The problem is the order by part, it takes more than 50 minutes to get results for this query..
What can be the best solution for this problem?
select top 100 * from THETABLE TT
Inner join SecondTable ST on TT.TypeID = ST.TypeID
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Many thanks,
Bentzy
Edit:
* TheTable is the one with 2M rows.
* SomeParam has 15 distinct values (more or less)
There are two things that come to mind to speed up this fetch:
If you need to run this query often, you should index the column 'lastCheckDate'. No matter which sql db you are using, a well defined index on the column will allow for faster selects, especially in an orders by clause.
Perform the date math before doing the select query. You are getting the difference in days between the row's checkDate and the current date, times some parameter. Does the multiplication affect the ordering of the rows? Can this simply be ordered by the 'lastCheckDate desc'? Explore other sorting options that return the same result.
Two ideas come to mind:
a) If ST.param doesn't change often, perhaps you can cache the result of the multiplication somewhere. The numbers would be "off" after a day, but the relative values would be the same - i.e., the sort order wouldn't change.
b) Find a way to reduce the size of the input tables. There are probably some values of LastCheckDate &/or SomeParam that will never be in the top 100. For example,
Select *
into #tmp
from THETABLE
where LastCheckDate between '2012-06-01' and getdate()
select top 100 *
from #tmp join SecondTable ST on #tmp.TypeID = ST.TypeID
order by DateDiff(day, LastCheckDate, getdate()) * ST.SomeParam desc
It's a lot faster to search a small table than a big one.
DATEDIFF(Day, TT.LastCheckDate, GETDATE()) is the number of days since "last check".
If you just order by TT.LastCheckDate you get a similar order.
EDIT
Maybe you can work out what dates you don't expect to get back and filter on them. Of course you then also need an index on that LastDateCheck column. If everything works out, you can at least shorten the list of records to check from 2M to some managable amount.
It is quite complicated.Do you seriouslly need all columns in query?There is one thing which you could try here. First just get the top 100 rows typeid
something like below
select top 100 typeid
,TT.lastcheckdate,st.someparam --do not use these if the typeid is unqiue in both tables..
--or just the PK columns of both tables and typeid so that these can be joined on PK
into #temptable
from st inner join tt on st.typeid = tt.typeid
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Above will sort very minimal data and thus should be faster.Based on how many columns you have in your table and indexes this should be way faster (it will be fast if you have many columns in both tables but this query will use just 3.Also, maybe these columns (st.typeid,st.someparam and tt.typeid and tt.lastcheckdate) are covered by some of indexes so no need to read underlying tables and thus reduce the IO as well) than actual one..Then join this data back to both tables.
If that doesnt work the way you expect.Then you can have indexed view using above select by adding the order by expression as column. Then use this indexed view to get top 100 and join with main tables.This will surely reduce the amount of work and thus improve perf.But Indexed view will have overhead which will depend on how frequently data changed in the table TT.
To lessen number of rows you might retrieve top (100) for each SecondTable record ordered by LastCheckDate, and then union all them and finally select top (100), by means of temporary table or dynamic sql generated query.
This solution uses cursor to fetch top 100 records for each value in SecondTable. With index on (TypeID, LastCheckDate) on TheTable it runs instantaneously (tested on my system with a table of 700,000 records and 50 SecondTable entries).
declare #SomeParam varchar(3)
declare #TypeID int
declare #tbl table (TheTableID int, LastCheckDate datetime, SomeParam float)
declare rstX cursor local fast_forward for
select TypeID, SomeParam
from SecondTable
open rstX
while 1 = 1
begin
fetch next from rstX into #TypeID, #SomeParam
if ##fetch_status <> 0
break
insert into #tbl
select top 100 ID, LastCheckDate, #SomeParam
from TheTable
where TypeID = #TypeID
order by LastCheckDate
end
close rstX
deallocate rstX
select top 100 *
from #tbl
order by DATEDIFF(Day, LastCheckDate, GETDATE()) * SomeParam
Obviously this solution fetches ID's only. You might want to expand temporary table with additional columns.
I want to update a table with consecutive numbering starting with 1. The update has a where clause so only results that meet the clause will be renumbered. Can I accomplish this efficiently without using a temp table?
This probably depends on your database, but here is a solution for MySQL 5 that involves using a variable:
SET #a:=0;
UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2,field3
You should probably edit your question and indicate which database you're using however.
Edit: I found a solution utilizing T-SQL for SQL Server. It's very similar to the MySQL method:
DECLARE #myVar int
SET #myVar = 0
UPDATE
myTable
SET
#myvar = myField = #myVar + 1
For Microsoft SQL Server 2005/2008. ROW_NUMBER() function was added in 2005.
; with T as (select ROW_NUMBER() over (order by ColumnToOrderBy) as RN
, ColumnToHoldConsecutiveNumber from TableToUpdate
where ...)
update T
set ColumnToHoldConsecutiveNumber = RN
EDIT: For SQL Server 2000:
declare #RN int
set #RN = 0
Update T
set ColumnToHoldConsecutiveNubmer = #RN
, #RN = #RN + 1
where ...
NOTE: When I tested the increment of #RN appeared to happen prior to setting the the column to #RN, so the above gives numbers starting at 1.
EDIT: I just noticed that is appears you want to create multiple sequential numbers within the table. Depending on the requirements, you may be able to do this in a single pass with SQL Server 2005/2008, by adding partition by to the over clause:
; with T as (select ROW_NUMBER()
over (partition by Client, City order by ColumnToOrderBy) as RN
, ColumnToHoldConsecutiveNumber from TableToUpdate)
update T
set ColumnToHoldConsecutiveNumber = RN
If you want to create a new PrimaryKey column, use just this:
ALTER TABLE accounts ADD id INT IDENTITY(1,1)
As well as using a CTE or a WITH, it is also possible to use an update with a self-join to the same table:
UPDATE a
SET a.columnToBeSet = b.sequence
FROM tableXxx a
INNER JOIN
(
SELECT ROW_NUMBER() OVER ( ORDER BY columnX ) AS sequence, columnY, columnZ
FROM tableXxx
WHERE columnY = #groupId AND columnY = #lang2
) b ON b.columnY = a.columnY AND b.columnZ = a.columnZ
The derived table, alias b, is used to generated the sequence via the ROW_NUMBER() function together with some other columns which form a virtual primary key.
Typically, each row will require a unique sequence value.
The WHERE clause is optional and limits the update to those rows that satisfy the specified conditions.
The derived table is then joined to the same table, alias a, joining on the virtual primary key columns with the column to be updated set to the generated sequence.
In oracle this works:
update myTable set rowColum = rownum
where something = something else
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/pseudocolumns009.htm#i1006297
To get the example by Shannon fully working I had to edit his answer:
; WITH CTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY [NameOfField]) as RowNumber, t1.ID
FROM [ActualTableName] t1
)
UPDATE [ActualTableName]
SET Name = 'Depersonalised Name ' + CONVERT(varchar(255), RowNumber)
FROM CTE
WHERE CTE.Id = [ActualTableName].ID
as his answer was trying to update T, which in his case was the name of the Common Table Expression, and it throws an error.
UPDATE TableName
SET TableName.id = TableName.New_Id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS New_Id
FROM TableName
) TableName
I've used this technique for years to populate ordinals and sequentially numbered columns. However I recently discovered an issue with it when running on SQL Server 2012. It would appear that internally the query engine is applying the update using multiple threads and the predicate portion of the UPDATE is not being handled in a thread-safe manner. To make it work again I had to reconfigure SQL Server's max degree of parallelism down to 1 core.
EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
DECLARE #id int
SET #id = -1
UPDATE dbo.mytable
SET #id = Ordinal = #id + 1
Without this you'll find that most sequential numbers are duplicated throughout the table.
One more way to achieve the desired result
1. Create a sequence object - (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver16)
CREATE SEQUENCE dbo.mySeq
AS BIGINT
START WITH 1 -- up to you from what number you want to start cycling
INCREMENT BY 1 -- up to you how it will increment
MINVALUE 1
CYCLE
CACHE 100;
2. Update your records
UPDATE TableName
SET Col2 = NEXT VALUE FOR dbo.mySeq
WHERE ....some condition...
EDIT: To reset sequence to start from the 1 for the next time you use it
ALTER SEQUENCE dbo.mySeq RESTART WITH 1 -- or start with any value you need`
Join to a Numbers table? It involves an extra table, but it wouldn't be temporary -- you'd keep the numbers table around as a utility.
See http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
or
http://www.sqlservercentral.com/articles/Advanced+Querying/2547/
(the latter requires a free registration, but I find it to be a very good source of tips & techniques for MS SQL Server, and a lot is applicable to any SQL implementation).
It is possible, but only via some very complicated queries - basically you need a subquery that counts the number of records selected so far, and uses that as the sequence ID. I wrote something similar at one point - it worked, but it was a lot of pain.
To be honest, you'd be better off with a temporary table with an autoincrement field.