I'm running into a wall compiling row updates and new rows in a few tables to save off in another table for trending. I know a cursor could achieve this pretty easily and I get a result set, but I'm struggling to figure out how to get these into a table with the cursor (or if I should approach it completely differently.
Background
I want to calculate and save off daily the number of new and edited rows daily from several tables of interest in a production database. These tables' rows are timestamped with the last edit.
My stats database that contains a tablestats table that will house the information for each table across 6 columns. My goal in mind is to run an Agent job daily to count the prior day's timestamps, the delta between today's rowcount and the prior day's rowcount, and then merge those into tablestats.
Something like this:
tablename
updyear
updmonth
updday
rowupdates
newrows
table_1
2023
2
5
2509
34
table_1
2023
2
6
3443
90
table_2
2023
2
5
834
255
table_2
2023
2
6
544
433
With that, I can trend/pivot the data as needed.
What I tried
I figured a cursor would in part be the best approach since I was having trouble condensing the query's results with the name of the table I'm pulling from. I adapted this question & answers to get part of the way there, but I'm struggling with how to take the next step. I abbreviated the below code for legibility:
DECLARE #last_upd nvarchar(MAX) = '';
DECLARE #checkdate date = DATEADD(DAY, -1, GETDATE());
SELECT #last_upd = #last_upd + 'SELECT '''
+QUOTENAME(name)
+''',YEAR(last_upd) as updyear /* month, etc. */,COUNT(last_upd) as rowupdates FROM '
+ QUOTENAME(name)
+ ' WHERE last_upd > #checkdate /* GROUP BY year/month/day*/; '
FROM sys.tables
WHERE (name IN ('table_1','table_2','table_3'))
IF ##ROWCOUNT > 0
EXEC sp_executesql #last_upd
, N'#checkdate date'
, #checkdate'
Which returns the following:
Query 1
updyear
updmonth
updday
rowupdates
table_1
2023
2
5
table_1
2023
2
6
Query 2
updyear
updmonth
updday
rowupdates
table_2
2023
2
5
table_2
2023
2
6
Query 3, etc.
Since it returns as 3 separate queries, I'm unsure how to get that into a merge statement, since I can't SELECT * INTO #temptable with these.
The reason I'm interested in merge even though it's a daily run is to accomodate any potential conflicts with existing data. I haven't gotten to the point of doing a rowcount but assume at worst, I could do a second cursor with the rowcount prior to rolling it up into a stored procedure.
What you really want is a UNION ALL to combine the results from the various queries into a single result set. If you change your dynamic SELECT to a UNION ALL SELECT, you are most of the way there. What's left is to strip the leading occurence using something like SET #last_upd = STUFF(#last_upd, 1, 10, ''), which replaces the first 10 characters with nothing.
If you include a newline immediately after the opening quote of your dynamic statement, the generated SQL will look a lot nicer when you print it out during debugging.
It is also common to now use STRING_AGG() to to combine generated code snippets when generating dynamic SQL, but your approach works so I'll leave it.
For the table name column in the result, you can use QUOTENAME(..., '''') to safely stringify the name inside single quotes instead of [].
The updated code would be something like:
DECLARE #last_upd nvarchar(MAX) = '';
DECLARE #checkdate date = DATEADD(DAY, -1, GETDATE());
SELECT #last_upd = #last_upd + '
UNION ALL
SELECT '
+ QUOTENAME(name, '''')
+ ',YEAR(last_upd) as updyear /* month, etc. */,COUNT(last_upd) as rowupdates FROM '
+ QUOTENAME(name)
+ ' WHERE last_upd > #checkdate GROUP BY YEAR(last_upd) /* year/month/day*/ '
FROM sys.tables
WHERE (name IN ('table_1','table_2','table_3'))
SET #last_upd = STUFF(#last_upd, 1, 10, '')
SELECT #last_upd
IF ##ROWCOUNT > 0
EXEC sp_executesql #last_upd
, N'#checkdate date'
, #checkdate
Generated SQL:
SELECT 'table_1',YEAR(last_upd) as updyear /* month, etc. */,COUNT(last_upd) as rowupdates FROM [table_1] WHERE last_upd > #checkdate GROUP BY YEAR(last_upd) /* year/month/day*/
UNION ALL
SELECT 'table_2',YEAR(last_upd) as updyear /* month, etc. */,COUNT(last_upd) as rowupdates FROM [table_2] WHERE last_upd > #checkdate GROUP BY YEAR(last_upd) /* year/month/day*/
UNION ALL
SELECT 'table_3',YEAR(last_upd) as updyear /* month, etc. */,COUNT(last_upd) as rowupdates FROM [table_3] WHERE last_upd > #checkdate GROUP BY YEAR(last_upd) /* year/month/day*/
Results:
(No column name)
updyear
rowupdates
[table_1]
2023
2
[table_2]
2023
1
[table_3]
2023
3
See this db<>fiddle
I'll leave it to you to finish up the details to get your complete desired result.
Related
I try to create dynamic forecast for 18(!) months depend on previous columns (months) and i am stuck:
I have three columns:
Stock
SafetyStock
Need for production - another select with clause WHERE date = getdate()
what i need to achieve:
Index, Stock- Current month, SafetyStock-Current month, Need for production (select * from Nfp where date = getdate()), Stock - Current month + 1, Safetystock - Current Month + 1, Need for Production - Current Month + 1 ... etc till 18 months
calculations:
Stock - Current month + 1 = Stock previous month + SafetyStock previous month - Needs for production of current month
there is any possibility to create something like this ? it has to be dynamic and get calculation for current date and next 18 months. So now i have to calculate from 2020-10 till let's say 2022-04
What i have tried:
I prepared 18 cte and joins everything. Then i do calculations - it works but it slow and i think it is not profesional.
I have tried to do dynamic sql, below you can see my code but i have stucked when i wanted to do computed column depended on previous computed column:
------------------- CODE -------------------------
if object_id('tempdb..#tmp') is not null
drop table #tmp
if object_id('tempdb..#tmp2') is not null
drop table #tmp2
declare #cols as int
declare #iteration as int
declare #Mth as nvarchar(30)
declare #data as date
declare #sql as nvarchar(max)
declare #sql2 as nvarchar(max)
set #cols = 18
set #iteration = 0
set #Mth = month(getdate())
set #data = cast(getdate() as date)
select
10 as SS,
12 as Stock
into #tmp
WHILE #iteration < #cols
begin
set #iteration = #iteration + 1
set #sql =
'
alter table #tmp
add [StockUwzgledniajacSS - ' + cast(concat(year(DATEADD(Month, #Iteration, #data)),'-', month(DATEADD(Month, #Iteration, #data))) as nvarchar(max)) +'] as (Stock - SS)
'
exec (#sql)
set #Mth= #Mth+ 1
set #sql2 =
'
alter table #tmp
add [StockUwzgledniajacSS - ' + #Mth +'] as ([StockUwzgledniajacSS - ' + #Mth +'])
'
end
select * from #tmp
thanks in advance!
Update 1 note: I wrote this before you posted your data. This still holds I believe but, of course, stock levels are way different. Given that your NFP data is by day, and your report is by month, I suggest adding something to preprocess that data into months e.g., sum of NPS values, grouped by month.
Update 2 (next day) note: From the OPs comments below, I've tried to integrate this with what was written and more directly answering the question e.g., creating a reporting table #tmp.
Given that the OP also mentions millions of rows, I imagine each row represents a specific part/item - I've included this as a field called StockNum.
I have done something that probably doesn't do your calculations properly, but demonstrates the approach and should get you over your current hurdle. Indeed, if you haven't used these before, then updating this code with your own calculations will help you to understand how it works so you can maintain it.
I'm assuming the key issue here for calculation is that this month's stock is based on last month's stock and then new stock minus old stock for this month.
It is possible to calculate this in 18 separate statements (update table set col2 = some function of col1, then update table set col3 = some function of col2, etc). However, updating the same table multiple times is often an anti-pattern causing poor performance - especially if you need to read the base data again and again.
Instead, something like this is often best calculated using a Recusive CTE (here's an example description), where it 'builds' a set of data based on previous results.
The key difference in this approach is that it
Creates the reporting table (without any data/calculations going in)
Calculates the data as a separate step - but with columns/fields that can be used to link to the reporting table
Inserts the data from calculations into the reporting table as a single insert statement.
I have used temporary tables/etc liberally, to help demonstrate the process.
You haven't explained what safety stock is, nor how you measure what's coming in, so for the example below, I have assumed safety stock is the amount produced and is 5 per month. I've then assumed that NFP is amount going out each month (e.g., forward estimates of sales). The key result will be stock at the end of month (e.g., which you could then review whether it's too high or too low).
As you want to store it in a table that has each month as columns, the first step is to create a list with the relevant buckets (months). These include fields used for matching in later calculations/etc. Note I have included some date fields (startdate and enddate) which may be useful when you customise the code. This part of the SQL is designed to be as straightforward as possible.
We then create the scratch table that has our reference data for stock movements, replacing your SELECT * FROM NFP WHERE date = getdate()
/* SET UP BUCKET LIST TO HELP CALCULATION */
CREATE TABLE #RepBuckets (BucketNum int, BucketName nvarchar(30), BucketStartDate datetime, BucketEndDate datetime)
INSERT INTO #RepBuckets (BucketNum) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18)
DECLARE #CurrentBucketStart date
SET #CurrentBucketStart = DATEFROMPARTS(YEAR(getdate()), MONTH(getdate()), 1)
UPDATE #RepBuckets
SET BucketName = 'StockAtEnd_' + FORMAT(DATEADD(month, BucketNum, #CurrentBucketStart), 'MMM_yy'),
BucketStartDate = DATEADD(month, BucketNum, #CurrentBucketStart),
BucketEndDate = DATEADD(month, BucketNum + 1, #CurrentBucketStart)
/* CREATE BASE DATA */
-- Current stock
CREATE TABLE #Stock (StockNum int, MonthNum int, StockAtStart int, SafetyStock int, NFP int, StockAtEnd int, PRIMARY KEY(StockNum, MonthNum))
INSERT INTO #Stock (StockNum, MonthNum, StockAtStart, SafetyStock, NFP, StockAtEnd) VALUES
(12422, 0, NULL, NULL, NULL, 10)
-- Simulates SELECT * FROM NFP WHERE date = getdate()
CREATE TABLE #NFP_by_month (StockNum int, MonthNum int, StockNFP int, PRIMARY KEY(StockNum, MonthNum))
INSERT INTO #NFP_by_month (StockNum, MonthNum, StockNFP) VALUES
(12422, 1, 4), (12422, 7, 4), (12422, 13, 4),
(12422, 2, 5), (12422, 8, 5), (12422, 14, 5),
(12422, 3, 2), (12422, 9, 2), (12422, 15, 2),
(12422, 4, 7), (12422, 10, 7), (12422, 16, 7),
(12422, 5, 9), (12422, 11, 9), (12422, 17, 9),
(12422, 6, 3), (12422, 12, 3), (12422, 18, 3)
We then use the recursive CTE to get calculate our data. It stores these in table #StockProjections.
What this does is
Start with your current stock (last row in the #Stock table). Note that the only value that matters in that is the stock at end of month.
Uses that stock level at the end of last month, as the stock level at the start of the new month
Adds the safety stock, minuses the NFP, and calculates your stock at end.
Note that within the recursive part of the CTE, 'SBM' (StockByMonth) refers to last month's data). This is then used with whatever external data (e.g., #NFP) to calculate new data.
These calculations create a table with
StockNum (the ID number of the relevant stock item - for this example, I've used one stock item 12422)
MonthNum (I've used integers this rather than dates, for clarity/simplicity)
BucketName (an nvarchar representing the month, used for column names)
Stock at start of month
Safety stock (which I assume is incoming stock, 5 per month)
NFP (which I assume is outgoing stock, varies by month and comes from a scratch table here - you'll need to adjust this to your select)
Stock at end of month
/* CALCULATE PROJECTIONS */
CREATE TABLE #StockProjections (StockNum int, BucketName nvarchar(30), MonthNum int, StockAtStart int, SafetyStock int, NFP int, StockAtEnd int, PRIMARY KEY (StockNum, BucketName))
; WITH StockByMonth AS
(-- Anchor
SELECT TOP 1 StockNum, MonthNum, StockAtStart, SafetyStock, NFP, StockAtEnd
FROM #Stock S
ORDER BY MonthNum DESC
-- Recursion
UNION ALL
SELECT NFP.StockNum,
SBM.MonthNum + 1 AS MonthNum,
SBM.StockAtEnd AS NewStockAtStart,
5 AS Safety_Stock,
NFP.StockNFP,
SBM.StockAtEnd + 5 - NFP.StockNFP AS NewStockAtEnd
FROM StockByMonth SBM
INNER JOIN #NFP_by_month NFP ON NFP.MonthNum = SBM.MonthNum + 1
WHERE NFP.MonthNum <= 18
)
INSERT INTO #StockProjections (StockNum, BucketName, MonthNum, StockAtStart, SafetyStock, NFP, StockAtEnd)
SELECT StockNum, BucketName, MonthNum, StockAtStart, SafetyStock, NFP, StockAtEnd
FROM StockByMonth
INNER JOIN #RepBuckets ON StockByMonth.MonthNum = #RepBuckets.BucketNum
Now we have the data, we set up a table for reporting purposes. Note that this table has the month names embedded into the column names (e.g., StockAtEnd_Jun_21). It would be easier to use a generic name (e.g., StockAtEnd_Month4) but I've gone for the slightly more complex case here for demonstration.
/* SET UP TABLE FOR REPORTING */
DECLARE #cols int = 18
DECLARE #iteration int = 0
DECLARE #colname nvarchar(30)
DECLARE #sql2 as nvarchar(max)
CREATE TABLE #tmp (StockNum int PRIMARY KEY)
WHILE #iteration <= #cols
BEGIN
SET #colname = (SELECT TOP 1 BucketName FROM #RepBuckets WHERE BucketNum = #iteration)
SET #sql2 = 'ALTER TABLE #tmp ADD ' + QUOTENAME(#colname) + ' int'
EXEC (#sql2)
SET #iteration = #iteration + 1
END
The last step is to add the data to your reporting table. I've used a pivot here but feel free to use whatever you like.
/* POPULATE TABLE */
DECLARE #columnList nvarchar(max) = N'';
SELECT #columnList += QUOTENAME(BucketName) + N' ' FROM #RepBuckets
SET #columnList = REPLACE(RTRIM(#columnList), ' ', ', ')
DECLARE #sql3 nvarchar(max)
SET #sql3 = N'
;WITH StockPivotCTE AS
(SELECT *
FROM (SELECT StockNum, BucketName, StockAtEnd
FROM #StockProjections
) StockSummary
PIVOT
(SUM(StockAtEnd)
FOR [BucketName]
IN (' + #columnList + N')
) AS StockPivot
)
INSERT INTO #tmp (StockNum, ' + #columnList + N')
SELECT StockNum, ' + #columnList + N'
FROM StockPivotCTE'
EXEC (#sql3)
Here's a DB<>fiddle showing it running with results of each sub-step.
I am very new to dynamic SQL and am trying to construct a query to report on our workers, what certificates they have and what the expiry date is on their last certificate where one exists. My temp table holds the correct data and my dynamic column string seems to be correct. When everything is run, the column headings show as expected and the personnel names are grouped correctly but none of the dates are showing, the values are all NULL. I haven't put the specific code for the first 2 sections as the selection criteria is a bit long winded.
For the dynamic column variable, SELECT STUFF(#columns, 1, 1, '') returns the single value below.
[Air Supervisor - AODC], [Air Supervisor - IMCA], [ALST - Certificate of Achievement], [ALST - IMCA], [Competence - A1/A2 Competent Assessor], [Competence - Air Diver-Surface Supplied], etc...
The data itself is held in a temp table, SELECT * FROM #results gives the below (example) output. This is every cert matched to the relevant personnel ID with the most recent expiry date.
id cert date
3484 [ALST - Banksman and Slinging] 28/07/2029
3648 [ALST - Banksman and Slinging] 05/11/2099
3701 [ALST - Banksman and Slinging] 27/05/2029
3740 [ALST - Banksman and Slinging] 20/01/2055
1181 [ALST - Crane Operators] 31/12/2029
1137 [ALST - Crane Operators] 31/12/2029
1072 [ALST - Crane Operators] 31/12/2029
The below is the actual pivot query. I need the [cert] field from above to become the column headers, dates (where they exist) to become the values and the personnel ID matched to the correct name.
SET #dynamicpivot =
N'SELECT pd.name, ' + STUFF(#columns, 1, 1, '') + '
FROM #results
PIVOT (MAX([date]) FOR [cert] IN (' + STUFF(#columns, 1, 1, '') + ')) as ce
JOIN dbo.personnel as pd
ON pd.person_id = ce.id
ORDER BY pd.name'
EXEC sp_executesql #dynamicpivot
The query runs without error, the column names show correctly and the personnel names show in order and grouped but none of the dates show, it's all NULLs.
I've tried to keep this fairly succinct, let me know if you need more info.
You need to initialise #columns. The default value is NULL, which you cannot use in string concatenation. This is because NULL cannot be combined with any other value. For example NULL + 1 returns NULL and NULL + 'Some Text' also returns NULL. You can test this with the below query.
NULL Concatenation Example
-- NULL cannot be concatenated.
SELECT
NULL + 'ABC'
;
This returns NULL. In effect your query is doing the same thing.
In this second example I'm using an initialised variable. The lack of NULL allows me to concatenate values to the string.
Initialised Variable Example
DECLARE #columns VARCHAR(255) = '';
-- Will return: 'ABC'.
SELECT
#columns + 'ABC'
;
Alternativly you could achieve the same result using ISNULL. This time the variable starts life as NULL. This is replaced with a blank string during concatenation.
ISNULL Example
DECLARE #columns VARCHAR(255) = NULL;
-- Using ISNULL returns: 'ABC'.
SELECT
ISNULL(#columns, '') + 'ABC'
;
I need a query that COUNT all row´s (with WHERE condition) in all tables in a database, where the table_names are unknown.
Reason why is: I got a "alarm_logging_system" that creates a new "archive" table every month or so, and automatically names it "AlarmLog(timestamp of last active alarm)"
So I need a query which dynamically search through all tables that exists in the database, COUNTS all row in a column with WHERE condition, and return one single value.
In example I want to get all active alarms, this month, last month, etc. Or a specific time range.
This is an example of a query I wrote to get the count for all active alarms last month
SELECT COUNT(ConditionActive)
FROM (whole database)???
WHERE (Acked=0 AND ConditionActive = 1)
AND (EventTime >= DATEADD(m,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE()), 0))
AND [EventTime] <= DATEADD(d,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE()),0))))
AS ACTIVE_LAST_MONTH
So what I need is a query, stored_procedure or a dynamic SQL query?
All the tables have the same schema and columns.
Appreciate all help!
This should demonstrate why it is not generally considered a good practice to make multiple copies of the same table and then some aggregate data from the whole collection. This just isn't how relational databases are designed to work.
This is untested because I don't have your table anywhere to work with but this should get it.
declare #SQL nvarchar(max) = ''
--This get a result set of the count for all tables.
select #SQL = #SQL + 'select count(ConditionActive) as MyCount
from [' + t.name + ']
where Acked = 0
AND ConditionActive = 1
AND EventTime >= DATEADD(month, -1, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0))
and [EventTime] <= DATEADD(day, -1, DATEADD(month, DATEDIFF(month, 0, GETDATE()),0)) UNION ALL'
from sys.tables t
--Now we need the sum of all counts
select #SQL = 'Select sum(MyCount) from (' + LEFT(#SQL, LEN(#SQL) - 10) + ') as x'
select #SQL
--uncomment the line below when you are confident that the dynamic sql is correct.
--exec sp_executesql #SQL
--EDIT--
I took the liberty of expanding the shortcuts in your DATEADD functions. The shortcuts are hard to remember and your were using both mm and m which both mean month. It is generally a better approach to just spell out the word to remove any ambiguity.
I am trying to deserialize xml in a SQL Server stored procedure before bringing it into c# win forms. My current result table looks as follows:
Edit: As the table above is too small to read below is an example of
what it contains:
Column 1 is just a time 07:00, 07:09, etc.
Column 2 contains the following data:
Row 1:
ABADILLA ARIEL<RegistrationId>29</RegistrationId>, BLAKE LORCAN<RegistrationId>30</RegistrationId>, CRONIN SHANE<RegistrationId>31</RegistrationId>
Row 2:
ADAMS NORMAN<RegistrationId>33</RegistrationId>, ADAMS WILLIAM<RegistrationId>34</RegistrationId>, AHEARNE PAUL<RegistrationId>35</RegistrationId>, LAWLOR DES<RegistrationId>32</RegistrationId>
So each row can have up to but no more than 4 entries.
End edit
I would like to be able to do a few things to this with this table, primarily I need to be able to modify it into the following format:
But ultimately i'd like to display the information with the name only but maintaining the link to the registration number without actually displaying it. This part might be easier to do in c# win forms though so I think i'd be happy enough if I could get it into the format shown above.
My sql code to date to return the results table shown initialls is as follows:
DECLARE #Registered TABLE
(CompetitionName VARCHAR(50),
CompetitionDate Date,
StartTime TIME,
RegistrationId INTEGER,
PlayersName Varchar(60)
)
INSERT INTO #Registered
SELECT MAX(c.CompetitionName) AS 'Competition Name', MAX(c.[Date]) AS 'Competition Date',
CONVERT(VARCHAR, r.PlayersStartTime, 108) AS 'Start Time', MAX(r.RegistrationId) AS RegistrationId,
CASE WHEN m.MemberId IS NOT NULL THEN (m.Surname + ' ' + m.FirstName) ELSE (nm.Surname + ' '+ nm.Firstname) END AS PlayersName
FROM dbo.Competitions c
LEFT JOIN [dbo].[Registration] r ON c.[CompetitionId] = r.[CompetitionId]
LEFT JOIN dbo.Members m ON r.MemberId = m.MemberId
LEFT JOIN dbo.NonMembers nm ON r.NonMemberId = nm.NonMemberId
WHERE [Date] = '20130104'
AND c.CompetitionId = 10
GROUP BY r.PlayersStartTime, m.MemberId, m.FirstName, m.Surname, nm.FirstName, nm.Surname
----
SELECT DISTINCT Main.StartTime,
STUFF((SELECT ', ' + PlayersName, + RegistrationId
FROM #Registered list
WHERE list.StartTime = Main.StartTime
FOR XML PATH ('')),1,2,''
) AS PlayerList
FROM #Registered Main;
I was looking at different ways of writing a stored procedure to return a "page" of data. This was for use with the ASP ObjectDataSource, but it could be considered a more general problem.
The requirement is to return a subset of the data based on the usual paging parameters; startPageIndex and maximumRows, but also a sortBy parameter to allow the data to be sorted. Also there are some parameters passed in to filter the data on various conditions.
One common way to do this seems to be something like this:
[Method 1]
;WITH stuff AS (
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
One problem with this is that it doesn't give the total count and generally we need another stored procedure for that. This second stored procedure has to replicate the parameter list and the complex WHERE clause. Not nice.
One solution is to append an extra column to the final select list, (SELECT COUNT(*) FROM stuff) AS TotalRows. This gives us the total but repeats it for every row in the result set, which is not ideal.
[Method 2]
An interesting alternative is given here (https://web.archive.org/web/20211020111700/https://www.4guysfromrolla.com/articles/032206-1.aspx) using dynamic SQL. He reckons that the performance is better because the CASE statement in the first solution drags things down. Fair enough, and this solution makes it easy to get the totalRows and slap it into an output parameter. But I hate coding dynamic SQL. All that 'bit of SQL ' + STR(#parm1) +' bit more SQL' gubbins.
[Method 3]
The only way I can find to get what I want, without repeating code which would have to be synchronized, and keeping things reasonably readable is to go back to the "old way" of using a table variable:
DECLARE #stuff TABLE (Row INT, ...)
INSERT INTO #stuff
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
(Or a similar method using an IDENTITY column on the table variable).
Here I can just add a SELECT COUNT on the table variable to get the totalRows and put it into an output parameter.
I did some tests and with a fairly simple version of the query (no sortBy and no filter), method 1 seems to come up on top (almost twice as quick as the other 2). Then I decided to test probably I needed the complexity and I needed the SQL to be in stored procedures. With this I get method 1 taking nearly twice as long as the other 2 methods. Which seems strange.
Is there any good reason why I shouldn't spurn CTEs and stick with method 3?
UPDATE - 15 March 2012
I tried adapting Method 1 to dump the page from the CTE into a temporary table so that I could extract the TotalRows and then select just the relevant columns for the resultset. This seemed to add significantly to the time (more than I expected). I should add that I'm running this on a laptop with SQL Server Express 2008 (all that I have available) but still the comparison should be valid.
I looked again at the dynamic SQL method. It turns out I wasn't really doing it properly (just concatenating strings together). I set it up as in the documentation for sp_executesql (with a parameter description string and parameter list) and it's much more readable. Also this method runs fastest in my environment. Why that should be still baffles me, but I guess the answer is hinted at in Hogan's comment.
I would most likely split the #SortBy argument into two, #SortColumn and #SortDirection, and use them like this:
…
ROW_NUMBER() OVER (
ORDER BY CASE #SortColumn
WHEN 'Name' THEN Name
WHEN 'OtherName' THEN OtherName
…
END *
CASE #SortDirection
WHEN 'DESC' THEN -1
ELSE 1
END
) AS Row
…
And this is how the TotalRows column could be defined (in the main select):
…
COUNT(*) OVER () AS TotalRows
…
I would definitely want to do a combination of a temp table and NTILE for this sort of approach.
The temp table will allow you to do your complicated series of conditions just once. Because you're only storing the pieces you care about, it also means that when you start doing selects against it further in the procedure, it should have a smaller overall memory usage than if you ran the condition multiple times.
I like NTILE() for this better than ROW_NUMBER() because it's doing the work you're trying to accomplish for you, rather than having additional where conditions to worry about.
The example below is one based off a similar query I'm using as part of a research query; I have an ID I can use that I know will be unique in the results. Using an ID that was an identity column would also be appropriate here, though.
--DECLARES here would be stored procedure parameters
declare #pagesize int, #sortby varchar(25), #page int = 1;
--Create temp with all relevant columns; ID here could be an identity PK to help with paging query below
create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime);
--Insert into #temp based off of your complex conditions, but with no attempt at paging
insert into #temp
(id, status, lastname, startdate)
select id, status, lastname, startdate
from Table1 ...etc.
where ...complicated conditions
SET #pagesize = 50;
SET #page = 5;--OR CAST(#startRowIndex/#pagesize as int)+1
SET #sortby = 'name';
--Only use the id and count to use NTILE
;with paging(id, pagenum, totalrows) as
(
select id,
NTILE((SELECT COUNT(*) cnt FROM #temp)/#pagesize) OVER(ORDER BY CASE WHEN #sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END),
cnt
FROM #temp
cross apply (SELECT COUNT(*) cnt FROM #temp) total
)
--Use the id to join back to main select
SELECT *
FROM paging
JOIN #temp ON paging.id = #temp.id
WHERE paging.pagenum = #page
--Don't need the drop in the procedure, included here for rerunnability
drop table #temp;
I generally prefer temp tables over table variables in this scenario, largely so that there are definite statistics on the result set you have. (Search for temp table vs table variable and you'll find plenty of examples as to why)
Dynamic SQL would be most useful for handling the sorting method. Using my example, you could do the main query in dynamic SQL and only pull the sort method you want to pull into the OVER().
The example above also does the total in each row of the return set, which as you mentioned was not ideal. You could, instead, have a #totalrows output variable in your procedure and pull it as well as the result set. That would save you the CROSS APPLY that I'm doing above in the paging CTE.
I would create one procedure to stage, sort, and paginate (using NTILE()) a staging table; and a second procedure to retrieve by page. This way you don't have to run the entire main query for each page.
This example queries AdventureWorks.HumanResources.Employee:
--------------------------------------------------------------------------
create procedure dbo.EmployeesByMartialStatus
#MaritalStatus nchar(1)
, #sort varchar(20)
as
-- Init staging table
if exists(
select 1 from sys.objects o
inner join sys.schemas s on s.schema_id=o.schema_id
and s.name='Staging'
and o.name='EmployeesByMartialStatus'
where type='U'
)
drop table Staging.EmployeesByMartialStatus;
-- Populate staging table with sort value
with s as (
select *
, sr=ROW_NUMBER()over(order by case #sort
when 'NationalIDNumber' then NationalIDNumber
when 'ManagerID' then ManagerID
-- plus any other sort conditions
else EmployeeID end)
from AdventureWorks.HumanResources.Employee
where MaritalStatus=#MaritalStatus
)
select *
into #temp
from s;
-- And now pages
declare #RowCount int; select #rowCount=COUNT(*) from #temp;
declare #PageCount int=ceiling(#rowCount/20); --assuming 20 lines/page
select *
, Page=NTILE(#PageCount)over(order by sr)
into Staging.EmployeesByMartialStatus
from #temp;
go
--------------------------------------------------------------------------
-- procedure to retrieve selected pages
create procedure EmployeesByMartialStatus_GetPage
#page int
as
declare #MaxPage int;
select #MaxPage=MAX(Page) from Staging.EmployeesByMartialStatus;
set #page=case when #page not between 1 and #MaxPage then 1 else #page end;
select EmployeeID,NationalIDNumber,ContactID,LoginID,ManagerID
, Title,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours
, CurrentFlag,rowguid,ModifiedDate
from Staging.EmployeesByMartialStatus
where Page=#page
GO
--------------------------------------------------------------------------
-- Usage
-- Load staging
exec dbo.EmployeesByMartialStatus 'M','NationalIDNumber';
-- Get pages 1 through n
exec dbo.EmployeesByMartialStatus_GetPage 1;
exec dbo.EmployeesByMartialStatus_GetPage 2;
-- ...etc (this would actually be a foreach loop, but that detail is omitted for brevity)
GO
I use this method of using EXEC():
-- SP parameters:
-- #query: Your query as an input parameter
-- #maximumRows: As number of rows per page
-- #startPageIndex: As number of page to filter
-- #sortBy: As a field name or field names with supporting DESC keyword
DECLARE #query nvarchar(max) = 'SELECT * FROM sys.Objects',
#maximumRows int = 8,
#startPageIndex int = 3,
#sortBy as nvarchar(100) = 'name Desc'
SET #query = ';WITH CTE AS (' + #query + ')' +
'SELECT *, (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingPageNo' +
', pagingCountRow / ' + CAST(#maximumRows as nvarchar(10)) + ' As pagingCountPage ' +
', (dt.pagingRowNo - 1) % ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingRowInPage ' +
'FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ' + #sortBy + ') As pagingRowNo, COUNT(*) OVER () AS pagingCountRow ' +
'FROM CTE) dt ' +
'WHERE (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 = ' + CAST(#startPageIndex as nvarchar(10))
EXEC(#query)
At result-set after query result columns:
Note:
I add some extra columns that you can remove them:
pagingRowNo : The row number
pagingCountRow : The total number of rows
pagingPageNo : The current page number
pagingCountPage : The total number of pages
pagingRowInPage : The row number that started with 1 in this page