Execute multiple dynamic T-SQL statements and obtain a limited number of unique values while preserving order - sql-server

I have a SourceTable and a table variable #TQueries containing various T-SQL predicates that target SourceTable.
The expected result is to dynamically generate SELECT statements that return a list of Id's as specified by the predicates in #TQueries. Each dynamically generated SELECT statement also needs to execute in a particular order, and the final set of values needs to be unique and the ordering must be preserved.
Fortunately, there's a limit to how many values need to be retrieved and how many dynamic queries need to be generated. The Id list should contain at most 10 Ids, and we don't expect more than 7 queries.
The following is a sample of this setup, not the actual data/database:
-- Set up some test data, this is quick and dirty just to provide some data to test against
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SourceTable]') AND type in (N'U'))
BEGIN
-- Create a numbers table, sorta
SELECT TOP 20
IDENTITY(INT,1,1) AS Id,
ABS(CHECKSUM(NewId())) % 100 AS [SomeValue]
INTO [SourceTable]
FROM sysobjects a
END
DECLARE #TQueries TABLE (
[Ordinal] INT,
[WherePredicate] NVARCHAR(MAX),
[OrderByPredicate] NVARCHAR(MAX)
);
-- Simulate SELECTs with different order by that get different data due to varying WHERE clauses and ORDER conditions
INSERT INTO #TQueries VALUES ( 1, N'[Id] IN (6,11,13,7,10,3,15)', '[SomeValue] ASC' ) -- Sort Asc
INSERT INTO #TQueries VALUES ( 2, N'[Id] IN (9,15,14,20,17)', '[SomeValue] DESC' ) -- Sort Desc
INSERT INTO #TQueries VALUES ( 3, N'[Id] IN (20,10,1,16,11,19,9,15,17,6,2,3,13)', 'NEWID()' ) -- Sort Random
My main issue has been avoiding the use of a CURSOR or iterating through the rows one by one. The closest I've come to a set operation that meets this criteria is using a table variable to store the results of each query or a massive CTE.
Suggestions and comments are welcome.

Here's a solution that builds a single statement both to run all the queries and to return the results.
It uses a similar approach as in your answer when iterating over the #TQueries table, i.e. it also uses {...} tokens where column values from #TQuery should go, and it puts the values there with nested REPLACE() calls.
Other than that, it heavily depends on ranking functions, and I'm not sure if doesn't really abuse them. You'd need to test this method before deciding if it's better or worse than the one you've got so far.
DECLARE #QueryTemplate nvarchar(max), #FinalSQL nvarchar(max);
SET #QueryTemplate =
N'SELECT
[Id],
QueryRank = {Ordinal},
RowRank = ROW_NUMBER() OVER (ORDER BY {OrderByPredicate})
FROM [dbo].[SourceTable]
WHERE {WherePredicate}
';
SET #FinalSQL =
N'WITH AllData AS (
' +
SUBSTRING(
(
SELECT
'UNION ALL ' +
REPLACE(REPLACE(REPLACE(#QueryTemplate,
'{Ordinal}' , [Ordinal] ),
'{OrderByPredicate}', [OrderByPredicate]),
'{WherePredicate}' , [WherePredicate] )
FROM #TQueries
ORDER BY [Ordinal]
FOR XML PATH (''), TYPE
).value('.', 'nvarchar(max)'),
11, -- starting just after the first 'UNION ALL '
CAST(0x7FFFFFFF AS int) -- max int; no need to specify the exact length
) +
'),
RankedData AS (
SELECT
[Id],
QueryRank,
RowRank,
ValueRank = ROW_NUMBER() OVER (PARTITION BY [Id] ORDER BY QueryRank)
FROM AllData
)SELECT TOP (#top)
[Id]
FROM RankedData
WHERE ValueRank = 1
ORDER BY
QueryRank,
RowRank
';
PRINT #FinalSQL;
EXECUTE sp_executesql #FinalSQL, N'#top int', 10;
Basically, every subquery gets these auxiliary columns:
QueryRank – a constant value (within the subquery's result set) derived from [Ordinal];
RowRank – a ranking assigned to a row based on the [OrderByPredicate].
The result sets are UNIONed and then every entry of every unique value is again ranked (ValueRank) based on the query ranking.
When pulling the final result set, duplicates are suppressed (by the condition ValueRank = 1), and QueryRank and RowRank are used in the ORDER BY clause to preserve the original row order.
I used EXECUTE sp_executesql #query instead of EXECUTE (#query), because the former allows you to add parameters to the query. In particular, I parametrised the number of results to return (the argument of TOP). But you could certainly concatenate that value into the dynamic script directly, just like other things, if you prefer EXECUTE () over EXECUTE sq_executesql.
If you like, you can try this query at SQL Fiddle. (Note: the SQL Fiddle version replaces the #TQueries table variable with the TQueries table.)

This is what I've managed to piece together cobbled from my original response and improved upon by comments from #AndriyM
DECLARE #sql_prefix NVARCHAR(MAX);
SET #sql_prefix =
N'DECLARE #TResults TABLE (
[Ordinal] INT IDENTITY(1,1),
[ContentItemId] INT
);
DECLARE #max INT, #top INT;
SELECT #max = 10;';
DECLARE #sql_insert_template NVARCHAR(MAX), #sql_body NVARCHAR(MAX);
SET #sql_insert_template =
N'SELECT #top = #max - COUNT(*) FROM #TResults;
INSERT INTO #TResults
SELECT TOP (#top) [Id]
FROM [dbo].[SourceTable]
WHERE
{WherePredicate}
AND NOT EXISTS (
SELECT 1
FROM #TResults AS [tr]
WHERE [tr].[ContentItemId] = [SourceTable].[Id]
)
ORDER BY {OrderByPredicate};';
WITH Query ([Ordinal],[SqlCommand]) AS (
SELECT
[Ordinal],
REPLACE(REPLACE(#sql_insert_template, '{WherePredicate}', [WherePredicate]), '{OrderByPredicate}', [OrderByPredicate])
FROM #TQueries
)
SELECT
#sql_body = #sql_prefix + (
SELECT [SqlCommand]
FROM Query
ORDER BY [Ordinal] ASC
FOR XML PATH(''),TYPE).value('.', 'varchar(max)') + CHAR(13)+CHAR(10)
+N' SELECT * FROM #TResults ORDER BY [Ordinal]';
EXEC(#sql_body);
The basic idea is to use a table variable to hold the results of each query. I create a template for the SQL and replace the values in the template based on what is stored in #TQueries.
Once the entire script is completed I run it with EXEC.

Related

Searching for multiple patterns in a string in T-SQL

In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.
Examples of these values are:
USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .
Example of string before:
"Offered to XVIV and USFRI as back ups. No response as of yet."
Example of string after:
"Offered to and as back ups. No response as of yet."
Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.
This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:
with t as (
select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s
union select 'Another row AZCH and TXJS words.'
), v as (
select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)
select t.s OriginalString, s.Removed
from t
cross apply (
select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
where not exists (select * from v where v.v = j.[value])
)s;
* Requires a fully-supported version of SQL Server.
build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. Your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION fn_ReplaceValue(#sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #ResultVar VARCHAR(500)
DECLARE #allValues TABLE (rowID int, sValues VARCHAR(15))
DECLARE #id INT = 0
DECLARE #ReplaceVal VARCHAR(10)
DECLARE #numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)
--Populate table variable with all values
INSERT #allValues
SELECT ROW_NUMBER() OVER(ORDER BY MyValuesCol) AS rowID, MyValuesCol
FROM MyValuesTable
SET #ResultVar = #sentence
WHILE (#id <= #numberOfValues)
BEGIN
SET #id = #id + 1
SET #ReplaceVal = (SELECT sValue FROM #allValues WHERE rowID = #id)
SET #ResultVar = REPLACE(#ResultVar, #ReplaceVal, SPACE(0))
END
RETURN #ResultVar
END
GO
I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:
DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);
If you just want to view your data sans these values, then use:
SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);

SQL - Add new column with outputs as values

Just wondering how I might go about adding the ouputted results as a new column to an exsisting table.
What I'm tryng to do is extract the date from a string which is in another column. I have the below code to do this:
Code
CREATE FUNCTION dbo.udf_GetNumeric
(
#strAlphaNumeric VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN
DECLARE #intAlpha INT
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric)
BEGIN
WHILE #intAlpha > 0
BEGIN
SET #strAlphaNumeric = STUFF(#strAlphaNumeric, #intAlpha, 1, '' )
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric )
END
END
RETURN ISNULL(#strAlphaNumeric,0)
END
GO
Now use the function as
SELECT dbo.udf_GetNumeric(column_name)
from table_name
The issue is that I want the result to be placed in a new column in an exsisting table. I have tried the below code but no luck.
ALTER TABLE [Data_Cube_Data].[dbo].[DB_Test]
ADD reportDated nvarchar NULL;
insert into [DB].[dbo].[DB_Test](reportDate)
SELECT
(SELECT dbo.udf_GetNumeric(FileNamewithDate) from [DB].[dbo].[DB_Test])
The syntax should be an UPDATE, not an INSERT, because you want to update existing rows, not insert new ones:
UPDATE Data_Cube_Data.dbo.DB_Test -- you don't need square bracket noise
SET reportDate = dbo.udf_GetNumeric(FileNamewithDate);
But yeah, I agree with the others, the function looks like the result of a "how can I make this object the least efficient thing in my entire database?" contest. Here's a better alternative:
-- better, set-based TVF with no while loop
CREATE FUNCTION dbo.tvf_GetNumeric
(#strAlphaNumeric varchar(256))
RETURNS TABLE
AS
RETURN
(
WITH cte(n) AS
(
SELECT TOP (256) n = ROW_NUMBER() OVER (ORDER BY ##SPID)
FROM sys.all_objects
)
SELECT output = COALESCE(STRING_AGG(
SUBSTRING(#strAlphaNumeric, n, 1), '')
WITHIN GROUP (ORDER BY n), '')
FROM cte
WHERE SUBSTRING(#strAlphaNumeric, n, 1) LIKE '%[0-9]%'
);
Then the query is:
UPDATE t
SET t.reportDate = tvf.output
FROM dbo.DB_Test AS t
CROSS APPLY dbo.tvf_GetNumeric(t.FileNamewithDate) AS tvf;
Example db<>fiddle that shows this has the same behavior as your existing function.
The function
As i mentioned in the comments, I would strongly suggest rewriting the function, it'll perform terribly. Multi-line table value function can perform poorly, and you also have a WHILE which will perform awfully. SQL is a set based language, and so you should be using set based methods.
There are a couple of alternatives though:
Inlinable Scalar Function
SQL Server 2019 can inline function, so you could inline the above. I do, however, assume that your value can only contain the characters A-z and 0-9. if it can contain other characters, such as periods (.), commas (,), quotes (") or even white space ( ), or your not on 2019 then don't use this:
CREATE OR ALTER FUNCTION dbo.udf_GetNumeric (#strAlphaNumeric varchar(256))
RETURNS varchar(256) AS
BEGIN
RETURN TRY_CONVERT(int,REPLACE(TRANSLATE(LOWER(#strAlphaNumeric),'abcdefghigclmnopqrstuvwxyz',REPLICATE('|',26)),'|',''));
END;
GO
SELECT dbo.udf_GetNumeric('abs132hjsdf');
The LOWER is there in case you are using a case sensitive collation.
Inline Table Value Function
This is the better solution in my mind, and doesn't have the caveats of the above.
It uses a Tally to split the data into individual characters, and then only reaggregate the characters that are a digit. Note that I assume you are using SQL Server 2017+ here:
DROP FUNCTION udf_GetNumeric; --Need to drop as it's a scalar function at the moment
GO
CREATE OR ALTER FUNCTION dbo.udf_GetNumeric (#strAlphaNumeric varchar(256))
RETURNS table AS
RETURN
WITH N AS (
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT TOP (LEN(#strAlphaNumeric))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4)
SELECT STRING_AGG(CASE WHEN V.C LIKE '[0-9]' THEN V.C END,'') WITHIN GROUP (ORDER BY T.I) AS strNumeric
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(#strAlphaNumeric,T.I,1)))V(C);
GO
SELECT *
FROM dbo.udf_GetNumeric('abs132hjsdf');
Your table
You define reportDated as nvarchar; this means nvarchar(1). Your function, however, returns a varchar(256); this will rarely fit in an nvarchar(1).
Define the column properly:
ALTER TABLE [dbo].[DB_Test] ADD reportDated varchar(256) NULL;
If you've already created the column then do the following:
ALTER TABLE [dbo].[DB_Test] ALTER COLUMN reportDated varchar(256) NULL;
I note, however, that the column is called "dated", which implies a date value, but it's a (n)varchar; that sounds like a flaw.
Updating the column
Use an UPDATE statement. Depending on the solution this would one of the following:
--Scalar function
UPDATE [dbo].[DB_Test]
SET reportDated = dbo.udf_GetNumeric(FileNamewithDate);
--Table Value Function
UPDATE DBT
SET reportDated = GN.strNumeric
FROM [dbo].[DB_Test] DBT
CROSS APPLY dbo.udf_GetNumeric(FileNamewithDate);

Dynamic SQL - How to use a value from a result as a column name?

I am working on a dynamic SQL query for a MsSQL stored procedure.
There is a table search.ProfileFields that contains the actual column names in a table I need to query.
My goal is to have the SQL select the specific column in the table, dynamically from its parent query..
A little confusing, heres an example:
DECLARE #sql nvarchar(max)
SELECT #sql = '
SELECT
pfs.SectionID,
pfs.SectionName,
pfs.Link,
(
SELECT
pf.FieldID,
pf.FieldTitle,
pf.FieldSQL,
pf.Restricted,
pf.Optional,
(
SELECT
pf.FieldSQL
FROM
Resources.emp.EmployeeComplete as e
WHERE
e.QID = #QID
) as Value
FROM
search.ProfileFields as pf
WHERE
pf.SectionID = pfs.SectionID
ORDER BY
pf.[Order]
FOR XML PATH (''field''), ELEMENTS, TYPE, ROOT (''fields'')
)
FROM
search.ProfileFieldSections as pfs
WHERE
pfs.Status = 1
FOR XML PATH (''data''), ELEMENTS, TYPE, ROOT (''root'')'
PRINT #sql
EXECUTE sp_executesql #sql, N'#QID varchar(10)', #QID = #QID
In the inner most select. I am querying pf.FieldSQL. I am looking for the actual value that was received by the parent select.
search.ProfileFields has a column called FieldSQL with a few results such as Name, Age, Location.
That is what I am trying to get my inner most select to do.
SELECT Name FROM ... - Name in this case comes from the value of pf.FieldSQL.
How can I go about querying a dynamic column name in this situation?
Have a look at this answer for a couple of suggestions. If your table definition is complex or changes occasionally you probably should use pivot. Here's one that might work for you, so long as column names in the FieldSQL column are well defined, there are not too many of them, and they don't ever change or get added to:
DECLARE #sql nvarchar(max)
SELECT #sql = '
SELECT
pfs.SectionID,
pfs.SectionName,
pfs.Link,
(
SELECT
pf.FieldID,
pf.FieldTitle,
pf.FieldSQL,
pf.Restricted,
pf.Optional,
(
SELECT case pf.FieldSQL
when 'Name' then e.Name
when 'DOB' then convert(nvarchar(10), e.DOB, 126)
-- ... etc.
-- NOTE: may need to aggregated depending on uniqueness of QID:
-- when 'Name' then min(e.Name)
-- when 'DOB' then convert(nvarchar(10), min(e.DOB), 126)
end
FROM
Resources.emp.EmployeeComplete as e
WHERE
e.QID = #QID
) as Value
FROM
search.ProfileFields as pf
WHERE
pf.SectionID = pfs.SectionID
ORDER BY
pf.[Order]
FOR XML PATH (''field''), ELEMENTS, TYPE, ROOT (''fields'')
)
FROM
search.ProfileFieldSections as pfs
WHERE
pfs.Status = 1
FOR XML PATH (''data''), ELEMENTS, TYPE, ROOT (''root'')'
PRINT #sql
EXECUTE sp_executesql #sql, N'#QID varchar(10)', #QID = #QID
Take a look at "PIVOT" operators here PIVOT
Thisshould give you some ideas how to use them.

SQL Server 2008 split string fails due to ampersand

I have created a stored procedure to attempt to replicate the split_string function that is now in SQL Server 2016.
So far I have got this:
CREATE FUNCTION MySplit
(#delimited NVARCHAR(MAX), #delimiter NVARCHAR(100))
RETURNS #t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
id INT IDENTITY(1,1), -- I use this column for numbering split parts
val NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #xml = N'<root><r>' + replace(#delimited,#delimiter,'</r><r>') + '</r></root>'
INSERT INTO #t(val)
SELECT
r.value('.','varchar(max)') AS item
FROM
#xml.nodes('//root/r') AS records(r)
RETURN
END
GO
And it does work, but it will not split the text string if any part of it contains an ampersand [ & ].
I have found hundreds of examples of splitting a string, but none seem to deal with special characters.
So using this:
select *
from MySplit('Test1,Test2,Test3', ',')
works ok, but
select *
from MySplit('Test1 & Test4,Test2,Test3', ',')
does not. It fails with
XML parsing: line 1, character 17, illegal name character.
What have I done wrong?
UPDATE
Firstly, thanks for #marcs, for showing me the error of my ways in writing this question.
Secondly, Thanks to all of the help below, especially #PanagiotisKanavos and #MatBailie
As this is throw away code for migrating data from old to new system, I have chosen to use #MatBailie solution, quick and very dirty, but also perfect for this task.
In the future, though, I will be progressing down #PanagiotisKanavos solution.
Edit your function and replace all & as &
This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.
Create FUNCTION [dbo].[split_stringss](
#delimited NVARCHAR(MAX),
#delimiter NVARCHAR(100)
) RETURNS #t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
DECLARE #xml XML
DECLARE #var NVARCHAR(MAX)
DECLARE #var1 NVARCHAR(MAX)
set #var1 = Replace(#delimited,'&','&')
SET #xml = N'<t>' + REPLACE(#var1,#delimiter,'</t><t>') + '</t>'
INSERT INTO #t(val)
SELECT r.value('.','varchar(MAX)') as item
FROM #xml.nodes('/t') as records(r)
RETURN
END
First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items
In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.
In some cases, the second fastest is what you used -
convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.
You can replace invalid characters by their encoded values, eg & with & but you have to be certain that your text will never contain such encodings.
Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :
CREATE FUNCTION dbo.SplitStrings_Moden
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(#List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000))
FROM cteStart s;
Personally I created and use a SQLCLR UDF.
Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:
var products=connection.Query<Product>("select * from products where id in #ids",new {ids=myIdArray});
An ORM like EF that supports LINQ can also generate an IN clause :
var products = from product in dbContext.Products
where myIdArray.Contains(product.Id)
select product;

Paging, sorting and filtering in a stored procedure (SQL Server)

I was looking at different ways of writing a stored procedure to return a "page" of data. This was for use with the ASP ObjectDataSource, but it could be considered a more general problem.
The requirement is to return a subset of the data based on the usual paging parameters; startPageIndex and maximumRows, but also a sortBy parameter to allow the data to be sorted. Also there are some parameters passed in to filter the data on various conditions.
One common way to do this seems to be something like this:
[Method 1]
;WITH stuff AS (
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
One problem with this is that it doesn't give the total count and generally we need another stored procedure for that. This second stored procedure has to replicate the parameter list and the complex WHERE clause. Not nice.
One solution is to append an extra column to the final select list, (SELECT COUNT(*) FROM stuff) AS TotalRows. This gives us the total but repeats it for every row in the result set, which is not ideal.
[Method 2]
An interesting alternative is given here (https://web.archive.org/web/20211020111700/https://www.4guysfromrolla.com/articles/032206-1.aspx) using dynamic SQL. He reckons that the performance is better because the CASE statement in the first solution drags things down. Fair enough, and this solution makes it easy to get the totalRows and slap it into an output parameter. But I hate coding dynamic SQL. All that 'bit of SQL ' + STR(#parm1) +' bit more SQL' gubbins.
[Method 3]
The only way I can find to get what I want, without repeating code which would have to be synchronized, and keeping things reasonably readable is to go back to the "old way" of using a table variable:
DECLARE #stuff TABLE (Row INT, ...)
INSERT INTO #stuff
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
(Or a similar method using an IDENTITY column on the table variable).
Here I can just add a SELECT COUNT on the table variable to get the totalRows and put it into an output parameter.
I did some tests and with a fairly simple version of the query (no sortBy and no filter), method 1 seems to come up on top (almost twice as quick as the other 2). Then I decided to test probably I needed the complexity and I needed the SQL to be in stored procedures. With this I get method 1 taking nearly twice as long as the other 2 methods. Which seems strange.
Is there any good reason why I shouldn't spurn CTEs and stick with method 3?
UPDATE - 15 March 2012
I tried adapting Method 1 to dump the page from the CTE into a temporary table so that I could extract the TotalRows and then select just the relevant columns for the resultset. This seemed to add significantly to the time (more than I expected). I should add that I'm running this on a laptop with SQL Server Express 2008 (all that I have available) but still the comparison should be valid.
I looked again at the dynamic SQL method. It turns out I wasn't really doing it properly (just concatenating strings together). I set it up as in the documentation for sp_executesql (with a parameter description string and parameter list) and it's much more readable. Also this method runs fastest in my environment. Why that should be still baffles me, but I guess the answer is hinted at in Hogan's comment.
I would most likely split the #SortBy argument into two, #SortColumn and #SortDirection, and use them like this:
…
ROW_NUMBER() OVER (
ORDER BY CASE #SortColumn
WHEN 'Name' THEN Name
WHEN 'OtherName' THEN OtherName
…
END *
CASE #SortDirection
WHEN 'DESC' THEN -1
ELSE 1
END
) AS Row
…
And this is how the TotalRows column could be defined (in the main select):
…
COUNT(*) OVER () AS TotalRows
…
I would definitely want to do a combination of a temp table and NTILE for this sort of approach.
The temp table will allow you to do your complicated series of conditions just once. Because you're only storing the pieces you care about, it also means that when you start doing selects against it further in the procedure, it should have a smaller overall memory usage than if you ran the condition multiple times.
I like NTILE() for this better than ROW_NUMBER() because it's doing the work you're trying to accomplish for you, rather than having additional where conditions to worry about.
The example below is one based off a similar query I'm using as part of a research query; I have an ID I can use that I know will be unique in the results. Using an ID that was an identity column would also be appropriate here, though.
--DECLARES here would be stored procedure parameters
declare #pagesize int, #sortby varchar(25), #page int = 1;
--Create temp with all relevant columns; ID here could be an identity PK to help with paging query below
create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime);
--Insert into #temp based off of your complex conditions, but with no attempt at paging
insert into #temp
(id, status, lastname, startdate)
select id, status, lastname, startdate
from Table1 ...etc.
where ...complicated conditions
SET #pagesize = 50;
SET #page = 5;--OR CAST(#startRowIndex/#pagesize as int)+1
SET #sortby = 'name';
--Only use the id and count to use NTILE
;with paging(id, pagenum, totalrows) as
(
select id,
NTILE((SELECT COUNT(*) cnt FROM #temp)/#pagesize) OVER(ORDER BY CASE WHEN #sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END),
cnt
FROM #temp
cross apply (SELECT COUNT(*) cnt FROM #temp) total
)
--Use the id to join back to main select
SELECT *
FROM paging
JOIN #temp ON paging.id = #temp.id
WHERE paging.pagenum = #page
--Don't need the drop in the procedure, included here for rerunnability
drop table #temp;
I generally prefer temp tables over table variables in this scenario, largely so that there are definite statistics on the result set you have. (Search for temp table vs table variable and you'll find plenty of examples as to why)
Dynamic SQL would be most useful for handling the sorting method. Using my example, you could do the main query in dynamic SQL and only pull the sort method you want to pull into the OVER().
The example above also does the total in each row of the return set, which as you mentioned was not ideal. You could, instead, have a #totalrows output variable in your procedure and pull it as well as the result set. That would save you the CROSS APPLY that I'm doing above in the paging CTE.
I would create one procedure to stage, sort, and paginate (using NTILE()) a staging table; and a second procedure to retrieve by page. This way you don't have to run the entire main query for each page.
This example queries AdventureWorks.HumanResources.Employee:
--------------------------------------------------------------------------
create procedure dbo.EmployeesByMartialStatus
#MaritalStatus nchar(1)
, #sort varchar(20)
as
-- Init staging table
if exists(
select 1 from sys.objects o
inner join sys.schemas s on s.schema_id=o.schema_id
and s.name='Staging'
and o.name='EmployeesByMartialStatus'
where type='U'
)
drop table Staging.EmployeesByMartialStatus;
-- Populate staging table with sort value
with s as (
select *
, sr=ROW_NUMBER()over(order by case #sort
when 'NationalIDNumber' then NationalIDNumber
when 'ManagerID' then ManagerID
-- plus any other sort conditions
else EmployeeID end)
from AdventureWorks.HumanResources.Employee
where MaritalStatus=#MaritalStatus
)
select *
into #temp
from s;
-- And now pages
declare #RowCount int; select #rowCount=COUNT(*) from #temp;
declare #PageCount int=ceiling(#rowCount/20); --assuming 20 lines/page
select *
, Page=NTILE(#PageCount)over(order by sr)
into Staging.EmployeesByMartialStatus
from #temp;
go
--------------------------------------------------------------------------
-- procedure to retrieve selected pages
create procedure EmployeesByMartialStatus_GetPage
#page int
as
declare #MaxPage int;
select #MaxPage=MAX(Page) from Staging.EmployeesByMartialStatus;
set #page=case when #page not between 1 and #MaxPage then 1 else #page end;
select EmployeeID,NationalIDNumber,ContactID,LoginID,ManagerID
, Title,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours
, CurrentFlag,rowguid,ModifiedDate
from Staging.EmployeesByMartialStatus
where Page=#page
GO
--------------------------------------------------------------------------
-- Usage
-- Load staging
exec dbo.EmployeesByMartialStatus 'M','NationalIDNumber';
-- Get pages 1 through n
exec dbo.EmployeesByMartialStatus_GetPage 1;
exec dbo.EmployeesByMartialStatus_GetPage 2;
-- ...etc (this would actually be a foreach loop, but that detail is omitted for brevity)
GO
I use this method of using EXEC():
-- SP parameters:
-- #query: Your query as an input parameter
-- #maximumRows: As number of rows per page
-- #startPageIndex: As number of page to filter
-- #sortBy: As a field name or field names with supporting DESC keyword
DECLARE #query nvarchar(max) = 'SELECT * FROM sys.Objects',
#maximumRows int = 8,
#startPageIndex int = 3,
#sortBy as nvarchar(100) = 'name Desc'
SET #query = ';WITH CTE AS (' + #query + ')' +
'SELECT *, (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingPageNo' +
', pagingCountRow / ' + CAST(#maximumRows as nvarchar(10)) + ' As pagingCountPage ' +
', (dt.pagingRowNo - 1) % ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingRowInPage ' +
'FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ' + #sortBy + ') As pagingRowNo, COUNT(*) OVER () AS pagingCountRow ' +
'FROM CTE) dt ' +
'WHERE (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 = ' + CAST(#startPageIndex as nvarchar(10))
EXEC(#query)
At result-set after query result columns:
Note:
I add some extra columns that you can remove them:
pagingRowNo : The row number
pagingCountRow : The total number of rows
pagingPageNo : The current page number
pagingCountPage : The total number of pages
pagingRowInPage : The row number that started with 1 in this page

Resources