SQL Server: replace sequence of same characters inside Text Field (TSQL only) - sql-server

I have a text column varchar(4000) with text:
'aaabbaaacbaaaccc'
and I need to remove all duplicated chars - so only one from sequence left:
'abacbac'
It should not be a function, Procedure or CLR - Regex solution. Only true SQL select.
Currently I think about using recursive WITH clause with replace 'aa'->'a', 'bb'->'b', 'cc'->'c'.
So recursion should cycle until all duplicated sequences of that chars would be replaced.
Do you have another solution, perhaps more performant one?
PS: I searched through this site about different replace examples - they didn't suit to this case.

Assuming a table definition of
CREATE TABLE myTable(rowID INT IDENTITY(1,1), dupedchars NVARCHAR(4000))
and data..
INSERT INTO myTable
SELECT 'aaabbaaacbaaaccc'
UNION
SELECT 'abcdeeeeeffgghhaaabbbjdduuueueu999whwhwwwwwww'
this query meets your criteria
WITH Numbers(n)
AS
( SELECT 1 AS n
UNION ALL
SELECT (n + 1) AS n
FROM Numbers
WHERE n < 4000
)
SELECT rowid,
( SELECT CASE
WHEN SUBSTRING(dupedchars,n2.n,1) = SUBSTRING(dupedchars+' ',n2.n+1,1) THEN ''
ELSE SUBSTRING(dupedchars,n2.n,1)
END AS [text()]
FROM myTable t2,numbers n2
WHERE n2.n <= LEN(dupedchars)
AND t.rowid = t2.rowid
FOR XML path('')
) AS deduped
FROM myTable t
OPTION(MAXRECURSION 4000)
Output
rowid deduped
1 abacbac
2 abcdefghabjdueueu9whwhw

Related

Replace specials chars with HTML entities

I have the following in table TABLE
id content
-------------------------------------
1 Hellö world, I äm text
2 ènd there äré many more chars
3 that are speçial in my dat£base
I now need to export these records into HTML files, using bcp:
set #command = 'bcp "select [content] from [TABLE] where [id] = ' +
#id queryout +' + #filename + '.html" -S ' + #instance +
' -c -U ' + #username + ' -P ' + #password"
exec xp_cmdshell #command, no_ouput
To make the output look correct, I need to first replace all special characters with their respective HTML entities (pseudo)
insert into [#temp_html] ..
replace(replace([content], 'ö', 'ö'), 'ä', 'ä')
But by now, I have 30 nested replaces and it's starting to look insane.
After much searching, I found this post which uses a HTML conversion table but it is too advanced for me to understand:
The table does not list the special chars itself as they are in my text (ö, à etc) but UnicodeHex. Do I need to add them to the table to make the conversions that I need?
I am having trouble understanding how to update my script to replace all special chars. Can someone please show me a snippet of (pseudo) code?
One way to do that with a translation table is using a recursive cte to do the replaces, and one more cte to get only the last row of each translated value.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
id int,
content nvarchar(100)
)
INSERT INTO #T (id, content) VALUES
(1, 'Hellö world, I äm text'),
(2, 'ènd there äré many more chars'),
(3, 'that are speçial in my dat£base')
Then, create and populate the translation table (I don't know the HTML entities for these chars, so I've just used numbers [plus it's easier to see in the results]). Also, please note that this can be done using yet another cte in the chain.
DECLARE #Translations AS TABLE
(
str nchar(1),
replacement nvarchar(10)
)
INSERT INTO #Translations (str, replacement) VALUES
('ö', '-1-'),
('ä', '-2-'),
('è', '-3-'),
('ä', '-4-'),
('é', '-5-'),
('ç', '-6-'),
('£', '-7-')
Now, the first cte will do the replaces, and the second cte just adds a row_number so that for each id, the last value of lvl will get 1:
;WITH CTETranslations AS
(
SELECT id, content, 1 As lvl
FROM #T
UNION ALL
SELECT id, CAST(REPLACE(content, str, replacement) as nvarchar(100)), lvl+1
FROM CTETranslations
JOIN #Translations
ON content LIKE '%' + str + '%'
), cteNumberedTranslation AS
(
SELECT id, content, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY lvl DESC) rn
FROM CTETranslations
)
Select from the second cte where rn = 1, I've joined the original table to show the source and translation side by side:
SELECT r.id, s.content, r.content
FROM #T s
JOIN cteNumberedTranslation r
ON s.Id = r.Id
WHERE rn = 1
ORDER BY Id
Results:
id content content
1 Hellö world, I äm text Hell-1- world, I -4-m text
2 ènd there äré many more chars -3-nd there -4-r-5- many more chars
3 that are speçial in my dat£base that are spe-6-ial in my dat-7-base
Please note that if your content have more that 100 special chars, you will need to add the maxrecursion 0 hint to the final select:
SELECT r.id, s.content, r.content
FROM #T s
JOIN cteNumberedTranslation r
ON s.Id = r.Id
WHERE rn = 1
ORDER BY Id
OPTION ( MAXRECURSION 0 );
See a live demo on rextester.

Searching stored procedure and views but ignoring comments

I have a query as
select
definition
from
sys.objects so
join
sys.sql_modules ssmsp on so.[object_id] = ssmsp.[object_id]
where
so.type in ('v', 'p')
where
definition like '%exec%'
While populating records, gets populated from comments also. How can I avoid getting filtered from comments?
Is there any solution?
Thanks
I think this is going to be nearly impossible to achieve in a single query.
Bear in mind that [definition] has no formatting, no line breaks, etc. the code is a single line (copy one and paste it into the editor).
If a comment starts with -- then where does it end? you have no way of knowing.
It is a little easier with /* because you can find the corresponding */ but there is still the added complication of multiple occurrences of the search string.
You might have a little more luck using PATINDEX and specifying a case-sensitive version of your collation (if you have a case insensitive database) and for example you know you only want occurrences of EXEC and not "execute" e.g. WHERE patindex('%EXEC%',defintion COLLATE SQL_Latin1_General_CP1_CS_AS) > 0
First for a fast varchar(max) string "splitter". Below is a hacked version of Jeff Moden's delimitedSplit8K.
IF OBJECT_ID('dbo.DelimitedSplit2B','IF') IS NOT NULL DROP FUNCTION dbo.DelimitedSplit2B;
GO
CREATE FUNCTION dbo.DelimitedSplit2B
(
#pString varchar(max),
#pDelimiter char(1)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT N
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(N)
), --216 values
cteTally(N) AS
(
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c
--2,176,782,336 rows: enough to handle 2,147,483,647 characters (the varchar(max) limit)
),
cteStart(N1) AS
(
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#pString,t.N,1) = #pDelimiter OR t.N = 0)
)
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(#pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1)
OVER (ORDER BY s.N1) - 1),0)-s.N1,DATALENGTH(ISNULL(#pString,1))))
FROM cteStart s;
Next for the function to search your DDL
create function dbo.SearchObjectDDLFor (#searchstring varchar(100), #maxLen int)
returns table as return
select objectName, lineNumber, lineText
from
(
select
objectName = ss.[name]+'.'+so.[name],
lineNumber = itemnumber,
lineText = substring(t.item, 1, isnull(nullif(charindex('--', t.item),0)-1, 8000)),
isLongComment =
sum
( -- this will assign a 1 for everything
case when t.item like '/*%' then 1
when t.item like '*/%'then -1
else 0 end
) over (partition by so.[name] order by itemnumber)
from sys.objects so
join sys.sql_modules ssmsp on so.[object_id] = ssmsp.[object_id]
join sys.schemas ss on so.schema_id = ss.schema_id
cross apply dbo.delimitedSplit2B(definition, char(10))
cross apply (values (rtrim(ltrim(replace(item,char(13),''))))) t(item)
where so.type in ('v', 'p')
and len(definition) < isnull(#maxLen,100000) -- character limit is #maxLen (100K default)
) splitLines
where isLongComment = 0 and lineText not like '--%' and lineText <> '*/'
and lineText like '%'+#searchstring+'%';
This function:
Accepts an input string to search for (#searchstring)
Splits your objects into lines
Returns only the portions of the line not part of a comment
filters the lines created in step3 for ones that contain #searchstring and returns the ObjectName (.), Line number and Text.
Caveats:
I just threw this together quick so forgive any errors
A t-sql splitter that accepts [n]varchar(max) will be slow. A CLR splitter would likely be faster but we're not talking about millions of rows. That said, you can speed it up by filtering the number of lines with #maxLen. #maxlen says "ignore and objects with more that #maxLen number of lines." When null it will search objects up to 100K lines long (but this can be adjusted).
This function address comments scenarios where comments look have "--" any where in the string: and scenarios where the comment is nested between "/" and "\ on separate lines.
a few scenarios which require more coding to suppress the comments include:
.
select col1, /* skipping col2 for now */ col3, col4
and
/*********
comments here
*********/
Examples:
select * from dbo.SearchObjectDDLFor('nocount', NULL);
select * from dbo.SearchObjectDDLFor('nocount', 2000);
Results will look something like :

Show/extract only the character(s) match in SQL Server query

How can I extract and show only the characters from a string on a column I am searching in SQL Server if the position of the characters varies on the string
Example input:
Mich%ael#
#Scott
Ran%dy
A#nder%son
Output:
%#
#
%
#%
I only able to think of a query like
select
columnname
from
dbo.tablename with (noLock)
where
columnname like '%[%#]%'
but this would not strip and show only the characters I want. I looked at substring() function but this requires knowing the position of the character to be stripped.
If you don't want or can't use a UDF, consider the following:
Declare #YourTable table (SomeField varchar(50))
Insert Into #YourTable values
('Mich%ael#'),
('#Scott'),
('Ran%dy'),
('A#nder%son')
Select A.*
,Stripped = max(B.Value)
From #YourTable A
Cross Apply (
Select Value=Stuff((Select '' + String
From (
Select String= Substring(a.b, v.number+1, 1) From (select A.SomeField b) a
Join master..spt_values v on v.number < len(a.b)
Where v.type = 'P'
) A
Where String in ('%','#') --<<<< This is Your Key Filter
For XML Path ('')),1,0,'')
) B
Group By SomeField
Returns
SomeField Stripped
#Scott #
A#nder%son #%
Mich%ael# %#
Ran%dy %

With SQL Server, How can I query a table based on a delimited string as the criteria?

I have the following tables:
tbl_File:
FileID | Filename
-----------------
1 | test.jpg
and
tbl_Tag:
TagID | TagName
---------------
1 | Red
and
tbl_TagFile:
ID | TagID | FileID
-------------------
1 | 1 | 1
I need to pass a non-inclusive query against these tables. For example, imagine a list of checkboxes to select one or more tags, and then a search button. I need to pass the TagID's to the query as a PIPE delimited string, such as "1|2|5|"
The search results need to be non-inclusive, such as if it must meet all the criteria. If 3 tags are selected, the results are to be files that have all 3 tags associated with them.
I think I've made this too complicated, but tried iterating over the tags using charindex and stuff to work my way through the string, but it seems there must be an easier way.
I'd like to do this as a function... Such as
SELECT FileID, Filename
FROM tbl_Files
WHERE dbo.udf_FileExistswithTags(#Tags, FileID) = 1
Any efficient way to do this?
It doesn't sound from your example scenario that the actual "need" is to pass a pipe-delimited string. I would highly suggest abandoning that idea and using a Table Value Parameter in your stored procedure. This has numerous advantages in that you will not hit a datatype limit or a "number of parameters" limit that might occur with very large sets of criteria. Additionally it gets away from any need to run a (potentially very slow) UDF.
Split the string into tokens on the application side, and then insert each token as a row in the TVP. Example below:
Create the TVP type in your database:
CREATE TYPE [dbo].[FileNameType] AS TABLE
(
fileName varchar(1000)
)
On the application side, build your list of filename tokens into a recordset:
private static List<SqlDataRecord> BuildFileNameTokenRecords(IEnumerable<string> tokens)
{
var records = new List<SqlDataRecord>();
foreach (string token in tokens){
var record = new SqlDataRecord(
new SqlMetaData[]
{
new SqlMetaData("fileName", SqlDbType.Varchar),
}
);
records.Add(record);
}
return records;
}
Wherever you run your proc from (rough code here):
var records = BuildFileNameTokenRecords(listofstrings);
var sqlCmd = sqlDb.GetStoredProcCommand("FileExists");
sqlDb.AddInParameter(sqlCmd, "tvpFilenameTokens", SqlDbType.Structured, records);
ExecuteNonQuery(sqlCmd);
Filtering your select statement then simply becomes a matter of joining on the tokens in the table parameter. Something like this:
CREATE PROCEDURE dbo.FileExists
(
-- Put additional parameters here
#tvpFilenameTokens dbo.FileNameType READONLY,
)
AS
BEGIN
SELECT FileID, Filename
FROM tbl_Files INNER JOIN #tvpFilenameTokens
ON tbl_Files.FileID = #tvpFilenameTokens.fileName
END
Here is an option that should scale. All of the functionality is available back to SQL Server 2005. It uses a CTE to separate the portion of the query that finds only the FileIDs that have all of the TagIDs passed in, and then that list of FileIDs is joined to the [File] table to get the details. It also uses an INNER JOIN instead of an IN list to match the TagID's.
Please note that the example below uses a SQLCLR splitter that is freely available in the SQL# library (which I wrote, but this function is in the Free version). The specific splitter used is not the important part; it should just be one that is either SQLCLR, an inline tally-table (like the one used in #wewesthemenace's answer), or is the XML method. Just don't use a splitter based on a WHILE-loop or a recursive CTE.
---- TEST SETUP
DECLARE #File TABLE
(
FileID INT NOT NULL PRIMARY KEY,
[Filename] NVARCHAR(200) NOT NULL
);
DECLARE #TagFile TABLE
(
TagID INT NOT NULL,
FileID INT NOT NULL,
PRIMARY KEY (TagID, FileID)
);
INSERT INTO #File VALUES (1, 'File1.txt');
INSERT INTO #File VALUES (2, 'File2.txt');
INSERT INTO #File VALUES (3, 'File3.txt');
INSERT INTO #TagFile VALUES (1, 1);
INSERT INTO #TagFile VALUES (2, 1);
INSERT INTO #TagFile VALUES (5, 1);
INSERT INTO #TagFile VALUES (1, 2);
INSERT INTO #TagFile VALUES (2, 2);
INSERT INTO #TagFile VALUES (4, 2);
INSERT INTO #TagFile VALUES (1, 3);
INSERT INTO #TagFile VALUES (2, 3);
INSERT INTO #TagFile VALUES (5, 3);
INSERT INTO #TagFile VALUES (6, 3);
---- DONE WITH TEST SETUP
DECLARE #TagsToGet VARCHAR(100); -- this would be the proc input parameter
SET #TagsToGet = '1|2|5';
CREATE TABLE #Tags (TagID INT NOT NULL PRIMARY KEY);
DECLARE #NumTags INT;
INSERT INTO #Tags (TagID)
SELECT split.SplitVal
FROM SQL#.String_Split4k(#TagsToGet, '|', 1) split;
SET #NumTags = ##ROWCOUNT;
;WITH files AS
(
SELECT tf.FileID
FROM #TagFile tf
INNER JOIN #Tags tg
ON tg.TagID = tf.TagID
GROUP BY tf.FileID
HAVING COUNT(*) = #NumTags
)
SELECT fl.*
FROM #File fl
INNER JOIN files
ON files.FileID = fl.FileID
ORDER BY fl.[Filename] ASC;
DROP TABLE #Tags; -- don't need this if code above is placed in a proc
Results:
FileID Filename
1 File1.txt
3 File3.txt
Notes
As much as I love TVPs (and I do, when they are done correctly and used appropriately), I would say that they are a bit much for this type of small scale, single dimensional array scenario. There won't really be any performance gain over using a SQLCLR streaming TVF string splitter but it would require more app code and the additional User-Defined Table Type, which can't be updated without first dropping all procs that reference it. That doesn't happen all of the time, but needs to be considered in terms of long-term maintenance costs.
The JOIN between TagFile and the temporary table populated from the split operation should be much more efficient than using an IN list with a subquery for the split operation. An IN list is short-hand for all of the values in it to be their own OR conditions. Hence the JOIN is a fully set-based approach that lets the Query Optimizer do its thang.
The structure I used for the test #TagFile table only has the two relevant IDs in it: TagID and FileID. It does not have the ID field that I assume is an IDENTITY field on this table. Unless there is a very specific reason for needing that IDENTITY field, I would suggest removing it. It adds to inherent benefit as the combination of TagID and FileID is a natural key (i.e. it is both NOT NULL and Unique). And if the Clustered PK of this table were simply those two fields, the JOIN to the temp table of those split-out TagIDs would be quite fast, even with millions of rows in TagFile.
One reason that this approach works so much better than trying to handle this via a function per FileID (outside of the obvious set-based is better than cursor-based reason) is that the list of TagIDs is the same for all files to be checked. So splitting that out more than one time is a waste of effort.
By not splitting the TagID list inline in the query I am able to capture the number of elements in that list with no additional effort. Hence this saves from needing to do a secondary calculation.
Here is a function called DelimitedSplit8K by Jeff Moden. This is used to split strings of length up to 8000. For more info, read this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString VARCHAR(8000), --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
#pDelimiter CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (--10E+1 or 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
),
cteLen(N1, L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
Your query would now be:
DECLARE #pString VARCHAR(8000) = '1|3|5'
SELECT
f.*
FROM tbl_File f
INNER JOIN tbl_TagFile tf ON tf.FileID = f.FileID
WHERE
tf.TagID IN(SELECT CAST(item AS INT) FROM dbo.DelimitedSplit8K(#pString, '|'))
GROUP BY f.FileID, f.FileName
HAVING COUNT(tf.ID) = (LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
The statement below counts the number of TagID in the parameter by counting the occurrence of the delimiter | + 1.
(LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
Here is an option that does not require UDF's.
It can be argued that this is also complicated.
DECLARE #TagList VARCHAR(50)
-- pass in this
SET #TagList = '1|3|6'
SELECT
FinalSet.FileID,
FinalSet.Tag,
FinalSet.TotalMatches
FROM
(
SELECT
tbl_TagFile.FileID,
tbl_TagFile.Tag,
COUNT(*) OVER(PARTITION BY tbl_TagFile.FileID) TotalMatches
FROM
(
SELECT 1 FileID, '1' Tag UNION ALL
SELECT 1 , '2' UNION ALL
SELECT 1 , '3' UNION ALL
SELECT 1 , '6' UNION ALL
SELECT 2 , '1' UNION ALL
SELECT 2 , '3'
) tbl_TagFile
INNER JOIN
(
SELECT tbl_Tag.Tag
FROM
(
SELECT '1' Tag UNION ALL
SELECT '2' UNION ALL
SELECT '3' UNION ALL
SELECT '4' UNION ALL
SELECT '5' UNION ALL
SELECT '6'
) tbl_Tag
WHERE '|' + #TagList + '|' LIKE '%|' + Tag + '|%'
) LimitedTagTable
ON LimitedTagTable.Tag = tbl_TagFile.Tag
) FinalSet
WHERE
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
There's some complications in this around data types and indexes and stuff but you can see the concept - you are only getting the records that match your passed in string.
subtable LimitedTagTable is your tag list filtered by your input pipe delimited string
subtable FinalSet joins your limited tag list to your list of files
column TotalMatches works out how many tag matches your file had
Finally this line limits the output to those files that had enough matches:
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
Please experiment with different inputs and datasets and see if it suits as I have made a number of assumptions.
I'm answering my own question, in hopes that someone can let me know if/how flawed it is. So far it seems to be working but just early testing.
Function:
ALTER FUNCTION [dbo].[udf_FileExistsByTags]
(
#FileID int
,#Tags nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE #Exists bit = 0
DECLARE #Count int = 0
DECLARE #TagTable TABLE ( FileID int, TagID int )
DECLARE #Tag int
WHILE len(#Tags) > 0
BEGIN
SET #Tag = CAST(LEFT(#Tags, charindex('|', #Tags + '|') -1) as int)
SET #Count = #Count + 1
IF EXISTS (SELECT * FROM tbl_FileTag WHERE FileID = #FileID AND TagID = #Tag )
BEGIN
INSERT INTO #TagTable ( FileID, TagID ) VALUES ( #FileID, #Tag )
END
SET #Tags = STUFF(#Tags, 1, charindex('|', #Tags + '|'), '')
END
SET #Exists = CASE WHEN #Count = (SELECT COUNT(*) FROM #TagTable) THEN 1 ELSE 0 END
RETURN #Exists
END
Then in the query:
SELECT * FROM tbl_File a WHERE dbo.udf_FileExistsByTags(a.FileID, #Tags) = 1
So now I'm looking for errors.
What do you think? Probably not every efficient, however this search will be used only on a periodic basis.

Paging, sorting and filtering in a stored procedure (SQL Server)

I was looking at different ways of writing a stored procedure to return a "page" of data. This was for use with the ASP ObjectDataSource, but it could be considered a more general problem.
The requirement is to return a subset of the data based on the usual paging parameters; startPageIndex and maximumRows, but also a sortBy parameter to allow the data to be sorted. Also there are some parameters passed in to filter the data on various conditions.
One common way to do this seems to be something like this:
[Method 1]
;WITH stuff AS (
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
One problem with this is that it doesn't give the total count and generally we need another stored procedure for that. This second stored procedure has to replicate the parameter list and the complex WHERE clause. Not nice.
One solution is to append an extra column to the final select list, (SELECT COUNT(*) FROM stuff) AS TotalRows. This gives us the total but repeats it for every row in the result set, which is not ideal.
[Method 2]
An interesting alternative is given here (https://web.archive.org/web/20211020111700/https://www.4guysfromrolla.com/articles/032206-1.aspx) using dynamic SQL. He reckons that the performance is better because the CASE statement in the first solution drags things down. Fair enough, and this solution makes it easy to get the totalRows and slap it into an output parameter. But I hate coding dynamic SQL. All that 'bit of SQL ' + STR(#parm1) +' bit more SQL' gubbins.
[Method 3]
The only way I can find to get what I want, without repeating code which would have to be synchronized, and keeping things reasonably readable is to go back to the "old way" of using a table variable:
DECLARE #stuff TABLE (Row INT, ...)
INSERT INTO #stuff
SELECT
CASE
WHEN #SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name)
WHEN #SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC)
WHEN #SortBy = ...
ELSE ROW_NUMBER() OVER (ORDER BY whatever)
END AS Row,
.,
.,
.,
FROM Table1
INNER JOIN Table2 ...
LEFT JOIN Table3 ...
WHERE ... (lots of things to check)
SELECT *
FROM stuff
WHERE (Row > #startRowIndex)
AND (Row <= #startRowIndex + #maximumRows OR #maximumRows <= 0)
ORDER BY Row
(Or a similar method using an IDENTITY column on the table variable).
Here I can just add a SELECT COUNT on the table variable to get the totalRows and put it into an output parameter.
I did some tests and with a fairly simple version of the query (no sortBy and no filter), method 1 seems to come up on top (almost twice as quick as the other 2). Then I decided to test probably I needed the complexity and I needed the SQL to be in stored procedures. With this I get method 1 taking nearly twice as long as the other 2 methods. Which seems strange.
Is there any good reason why I shouldn't spurn CTEs and stick with method 3?
UPDATE - 15 March 2012
I tried adapting Method 1 to dump the page from the CTE into a temporary table so that I could extract the TotalRows and then select just the relevant columns for the resultset. This seemed to add significantly to the time (more than I expected). I should add that I'm running this on a laptop with SQL Server Express 2008 (all that I have available) but still the comparison should be valid.
I looked again at the dynamic SQL method. It turns out I wasn't really doing it properly (just concatenating strings together). I set it up as in the documentation for sp_executesql (with a parameter description string and parameter list) and it's much more readable. Also this method runs fastest in my environment. Why that should be still baffles me, but I guess the answer is hinted at in Hogan's comment.
I would most likely split the #SortBy argument into two, #SortColumn and #SortDirection, and use them like this:
…
ROW_NUMBER() OVER (
ORDER BY CASE #SortColumn
WHEN 'Name' THEN Name
WHEN 'OtherName' THEN OtherName
…
END *
CASE #SortDirection
WHEN 'DESC' THEN -1
ELSE 1
END
) AS Row
…
And this is how the TotalRows column could be defined (in the main select):
…
COUNT(*) OVER () AS TotalRows
…
I would definitely want to do a combination of a temp table and NTILE for this sort of approach.
The temp table will allow you to do your complicated series of conditions just once. Because you're only storing the pieces you care about, it also means that when you start doing selects against it further in the procedure, it should have a smaller overall memory usage than if you ran the condition multiple times.
I like NTILE() for this better than ROW_NUMBER() because it's doing the work you're trying to accomplish for you, rather than having additional where conditions to worry about.
The example below is one based off a similar query I'm using as part of a research query; I have an ID I can use that I know will be unique in the results. Using an ID that was an identity column would also be appropriate here, though.
--DECLARES here would be stored procedure parameters
declare #pagesize int, #sortby varchar(25), #page int = 1;
--Create temp with all relevant columns; ID here could be an identity PK to help with paging query below
create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime);
--Insert into #temp based off of your complex conditions, but with no attempt at paging
insert into #temp
(id, status, lastname, startdate)
select id, status, lastname, startdate
from Table1 ...etc.
where ...complicated conditions
SET #pagesize = 50;
SET #page = 5;--OR CAST(#startRowIndex/#pagesize as int)+1
SET #sortby = 'name';
--Only use the id and count to use NTILE
;with paging(id, pagenum, totalrows) as
(
select id,
NTILE((SELECT COUNT(*) cnt FROM #temp)/#pagesize) OVER(ORDER BY CASE WHEN #sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END),
cnt
FROM #temp
cross apply (SELECT COUNT(*) cnt FROM #temp) total
)
--Use the id to join back to main select
SELECT *
FROM paging
JOIN #temp ON paging.id = #temp.id
WHERE paging.pagenum = #page
--Don't need the drop in the procedure, included here for rerunnability
drop table #temp;
I generally prefer temp tables over table variables in this scenario, largely so that there are definite statistics on the result set you have. (Search for temp table vs table variable and you'll find plenty of examples as to why)
Dynamic SQL would be most useful for handling the sorting method. Using my example, you could do the main query in dynamic SQL and only pull the sort method you want to pull into the OVER().
The example above also does the total in each row of the return set, which as you mentioned was not ideal. You could, instead, have a #totalrows output variable in your procedure and pull it as well as the result set. That would save you the CROSS APPLY that I'm doing above in the paging CTE.
I would create one procedure to stage, sort, and paginate (using NTILE()) a staging table; and a second procedure to retrieve by page. This way you don't have to run the entire main query for each page.
This example queries AdventureWorks.HumanResources.Employee:
--------------------------------------------------------------------------
create procedure dbo.EmployeesByMartialStatus
#MaritalStatus nchar(1)
, #sort varchar(20)
as
-- Init staging table
if exists(
select 1 from sys.objects o
inner join sys.schemas s on s.schema_id=o.schema_id
and s.name='Staging'
and o.name='EmployeesByMartialStatus'
where type='U'
)
drop table Staging.EmployeesByMartialStatus;
-- Populate staging table with sort value
with s as (
select *
, sr=ROW_NUMBER()over(order by case #sort
when 'NationalIDNumber' then NationalIDNumber
when 'ManagerID' then ManagerID
-- plus any other sort conditions
else EmployeeID end)
from AdventureWorks.HumanResources.Employee
where MaritalStatus=#MaritalStatus
)
select *
into #temp
from s;
-- And now pages
declare #RowCount int; select #rowCount=COUNT(*) from #temp;
declare #PageCount int=ceiling(#rowCount/20); --assuming 20 lines/page
select *
, Page=NTILE(#PageCount)over(order by sr)
into Staging.EmployeesByMartialStatus
from #temp;
go
--------------------------------------------------------------------------
-- procedure to retrieve selected pages
create procedure EmployeesByMartialStatus_GetPage
#page int
as
declare #MaxPage int;
select #MaxPage=MAX(Page) from Staging.EmployeesByMartialStatus;
set #page=case when #page not between 1 and #MaxPage then 1 else #page end;
select EmployeeID,NationalIDNumber,ContactID,LoginID,ManagerID
, Title,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours
, CurrentFlag,rowguid,ModifiedDate
from Staging.EmployeesByMartialStatus
where Page=#page
GO
--------------------------------------------------------------------------
-- Usage
-- Load staging
exec dbo.EmployeesByMartialStatus 'M','NationalIDNumber';
-- Get pages 1 through n
exec dbo.EmployeesByMartialStatus_GetPage 1;
exec dbo.EmployeesByMartialStatus_GetPage 2;
-- ...etc (this would actually be a foreach loop, but that detail is omitted for brevity)
GO
I use this method of using EXEC():
-- SP parameters:
-- #query: Your query as an input parameter
-- #maximumRows: As number of rows per page
-- #startPageIndex: As number of page to filter
-- #sortBy: As a field name or field names with supporting DESC keyword
DECLARE #query nvarchar(max) = 'SELECT * FROM sys.Objects',
#maximumRows int = 8,
#startPageIndex int = 3,
#sortBy as nvarchar(100) = 'name Desc'
SET #query = ';WITH CTE AS (' + #query + ')' +
'SELECT *, (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingPageNo' +
', pagingCountRow / ' + CAST(#maximumRows as nvarchar(10)) + ' As pagingCountPage ' +
', (dt.pagingRowNo - 1) % ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 As pagingRowInPage ' +
'FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ' + #sortBy + ') As pagingRowNo, COUNT(*) OVER () AS pagingCountRow ' +
'FROM CTE) dt ' +
'WHERE (dt.pagingRowNo - 1) / ' + CAST(#maximumRows as nvarchar(10)) + ' + 1 = ' + CAST(#startPageIndex as nvarchar(10))
EXEC(#query)
At result-set after query result columns:
Note:
I add some extra columns that you can remove them:
pagingRowNo : The row number
pagingCountRow : The total number of rows
pagingPageNo : The current page number
pagingCountPage : The total number of pages
pagingRowInPage : The row number that started with 1 in this page

Resources