Searching stored procedure and views but ignoring comments

Searching stored procedure and views but ignoring comments - sql-server

I have a query as
select
definition
from
sys.objects so
join
sys.sql_modules ssmsp on so.[object_id] = ssmsp.[object_id]
where
so.type in ('v', 'p')
where
definition like '%exec%'
While populating records, gets populated from comments also. How can I avoid getting filtered from comments?
Is there any solution?
Thanks

I think this is going to be nearly impossible to achieve in a single query.
Bear in mind that [definition] has no formatting, no line breaks, etc. the code is a single line (copy one and paste it into the editor).
If a comment starts with -- then where does it end? you have no way of knowing.
It is a little easier with /* because you can find the corresponding */ but there is still the added complication of multiple occurrences of the search string.
You might have a little more luck using PATINDEX and specifying a case-sensitive version of your collation (if you have a case insensitive database) and for example you know you only want occurrences of EXEC and not "execute" e.g. WHERE patindex('%EXEC%',defintion COLLATE SQL_Latin1_General_CP1_CS_AS) > 0

First for a fast varchar(max) string "splitter". Below is a hacked version of Jeff Moden's delimitedSplit8K.
IF OBJECT_ID('dbo.DelimitedSplit2B','IF') IS NOT NULL DROP FUNCTION dbo.DelimitedSplit2B;
GO
CREATE FUNCTION dbo.DelimitedSplit2B
(
#pString varchar(max),
#pDelimiter char(1)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT N
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(N)
), --216 values
cteTally(N) AS
(
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c
--2,176,782,336 rows: enough to handle 2,147,483,647 characters (the varchar(max) limit)
),
cteStart(N1) AS
(
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#pString,t.N,1) = #pDelimiter OR t.N = 0)
)
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(#pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1)
OVER (ORDER BY s.N1) - 1),0)-s.N1,DATALENGTH(ISNULL(#pString,1))))
FROM cteStart s;
Next for the function to search your DDL
create function dbo.SearchObjectDDLFor (#searchstring varchar(100), #maxLen int)
returns table as return
select objectName, lineNumber, lineText
from
(
select
objectName = ss.[name]+'.'+so.[name],
lineNumber = itemnumber,
lineText = substring(t.item, 1, isnull(nullif(charindex('--', t.item),0)-1, 8000)),
isLongComment =
sum
( -- this will assign a 1 for everything
case when t.item like '/*%' then 1
when t.item like '*/%'then -1
else 0 end
) over (partition by so.[name] order by itemnumber)
from sys.objects so
join sys.sql_modules ssmsp on so.[object_id] = ssmsp.[object_id]
join sys.schemas ss on so.schema_id = ss.schema_id
cross apply dbo.delimitedSplit2B(definition, char(10))
cross apply (values (rtrim(ltrim(replace(item,char(13),''))))) t(item)
where so.type in ('v', 'p')
and len(definition) < isnull(#maxLen,100000) -- character limit is #maxLen (100K default)
) splitLines
where isLongComment = 0 and lineText not like '--%' and lineText <> '*/'
and lineText like '%'+#searchstring+'%';
This function:
Accepts an input string to search for (#searchstring)
Splits your objects into lines
Returns only the portions of the line not part of a comment
filters the lines created in step3 for ones that contain #searchstring and returns the ObjectName (.), Line number and Text.
Caveats:
I just threw this together quick so forgive any errors
A t-sql splitter that accepts [n]varchar(max) will be slow. A CLR splitter would likely be faster but we're not talking about millions of rows. That said, you can speed it up by filtering the number of lines with #maxLen. #maxlen says "ignore and objects with more that #maxLen number of lines." When null it will search objects up to 100K lines long (but this can be adjusted).
This function address comments scenarios where comments look have "--" any where in the string: and scenarios where the comment is nested between "/" and "\ on separate lines.
a few scenarios which require more coding to suppress the comments include:
.
select col1, /* skipping col2 for now */ col3, col4
and
/*********
comments here
*********/
Examples:
select * from dbo.SearchObjectDDLFor('nocount', NULL);
select * from dbo.SearchObjectDDLFor('nocount', 2000);
Results will look something like :

Related

Identify how many times a string exists on a stored procedure in SQL Server

I have to search the occurrence of a particular string inside all available stored procedures in SQL Server. I know that we can get this by using the below query.
SELECT OBJECT_NAME(OBJECT_ID) PrcName
FROM sys.sql_modules
WHERE DEFINITION LIKE '%SearchStr%'
But is there a way we can find out how many times the particular string is available in each stored procedure? This is for estimating the effort modifying the stored procedures.
Any help will be much appreciated.

This will work as tested:
;WITH cte as
(
SELECT OBJECT_NAME(OBJECT_ID) PrcName, OBJECT_ID
FROM sys.sql_modules
WHERE DEFINITION LIKE '%tblNDT%')
select t1.PrcName, (LEN(Definition) - LEN(replace(Definition,'tblNDT',''))) / LEN('tblNDT') Cnt
from cte t1
INNER JOIN sys.sql_modules t2 on t1.object_id = t2.object_id

An easy way of checking how many times something occurs is to take the initial length, replace your string with blanks, recheck the length, and divide by the length of your string:
DECLARE #sentence VARCHAR(100)
DECLARE #word VARCHAR(100)
SET #word = 'Cool'
SET #sentence = 'This cool sentence is really cool. Cool!'
DECLARE #wordlen INT = (SELECT LEN(#word))
--Original sentence and length
SELECT #sentence AS setencelen
SELECT LEN(#sentence) AS origsentence
--With word removed
SELECT REPLACE(#sentence, 'cool', '') AS shortenedsentence
SELECT LEN(REPLACE(#sentence, 'cool', '')) AS shortenedlen
SELECT LEN(#sentence) - LEN(REPLACE(#sentence, 'cool', '')) AS diffinlength
SELECT (LEN(#sentence) - LEN(REPLACE(#sentence, 'cool', ''))) / #wordlen AS occurrences
I have seen this work in some cases and not in others. If you have a bunch of comments that contain the same string, it will count incorrectly.

I have found a solution for this.
DECLARE #cnt AS INT= 1
DECLARE #SearchStr VARCHAR(MAX) = 'SearchText'
;WITH CTE_SearchStr1
AS
(
SELECT #cnt Cnt, #SearchStr SearchStr UNION ALL
SELECT Cnt + 1, #SearchStr+'%'+SearchStr FROM CTE_SearchStr1
),
CTE_SearchStr2
AS
(
SELECT TOP 100 * FROM CTE_SearchStr1
)
SELECT OBJECT_NAME(OBJECT_ID) ObjectName, MAX(cnt) cnt FROM sys.sql_modules a INNER JOIN CTE_SearchStr2 b ON
a.definition LIKE '%'+b.SearchStr+'%'
GROUP BY OBJECT_NAME(OBJECT_ID) ORDER BY 2 DESC
Only problem with the above query is that I can not search for more that 100 times. It will throw the below exception
Msg 530, Level 16, State 1, Line 3 The statement terminated. The
maximum recursion 100 has been exhausted before statement completion.
In my scenario, the number of occurrences are less than 100, but is there a way to overcome this error?

TSQL/SQL Server - table function to parse/split delimited string to multiple/separate columns

So, my first post is less a question and more a statement! Sorry.
I needed to convert delimited strings stored in VarChar table columns to multiple/separate columns for the same record. (It's COTS software; so please don't bother telling me how the table is designed wrong.) After searching the internet ad nauseum for how to create a generic single line call to do that - and finding lots of how not to do that - I created my own. (The name is not real creative.)
Returns: A table with sequentially numbered/named columns starting with [Col1]. If an input value is not provided, then an empty string is returned. If less than 32 values are provided, all past the last value are returned as null. If more than 32 values are provided, they are ignored.
Prerequisites: A Number/Tally Table (luckily, our database already contained 'dbo.numbers').
Assumptions: Not more than 32 delimited values. (If you need more, change "WHERE tNumbers.Number BETWEEN 1 AND XXX", and add more prenamed columns ",[Col33]...,[ColXXX]".)
Issues: The very first column always gets populated, even if #InputString is NULL.
--======================================================================
--SMOZISEK 2017/09 CREATED
--======================================================================
CREATE FUNCTION dbo.fStringToPivotTable
(#InputString VARCHAR(8000)
,#Delimiter VARCHAR(30) = ','
)
RETURNS TABLE AS RETURN
WITH cteElements AS (
SELECT ElementNumber = ROW_NUMBER() OVER(PARTITION BY #InputString ORDER BY (SELECT 0))
,ElementValue = NodeList.NodeElement.value('.','VARCHAR(1022)')
FROM (SELECT TRY_CONVERT(XML,CONCAT('<X>',REPLACE(#InputString,#Delimiter,'</X><X>'),'</X>')) AS InputXML) AS InputTable
CROSS APPLY InputTable.InputXML.nodes('/X') AS NodeList(NodeElement)
)
SELECT PivotTable.*
FROM (
SELECT ColumnName = CONCAT('Col',tNumbers.Number)
,ColumnValue = tElements.ElementValue
FROM DBO.NUMBERS AS tNumbers --DEPENDENT ON ANY EXISTING NUMBER/TALLY TABLE!!!
LEFT JOIN cteElements AS tElements
ON tNumbers.Number = tElements.ElementNumber
WHERE tNumbers.Number BETWEEN 1 AND 32
) AS XmlSource
PIVOT (
MAX(ColumnValue)
FOR ColumnName
IN ([Col1] ,[Col2] ,[Col3] ,[Col4] ,[Col5] ,[Col6] ,[Col7] ,[Col8]
,[Col9] ,[Col10],[Col11],[Col12],[Col13],[Col14],[Col15],[Col16]
,[Col17],[Col18],[Col19],[Col20],[Col21],[Col22],[Col23],[Col24]
,[Col25],[Col26],[Col27],[Col28],[Col29],[Col30],[Col31],[Col32]
)
) AS PivotTable
;
GO
Test:
SELECT *
FROM dbo.fStringToPivotTable ('|Height|Weight||Length|Width||Color|Shade||Up|Down||Top|Bottom||Red|Blue|','|') ;
Usage:
SELECT 1 AS ID,'Title^FirstName^MiddleName^LastName^Suffix' AS Name
INTO #TempTable
UNION SELECT 2,'Mr.^Scott^A.^Mozisek^Sr.'
UNION SELECT 3,'Ms.^Jane^Q.^Doe^'
UNION SELECT 5,NULL
UNION SELECT 7,'^Betsy^^Ross^'
;
SELECT SourceTable.*
,ChildTable.Col1 AS ColTitle
,ChildTable.Col2 AS ColFirst
,ChildTable.Col3 AS ColMiddle
,ChildTable.Col4 AS ColLast
,ChildTable.Col5 AS ColSuffix
FROM #TempTable AS SourceTable
OUTER APPLY dbo.fStringToPivotTable(SourceTable.Name,'^') AS ChildTable
;
No, I have not tested any plan (I just needed it to work).
Oh, yeah: SQL Server 2012 (12.0 SP2)
Comments? Corrections? Enhancements?

Here is my TVF. Easy to expand up to the 32 (the pattern is pretty clear).
This is a straight XML without the cost of the PIVOT.
Example - Notice the OUTER APPLY --- Use CROSS APPLY to Exclude NULLs
Select A.ID
,B.*
From #TempTable A
Outer Apply [dbo].[tvf-Str-Parse-Row](A.Name,'^') B
Returns
The UDF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
Where #String is not null
)
--Thanks Shnugo for making this XML safe
--Select * from [dbo].[tvf-Str-Parse-Row]('Dog,Cat,House,Car',',')
--Select * from [dbo].[tvf-Str-Parse-Row]('John <test> Cappelletti',' ')

With SQL Server, How can I query a table based on a delimited string as the criteria?

I have the following tables:
tbl_File:
FileID | Filename
-----------------
1 | test.jpg
and
tbl_Tag:
TagID | TagName
---------------
1 | Red
and
tbl_TagFile:
ID | TagID | FileID
-------------------
1 | 1 | 1
I need to pass a non-inclusive query against these tables. For example, imagine a list of checkboxes to select one or more tags, and then a search button. I need to pass the TagID's to the query as a PIPE delimited string, such as "1|2|5|"
The search results need to be non-inclusive, such as if it must meet all the criteria. If 3 tags are selected, the results are to be files that have all 3 tags associated with them.
I think I've made this too complicated, but tried iterating over the tags using charindex and stuff to work my way through the string, but it seems there must be an easier way.
I'd like to do this as a function... Such as
SELECT FileID, Filename
FROM tbl_Files
WHERE dbo.udf_FileExistswithTags(#Tags, FileID) = 1
Any efficient way to do this?

It doesn't sound from your example scenario that the actual "need" is to pass a pipe-delimited string. I would highly suggest abandoning that idea and using a Table Value Parameter in your stored procedure. This has numerous advantages in that you will not hit a datatype limit or a "number of parameters" limit that might occur with very large sets of criteria. Additionally it gets away from any need to run a (potentially very slow) UDF.
Split the string into tokens on the application side, and then insert each token as a row in the TVP. Example below:
Create the TVP type in your database:
CREATE TYPE [dbo].[FileNameType] AS TABLE
(
fileName varchar(1000)
)
On the application side, build your list of filename tokens into a recordset:
private static List<SqlDataRecord> BuildFileNameTokenRecords(IEnumerable<string> tokens)
{
var records = new List<SqlDataRecord>();
foreach (string token in tokens){
var record = new SqlDataRecord(
new SqlMetaData[]
{
new SqlMetaData("fileName", SqlDbType.Varchar),
}
);
records.Add(record);
}
return records;
}
Wherever you run your proc from (rough code here):
var records = BuildFileNameTokenRecords(listofstrings);
var sqlCmd = sqlDb.GetStoredProcCommand("FileExists");
sqlDb.AddInParameter(sqlCmd, "tvpFilenameTokens", SqlDbType.Structured, records);
ExecuteNonQuery(sqlCmd);
Filtering your select statement then simply becomes a matter of joining on the tokens in the table parameter. Something like this:
CREATE PROCEDURE dbo.FileExists
(
-- Put additional parameters here
#tvpFilenameTokens dbo.FileNameType READONLY,
)
AS
BEGIN
SELECT FileID, Filename
FROM tbl_Files INNER JOIN #tvpFilenameTokens
ON tbl_Files.FileID = #tvpFilenameTokens.fileName
END

Here is an option that should scale. All of the functionality is available back to SQL Server 2005. It uses a CTE to separate the portion of the query that finds only the FileIDs that have all of the TagIDs passed in, and then that list of FileIDs is joined to the [File] table to get the details. It also uses an INNER JOIN instead of an IN list to match the TagID's.
Please note that the example below uses a SQLCLR splitter that is freely available in the SQL# library (which I wrote, but this function is in the Free version). The specific splitter used is not the important part; it should just be one that is either SQLCLR, an inline tally-table (like the one used in #wewesthemenace's answer), or is the XML method. Just don't use a splitter based on a WHILE-loop or a recursive CTE.
---- TEST SETUP
DECLARE #File TABLE
(
FileID INT NOT NULL PRIMARY KEY,
[Filename] NVARCHAR(200) NOT NULL
);
DECLARE #TagFile TABLE
(
TagID INT NOT NULL,
FileID INT NOT NULL,
PRIMARY KEY (TagID, FileID)
);
INSERT INTO #File VALUES (1, 'File1.txt');
INSERT INTO #File VALUES (2, 'File2.txt');
INSERT INTO #File VALUES (3, 'File3.txt');
INSERT INTO #TagFile VALUES (1, 1);
INSERT INTO #TagFile VALUES (2, 1);
INSERT INTO #TagFile VALUES (5, 1);
INSERT INTO #TagFile VALUES (1, 2);
INSERT INTO #TagFile VALUES (2, 2);
INSERT INTO #TagFile VALUES (4, 2);
INSERT INTO #TagFile VALUES (1, 3);
INSERT INTO #TagFile VALUES (2, 3);
INSERT INTO #TagFile VALUES (5, 3);
INSERT INTO #TagFile VALUES (6, 3);
---- DONE WITH TEST SETUP
DECLARE #TagsToGet VARCHAR(100); -- this would be the proc input parameter
SET #TagsToGet = '1|2|5';
CREATE TABLE #Tags (TagID INT NOT NULL PRIMARY KEY);
DECLARE #NumTags INT;
INSERT INTO #Tags (TagID)
SELECT split.SplitVal
FROM SQL#.String_Split4k(#TagsToGet, '|', 1) split;
SET #NumTags = ##ROWCOUNT;
;WITH files AS
(
SELECT tf.FileID
FROM #TagFile tf
INNER JOIN #Tags tg
ON tg.TagID = tf.TagID
GROUP BY tf.FileID
HAVING COUNT(*) = #NumTags
)
SELECT fl.*
FROM #File fl
INNER JOIN files
ON files.FileID = fl.FileID
ORDER BY fl.[Filename] ASC;
DROP TABLE #Tags; -- don't need this if code above is placed in a proc
Results:
FileID Filename
1 File1.txt
3 File3.txt
Notes
As much as I love TVPs (and I do, when they are done correctly and used appropriately), I would say that they are a bit much for this type of small scale, single dimensional array scenario. There won't really be any performance gain over using a SQLCLR streaming TVF string splitter but it would require more app code and the additional User-Defined Table Type, which can't be updated without first dropping all procs that reference it. That doesn't happen all of the time, but needs to be considered in terms of long-term maintenance costs.
The JOIN between TagFile and the temporary table populated from the split operation should be much more efficient than using an IN list with a subquery for the split operation. An IN list is short-hand for all of the values in it to be their own OR conditions. Hence the JOIN is a fully set-based approach that lets the Query Optimizer do its thang.
The structure I used for the test #TagFile table only has the two relevant IDs in it: TagID and FileID. It does not have the ID field that I assume is an IDENTITY field on this table. Unless there is a very specific reason for needing that IDENTITY field, I would suggest removing it. It adds to inherent benefit as the combination of TagID and FileID is a natural key (i.e. it is both NOT NULL and Unique). And if the Clustered PK of this table were simply those two fields, the JOIN to the temp table of those split-out TagIDs would be quite fast, even with millions of rows in TagFile.
One reason that this approach works so much better than trying to handle this via a function per FileID (outside of the obvious set-based is better than cursor-based reason) is that the list of TagIDs is the same for all files to be checked. So splitting that out more than one time is a waste of effort.
By not splitting the TagID list inline in the query I am able to capture the number of elements in that list with no additional effort. Hence this saves from needing to do a secondary calculation.

Here is a function called DelimitedSplit8K by Jeff Moden. This is used to split strings of length up to 8000. For more info, read this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString VARCHAR(8000), --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
#pDelimiter CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (--10E+1 or 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
),
cteLen(N1, L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
Your query would now be:
DECLARE #pString VARCHAR(8000) = '1|3|5'
SELECT
f.*
FROM tbl_File f
INNER JOIN tbl_TagFile tf ON tf.FileID = f.FileID
WHERE
tf.TagID IN(SELECT CAST(item AS INT) FROM dbo.DelimitedSplit8K(#pString, '|'))
GROUP BY f.FileID, f.FileName
HAVING COUNT(tf.ID) = (LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
The statement below counts the number of TagID in the parameter by counting the occurrence of the delimiter | + 1.
(LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)

Here is an option that does not require UDF's.
It can be argued that this is also complicated.
DECLARE #TagList VARCHAR(50)
-- pass in this
SET #TagList = '1|3|6'
SELECT
FinalSet.FileID,
FinalSet.Tag,
FinalSet.TotalMatches
FROM
(
SELECT
tbl_TagFile.FileID,
tbl_TagFile.Tag,
COUNT(*) OVER(PARTITION BY tbl_TagFile.FileID) TotalMatches
FROM
(
SELECT 1 FileID, '1' Tag UNION ALL
SELECT 1 , '2' UNION ALL
SELECT 1 , '3' UNION ALL
SELECT 1 , '6' UNION ALL
SELECT 2 , '1' UNION ALL
SELECT 2 , '3'
) tbl_TagFile
INNER JOIN
(
SELECT tbl_Tag.Tag
FROM
(
SELECT '1' Tag UNION ALL
SELECT '2' UNION ALL
SELECT '3' UNION ALL
SELECT '4' UNION ALL
SELECT '5' UNION ALL
SELECT '6'
) tbl_Tag
WHERE '|' + #TagList + '|' LIKE '%|' + Tag + '|%'
) LimitedTagTable
ON LimitedTagTable.Tag = tbl_TagFile.Tag
) FinalSet
WHERE
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
There's some complications in this around data types and indexes and stuff but you can see the concept - you are only getting the records that match your passed in string.
subtable LimitedTagTable is your tag list filtered by your input pipe delimited string
subtable FinalSet joins your limited tag list to your list of files
column TotalMatches works out how many tag matches your file had
Finally this line limits the output to those files that had enough matches:
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
Please experiment with different inputs and datasets and see if it suits as I have made a number of assumptions.

I'm answering my own question, in hopes that someone can let me know if/how flawed it is. So far it seems to be working but just early testing.
Function:
ALTER FUNCTION [dbo].[udf_FileExistsByTags]
(
#FileID int
,#Tags nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE #Exists bit = 0
DECLARE #Count int = 0
DECLARE #TagTable TABLE ( FileID int, TagID int )
DECLARE #Tag int
WHILE len(#Tags) > 0
BEGIN
SET #Tag = CAST(LEFT(#Tags, charindex('|', #Tags + '|') -1) as int)
SET #Count = #Count + 1
IF EXISTS (SELECT * FROM tbl_FileTag WHERE FileID = #FileID AND TagID = #Tag )
BEGIN
INSERT INTO #TagTable ( FileID, TagID ) VALUES ( #FileID, #Tag )
END
SET #Tags = STUFF(#Tags, 1, charindex('|', #Tags + '|'), '')
END
SET #Exists = CASE WHEN #Count = (SELECT COUNT(*) FROM #TagTable) THEN 1 ELSE 0 END
RETURN #Exists
END
Then in the query:
SELECT * FROM tbl_File a WHERE dbo.udf_FileExistsByTags(a.FileID, #Tags) = 1
So now I'm looking for errors.
What do you think? Probably not every efficient, however this search will be used only on a periodic basis.

get character only string from another string in sql server

I am looking for solution to get a character based string extracted from another string.
I need only first 4 "characters only" from another string.
The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.
For example - I should get
"NAGP" if source string is "Nagpur District"
"ILLF" if source string is "Ill Fated"
"RAJU" if source string is "RA123 *JU23"
"MAC" if source string is "MAC"
Any help is greatly appreciated.
Thanks for sharing your time and wisdom.

You can use the answer in the question and add substring method to get your value of desired length
How to strip all non-alphabetic characters from string in SQL Server?
i.e.
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
use it like
Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);
Here SUBSTRING is used to get string of length 4 from the returned value.

^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?
You can try this.Grab the captures or groups.See demo.
http://regex101.com/r/rQ6mK9/42

A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:
CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (#String NVARCHAR(1000), #Length INT)
RETURNS TABLE
AS
RETURN
( WITH E1 (N) AS
( SELECT 1
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
),
E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2),
N (Number) AS (SELECT TOP (LEN(#String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
SELECT Result = ( SELECT TOP (ISNULL(#Length, 1000)) SUBSTRING(#String, n.Number, 1)
FROM N
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
ORDER BY Number
FOR XML PATH('')
)
);
All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:
Letter
------
R
A
1
2
3
*
J
U
2
3
T
The rows that are not alphanumeric are then removed by the where clause:
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
Leaving
Letter
------
R
A
J
U
T
The #Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.
Running some tests on this with a sample table of 1,000,000 rows:
CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
CROSS JOIN sys.all_objects a
CROSS JOIN sys.all_objects B
ORDER BY a.object_id;
Then comparing the scalar and the inline udfs (called as follows):
SELECT COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM T;
SELECT COUNT(tvf.Result)
FROM T
CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;
Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

SQL Server: replace sequence of same characters inside Text Field (TSQL only)

I have a text column varchar(4000) with text:
'aaabbaaacbaaaccc'
and I need to remove all duplicated chars - so only one from sequence left:
'abacbac'
It should not be a function, Procedure or CLR - Regex solution. Only true SQL select.
Currently I think about using recursive WITH clause with replace 'aa'->'a', 'bb'->'b', 'cc'->'c'.
So recursion should cycle until all duplicated sequences of that chars would be replaced.
Do you have another solution, perhaps more performant one?
PS: I searched through this site about different replace examples - they didn't suit to this case.

Assuming a table definition of
CREATE TABLE myTable(rowID INT IDENTITY(1,1), dupedchars NVARCHAR(4000))
and data..
INSERT INTO myTable
SELECT 'aaabbaaacbaaaccc'
UNION
SELECT 'abcdeeeeeffgghhaaabbbjdduuueueu999whwhwwwwwww'
this query meets your criteria
WITH Numbers(n)
AS
( SELECT 1 AS n
UNION ALL
SELECT (n + 1) AS n
FROM Numbers
WHERE n < 4000
)
SELECT rowid,
( SELECT CASE
WHEN SUBSTRING(dupedchars,n2.n,1) = SUBSTRING(dupedchars+' ',n2.n+1,1) THEN ''
ELSE SUBSTRING(dupedchars,n2.n,1)
END AS [text()]
FROM myTable t2,numbers n2
WHERE n2.n <= LEN(dupedchars)
AND t.rowid = t2.rowid
FOR XML path('')
) AS deduped
FROM myTable t
OPTION(MAXRECURSION 4000)
Output
rowid deduped
1 abacbac
2 abcdefghabjdueueu9whwhw

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Searching stored procedure and views but ignoring comments - sql-server

Related

Identify how many times a string exists on a stored procedure in SQL Server

TSQL/SQL Server - table function to parse/split delimited string to multiple/separate columns

With SQL Server, How can I query a table based on a delimited string as the criteria?

get character only string from another string in sql server

SQL Server: replace sequence of same characters inside Text Field (TSQL only)

Categories

Resources