I want to print the position of commas in a given comma separated string, but i'm just getting only zeros.
Here is the code I wrote:
declare #begin int=0
declare #temp int=1
declare #count int=0
declare #Name nvarchar(MAX)='siva,lahsh,dsjhdsd,hjdhjds,ddjhds,yrehrf'
declare #max nvarchar(20)
set #max=len(#Name)-len(replace(#Name,',',''))
create table #table(delimiter int)
while #count>=#max
begin
set #temp=CHARINDEX(',',#Name,#begin)
set #begin=#temp+1
insert into #table(delimiter) values(#temp)
set #count+=1
end
select delimiter from #table
Any help?
Well, your logic is all wrong in several places... here's a fixed version that works:
declare #begin int=0
declare #temp int=1
declare #count int=0
declare #Name nvarchar(MAX)='siva,lahsh,dsjhdsd,hjdhjds,ddjhds,yrehrf'
declare #max nvarchar(20)
set #max=len(#Name)
IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table
create table #table(delimiter int)
while CHARINDEX(',',#Name,#begin) > 0
begin
set #temp=CHARINDEX(',',#Name,#begin)
set #begin=#temp+1
insert into #table(delimiter) values(#temp)
set #count+=1
end
select delimiter from #table
Basically, your loop control was completely off, as was your initialization of #max. And you don't even really need "max", but I just made the tweaks to your code so you can see what changed. I'll leave it as an exercise to optimize it further.
Of course, I'm not sure why you want to do this... nothing about this seems like a reasonable solution to any reasonable problem I can think of. Maybe you could provide more detail about what it is you're actually trying to do...
using a CSV Splitter table valued function by Jeff Moden and sum() over() to sum the length of the parsed value +1 to report the delimiter position:
declare #Name nvarchar(MAX)='siva,lahsh,dsjhdsd,hjdhjds,ddjhds,yrehrf';
select s.*
, Delimiter = sum(len(Item)+1) over (order by ItemNumber)
from dbo.delimitedsplitN4K(#Name,',') s
rextester demo: http://rextester.com/BXD20065
returns:
+------------+---------+-----------+
| ItemNumber | Item | Delimiter |
+------------+---------+-----------+
| 1 | siva | 5 |
| 2 | lahsh | 11 |
| 3 | dsjhdsd | 19 |
| 4 | hjdhjds | 27 |
| 5 | ddjhds | 34 |
| 6 | yrehrf | 41 | <-- 41 is not a comma, but it is the end of the string+1
+------------+---------+-----------+
splitting strings reference:
Tally OH! An Improved SQL 8K “CSV Splitter” Function - Jeff Moden
Splitting Strings : A Follow-Up - Aaron Bertrand
Split strings the right way – or the next best way - Aaron Bertrand
string_split() in SQL Server 2016 : Follow-Up #1 - Aaron Bertrand
create function dbo.DelimitedSplitN4K (
#pString nvarchar(4000)
, #pDelimiter nchar(1)
)
returns table with schemabinding as
return
with e1(n) as (
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1
)
, e2(n) as (select 1 from e1 a, e1 b)
, e4(n) as (select 1 from e2 a, e2 b)
, cteTally(n) as (select top (isnull(datalength(#pString)/2,0))
row_number() over (order by (select null)) from e4)
, cteStart(n1) as (select 1 union all
select t.n+1 from cteTally t where substring(#pString,t.n,1) = #pDelimiter)
, cteLen(n1,l1) as(select s.n1
, isnull(nullif(charindex(#pDelimiter,#pString,s.n1),0)-s.n1,4000)
from cteStart s
)
select ItemNumber = row_number() over(order by l.n1)
, Item = substring(#pString, l.n1, l.l1)
from cteLen l;
go
A few things to consider...
First - if you don't need #name to be nvarchar(max) you should change it to nvarchar(4000). nvarchar(max) will kill performance (Jeff actually includes this in the comment portion of his function). If you DO have strings longer than 4000 - then the solution SqlZim posted will be wrong.
Though I love Jeff Moden's splitter, it is not the right tool for this job. SqlZim's solution will be very inefficient and, as he pointed out, returns one bad row (a problem with no inexpensive way to resolve.) Solving this problem using a loop & temp table is even worse.
I have a function that is designed for EXACTLY this type of thing. I just created an nvarchar(4000) version tonight for this problem. You can read more about it here. Nasty Fast N-Grams (Part 1): Character-Level Unigrams
Here's the nvarchar(4000) version I just finished developing:
CREATE FUNCTION dbo.NGramsN4K
(
#string nvarchar(4000), -- Input string
#N int -- requested token size
)
/****************************************************************************************
Purpose:
A character-level N-Grams function that outputs a contiguous stream of #N-sized tokens
based on an input string (#string). Accepts strings up to 4000 nvarchar characters long.
For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.
Compatibility:
SQL Server 2008+, Azure SQL Database
Syntax:
--===== Autonomous
SELECT position, token FROM dbo.NGramsN4K(#string,#N);
--===== Against a table using APPLY
SELECT s.SomeID, ng.position, ng.token
FROM dbo.SomeTable s
CROSS APPLY dbo.NGramsN4K(s.SomeValue,#N) ng;
Parameters:
#string = The input string to split into tokens.
#N = The size of each token returned.
Returns:
Position = bigint; the position of the token in the input string
token = nvarchar(4000); a #N-sized character-level N-Gram token
Developer Notes:
1. NGramsN4K is not case sensitive
2. Many functions that use NGramsN4K will see a huge performance gain when the optimizer
creates a parallel execution plan. One way to get a parallel query plan (if the
optimizer does not chose one) is to use make_parallel by Adam Machanic which can be
found here:
sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
3. When #N is less than 1 or greater than the datalength of the input string then no
tokens (rows) are returned. If either #string or #N are NULL no rows are returned.
This is a debatable topic but the thinking behind this decision is that: because you
can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
can't turn anything into NULL-grams, no rows should be returned.
For people who would prefer that a NULL input forces the function to return a single
NULL output you could add this code to the end of the function:
UNION ALL
SELECT 1, NULL
WHERE NOT(#N > 0 AND #N <= DATALENGTH(#string)) OR (#N IS NULL OR #string IS NULL);
4. NGramsN4K is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== Turn the string, 'abcd' into unigrams, bigrams and trigrams
SELECT position, token FROM dbo.NGramsN4K('abcd',1); -- unigrams (#N=1)
SELECT position, token FROM dbo.NGramsN4K('abcd',2); -- bigrams (#N=2)
SELECT position, token FROM dbo.NGramsN4K('abcd',3); -- trigrams (#N=3)
--===== How many times the substring "AB" appears in each record
DECLARE #table TABLE(stringID int identity primary key, string nvarchar(100));
INSERT #table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');
SELECT string, occurances = COUNT(*)
FROM #table t
CROSS APPLY dbo.NGramsN4K(t.string,2) ng
WHERE ng.token = 'AB'
GROUP BY string;
----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20170324 - Initial Development - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N) AS
(
SELECT 1 FROM (VALUES -- 64 dummy values to CROSS join for 4096 rows
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($)) t(N)
),
iTally(N) AS
(
SELECT
TOP (ABS(CONVERT(BIGINT,((DATALENGTH(ISNULL(#string,''))/2)-(ISNULL(#N,1)-1)),0)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -- Order by a constant to avoid a sort
FROM L1 a CROSS JOIN L1 b -- cartesian product for 4096 rows (16^2)
)
SELECT
position = N, -- position of the token in the string(s)
token = SUBSTRING(#string,CAST(N AS int),#N) -- the #N-Sized token
FROM iTally
WHERE #N > 0 AND #N <= (DATALENGTH(#string)/2); -- Protection against bad parameter values
GO
... and now for a quick, 1-row test (with actual execution plan turned on):
SET NOCOUNT ON;
declare #Name nvarchar(MAX)='siva,lahsh,dsjhdsd,hjdhjds,ddjhds,yrehrf';
-- comparing IO:
SET STATISTICS IO ON;
--NGramsN4K
PRINT 'NGramsN4K:'+char(13)+char(10)+replicate('-',50)+char(13)+char(10)+char(13)+char(10);
SELECT position
FROM dbo.NGramsN4K(#name,1)
WHERE token = ',';
--delimitedsplitN4K
PRINT 'delimitedsplitN4K-based solution'+char(13)+char(10)+replicate('-',50);
select Delimiter = sum(len(Item)+1) over (order by ItemNumber)
from dbo.delimitedsplitN4K(#Name,',') s;
SET STATISTICS IO OFF;
The results:
position
--------------------
5
11
19
27
34
Delimiter
-----------
5
11
19
27
34
41
The plans:
... and the IO stats:
NGramsN4K:
--------------------------------------------------
delimitedsplitN4K-based solution
--------------------------------------------------
Table 'Worktable'. Scan count 7, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
37 reads for one row is very bad.
Related
I am working with a Prefix Search engine, and I am trying to generate suffix keywords for my part numbers.
**Example String: 123456-7890-A-BCDEF-GHIJ-KL
I am looking to split this string into chunks, like this:
123456
7890
A
BCDEF
GHIJ
KL
Then I need to generate the suffixes of each chunk that is more that 3 chars in length, into one comma delimited list.
--For chunk 123456, I would get the suffixes 23456, 3456, 456, 56
--For chunk 7890, I would get the suffixes 890, 90
--For chunk A, it would be ignored as it is less than 3 chars in Length
--For chunk BCDEF, I would get the suffixes CDEF, DEF, EF
--For chunk GHIJ, I would get the suffixes HIJ, IJ
--For chunk KL, it would be ignored as it is less than 3 chars in Length
My string could have any amount of chars in each chunk, they are not always formatted like the example.
So the final result for string 123456-7890-A-BCDEF-GHIJ would look like this;
23456, 3456, 456, 56, 890, 90, CDEF, DEF, EF, HIJ, IJ
My string could have any amount of chars in each chunk, they are not always formatted like the example.
Some other example strings;
123-4567890-ABC-DEFGHIJ-K-L
--Result: 23, 567890, 67890, 7890, 890, 90, BC, EFGHIJ, FGHIJ, GHIJ, HIJ, IJ
123456-7-890AB-CDEFG-H-IJKL
--Result: 23456, 3456, 456, 56, 90AB, 0AB, AB, DEFG, EFG, FG, JKL, KL
I acknowledge this is not quite what you are looking for, but it is close. Perhaps this will give you or someone else an idea to get exactly what you want. This assumes SQL Server 2017 or higher.
So I am using STRING_SPLIT() to divide the string into one row for each chunk numbering those rows using ROW_NUMBER(). There is a problem with this in that STRING_SPLIT() does not guarantee order.
I then use a CTE to basically come up with index values to step through the each chunk from 2 to chunk length - 1 putting those strings back into a comma-separated list for each chunk using STRING_AGG().
I insert those results to a temp table so that I can select them in order by row number and assemble the suffixes from each chunk into the final comma-separated list.
DECLARE #MyString VARCHAR(50);
SET #MyString = '123456-7890-A-BCDEF-GHIJ';
WITH cte
AS (SELECT
2 AS n -- anchor member
, value
, ROW_NUMBER() OVER (ORDER BY value) AS [rn]
FROM STRING_SPLIT(#MyString, '-')
WHERE LEN(value) >= 3
UNION ALL
SELECT
n + 1
, cte.value -- recursive member
, cte.rn
FROM cte
WHERE n < (LEN(value) - 1) -- terminator
)
SELECT STRING_AGG(SUBSTRING(value, n, LEN(value) - n + 1), ', ') as [Chunk]
INTO #Temp
FROM cte
GROUP BY rn
ORDER BY rn;
SELECT STRING_AGG(Chunk, ', ')
FROM #Temp
Here is the dbfiddle.
There is probably a more elegant way to do this and likely better string manipulators than t-sql, but this seems to work: (requires DB to be compatibility mode = SQL 2016 or above to use STRING_SPLIT)
DECLARE #STRING VARCHAR (100)
DECLARE #DIVIDEON CHAR (1)
DECLARE #FULLSTRING VARCHAR (100)
DECLARE #SUBSTR VARCHAR (100)
DECLARE #WRKSTRING VARCHAR (100)
DECLARE #LEN TINYINT
SELECT #STRING = '123456-7-890AB-CDEFG-H-IJKL'
SELECT #DIVIDEON = '-'
SELECT #FULLSTRING =''
IF OBJECT_ID('tempdb..#mytable') IS NOT NULL
DROP TABLE #mytable
select RIGHT(value,LEN(value)-1) as MyString
into #mytable
from STRING_SPLIT (#STRING, #DIVIDEON) -- only SQL 2016 and above
where len(value)>2 -- ignore subsets that do not have 3 or more chars
while exists (select MyString from #mytable)
begin
select #WRKSTRING = (select top 1 MyString from #mytable)
select #LEN = len(#WRKSTRING)
select #SUBSTR=#WRKSTRING
while #LEN>2
begin
select #SUBSTR=#SUBSTR+', '+RIGHT(#WRKSTRING,#LEN-1)
select #LEN=#LEN-1
end
delete from #mytable where MyString = #WRKSTRING
--select #SUBSTR as Fullstr
select #FULLSTRING=#FULLSTRING+#SUBSTR+','
end
select LEFT(#FULLSTRING,LEN(#FULLSTRING)-1)
drop table #mytable
--not performant...maybe good enough(?)
declare #t table
(
id int identity primary key clustered,
thecol varchar(40)
);
insert into #t(thecol)
values('1-2-3-4-5-6'), ('abcd') /*??*/;
insert into #t(thecol)
select top (10000) newid()
from master.dbo.spt_values as a
cross join master.dbo.spt_values as b;
select *,
thelist= replace(
cast('<!--'+replace(thecol, '-', '--><!--')+'-->' as xml).query('
let $seq := (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) (: note: max 22 chars per string-part:)
for $a in comment()
let $str := string($a), $maxlen := string-length($str)-1
for $i in $seq[. <= $maxlen]
return concat(substring($str, $i, 100), "," )
').value('.', 'varchar(max)')+'_', ',_', '')
from #t;
Hi I have a view which is used in lots of search queries in my application.
The issue is application queries which use this view is is running very slow.I am investigating this and i found out a particular portion on the view definition which is making it slow.
create view Demoview AS
Select
p.Id as Id,
----------,
STUFF((SELECT ',' + [dbo].[OnlyAlphaNum](colDesc)
FROM dbo.ContactInfoDetails cd
WHERE pp.FormId = f.Id AND ppc.PageId = pp.Id
FOR XML PATH('')), 1, 1, '') AS PhoneNumber,
p.FirstName as Fname,
From
---
This is one of the column in the view.
The scalar function [OnlyAlphaNum] is making it slow,as it stops parallel execution of the query.
The function is as below;
CREATE FUNCTION [dbo].[OnlyAlphaNum]
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
WHILE PATINDEX('%[^A-Z0-9]%', #String) > 0
SET #String = STUFF(#String, PATINDEX('%[^A-Z0-9]%', #String), 1, '')
RETURN #String
END
How can i convert it into an inline function.?
I tried with CASE ,but not successful.I have read that CTE is a good option.
Any idea how to tackle this problem.?
I already did this; you can read more about it here.
The function:
CREATE FUNCTION dbo.alphaNumericOnly8K(#pString varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
Purpose:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in #pString.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Parameters:
#pString = varchar(8000); Input string to be cleaned
Returns:
AlphaNumericOnly - varchar(8000)
Syntax:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(#pString) ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) ca;
Programmer's Notes:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison
purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 00 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(#pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(#pString,CHECKSUM(N),1)
FROM iTally
WHERE
((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY N
FOR XML PATH('')
);
Note the examples in the code comments:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
Returns:
AlphaNumericOnly
-------------------
xxx123abc999
txtID OldTxt AlphaNumericOnly
----------- ------------- -----------------
1 !!!A555A!!! A555A
2 NULL NULL
3 AAA.999 AAA999
It's the fastest of it's kind. It runs extra fast with a parallel execution plan. To force a parallel execution plan, grab a copy of make_parallel by Adam Machanic. Then you would run it like this:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') ao
CROSS APPLY dbo.make_parallel();
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt)
CROSS APPLY dbo.make_parallel();
Surely there is scope to improve this. test it out.
;WITH CTE AS (
SELECT (CASE WHEN PATINDEX('%[^A-Z0-9]%', D.Name) > 0
THEN STUFF(D.Name, PATINDEX('%[^A-Z0-9]%', D.Name), 1, '')
ELSE D.NAME
END ) NameString
FROM #dept D
UNION ALL
SELECT STUFF(C.NameString, PATINDEX('%[^A-Z0-9]%', C.NameString), 1, '')
FROM CTE C
WHERE PATINDEX('%[^A-Z0-9]%', C.NameString) > 0
)
Select STUFF((SELECT ',' + E.NameString from CTE E
WHERE PATINDEX('%[^A-Z0-9]%', E.NameString) = 0
FOR XML PATH('')), 1, 1, '') AS NAME
I am trying to remove all the comments from a NVARCHAR value.
I don't know which value I will get to the NVARCHAR variable and I need to remove all the comments that start with -- until the end of the line.
For example:
-- Some Comments
SET NOCOUNT ON;
-- Some Comments
SELECT FirstName FROM dbo.Users WHERE Id = #Id;
After removing the comments it should look like this:
SET NOCOUNT ON;
SELECT FirstName FROM dbo.Users WHERE Id = #Id;
Is there any easy way doing it in T-SQL?
Thanks in advance.
Using ngramsN4k:
CREATE FUNCTION dbo.NGramsN4K
(
#string nvarchar(4000), -- Input string
#N int -- requested token size
)
/****************************************************************************************
Purpose:
A character-level N-Grams function that outputs a contiguous stream of #N-sized tokens
based on an input string (#string). Accepts strings up to 4000 nvarchar characters long.
For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.
Compatibility:
SQL Server 2008+, Azure SQL Database
Syntax:
--===== Autonomous
SELECT position, token FROM dbo.NGramsN4K(#string,#N);
--===== Against a table using APPLY
SELECT s.SomeID, ng.position, ng.token
FROM dbo.SomeTable s
CROSS APPLY dbo.NGramsN4K(s.SomeValue,#N) ng;
Parameters:
#string = The input string to split into tokens.
#N = The size of each token returned.
Returns:
Position = bigint; the position of the token in the input string
token = nvarchar(4000); a #N-sized character-level N-Gram token
Developer Notes:
1. NGramsN4K is not case sensitive
2. Many functions that use NGramsN4K will see a huge performance gain when the optimizer
creates a parallel execution plan. One way to get a parallel query plan (if the
optimizer does not chose one) is to use make_parallel by Adam Machanic which can be
found here:
sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
3. When #N is less than 1 or greater than the datalength of the input string then no
tokens (rows) are returned. If either #string or #N are NULL no rows are returned.
This is a debatable topic but the thinking behind this decision is that: because you
can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
can't turn anything into NULL-grams, no rows should be returned.
For people who would prefer that a NULL input forces the function to return a single
NULL output you could add this code to the end of the function:
UNION ALL
SELECT 1, NULL
WHERE NOT(#N > 0 AND #N <= DATALENGTH(#string)) OR (#N IS NULL OR #string IS NULL);
4. NGramsN4K is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== Turn the string, 'abcd' into unigrams, bigrams and trigrams
SELECT position, token FROM dbo.NGramsN4K('abcd',1); -- unigrams (#N=1)
SELECT position, token FROM dbo.NGramsN4K('abcd',2); -- bigrams (#N=2)
SELECT position, token FROM dbo.NGramsN4K('abcd',3); -- trigrams (#N=3)
--===== How many times the substring "AB" appears in each record
DECLARE #table TABLE(stringID int identity primary key, string nvarchar(100));
INSERT #table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');
SELECT string, occurances = COUNT(*)
FROM #table t
CROSS APPLY dbo.NGramsN4K(t.string,2) ng
WHERE ng.token = 'AB'
GROUP BY string;
----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20170324 - Initial Development - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N) AS
(
SELECT 1 FROM (VALUES -- 64 dummy values to CROSS join for 4096 rows
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($)) t(N)
),
iTally(N) AS
(
SELECT
TOP (ABS(CONVERT(BIGINT,((DATALENGTH(ISNULL(#string,''))/2)-(ISNULL(#N,1)-1)),0)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -- Order by a constant to avoid a sort
FROM L1 a CROSS JOIN L1 b -- cartesian product for 4096 rows (16^2)
)
SELECT
position = N, -- position of the token in the string(s)
token = SUBSTRING(#string,CAST(N AS int),#N) -- the #N-Sized token
FROM iTally
WHERE #N > 0
-- Protection against bad parameter values
AND #N <= (ABS(CONVERT(BIGINT,((DATALENGTH(ISNULL(#string,''))/2)-(ISNULL(#N,1)-1)),0)));
You can solve it using the solution below. This will be limited to NVARCHAR(4000) but I can put together an NVARCHAR(max) version if you need one. Also note that my solution ignores lines that begin with "--" and grabs everything up to "--" where the comment is deeper in. I'm not adressing /* this comment style */ but could be modified to do so.
Solution
-- sample stored proc
declare #storedproc varchar(8000) =
'-- Some Comments
SET NOCOUNT ON;
-- Some Comments
SELECT FirstName -- we only need the first name
FROM dbo.Users WHERE Id = #Id;';
--select #storedproc;
-- Solution
select cleanedProc =
(
select substring(item, 1, isnull(nullif(charindex('--', item),0)-1,nextPos))+br
from
(
select 0 union all
select position from dbo.ngramsN4k(#storedproc,1)
where token = char(10)
) d(position)
cross apply (values (char(10), d.position+1,
isnull(nullif(charindex(char(10), #storedproc, d.position+1),0),8000))
) p(br, startPos, nextPos)
cross apply (values (substring(#storedproc, startPos, nextPos-startPos))) split(item)
where item not like '--%'
order by position
for xml path(''), type
).value('(text())[1]', 'varchar(8000)');
before
-- Some Comments
SET NOCOUNT ON;
-- Some Comments
SELECT FirstName -- we only need the first name
FROM dbo.Users WHERE Id = #Id;
after
SET NOCOUNT ON;
SELECT FirstName
FROM dbo.Users WHERE Id = #Id;
I have a table in SQL Server 2012 BI:
CREATE TABLE CodeParts (
ID int identity(1,1) not null
,Line nvarchar(max) not null
)
loaded with parts of the very long T-SQL query stored in [Line] column. Example:
ID | Line
----------------------
1 | BEGIN TRAN MERGE someTableWithLotOfColumns dst USING (SELECT...
2 | WHEN MATCHED THEN CASE WHEN dst.someColumn != src.someColumn...
3 | WHEN NOT MATCHED...
4 | OUTPUT...
5 | ;MERGE... next table with lot of columns blah blah blah
...| ...
25 | ;MERGE... yet another table with lot of columns
60 | COMMIT
The code have 60 lines, each of the line may be up to 12,000 characters because of number of the columns and their names' length.
I need to execute entire code built by all those rows and I don't know how to do that avoiding truncation.
It can be very tricky to work with longer string. Check this:
DECLARE #txt NVARCHAR(MAX)=(SELECT REPLICATE('x',12000));
SELECT LEN(#txt) AS CountCharacters
,DATALENGTH(#txt) AS UsedBytes;
Although one might think this is declared as NVARCHAR(MAX) the given 'x' isn't. This let's the string be a normal string with a smaller size limit. Now try this (only difference is the CAST('x' AS NVARCHAR(MAX))):
DECLARE #txt2 NVARCHAR(MAX)=(SELECT REPLICATE(CAST('x' AS NVARCHAR(MAX)),12000));
SELECT LEN(#txt2) AS CountCharacters
,DATALENGTH(#txt2) AS UsedBytes;
To demonstrate this I create a working example with a dummy table with 60 row each row consisting of 12.000 characters.
DECLARE #tbl TABLE(ID INT IDENTITY,CodeLine NVARCHAR(MAX));
WITH TallySixty AS (SELECT TOP 60 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Dummy FROM master..spt_values)
INSERT INTO #tbl
SELECT REPLICATE(CAST(RIGHT(Dummy,1) AS NVARCHAR(MAX)),12000)
FROM TallySixty;
SELECT CodeLine,LEN(CodeLine) AS CountCharacters
,DATALENGTH(CodeLine) AS UsedBytes FROM #tbl
DECLARE #concatString NVARCHAR(MAX)=
(
SELECT(SELECT CodeLine + ' ' FROM #tbl FOR XML PATH(''),TYPE).value('(text())[1]','nvarchar(max)')
);
SELECT #concatString
,LEN(#concatString) AS CountCharacters
,DATALENGTH(#concatString) AS UsedBytes
The final result shows clearly, that the resulting string has the length of 60 times 12.000 (plus the added blanks) and is twice this sice in memory due to NVARCHAR. Up to ~2GB this concatenation should work. According to this this is pretty much enough :-)
I think, that EXEC is able to deal with NVARCHAR(MAX) up to the full size.
DECLARE #sql NVARCHAR(max)
SET #sql=ISNULL(#sql+CHAR(13),'')+Line FROM CodeParts order by id
EXEC(#SQL)
The #sql variable must be declared with MAX as length.
If the string is more than 4000, you may cannot print whole string, but it can be executed.
I am being passed the following parameter to my stored procedure -
#AddOns = 23:2,33:1,13:5
I need to split the string by the commas using this -
SET #Addons = #Addons + ','
set #pos = 0
set #len - 0
While CHARINDEX(',', #Addons, #pos+1)>0
Begin
SET #len = CHARINDEX(','), #Addons, #pos+1) - #pos
SET #value = SUBSTRING(#Addons, #pos, #len)
So now #value = 23:2 and I need to get 23 which is my ID and 2 which is my quantity. Here is the rest of my code -
INSERT INTO TABLE(ID, Qty)
VALUES(#ID, #QTY)
set #pos = CHARINDEX(',', #Addons, #pos+#len) + 1
END
So what is the best way to get the values of 23 and 2 in separate fields to us in the INSERT statement?
First you would split the sets of key-value pairs into rows (and it looks like you already got that far), and then you get the position of the colon and use that to do two SUBSTRING operations to split the key and value apart.
Also, this can be done much more efficiently than storing each row's key and value into separate variables just to get inserted into a table. If you INSERT from the SELECT that breaks this data apart, it will be a set-based operation instead of row-by-row.
For example:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
;WITH pairs AS
(
SELECT [SplitVal] AS [Value], CHARINDEX(N':', [SplitVal]) AS [ColonIndex]
FROM SQL#.String_Split(#AddOns, N',', 1) -- https://SQLsharp.com/
)
SELECT *,
SUBSTRING(pairs.[Value], 1, pairs.[ColonIndex] - 1) AS [ID],
SUBSTRING(pairs.[Value], pairs.[ColonIndex] + 1, 1000) AS [QTY]
FROM pairs;
/*
Value ColonIndex ID QTY
23:2 3 23 2
33:1 3 33 1
13:5 3 13 5
999:45 4 999 45
*/
GO
For that example I am using a SQLCLR string splitter found in the SQL# library (that I am the author of), which is available in the Free version. You can use whatever splitter you like, including the built-in STRING_SPLIT that was introduced in SQL Server 2016.
It would be used as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
;WITH pairs AS
(
SELECT [value] AS [Value], CHARINDEX(N':', [value]) AS [ColonIndex]
FROM STRING_SPLIT(#AddOns, N',') -- built-in function starting in SQL Server 2016
)
INSERT INTO dbo.TableName (ID, QTY)
SELECT SUBSTRING(pairs.[Value], 1, pairs.[ColonIndex] - 1) AS [ID],
SUBSTRING(pairs.[Value], pairs.[ColonIndex] + 1, 1000) AS [QTY]
FROM pairs;
Of course, the Full (i.e. paid) version of SQL# includes an additional splitter designed to handle key-value pairs. It's called String_SplitKeyValuePairs and works as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
SELECT *
FROM SQL#.String_SplitKeyValuePairs(#AddOns, N',', N':', 1, NULL, NULL, NULL);
/*
KeyID Key Value
1 23 2
2 33 1
3 13 5
4 999 45
*/
GO
So, it would be used as follows:
DECLARE #AddOns VARCHAR(1000) = N'23:2,33:1,13:5,999:45';
INSERT INTO dbo.[TableName] ([Key], [Value])
SELECT kvp.[Key], kvp.[Value]
FROM SQL#.String_SplitKeyValuePairs(#AddOns, N',', N':', 1, NULL, NULL, NULL) kvp;
Check out this blog post...
http://www.sqlservercentral.com/blogs/querying-microsoft-sql-server/2013/09/19/how-to-split-a-string-by-delimited-char-in-sql-server/
Noel
I am going to make another attempt at this inspired by the answer given by #gofr1 on this question...
How to insert bulk of column data to temp table?
That answer showed how to use an XML variable and the nodes method to split comma separated data and insert it into individual columns in a table. It seemed to me to be very similar to what you were trying to do here.
Check out this SQL. It certainly isn't has concise as just having "split" function, but it seems better than chopping up the string based on position of the colon.
Noel