Related
I have an inline table-valued function, which splits strings into row of substrings based on a specified separator.
It is as follows:
ALTER FUNCTION [dbo].[SplitString]
(#List NVARCHAR(MAX),
#Delim VARCHAR(255))
RETURNS TABLE
AS
RETURN
(SELECT [Value], idx = RANK() OVER (ORDER BY n)
FROM
(SELECT
n = Number,
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM
(SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim) AS y
);
GO
Usage:
SELECT value
FROM dbo.SplitString('a|b|c', '|')
returns:
value
a
b
c
But when sending an empty value as the first argument, it doesn't return anything.
For example:
SELECT value FROM dbo.SplitString('','|')
This doesn't return anything.
What modification I need to do to the dbo.SplitString function, so that it returns an empty result set, when an empty string is passed in as first argument?
PS: I can't use the inbuilt STRING_SPLIT function because of compatibility issues.
DelimitedSplit8K_LEAD (above) will always return a row and will be faster.
That said, for learning purposes let's fix your function. If you replace a blank value with your delimiter you will get the results you are looking for. You just need to replace every instance of #list with ISNULL(NULLIF(#List,''),#Delim). Now you have:
ALTER FUNCTION [dbo].[SplitString]
(#List NVARCHAR(MAX),
#Delim VARCHAR(255))
RETURNS TABLE
AS
RETURN
(SELECT [Value], idx = RANK() OVER (ORDER BY n)
FROM
(SELECT
n = Number,
[Value] = LTRIM(RTRIM(SUBSTRING(ISNULL(NULLIF(#List,''),#Delim), [Number],
CHARINDEX(#Delim, ISNULL(NULLIF(#List,''),#Delim) + #Delim, [Number]) - [Number])))
FROM
(SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(ISNULL(NULLIF(#List,''),#Delim))
AND SUBSTRING(#Delim + ISNULL(NULLIF(#List,''),#Delim),
[Number], LEN(#Delim)) = #Delim) AS y
);
Now, when you execute:
DECLARE #list VARCHAR(max) = '', #delim VARCHAR(255) = '|'
SELECT *
FROM dbo.SplitString(#list,#delim)
You get:
Value idx
------ ------
1
Thanks #Larnu and #Bernie for all suggestions.
After so much of research, I started iterating and getting expected result.
I achieved this by simple while loop and , string functions of SQL.
CREATE FUNCTION [SplitString]
(
#ActualString VARCHAR(MAX),
#DelimiterCharacter VARCHAR(10)
)
RETURNS #TableRes TABLE (Id INT IDENTITY(1,1),Value VARCHAR(MAX))
AS
BEGIN
DECLARE #SubStr VARCHAR(MAX)
WHILE (CHARINDEX(#DelimiterCharacter ,#ActualString)<>0)
BEGIN
SET #SubStr=SUBSTRING(#ActualString,1,CHARINDEX(#DelimiterCharacter ,#ActualString)-1)
SET #ActualString= STUFF(#ActualString,1,CHARINDEX(#DelimiterCharacter,#ActualString),'')
INSERT INTO #TableRes
SELECT #SubStr
END
INSERT INTO #TableRes
SELECT #ActualString
RETURN
END
This will work for all cases
1)When Actual string is empty string like select * from [dbo].[SplitString]('',',')
2)When Actual string has empty string at end like select * from [dbo].[SplitString]('a,b,',',')
What native function can I use to retrieve a specific substring in a comma-delimited string?
For example, I have the following:
declare #table table
(
mystring varchar(100)
)
Insert into #table
select 'a,ff,sddd,sds,qwq' union
select 'a,jgj,sddd,sasds,qwq' union
select 'ccc,g,rer,fd,vs' union
select 'sdsd,xxx,rerqq,fdf,vsw'
Using some native function (maybe a combination of (ie. PATINDEX, STRING_SPLIT, CHARINDEX), I'd like to return the string between the 2nd and third comma in all the rows.
The reason I'm asking about native functions is because I've seen several custom functions that work well with small batches of data (ie. GetSplitString_CTE) but they're extremely slow with large data sets.
I'm using SQL Server 2016 and the source table has 1.7 billion rows. I cannot change the column type or add columns.
A series of CROSS APPLYs feeding into each other should do the trick:
SELECT
SUBSTRING(
t.mystring,
v2.comma + 1,
ISNULL(v3.comma - v2.comma - 1, LEN(t.mystring))
)
FROM #table t
CROSS APPLY (VALUES( NULLIF(CHARINDEX(',', t.mystring ), 0) ) v1(comma)
CROSS APPLY (VALUES( NULLIF(CHARINDEX(',', t.mystring, v1.comma + 1), 0) ) v2(comma)
CROSS APPLY (VALUES( NULLIF(CHARINDEX(',', t.mystring, v2.comma + 1), 0) ) v3(comma)
Take the string a,jgj,sddd,sasds,qwq:
v1.comma returns 2
v2.comma returns 6
v3.comma returns 11
The substring starts at v2.comma + 1 = 7 and length of v3.comma - v2.comma - 1 = 4
NULLIF in case CHARINDEX does not find a comma
ISNULL/LEN in case there is no third comma
I strongly suggest you reconsider the table design
With a bit a JSON
Example
Select A.*
,Pos3 = JSON_VALUE('["'+replace(replace(mystring,'"','\"'),',','","')+'"]','$[2]')
from #table A
Returns
mystring Pos3
a,ff,sddd,sds,qwq sddd
a,jgj,sddd,sasds,qwq sddd
ccc,g,rer,fd,vs rer
sdsd,xxx,rerqq,fdf,vsw rerqq
NOTE: If you can GTD no DOUBLE QUOTES in your string, you can eliminate one replace()
Select A.*
,Pos3 = JSON_VALUE('["'+replace(mystring,',','","')+'"]','$[2]')
from #table A
EDIT: If you want to extact more than one value
Select A.*
,Pos1 = JSON_VALUE(S,'$[0]')
,Pos2 = JSON_VALUE(S,'$[1]')
,Pos3 = JSON_VALUE(S,'$[2]')
,Pos4 = JSON_VALUE(S,'$[3]')
,Pos5 = JSON_VALUE(S,'$[4]')
,Pos6 = JSON_VALUE(S,'$[5]')
From #table A
Cross Apply ( values ( '["'+replace(replace(A.mystring,'"','\"'),',','","')+'"]' ) ) B(S)
Returns
So using a built-in function as you requested, you can easily do that with row_number()
There is a caveat that according to the documentation the output is not guaranteed to be in order, but is in my experience; you could always replace it with your own function.
declare #x varchar(100)='a,bb,c,ddd,e,fff,ggggg,h,i'
select value from (
select * , Row_Number() over (order by (select 1/0)) rn
from String_Split(#x,',')
)s
where rn=5
You can implement easy against a table as
select * from
MyTable t outer apply
(
select * , Row_Number() over (order by (select 1/0)) rn
from String_Split(t.columnname,',')
)s
where rn=5
I have the following given string to split into two columns with given From and To format.
Given string:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D'
Expected Result:
From To
-----------
A B
B C
C D
Tried:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D'
SELECT CASE WHEN item LIKE '%-' THEN REPLACE(item,'-','') END AS [From],
CASE WHEN item NOT LIKE '%-' THEN item END AS [To]
FROM dbo.f_Split(#String,'>')
Try this:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D';
DECLARE #StringXML XML = CAST('<a>' + REPLACE(#String, '->', '</a><a>') + '</a>' AS XML);
WITH DataSource ([RowID], [RowValue]) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC)
,T.c.value('.', 'CHAR(1)')
FROM #StringXML.nodes('a') T(c)
)
SELECT DS1.[RowValue] AS [From]
,DS2.[RowValue] AS [TO]
FROM DataSource DS1
INNER JOIN DataSource DS2
ON DS1.[RowID] + 1 = DS2.[RowID];
The idea is to split the values and order them. Then just perform join to the final row set to itself.
You can REPLACE the string before processing it and directly apply joins to get the expected output. Considering the dbo.f_Split function returns column item.
DECLARE #String VARCHAR(MAX) = 'A->B->C->D->E->F->G';
SET #String = REPLACE(#String, '->', '>')
WITH CTE(RowNumber, RowData) AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY S1.item) AS RowNumber,
S1.item AS RowData
FROM dbo.f_Split(#String,'>') S1
)
SELECT
C1.RowData AS [From],
C2.RowData AS [To]
FROM CTE C1
INNER JOIN CTE C2 ON C1.RowNumber + 1 = C2.RowNumber
One more solution using the position and +1:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D->E';
DECLARE #YourStringAsXml XML=CAST('<x>' + REPLACE(#String, '->', '</x><x>') + '</x>' AS XML);
--the query
WITH tally(nr) AS
(
SELECT TOP (#YourStringAsXml.value('count(/x)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values
)
SELECT #YourStringAsXml.value('/x[sql:column("nr")][1]','varchar(10)') AS FromNode
,#YourStringAsXml.value('/x[sql:column("nr")+1][1]','varchar(10)') AS ToNode
FROM tally;
The idea in short:
We transform the string to an XML
We use a tally-on-the-fly with a computed TOP() clause to get a list of running numbers (better was - and very handsome anyway - a pyhsical numbers table).
Now we can pick the elements by their position (sql:column()) and the neighbour by simply adding +1 to this position
I have a field in a table containing different IDs for different programmes like this:
ProgrammeID
-----------
Prog201604L
Prog201503L
Pro2015N
Pro2014N
Programme2010
Programme2011
Each programme ID has its meaning. The number in the mid of the string indicates the time or month. It is obvious that Prog201604L and Prog201503L indicate the same programme but in different years (so do the rest). What I want to do is to remove the numbers so after removal the programmeID will be like:
ProgrammeID
-----------
ProgL
ProgL
ProN
ProN
Programme
Programme
Then later I can aggregate this programmes together.
I am currently using SSMS 2012 not sure if there is a sql statement like RegEx. I have been searching for a long time but the solution online are mainly about Oracle and MySQL. What I found is PATINDEX() and it seems to support regular expression. Can anybody tell me how to create a pattern that suits my situation and what kind of statement I should use?
Thanks in advance
If the Number part is always 6 characters below can be used.
DECLARE #ProgrammeID VARCHAR(50) = 'Prog201604L'
SELECT STUFF(#ProgrammeID, PATINDEX( '%[0-9]%', #ProgrammeID), 6, '')
If the numbers are not fixed... to extend above
CREATE TABLE #Programme ( ProgrammeID VARCHAR(50) )
INSERT INTO #Programme
VALUES
('Prog201604L')
,('Pro2015N')
,('Programme2010')
,('Prog2016L')
,('Pro2N')
,('Prog')
,('2010')
SELECT ProgrammeID,
ISNULL(
STUFF(ProgrammeID,
PATINDEX( '%[0-9]%', ProgrammeID), -- get number start index
IIF(PATINDEX( '%[0-9][a-z]%',ProgrammeID)= 0, PATINDEX( '%[0-9]',ProgrammeID), PATINDEX( '%[0-9][a-z]%',ProgrammeID)) + 1 -- get the last number index
- PATINDEX( '%[0-9]%', ProgrammeID), -- get the number character length
'')
,ProgrammeID) -- Where there are no numbers in the string you will get Null, replace it with actual string
AS [Without Numbers]
FROM #Programme
this will handle cases with varying numbers and even string without number.
Hope this helps
You can create a function and pass the value of each row to function
as (just run this query)
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
---Call it like this:
Declare #tbl table (ProgrammeID varchar(20))
insert into #tbl values ('ProgL'),('ProgL'),('ProN'),('ProN'),('Programme'),('Programme')
select * from #tbl
Select dbo.RemoveNonAlphaCharacters(ProgrammeID) from #tbl
How to strip all non-alphabetic characters from string in SQL Server?
Remove numbers from string sql server
One clever option is to take the substring of the ProgrammeID column from the left, until hitting the first number, and concatenate that with the reverse of the substring from the right until hitting the first number:
SELECT
SUBSTRING(ProgrammeID,
1,
PATINDEX('%[0-9]%', ProgrammeID) - 1) +
REVERSE(SUBSTRING(REVERSE(ProgrammeID),
1,
PATINDEX('%[0-9]%', REVERSE(ProgrammeID)) - 1))
FROM yourTable
I have created a user-defined function for SQL Server to remove non-numeric characters in a string expression
We can modify it to remove the opposite, numeric characters from the input string as follows
while patindex('%[0-9]%', #str) > 0
set #str = stuff(#str, patindex('%[0-9]%', #str), 1, '')
return #str
I hope it helps
Alan Burstein wrote an iTVF exactly for this. The function is called PatExclude8K. Here is the function definition (some comments removed):
CREATE FUNCTION dbo.PatExclude8K
(
#String VARCHAR(8000),
#Pattern VARCHAR(50)
)
/*******************************************************************************
Purpose:
Given a string (#String) and a pattern (#Pattern) of characters to remove,
remove the patterned characters from the string.
*******************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)),
itally(N) AS
(
SELECT TOP(CONVERT(INT,LEN(#String),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 T1 CROSS JOIN E1 T2 CROSS JOIN E1 T3 CROSS JOIN E1 T4
)
SELECT NewString =
((
SELECT SUBSTRING(#String,N,1)
FROM iTally
WHERE 0 = PATINDEX(#Pattern,SUBSTRING(#String COLLATE Latin1_General_BIN,N,1))
FOR XML PATH(''),TYPE
).value('.[1]','varchar(8000)'));
GO
And here is how you would use it:
SELECT *
FROM #Programme p
CROSS APPLY dbo.PatExclude8K(p.ProgrammeID, '[0-9]');
Using your sample data, here is the result:
ProgrammeID NewString
-------------------- -----------------
Prog201604L ProgL
Prog201503L ProgL
Pro2015N ProN
Pro2014N ProN
Programme2010 Programme
Programme2011 Programme
I created this solution building on a solution to extracting values from a comma separated list inside a string.
It seems to work find and even be a bit more effective than using while - I will be happy for feedback about that assumption, though.
On on table with 461.358 rows it takes 3 minutes and 27 seconds to do this (0.44 ms per row) (I put it into a function).
select count(*)
from Mytable
where dbo.StripNumeric(inputFromUser) is null
Here's the solutions
For stripping away numeric:
declare #input nvarchar(max) = null
select #input = '1a2 3b4' + char(13) + char(10) + '5(678)*90c'
DECLARE #output nvarchar(max) = '';
WITH cte AS
(
SELECT cast(1 as int) as [index]
UNION ALL
SELECT [index]+ 1 as [index]
from cte
where [index] < len(#input)
)
select #output = iif(PATINDEX('%[0-9]%', substring(#input, [index], 1))= 1, #output, #output + substring(#input, [index], 1))
from cte;
select iif(COALESCE( #output, '') = '', null, ltrim(rtrim(#output)))
For stripping away non-numeric:
declare #input nvarchar(max) = null
select #input = '1a2 3b4' + char(13) + char(10) + '5(678)*90c'
DECLARE #output nvarchar(max) = '';
WITH cte AS
(
SELECT cast(1 as int) as [index]
UNION ALL
SELECT [index]+ 1 as [index]
from cte
where [index] < len(#input) --len(substring(#input, index, 1)) >
)
select #output = iif(PATINDEX('%[0-9]%', substring(#input, [index], 1))= 1, #output + substring(#input, [index], 1), #output)
from cte;
select iif(COALESCE( #output, '') = '', null, ltrim(rtrim(#output)))
I have a varchar(max) field containing Name Value pairs, in every line I have Name UnderScore Value.
I need to do a query against it so that it returns the Name, Value pairs in two columns (so by parsing the text, removing the underscore and the "new line" char.
So from this
select NameValue from Table
where I get this text:
Name1_Value1
Name2_Value2
Name3_Value3
I would like to have this output
Names Values
===== ======
Name1 Value1
Name2 Value2
Name3 Value3
SELECT substring(NameValue, 1, charindex('_', NameValue)-1) AS Names,
substring(NameValue, charindex('_', NameValue)+1, LEN(NameValue)) AS Values
FROM Table
EDIT:
Something like this put in a function or stored procedure combined with a temp table should work for more than one line, depending on the line delimiter you should also remove CHAR(13) before you start:
DECLARE #helper varchar(512)
DECLARE #current varchar(512)
SET #helper = NAMEVALUE
WHILE CHARINDEX(CHAR(10), #helper) > 0 BEGIN
SET #current = SUBSTRING(#helper, 1, CHARINDEX(CHAR(10), #helper)-1)
SELECT SUBSTRING(#current, 1, CHARINDEX('_', #current)-1) AS Names,
SUBSTRING(#current, CHARINDEX('_', #current)+1, LEN(#current)) AS Names
SET #helper = SUBSTRING(#helper, CHARINDEX(CHAR(10), #helper)+1, LEN(#helper))
END
SELECT SUBSTRING(#helper, 1, CHARINDEX('_', #helper)-1) AS Names,
SUBSTRING(#helper, CHARINDEX('_', #helper)+1, LEN(#helper)) AS Names
DECLARE #TExt NVARCHAR(MAX)= '***[ddd]***
dfdf
fdfdfdfdfdf
***[fff]***
4545445
45454
***[ahaASSDAD]***
DFDFDF
***[SOME TEXT]***
'
DECLARE #Delimiter VARCHAR(1000)= CHAR(13) + CHAR(10) ;
WITH numbers
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
FROM sys.objects o
CROSS JOIN sys.objects o2
),
c AS ( SELECT Number CHARBegin ,
ROW_NUMBER() OVER ( ORDER BY number ) RN
FROM numbers
WHERE SUBSTRING(#text, Number, LEN(#Delimiter)) = #Delimiter
),
res
AS ( SELECT CHARBegin ,
CAST(LEFT(#text, charbegin) AS NVARCHAR(MAX)) Res ,
RN
FROM c
WHERE rn = 1
UNION ALL
SELECT c.CHARBegin ,
CAST(SUBSTRING(#text, res.CHARBegin,
c.CHARBegin - res.CHARBegin) AS NVARCHAR(MAX)) ,
c.RN
FROM c
JOIN res ON c.RN = res.RN + 1
)
SELECT *
FROM res
He is an example that you can use:
-- Creating table:
create table demo (dID int, dRec varchar(100));
-- Inserting records:
insert into demo (dID, dRec) values (1, 'BCQP1 Sam');
insert into demo (dID, dRec) values (2, 'BCQP2 LD');
-- Selecting fields to retrive records:
select * from demo;
Then I want to show in one single row both rows combined and display only the values from the left removing the name on the right side up to the space character.
/*
The STUFF() function puts a string in another string, from an initial position.
The LEFT() function returns the left part of a character string with the specified number of characters.
The CHARINDEX() string function returns the starting position of the specified expression in a character string.
*/
SELECT
DISTINCT
STUFF((SELECT ' ' + LEFT(dt1.dRec, charindex(' ', dt1.dRec) - 1)
FROM demo dt1
ORDER BY dRec
FOR XML PATH('')), 1, 1, '') [Convined values]
FROM demo dt2
--
GROUP BY dt2.dID, dt2.dRec
ORDER BY 1
As you can see here when you run the function the output will be:
BCQP1 BCQP2
On the top of the script I explained what each function is used for (STUFF(), LEFT(), CHARINDEX() functions) I also used DISTINCT in order to eliminate duplicate values.
NOTE: dt stands for "demo table", I used the same table and use two alias dt1 and dt2, and dRec stands for "demo Record"
If you want to learn more about STUFF() Function here is a link:
https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
With a CTE you will have a problem with Recursion if more that 100 items
Msg 530, Level 16, State 1, Line 20 The statement terminated. The
maximum recursion 100 has been exhausted before statement completion.
DECLARE #TExt NVARCHAR(MAX)
SET #TExt = '100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203'
DECLARE #Delimiter VARCHAR(1000)= ',';
WITH numbers
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
FROM sys.objects o
CROSS JOIN sys.objects o2
),
c AS ( SELECT Number CHARBegin ,
ROW_NUMBER() OVER ( ORDER BY number ) RN
FROM numbers
WHERE SUBSTRING(#text, Number, LEN(#Delimiter)) = #Delimiter
),
res
AS ( SELECT CHARBegin ,
CAST(LEFT(#text, charbegin) AS NVARCHAR(MAX)) Res ,
RN
FROM c
WHERE rn = 1
UNION ALL
SELECT c.CHARBegin ,
CAST(SUBSTRING(#text, res.CHARBegin,
c.CHARBegin - res.CHARBegin) AS NVARCHAR(MAX)) ,
c.RN
FROM c
JOIN res ON c.RN = res.RN + 1
)
SELECT *
FROM res