SQL Server Split using XML - illegal name character - sql-server

I am using the following to spilt comma separated string into columns (SQL Server 2014):
function [dbo].[splitString](#input Varchar(max), #Splitter Varchar(99)) returns table as
Return
SELECT Split.a.value('.', 'NVARCHAR(max)') AS Data FROM
( SELECT CAST ('<M>' + REPLACE(#input, #Splitter, '</M><M>') + '</M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
When I try to split the following:
Front Office,Food & Beverage,Housekeeping,Human Resource & Training,Reservation,Other
I get the following error: XML parsing: line 1, character 82, illegal name character
Is there a way to include special characters in my function?

Please try the following solution.
Notable points:
CData section protects against XML entities like ampersand and the
like.
text() inside .nodes() method is for performance reasons.
TRY_CAST() will return NULL, but will not error out.
SQL
DECLARE #input Varchar(max) = 'Front Office,Food & Beverage,Housekeeping,Human Resource & Training,Reservation,Other'
, #Splitter Varchar(99) = ',';
SELECT Split.a.value('.', 'NVARCHAR(max)') AS Data
FROM ( SELECT TRY_CAST('<M><![CDATA[' + REPLACE(#input, #Splitter, ']]></M><M><![CDATA[') + ']]></M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M/text()') AS Split(a);

Related

SQL - String Manipulation

Context:
I have a view in SQL Server that tracks parameters a user inputs when they run an SSRS report (ReportServer.dbo.ExecutionLog). About 50 report parameters are saved as a string in a single column with ntext datatype. I would like to break this single column up into multiple columns for each parameter.
Details:
I query the report parameters like this:
SELECT ReportID, [Parameters]
FROM ReportServer.dbo.ExecutionLog
WHERE ReportID in (N'redacted')
and [Status] in (N'rsSuccess')
ORDER BY TimeEnd DESC
And here's a small subset of what the results look like:
alpha=123&bravo=9%2C33%2C76%2C23&charlie=91&delta=29&echo=11%2F2%2F2018%2012%3A00%3A00%20AM&foxtrot=11%2F1%2F2030%2012%3A00%3A00%20AM
Quesitons:
How can I get the results to look like this:
SQL Server 2017 is Python friendly. Is Python a better language to use in this scenario just for parsing purposes?
I've seen similar topics posted here, here & here. The parameters are dynamic so parsing via SQL string functions that involve counting characters doesn't apply. This question is relevant to more people than just me because there's a large population of people using SSRS. Tracking & formatting parameters in a more digestible way is valuable for all users of SSRS.
Here is a way using the built in STRING_SPLIT. I'm just not sure what the logic is for the stuff AFTER the date, so I would discarded it but I left it for you to decide.
DEMO
declare #table table (ReportID int identity(1,1), [Parameters] varchar(8000))
insert into #table
values
('alpha=123&bravo=9%2C33%2C76%2C23&charlie=91&delta=29&echo=11%2F2%2F2018%2012%3A00%3A00%20AM&foxtrot=11%2F1%2F2030%2012%3A00%3A00%20AM')
,('alpha=457893&bravo=9%2C33%2C76%2C23&charlie=91&delta=29&echo=11%2F2%2F2018%2012%3A00%3A00%20AM&foxtrot=11%2F1%2F2030%2012%3A00%3A00%20AM')
select
ReportID
,[Parameters]
,alpha = max(iif(value like 'alpha%',substring(value,charindex('=',value) + 1,99),null))
,bravo = max(iif(value like 'bravo%',substring(value,charindex('=',value) + 1,99),null))
,charlie = max(iif(value like 'charlie%',substring(value,charindex('=',value) + 1,99),null))
,delta = max(iif(value like 'delta%',substring(value,charindex('=',value) + 1,99),null))
,echo = max(iif(value like 'echo%',substring(value,charindex('=',value) + 1,99),null))
,foxtrot = max(iif(value like 'foxtrot%',substring(value,charindex('=',value) + 1,99),null))
from #table
cross apply string_split(replace(replace([Parameters],'%2C',','),'%2F','/'),'&')
group by ReportID, [Parameters]
Or, if they aren't static you can use a dynamic pivot. It'll take some massaging to get your columns in the correct order.
DEMO
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
SET #cols = STUFF((SELECT distinct ',' + QUOTENAME(substring([value],0,charindex('=',[value])))
from myTable
cross apply string_split(replace(replace([Parameters],'%2C',','),'%2F','/'),'&')
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #cols
set #query = 'SELECT ReportID, ' + #cols + ' from
(
select ReportID
, ColName = substring([value],0,charindex(''='',[value]))
, ColVal = substring([value],charindex(''='',[value]) + 1,99)
from myTable
cross apply string_split(replace(replace([Parameters],''%2C'','',''),''%2F'',''/''),''&'')
) x
pivot
(
max(ColVal)
for ColName in (' + #cols + ')
) p '
execute(#query)
Split the string on the ampersand character.
Further split each row into two columns on the equals character.
In the second column, replace %2C with the comma character, and %2F with the forward-slash character, and so on with any other replacements as needed.
Use a dynamic-pivot to query the above in the format that you want.
Here's a method that starts with a lot of replaces.
To url-decode the string and transform it into an XML type.
Then it uses the XML functions to get the values for the columns.
Example snippet:
declare #Table table ([Parameters] varchar(200));
insert into #Table ([Parameters]) values
('alpha=123&bravo=9%2C33%2C76%2C23&charlie=91&delta=29&echo=11%2F2%2F2018%2012%3A00%3A00%20AM&foxtrot=11%2F1%2F2030%2012%3A00%3A00%20AM');
select
x.query('/x[key="alpha"]/val').value('.', 'int') as alpha,
x.query('/x[key="bravo"]/val').value('.', 'varchar(30)') as bravo,
x.query('/x[key="charlie"]/val').value('.', 'varchar(30)') as charlie,
x.query('/x[key="delta"]/val').value('.', 'varchar(30)') as delta,
convert(date, x.query('/x[key="echo"]/val').value('.', 'varchar(30)'), 103)as echo,
convert(date, x.query('/x[key="foxtrot"]/val').value('.', 'varchar(30)'), 103) as foxtrot
from #Table
cross apply (select cast('<x><key>'+
replace(replace(replace(replace(replace(
replace([Parameters],
'%2C',','),
'%2F','/'),
'%20',' '),
'%3A',':'),
'=','</key><val>'),
'&','</val></x><x><key>')
+'</val></x>' as XML) as x) ca
Test on db<>fiddle here

How to remove a string that left of character `;` and the contained string `U` and then display it?

I have a table and the values like this
000001U;000002;000003U;000004;000005U;000006U
and I want display the field is like
000002;000004;
Try This
DECLARE #Table AS TABLE (Data nvarchar(1000))
INSERT INTO #Table
SELECT '000001U;000002;000003U;000004;000005U;000006U'
SELECT STUFF((SELECT '; '+Data
FROM
(
SELECT Split.a.value('.','nvarchar(1000)') AS Data
FROM
(
SELECT
CAST('<S>'+REPLACE(Data,';','</S><S>') +'</S>' AS XML ) AS Data
FROM #Table
)AS A
CROSS APPLY Data.nodes('S') AS Split(a)
)dt
WHERE CHARINDEX('U',Data)=0 FOR XML PATH('')),1,1,'') AS Data
Result
Data
---------
000002; 000004
As mentioned in the comments, SQL Server does not have any native regex replacement support. But, if you can get a dump of your entire table/column, then you can easily do a regex replacement in another tool, such as Notepad++.
Do a find on this pattern:
[0-9]+U;?
And then just replace with empty string. This should leave each row with the data you want to see. Here is a demo showing that this works in Java.
Demo
for SQL Server 2016 and later.
select stuff (
(select ',' + value
from STRING_SPLIT ('000001U;000002;000003U;000004;000005U;000006U', ';')
where right(value, 1) <> 'U'
for xml path('')),
1, 1, '')
for earlier version, you may use any CSV Spliter like this from Jeff Moden http://www.sqlservercentral.com/articles/Tally+Table/72993/
Simple way is to determine the value by IsNumeric function.
DECLARE #GIVEN VARCHAR(MAX)='000001U;000002;000003U;000004;000005U;000006U';
DECLARE #FINAL VARCHAR(MAX)='';
SELECT #FINAL =#FINAL+ case when ISNUMERIC(val)=1 then val+';' else '' end FROM (
SELECT split.x.value('.','varchar(max)') VAL FROM(
SELECT CAST('<M>'+REPLACE(#GIVEN,';','</M><M>')+'</M>' AS XML) AS VAL
)A
CROSS APPLY a.VAL.nodes('/M') as split(x)
)AA
PRINT #FINAL
Result: 000002;000004;

SQL Server Regular expression extract pattern from DB colomn

I have a question about SQL Server: I have a database column with a pattern which is like this:
up to 10 digits
then a comma
up to 10 digits
then a semicolon
e.g.
100000161, 100000031; 100000243, 100000021;
100000161, 100000031; 100000243, 100000021;
and I want to extract within the pattern the first digits (up to 10) (1.) and then a semicolon (4.)
(or, in other words, remove everything from the semicolon to the next semicolon)
100000161; 100000243; 100000161; 100000243;
Can you please advice me how to establish this in SQL Server? Im not very familiar with regex and therefore have no clue how to fix this.
Thanks,
Alex
Try this
Declare #Sql Table (SqlCol nvarchar(max))
INSERT INTO #Sql
SELECT'100000161,100000031;100000243,100000021;100000161,100000031;100000243,100000021;'
;WITH cte
AS (SELECT Row_number()
OVER(
ORDER BY (SELECT NULL)) AS Rno,
split.a.value('.', 'VARCHAR(1000)') AS Data
FROM (SELECT Cast('<S>'
+ Replace( Replace(sqlcol, ';', ','), ',',
'</S><S>')
+ '</S>'AS XML) AS Data
FROM #Sql)AS A
CROSS apply data.nodes('/S') AS Split(a))
SELECT Stuff((SELECT '; ' + data
FROM cte
WHERE rno%2 <> 0
AND data <> ''
FOR xml path ('')), 1, 2, '') AS ExpectedData
ExpectedData
-------------
100000161; 100000243; 100000161; 100000243
I believe this will get you what you are after as long as that pattern truly holds. If not it's fairly easy to ensure it does conform to that pattern and then apply this
Select Substring(TargetCol, 1, 10) + ';' From TargetTable
You can take advantage of SQL Server's XML support to convert the input string into an XML value and query it with XQuery and XPath expressions.
For example, the following query will replace each ; with </b><a> and each , to </a><b> to turn each string into <a>100000161</a><a>100000243</a><a />. After that, you can select individual <a> nodes with /a[1], /a[2] :
declare #table table (it nvarchar(200))
insert into #table values
('100000161, 100000031; 100000243, 100000021;'),
('100000161, 100000031; 100000243, 100000021;')
select
xCol.value('/a[1]','nvarchar(200)'),
xCol.value('/a[2]','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
-------------------------
A1 A2
100000161 100000243
100000161 100000243
value extracts a single value from an XML field. nodes returns a table of nodes that match the XPath expression. The following query will return all "keys" :
select
a.value('.','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
cross apply xCol.nodes('a') as y(a)
where a.value('.','nvarchar(200)')<>''
------------
100000161
100000243
100000161
100000243
With 200K rows of data though, I'd seriously consider transforming the data when loading it and storing it in indivisual, indexable columns, or add a separate, related table. Applying string manipulation functions on a column means that the server can't use any covering indexes to speed up queries.
If that's not possible (why?) I'd consider at least adding a separate XML-typed column that would contain the same data in XML form, to allow the creation of an XML index.

Split String using XML in SQL Server

Question: how to split below string using XML?
Input:
'7-VPN Connectivity 7.8 - Ready to Elixir Connector install 9-Unified Installation'
Expected output:
7-VPN Connectivity
7.8 - Ready to Elixir Connector install
9-Unified Installation
My code:
DECLARE #xml AS XML,
#str AS VARCHAR(100)
SET #str = '7-VPN Connectivity 7.8 - Ready to Elixir Connector install 9-Unified Installation'
SET #xml = CAST(('<X>'+replace(#str,' ','</X><X>')+'</X>') AS XML)
SELECT
N.value('.', 'VARCHAR(10)') AS value
FROM
#xml.nodes('X') AS T(N)
--Provide the comma From Where you wan't to split The Data
-- For Eg:
BEGIN TRAN
DECLARE #S varchar(max),
#Split char(1),
#X xml
SELECT #S = '7-VPN Connectivity ,7.8- Ready to Elixir Connector install, 9-Unified Installation',
#Split = ','
SELECT #X = CONVERT(xml,' <root> <s>' + REPLACE(#S,#Split,'</s> <s>') + '</s> </root> ')
SELECT [Value] = T.c.value('.','varchar(255)')
FROM #X.nodes('/root/s') T(c)
ROLLBACK TRAN
This is a horrible design! If there is the slightest chance to fix this you should change this the sooner the better...
You might try something like this, but use it only to clean up that mess!
DECLARE #YourString VARCHAR(100)='7-VPN Connectivity 7.8 - Ready to Elixir Connector install 9-Unified Installation';
WITH CutAtHyphen(Nr,part) AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,LTRIM(RTRIM(A.part.value('text()[1]','nvarchar(max)')))
FROM
(
SELECT CAST('<x>' + REPLACE((SELECT #YourString AS [*] FOR XML PATH('')),'-','</x><x>') + '</x>' AS XML) AS Casted
) AS t
CROSS APPLY t.Casted.nodes('/x') AS A(part)
)
,CutOffFinal AS
(
SELECT Nr
,part
,LEFT(part,LEN(part)-PositionOf.LastBlank) AS Remainder
,CASE WHEN Nr>1 THEN RIGHT(part,PositionOf.LastBlank) ELSE part END AS Tail
FROM CutAtHyphen
OUTER APPLY (SELECT CHARINDEX(' ',REVERSE(part))) AS PositionOf(LastBlank)
)
,recCTE AS
(
SELECT Nr, CAST(N'' AS NVARCHAR(MAX)) AS String,Tail FROM CutOffFinal WHERE Nr=1
UNION ALL
SELECT cof.Nr
,r.Tail + '-' + cof.Remainder
,cof.Tail
FROM recCTE AS r
INNER JOIN CutOffFinal AS cof ON cof.Nr=r.Nr+1
)
SELECT String + CASE WHEN Nr=(SELECT MAX(Nr) FROM CutOffFinal) THEN Tail ELSE '' END AS FinalString
FROM recCTE
WHERE Nr>1;
This code will first of all cut the string at the hyphens and trim it. The it will search for the last blank and cut of the number, which belongs to the next row.
The recursive CTE will travel down the line and concatenate the tail of the previous row, with the remainder of the current.
The first and the last line need special treatment.

select multiple nodes xml

I'm retrieving xml formatted text from ntext fields (sample format of a row below):
<root>
<DocInfo>
<CompanyName>Some Company</CompanyName>
<WebsiteUrl>http://www.someurl.com</WebsiteUrl>
<PrimaryServices>Benefits Administration</PrimaryServices>
<PrimaryServices>Payroll Processing</PrimaryServices>
<SecondaryServices>Background Checking</SecondaryServices>
<SecondaryServices>HR Outsourcing</SecondaryServices>
<SecondaryServices>Comp & Benefits</SecondaryServices>
<SecondaryServices>Administration</SecondaryServices>
</DocInfo>
</root>
Using this sql I am retrieving the single node values:
select #xmlString = COALESCE(#xmlString + '', '') + cast(content_html as nvarchar(max)) FROM content where folder_id = 18
set #xmlString = replace(#xmlString,'<?xml version="1.0" encoding="UTF-16" standalone="yes"?>','')
set #XML = cast(#xmlString as xml)
Select
T.N.value('CompanyName[1]', 'varchar(250)') as CompanyName,
T.N.value('WebsiteUrl[1]', 'varchar(250)') as WebsiteUrl,
T.N.value('PrimaryServices[1]', 'varchar(250)') as PrimaryServices,
T.N.value('SecondaryServices[1]', 'varchar(250)') as SecondaryServices,
T.N.value('Description[1]', 'varchar(max)') as Description
from #XML.nodes('/root/DocInfo') as T(N)
This works fine for the single node values (CompanyName, WebsiteUrl). However, it isn't inserting the nodes with multiple values properly (like PrimaryServices and SecondaryServices - each of which may have zero to 16 nodes). How do I get these variable length multiple node values into these columns?
Thanks for any help
To get the multiple nodes as a comma separated value you can use a variant of the for xml path('') trick. Use the shredded XML (T.N) as a source in the sub-query to get the nodes you are interested in. The xQuery ... substring(text()[1]) ... part is just there to remove the extra comma and to get the comma separated value out of the XML that is created by for xml.
select
T.N.value('(CompanyName/text())[1]', 'varchar(250)') as CompanyName,
T.N.value('(WebsiteUrl/text())[1]', 'varchar(250)') as WebsiteUrl,
(
select ', '+P.N.value('text()[1]', 'varchar(max)')
from T.N.nodes('PrimaryServices') as P(N)
for xml path(''), type
).value('substring(text()[1], 2)', 'varchar(max)') as PrimaryServices,
(
select ', '+S.N.value('text()[1]', 'varchar(max)')
from T.N.nodes('SecondaryServices') as S(N)
for xml path(''), type
).value('substring(text()[1], 2)', 'varchar(max)') as SecondaryServices,
T.N.value('(Description/text())[1]', 'varchar(max)') as Description
from #XML.nodes('/root/DocInfo') as T(N)
If you want all the services in one column you can use a different xPath in the nodes part in the sub-query.
select
T.N.value('(CompanyName/text())[1]', 'varchar(250)') as CompanyName,
T.N.value('(WebsiteUrl/text())[1]', 'varchar(250)') as WebsiteUrl,
(
select ', '+P.N.value('text()[1]', 'varchar(max)')
from T.N.nodes('PrimaryServices,SecondaryServices') as P(N)
for xml path(''), type
).value('substring(text()[1], 2)', 'varchar(max)') as Services,
T.N.value('(Description/text())[1]', 'varchar(max)') as Description
from #XML.nodes('/root/DocInfo') as T(N)

Resources