Parse JSON Array in T-SQL - arrays

In our SQL Server table we have a json object stored with an array of strings. I want to programatically split that string into several columns. However, I cannot seem to get it to work or even if it's possible.
Is this a possibility to create multiple columns within the WITH clause or it is a smarter move to do it within the select statement?
I trimmed down some of the code to give a simplistic idea of what's given.
The example JSON is similar to { "arr": ["str1 - str2"] }
SELECT b.* FROM [table] a
OUTER APPLY
OPENJSON(a.value, '$.arr')
WITH
(
strSplit1 VARCHAR(100) SPLIT('$.arr', '-',1),
strSplit2 VARCHAR(100) SPLIT('$.arr', '-',2)
) b

Due to the tag [tsql] and the usage of OPENJSON I assume this is SQL-Server. But might be wrong... Please always specify your RDBMS (with version).
Your JSON is rather weird... I think you've overdone it while trying to simplify this for brevity...
Try this:
DECLARE #tbl TABLE(ID INT IDENTITY,YourJSON NVARCHAR(MAX));
INSERT INTO #tbl VALUES(N'{ "arr": ["str1 - str2"] }') --weird example...
,(N'{ "arr": ["a","b","c"] }'); --array with three elements
SELECT t.ID
,B.[value] AS arr
FROM #tbl t
CROSS APPLY OPENJSON(YourJSON)
WITH(arr NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.arr) B;
A rather short approach (but fitting to this simple example only) was this:
SELECT t.ID
,A.*
FROM #tbl t
OUTER APPLY OPENJSON(JSON_QUERY(YourJSON,'$.arr')) A
Hint
JSON support was introduced with SQL-Server 2016
UPDATE: If the JSON's content is a weird CSV-string...
There's a trick to transform a CSV into a JSON-array. Try this
DECLARE #tbl TABLE(ID INT IDENTITY,YourJSON NVARCHAR(MAX));
INSERT INTO #tbl VALUES(N'{ "arr": ["str1 - str2"] }') --weird example...
,(N'{ "arr": ["a","b","c"] }') --array with three elements
,(N'{ "arr": ["x-y-z"] }'); --array with three elements in a weird CSV format
SELECT t.ID
,B.[value] AS arr
,C.[value]
FROM #tbl t
CROSS APPLY OPENJSON(YourJSON)
WITH(arr NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.arr) B
CROSS APPLY OPENJSON('["' + REPLACE(B.[value],'-','","') + '"]') C;
Some simple replacements in OPENJSON('["' + REPLACE(B.[value],'-','","') + '"]') will create a JSON array out of your CSV-string, which can be opened in OPENJSON.

I'm not aware of any way to split a string within JSON. I wonder if the issue is down to your JSON containing a single string rather than multiple values?
The below example shows how to extract each string from the array; and if you wish to go further and split those strings on the hyphen, shows how to do that using SQL's normal SUBSTRING and CHARINDEX functions.
create table [table]
(
value nvarchar(max)
)
insert [table](value)
values ('{ "arr": ["str1 - str2"] }'), ('{ "arr": ["1234 - 5678","abc - def"] }')
SELECT b.value
, rtrim(substring(b.value,1,charindex('-',b.value)-1))
, ltrim(substring(b.value,charindex('-',b.value)+1,len(b.value)))
FROM [table] a
OUTER APPLY OPENJSON(a.value, '$.arr') b
If you want all values in a single column, you can use the string_split function: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-2017
SELECT ltrim(rtrim(c.value))
FROM [table] a
OUTER APPLY OPENJSON(a.value, '$.arr') b
OUTER APPLY STRING_SPLIT(b.value, '-') c

Related

Split string to array using delimiter, getting second to last element in SELECT Statement

Heads!
In my database, I have a column that contains the following data (examples):
H-01-01-02-01
BLE-01-03-01
H-02-05-1.1-03
The task is to get the second to last element of the array if you would split that using the "-" character. The strings are of different length.
So this would be the result using the above mentioned data:
02
03
1.1
Basically I'm searching for an equivalent of the following ruby-statement for use in a Select-Statement in SQL-Server:
"BLE-01-03-01".split("-")[-2]
Is this possible in any way in SQL Server? After spending some time searching for a solution, I only found ones that work for the last or first element.
Thanks very much for any clues or solutions!
PS: Version of SQL Server is Microsoft SQL Server 2012
As an alternative you can try this:.
--A mockup table with some test data to simulate your issue
DECLARE #mockupTable TABLE (ID INT IDENTITY, YourColumn VARCHAR(50));
INSERT INTO #mockupTable VALUES
('H-01-01-02-01')
,('BLE-01-03-01')
,('H-02-05-1.1-03');
--The query
SELECT CastedToXml.value('/x[sql:column("CountOfFragments")-1][1]','nvarchar(10)') AS TheWantedFragment
FROM #mockupTable t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.YourColumn,'-','</x><x>') + '</x>' AS XML))A(CastedToXml)
CROSS APPLY(SELECT CastedToXml.value('count(/x)','int')) B(CountOfFragments);
The idea in short:
The first APPLY will transform the string to a XML like this
<x>H</x>
<x>01</x>
<x>01</x>
<x>02</x>
<x>01</x>
The second APPLY will xquery into this XML to get the count of fragments. As APPLY will add this as a column to the result set, we can use the value using sql:column() to get the wanted fragment by its position.
As I wrote in my comment - using charindex with reverse.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Col Varchar(100)
);
INSERT INTO #T (Col) VALUES
('H-01-01-02-01'),
('BLE-01-03-01'),
('H-02-05-1.1-03');
The query:
SELECT Col,
LEFT(RIGHT(Col, AlmostLastDelimiter-1), AlmostLastDelimiter - LastDelimiter - 1) As SecondToLast
FROM #T
CROSS APPLY (SELECT CharIndex('-', Reverse(Col)) As LastDelimiter) As A
CROSS APPLY (SELECT CharIndex('-', Reverse(Col), LastDelimiter+1) As AlmostLastDelimiter) As B
Results:
Col SecondToLast
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1
Similar to Zohar's solution, but using CTEs instead of CROSS APPLY to prevent redundancy. I personally find this easier to follow, as you can see what happens in each step. Doesn't make it a better solution though ;)
DECLARE #strings TABLE (data VARCHAR(50));
INSERT INTO #strings VALUES ('H-01-01-02-01') , ('BLE-01-03-01'), ('H-02-05-1.1-03');
WITH rev AS (
SELECT
data,
REVERSE(data) AS reversed
FROM
#strings),
first_hyphen AS (
SELECT
data,
reversed,
CHARINDEX('-', reversed) + 1 AS first_pos
FROM
rev),
second_hyphen AS (
SELECT
data,
reversed,
first_pos,
CHARINDEX('-', reversed, first_pos) AS second_pos
FROM
first_hyphen)
SELECT
data,
REVERSE(SUBSTRING(reversed, first_pos, second_pos - first_pos)) AS result
FROM
second_hyphen;
Results:
data result
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1
Try this
declare #input NVARCHAR(100)
declare #dlmt NVARCHAR(3);
declare #pos INT = 2
SET #input=REVERSE(N'H-02-05-1.1-03');
SET #dlmt=N'-';
SELECT
CAST(N'<x>'
+ REPLACE(
(SELECT REPLACE(#input,#dlmt,'#DLMT#') AS [*] FOR XML PATH(''))
,N'#DLMT#',N'</x><x>'
) + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)');

How to remove a string that left of character `;` and the contained string `U` and then display it?

I have a table and the values like this
000001U;000002;000003U;000004;000005U;000006U
and I want display the field is like
000002;000004;
Try This
DECLARE #Table AS TABLE (Data nvarchar(1000))
INSERT INTO #Table
SELECT '000001U;000002;000003U;000004;000005U;000006U'
SELECT STUFF((SELECT '; '+Data
FROM
(
SELECT Split.a.value('.','nvarchar(1000)') AS Data
FROM
(
SELECT
CAST('<S>'+REPLACE(Data,';','</S><S>') +'</S>' AS XML ) AS Data
FROM #Table
)AS A
CROSS APPLY Data.nodes('S') AS Split(a)
)dt
WHERE CHARINDEX('U',Data)=0 FOR XML PATH('')),1,1,'') AS Data
Result
Data
---------
000002; 000004
As mentioned in the comments, SQL Server does not have any native regex replacement support. But, if you can get a dump of your entire table/column, then you can easily do a regex replacement in another tool, such as Notepad++.
Do a find on this pattern:
[0-9]+U;?
And then just replace with empty string. This should leave each row with the data you want to see. Here is a demo showing that this works in Java.
Demo
for SQL Server 2016 and later.
select stuff (
(select ',' + value
from STRING_SPLIT ('000001U;000002;000003U;000004;000005U;000006U', ';')
where right(value, 1) <> 'U'
for xml path('')),
1, 1, '')
for earlier version, you may use any CSV Spliter like this from Jeff Moden http://www.sqlservercentral.com/articles/Tally+Table/72993/
Simple way is to determine the value by IsNumeric function.
DECLARE #GIVEN VARCHAR(MAX)='000001U;000002;000003U;000004;000005U;000006U';
DECLARE #FINAL VARCHAR(MAX)='';
SELECT #FINAL =#FINAL+ case when ISNUMERIC(val)=1 then val+';' else '' end FROM (
SELECT split.x.value('.','varchar(max)') VAL FROM(
SELECT CAST('<M>'+REPLACE(#GIVEN,';','</M><M>')+'</M>' AS XML) AS VAL
)A
CROSS APPLY a.VAL.nodes('/M') as split(x)
)AA
PRINT #FINAL
Result: 000002;000004;

Display a String after specific character

Consider A string has multiple dots but I want to read and display from 5th dot(.) to end of the string.Any single select query can you suggest.
Eg:
I/P
We.are.inserting.a.lot.of.records by using SSIS Package.even.the.records are.not committed.
o/P
of.records by using SSIS Package.even.the.records are.not committed.
Using CHARINDEX, one method:
DECLARE #String varchar(500);
SET #String = 'We.are.inserting.a.lot.of.records by using SSIS Package.even.the.records are.not committed.';
SELECT STUFF(#string, 1, CI5.CI,'')
FROM (VALUES(CHARINDEX('.',#String))) CI1(CI)
CROSS APPLY (VALUES(CHARINDEX('.',#String, CI1.CI+1))) CI2(CI)
CROSS APPLY (VALUES(CHARINDEX('.',#String, CI2.CI+1))) CI3(CI)
CROSS APPLY (VALUES(CHARINDEX('.',#String, CI3.CI+1))) CI4(CI)
CROSS APPLY (VALUES(CHARINDEX('.',#String, CI4.CI+1))) CI5(CI);
Returns: 'of.records by using SSIS Package.even.the.records are.not committed.'
Nesting the CHARINDEX:
SELECT
STUFF(val,1, charindex('.',val,charindex('.',val,charindex('.',val,charindex('.',
val,charindex('.',val)+1)+1)+1)+1), '')
FROM (values('.2.3.4.5.6.7'),('2.3.4.5.6.7'),('abc')) x(val)
This will return the whole string, when the string doesn't contain 5 dots.
You can call a recursice CTE for rescue:
DECLARE #yourString NVARCHAR(MAX)='We.are.inserting.a.lot.of.records by using SSIS Package.even.the.records are.not committed.';
DECLARE #CountDots INT=5;
WITH recCTE AS
(
SELECT #yourString AS Original
,CHARINDEX('.',#yourString) AS PosDot
,1 AS DotCount
UNION ALL
SELECT r.Original
,CHARINDEX('.',#yourString,r.PosDot+1)
,r.DotCount+1
FROM recCTE AS r
WHERE r.DotCount<#CountDots
)
SELECT SUBSTRING(#yourString,(SELECT MAX(PosDot) FROM recCTE)+1,LEN(#yourString))
One advantage is that you can define the count of dots dynamically. Another advantage is that you can fully inline this to any query, VIEW or iTVF.
UPDATE: Set-based approach
DECLARE #yourStringTable TABLE(ID INT IDENTITY,SomeString NVARCHAR(MAX));
INSERT INTO #yourStringTable VALUES
('We.are.inserting.a.lot.of.records by using SSIS Package.even.the.records are.not committed.')
,('1.2.3.4.5.6.7.8');
DECLARE #CountDots INT=5;
WITH recCTE AS
(
SELECT ID
,SomeString AS Original
,CHARINDEX('.',SomeString) AS PosDot
,1 AS DotCount
FROM #yourStringTable
UNION ALL
SELECT r.ID
,r.Original
,CHARINDEX('.',r.Original,r.PosDot+1)
,r.DotCount+1
FROM recCTE AS r
WHERE r.DotCount<#CountDots
)
SELECT ID
,Original
,SUBSTRING(Original,(SELECT MAX(x.posDot) FROM recCTE AS x WHERE x.ID=recCTE.ID)+1,LEN(Original))
FROM recCTE
WHERE PosDot=(SELECT MAX(x.posDot) FROM recCTE AS x WHERE x.ID=recCTE.ID)
For the said string values only patindex() function would be sufficient to read string after few dot(.)s with substring() function
select
substring(I/P, patindex('%[.A-Z].[A-Z][.A-Z].[A-Z]%', I/P)+2, LEN(I/P)) [O/P]
from table

SQL Server Regular expression extract pattern from DB colomn

I have a question about SQL Server: I have a database column with a pattern which is like this:
up to 10 digits
then a comma
up to 10 digits
then a semicolon
e.g.
100000161, 100000031; 100000243, 100000021;
100000161, 100000031; 100000243, 100000021;
and I want to extract within the pattern the first digits (up to 10) (1.) and then a semicolon (4.)
(or, in other words, remove everything from the semicolon to the next semicolon)
100000161; 100000243; 100000161; 100000243;
Can you please advice me how to establish this in SQL Server? Im not very familiar with regex and therefore have no clue how to fix this.
Thanks,
Alex
Try this
Declare #Sql Table (SqlCol nvarchar(max))
INSERT INTO #Sql
SELECT'100000161,100000031;100000243,100000021;100000161,100000031;100000243,100000021;'
;WITH cte
AS (SELECT Row_number()
OVER(
ORDER BY (SELECT NULL)) AS Rno,
split.a.value('.', 'VARCHAR(1000)') AS Data
FROM (SELECT Cast('<S>'
+ Replace( Replace(sqlcol, ';', ','), ',',
'</S><S>')
+ '</S>'AS XML) AS Data
FROM #Sql)AS A
CROSS apply data.nodes('/S') AS Split(a))
SELECT Stuff((SELECT '; ' + data
FROM cte
WHERE rno%2 <> 0
AND data <> ''
FOR xml path ('')), 1, 2, '') AS ExpectedData
ExpectedData
-------------
100000161; 100000243; 100000161; 100000243
I believe this will get you what you are after as long as that pattern truly holds. If not it's fairly easy to ensure it does conform to that pattern and then apply this
Select Substring(TargetCol, 1, 10) + ';' From TargetTable
You can take advantage of SQL Server's XML support to convert the input string into an XML value and query it with XQuery and XPath expressions.
For example, the following query will replace each ; with </b><a> and each , to </a><b> to turn each string into <a>100000161</a><a>100000243</a><a />. After that, you can select individual <a> nodes with /a[1], /a[2] :
declare #table table (it nvarchar(200))
insert into #table values
('100000161, 100000031; 100000243, 100000021;'),
('100000161, 100000031; 100000243, 100000021;')
select
xCol.value('/a[1]','nvarchar(200)'),
xCol.value('/a[2]','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
-------------------------
A1 A2
100000161 100000243
100000161 100000243
value extracts a single value from an XML field. nodes returns a table of nodes that match the XPath expression. The following query will return all "keys" :
select
a.value('.','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
cross apply xCol.nodes('a') as y(a)
where a.value('.','nvarchar(200)')<>''
------------
100000161
100000243
100000161
100000243
With 200K rows of data though, I'd seriously consider transforming the data when loading it and storing it in indivisual, indexable columns, or add a separate, related table. Applying string manipulation functions on a column means that the server can't use any covering indexes to speed up queries.
If that's not possible (why?) I'd consider at least adding a separate XML-typed column that would contain the same data in XML form, to allow the creation of an XML index.

Finding a string in XML column using sql server

I have a table with a xml column.
I require to search for sub string in that xml column for all its node and value. Search should be case insensitive
Structure of XML in each row is different
I used below query to do that,
select * from TableName Where Cast(xmlcolumn as varchar(max) ) like '%searchString%'
this works for short length xml rows, if row length goes huge it cant handle the situation. Only partial of the data was searched.
Suggest me some other ways to achieve.
If this is one time task then I would use exist XML method thus:
DECLARE #Table1 TABLE (
ID INT IDENTITY PRIMARY KEY,
CommentAsXML XML
)
INSERT #Table1 (CommentAsXML)
VALUES (N'<root><item /><item type="Reg">0001</item><item type="Inv">B007</item><item type="Cus">A0001</item><item type="Br">F0001</item></root>')
INSERT #Table1 (CommentAsXML)
VALUES (N'<root><item /><item type="Reg">0005</item><parent><child>B007</child></parent><item type="Br">F0005</item></root>')
INSERT #Table1 (CommentAsXML)
VALUES (N'<root><item /><item type="Reg">0005</item></root>')
-- Following query is searching for B007 within InnerText of all XML elements:
SELECT *
FROM #Table1 t
WHERE t.CommentAsXML.exist('//*[lower-case(text()[1]) eq "b007"]') = 1
Results:
ID CommentAsXML
-- ------------------------------------------------------------------------------------------------------------------------------
1 <root><item type="Reg">0001</item><item type="Inv">B007</item><item type="Cus">A0001</item><item type="Br">F0001</item></root>
2 <root><item type="Reg">0005</item><parent><child>B007</child></parent><item type="Br">F0005</item></root>
Also, if you want to search for some text in XML atrributes' values then following XQuery could be used:
SELECT *
FROM #Table1 t
WHERE t.CommentAsXML.exist('//#*[lower-case(.) eq "reg"]') = 1
Note: in both cases, string constants (ex. "reg") should be with lower cases.

Resources