Parsing a xml string without the root elements using T-SQL - sql-server

Below is the XML string stored in a column in my database table and I need to parse it using T-SQL
Installation type:Interior
Wiring Length (ft):4
Connector type:RJ45
Location description:basement
WirelessID:1
DevID:1234567
wontTurnOff:true
How do I parse this since they're missing the root and child tags?
Thanks.

It appears that the data format in your table column is not an XML format. It normally looks like the following and the column type will be set as "XML". Note that there is a hierarchy and each item associated with the book has a beginning and end tag. There can be multiple books under catalog.
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
However, having pointed out the above, I'm assuming that you probably are dealing with something somewhat different that may have initially of been extracted for an XML source and placed into the format that you've provided. If the format and seven items to parse out are consistent (with different values from one record to the next in the table), then here is an alternative that may assist. This is an example that you can copy into a query and test and use as a reference in the final SQL that will probably need to include a conversion to integer for some of your numeric values parsed out. This example will accommodate the differing length of the values associated with each item. However, in order for this to work, the items must follow in the same order as you provided in your example.
DECLARE #string varchar(max)
SET #string = 'Installation type:Interior
Wiring Length (ft):4
Connector type:RJ45
Location description:basement
WirelessID:1
DevID:1234567
wontTurnOff:true'
DECLARE #Installation_type varchar(50)
,#Wiring_Length varchar(10)
,#Connector_type varchar(10)
,#Location_Description varchar(100)
,#WirelessID varchar(10)
,#DevID varchar(10)
,#wontTurnOff varchar(5)
SELECT SUBSTRING(#string, charindex('Installation type:', #string)+18, (charindex('Wiring Length (ft):', #string)-2) - (charindex('Installation type:', #string)+18))
SELECT SUBSTRING(#string, charindex('Wiring Length (ft):', #string)+19, (charindex('Connector type:', #string)-2) - (charindex('Wiring Length (ft):', #string)+19))
SELECT SUBSTRING(#string, charindex('Connector type:', #string)+15, (charindex('Location description:', #string)-2) - (charindex('Connector type:', #string)+15))
SELECT SUBSTRING(#string, charindex('Location description:', #string)+21, (charindex('WirelessID:', #string)-2) - (charindex('Location description:', #string)+21))
SELECT SUBSTRING(#string, charindex('WirelessID:', #string)+11, (charindex('DevID:', #string)-2) - (charindex('WirelessID:', #string)+11))
SELECT SUBSTRING(#string, charindex('DevID:', #string)+6, (charindex('wontTurnOff:', #string)-2) - (charindex('DevID:', #string)+6))
SELECT SUBSTRING(#string, charindex('wontTurnOff:', #string)+12, (len(#string)+1) - (charindex('wontTurnOff:', #string)+12))
It is common for data to arrive in all kinds of strange formats and sometimes they are fragmented. If your data is truly XML and needs to be parsed as such, then you assign your values to a temporary varchar(max) variable, perform find and replaces within it for each item (for example convert "WirelessID:1" to "1">) concatenate the entire value in a topmost hierarchy XML Tag like "" used in the initial XML example, convert the entire varchar(max) value to an XML and then apply the XML parse function included with SQL.
Hope this helps. Without seeing your data I can only suggest the two options. I've been doing ETL development for many years and it is common to be required to use every trick in the book to resolve data formatting issues when there is no control over that source.

Related

How do I perform a string function whilst using OPENJSON WITH?

In this example, DT_RowId is a concatenated string. I need to extract out its values, and make them available in a WHERE clause (not shown).
Is there a way to perform string functions on a value as part of a FROM OPENJSON WITH?
Is there a proper way to extract concatenated strings from a value without using a clunky SELECT statement?
Side note: This example is REALLY part of an UPDATE statement, so I'd be using the extracted values in the WHERE clause (not shown here). Also, also: Split is a custom string function we have.
BTW: I have full control of that DT_RowId, and i could make it an array, for example, [42, 1, 1]
declare #jsonRequest nvarchar(max) = '{"DT_RowId":"42_1_14","Action":"edit","Schedule":"1","Slot":"1","Period":"9:00 to 9:30 UPDATED","AMOnly":"0","PMOnly":"0","AllDay":"1"}'
select
(select Item from master.dbo.Split(source.DT_RowId, '_', 0) where ItemIndex = 0) as ID
,source.Schedule
,source.Slot
,source.[Period]
,source.AllDay
,source.PMOnly
,source.AMOnly
from openjson(#jsonRequest, '$')
with
(
DT_RowId varchar(255) '$.DT_RowId' /*concatenated string of row being edited */
,Schedule tinyint '$.Schedule'
,Slot tinyint '$.Slot'
,[Period] varchar(20) '$.Period'
,AllDay bit '$.AllDay'
,PMOnly bit '$.PMOnly'
,AMOnly bit '$.AMOnly'
) as source
Using SQL-Server 2016+ offers a nice trick to split a string fast and position-aware:
select
DTRow.AsJson as DTRow_All_Content
,JSON_VALUE(DTRow.AsJson,'$[0]') AS DTRow_FirstValue
,source.Schedule
,source.Slot
,source.[Period]
,source.AllDay
,source.PMOnly
,source.AMOnly
from openjson(#jsonRequest, '$')
with
(
DT_RowId varchar(255) '$.DT_RowId' /*concatenated string of row being edited */
,Schedule tinyint '$.Schedule'
,Slot tinyint '$.Slot'
,[Period] varchar(20) '$.Period'
,AllDay bit '$.AllDay'
,PMOnly bit '$.PMOnly'
,AMOnly bit '$.AMOnly'
) as source
OUTER APPLY(SELECT CONCAT('["',REPLACE([source].DT_RowId,'_','","'),'"]')) DTRow(AsJson);
The magic is the transformation of 42_1_14 to ["42","1","14"] with some simple string methods. With this you can use JSON_VALUE() to fetch an item by its position.
General hint: If you have full control of DT_RowId you should rather create this JSON array right from the start and avoid hacks while reading this...
update
Just to demonstrate how this would run, if the value was a JSON-array, check this out:
declare #jsonRequest nvarchar(max) = '{"DT_RowId":["42","1","14"]}'
select
source.DT_RowId as DTRow_All_Content
,JSON_VALUE(source.DT_RowId,'$[0]') AS DTRow_FirstValue
from openjson(#jsonRequest, '$')
with
(
DT_RowId NVARCHAR(MAX) AS JSON
) as source;
update 2
Just to add a little to your self-answer:
We must think of JSON as a special string. As there is no native JSON data type, the engine does not know, when the string is a string, and when it is JSON.
Using NVARCHAR(MAX) AS JSON in the WITH-clause allows to deal with the return value again with JSON methods. For example, we could use CROSS APPLY OPENJSON(UseTheValueHere) to dive into nested lists and objects.
Actually there's no need to use this at all. If there are no repeating elements, one could just parse all the values directly:
SELECT JSON_VALUE(#jsonRequest,'$.DT_RowId[0]') AS DTRowId_1
,JSON_VALUE(#jsonRequest,'$.Action') AS [Action]
--and so on...
But this would mean to parse the JSON over and over, which is very expensive.
Using OPENJSON means to read the whole JSON in one single pass (on the current level) and return the elements found (with or without a JSON path) in a derived set (one row for each element).
The WITH-clause is meant to perform kind of PIVOT-action and returns the elements as a multi-column-set. The additional advantage is, that you can specify the data type and - if necessary - a differing JSON path and the column's alias.
You can use any valid JSON path (as well in the WITH-clause as in JSON_VALUE() or in many other places). That means that there are several ways to get the same result. Understanding how the engine works, will enable you to find the most performant approach.
OP here. Just expanding on the answer I accepted by Shnugo, with some details and notes... Hopefully all this might help somebody out there.
I am going to make DT_RowId an array
I will use AS JSON for DT_RowId in the OPENJSON WITH statement
I can then treat it as a json structure, and use JSON_VALUE to extract a value at a specific index
declare #jsonRequest nvarchar(max) = '{"DT_RowId":["42", "1", "14"],"Action":"edit","Schedule":"1","Slot":"1","Period":"9:00 to 9:30 UPDATED","AMOnly":"0","PMOnly":"0","AllDay":"1"}'
select
source.DT_RowId as DTRowId_FULL_JSON_Struct /*the full array*/
,JSON_VALUE(source.DT_RowId,'$[0]') AS JSON_VAL_0 /*extract value at index 0 from json structure*/
,JSON_VALUE(source.DT_RowId,'$[1]') AS JSON_VAL_1 /*extract value at index 1 from json structure*/
,JSON_VALUE(source.DT_RowId,'$[2]') AS JSON_VAL_2 /*extract value at index 2 from json structure*/
,source.DT_RowId_Index0 /*already extracted*/
,source.DT_RowId_Index1 /*already extracted*/
,source.DT_RowId_Index2 /*already extracted*/
,source.Schedule
,source.Slot
,source.Period
,source.AllDay
,source.PMOnly
,source.AMOnly
from openjson(#jsonRequest, '$')
with
(
DT_RowId nvarchar(max) as json /*format as json; do the rest in the SELECT statement*/
,DT_RowId_Index0 varchar(2) '$.DT_RowId[0]' /*When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.*/
,DT_RowId_Index1 varchar(2) '$.DT_RowId[1]' /*When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.*/
,DT_RowId_Index2 varchar(2) '$.DT_RowId[2]' /*When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.*/
,Schedule tinyint '$.Schedule'
,Slot tinyint '$.Slot'
,[Period] varchar(20) '$.Period'
,AllDay bit '$.AllDay'
,PMOnly bit '$.PMOnly'
,AMOnly bit '$.AMOnly'
) as source

Parse XML using SQL

I'm using MS SQL2016 and I have an XML file that I need to parse to put various data elements into the separate fields. For the most part everything works find except I need a little help to identify a particular node value. If I have (I put only a snippet of the xml here but it does show the problem)
DECLARE #xmlString xml
SET #xmlString ='<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">25685064</PMID>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Electronic">1234-5678</ISSN>
<ISSN IssnType="Print">1475-2867</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>15</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2015</Year>
</PubDate>
</JournalIssue>
</Journal>
</Article>
</MedlineCitation>
</PubmedArticle>'
select
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN
from #xmlString.nodes ('//MedlineCitation[1]') as R(nref)
I bypass the second ISSNType and read the first value available. I need to pull both values. What do I need to change? Thanks
You can read as second column:
SELECT
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN,
nref.value('Article[1]/Journal[1]/ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]') as R(nref)
Or
SELECT
nref.value('ISSN[1]','varchar(max)') ISSN,
nref.value('ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]') as R(nref)
Or as a separate row:
SELECT nref.value('.','varchar(MAX)') ISSN
from #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)
Update
If number of ISSNs may vary, I recommend normalize your resultset:
SELECT
nref.value('.','varchar(MAX)') Issn,
nref.value('#IssnType','varchar(MAX)') IssnType
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)

Trying to transfer xml data into SQL as elements (not attribute)

I am new at this so please bear with me. I am attempting to transfer some XML data into Microsoft SQL Server. I am assuming that this data needs to be transferred as elements and not attributes because the contents of the columns will not be static.
However for some reason when I attempt to transfer the data as elements I get NULL values. But when I try to transfer this same data as attributes it works and looks the way it is supposed to. I am tempted to shrug and just move on but I'm worried that things might go awry for me if I do that later on down the road.
I already have some attributes from this XML that I managed to transfer as attributes which I plan to combine with these elements that are masquerading as attributes into a single table. Will it work? And if it does will there be problems down the road?
Here is my SQL code when I attempt to transfer the elements as elements:
SELECT *
FROM OPENXML (#hdoc, '/roll/voter', 2)
WITH (
id int,
[value] char(50),
[state] char(2))
Here is my SQL code when I attempt to transfer the elements as attributes:
SELECT *
FROM OPENXML (#hdoc, '/roll/voter', 1)
WITH (
id int,
[value] char(50),
[state] char(2))
Here is a miniaturized version of the XML document:
<roll>
<voter id="400048" value="Yea" state="FL" />
<voter id="412516" value="Yea" state="CA" />
</roll>
Here is a link to the xml document via google drive (very small XML): https://drive.google.com/open?id=0B5VgOwWcGeLHaWctRU56Qlk3UWM
A screenshot of my SQL query, the table results, and the XML
FROM OPENXML is outdated and should not be used anymore (rare exceptions exist)...
Try with the real XML methods:
DECLARE #xml XML=
N'<roll>
<voter id="400048" value="Yea" state="FL" />
<voter id="412516" value="Yea" state="CA" />
</roll>';
SELECT #xml.value(N'(/roll/voter/#id)[1]',N'int') AS voter_id
,#xml.value(N'(/roll/voter/#value)[1]',N'nvarchar(max)') AS voter_value
,#xml.value(N'(/roll/voter/#state)[1]',N'nvarchar(max)') AS voter_state
The result
voter_id voter_value voter_state
400048 Yea FL

SQL Server XQuery - Selecting a Subset

Take for example the following XML:
Initial Data
<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<isbn>9999999999999</isbn>
<pages>500</pages>
<backing>paperback</backing>
</computer_book>
and:
<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
<pages>275</pages>
<backing>paperback</backing>
</cooking_book>
I have something similar in a single xml-typed column of a SQL Server 2008 database. Using SQL Server XQuery, would it be possible to get results such as this:
Resulting Data
<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<pages>500</pages>
</computer_book>
and:
<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
</cooking_book>
Please note that I am not referring to selecting both examples in one query; rather I am selecting each via its primary key (which is in another column). In each case, I am essentially trying to select the root and an arbitrary subset of children. The roots can be different, as seen above, so I do not believe I can hard-code the root node name into a "for xml" clause.
I have a feeling SQL Server's XQuery capabilities will not allow this, and that is fine if it is the case. If I can accomplish this, however, I would greatly appreciate an example.
Here is the test data I used in the queries below:
declare #T table (XMLCol xml)
insert into #T values
('<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<isbn>9999999999999</isbn>
<pages>500</pages>
<backing>paperback</backing>
</computer_book>'),
('<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
<pages>275</pages>
<backing>paperback</backing>
</cooking_book>')
You can filter the nodes under to root node like this using local-name() and a list of the node names you want:
select XMLCol.query('/*/*[local-name()=("isbn","pages")]')
from #T
Result:
<isbn>9999999999999</isbn><pages>500</pages>
<isbn>5555555555555</isbn><pages>275</pages>
If I understand you correctly the problem with this is that you don't get the root node back.
This query will give you an empty root node:
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'/>' as xml)
from #T
Result:
<computer_book />
<cooking_book />
From this I have found two solutions for you.
Solution 1
Get the nodes from your table to a table variable and then modify the XML to look like you want.
-- Table variable to hold the node(s) you want
declare #T2 table (RootNode xml, ChildNodes xml)
-- Fetch the xml from your table
insert into #T2
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'/>' as xml),
XMLCol.query('/*/*[local-name()=("isbn","pages")]')
from #T
-- Add the child nodes to the root node
update #T2 set
RootNode.modify('insert sql:column("ChildNodes") into (/*)[1]')
-- Fetch the modified XML
select RootNode
from #T2
Result:
RootNode
<computer_book><isbn>9999999999999</isbn><pages>500</pages></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages></cooking_book>
The sad part with this solution is that it does not work with SQL Server 2005.
Solution 2
Get the parts, build the XML as a string and cast it back to XML.
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>'+
cast(XMLCol.query('/*/*[local-name()=("isbn","pages")]') as varchar(max))+
'</'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>' as xml)
from #T
Result:
<computer_book><isbn>9999999999999</isbn><pages>500</pages></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages></cooking_book>
Making the nodes parameterized
In the queries above the nodes you get as child nodes is hard coded in the query. You can use sql:varaible() to do this instead. I have not found a way of making the number of nodes dynamic but you can add as many as you think you need and have null as value for the nodes you don't need.
declare #N1 varchar(10)
declare #N2 varchar(10)
declare #N3 varchar(10)
declare #N4 varchar(10)
set #N1 = 'isbn'
set #N2 = 'pages'
set #N3 = 'backing'
set #N4 = null
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>'+
cast(XMLCol.query('/*/*[local-name()=(sql:variable("#N1"),
sql:variable("#N2"),
sql:variable("#N3"),
sql:variable("#N4"))]') as varchar(max))+
'</'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>' as xml)
from #T
Result:
<computer_book><isbn>9999999999999</isbn><pages>500</pages><backing>paperback</backing></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages><backing>paperback</backing></cooking_book>

Using SQL Server 2005's XQuery select all nodes with a specific attribute value, or with that attribute missing

Update: giving a much more thorough example.
The first two solutions offered were right along the lines of what I was trying to say not to do. I can't know location, it needs to be able to look at the whole document tree. So a solution along these lines, with /Books/ specified as the context will not work:
SELECT x.query('.') FROM #xml.nodes('/Books/*[not(#ID) or #ID = 5]') x1(x)
Original question with better example:
Using SQL Server 2005's XQuery implementation I need to select all nodes in an XML document, just once each and keeping their original structure, but only if they are missing a particular attribute, or that attribute has a specific value (passed in by parameter). The query also has to work on the whole XML document (descendant-or-self axis) rather than selecting at a predefined depth.
That is to say, each individual node will appear in the resultant document only if it and every one of its ancestors are missing the attribute, or have the attribute with a single specific value.
For example:
If this were the XML:
DECLARE #Xml XML
SET #Xml =
N'
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
<Novel category="4">Novel4</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
<Volume category="3">M-S</Volume>
<Volume category="4">T-Z</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
'
A parameter of 1 for category would result in this:
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
A parameter of 2 for category would result in this:
<Library>
<Novels>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
</Encyclopedia>
</Encyclopedias>
</Library>
I know XSLT is perfectly suited for this job, but it's not an option. We have to accomplish this entirely in SQL Server 2005. Any implementations not using XQuery are fine too, as long as it can be done entirely in T-SQL.
It's not clear for me from your example what you're actually trying to achieve. Do you want to return a new XML with all the nodes stripped out except those that fulfill the condition? If yes, then this looks like the job for an XSLT transform which I don't think it's built-in in MSSQL 2005 (can be added as a UDF: http://www.topxml.com/rbnews/SQLXML/re-23872_Performing-XSLT-Transforms-on-XML-Data-Stored-in-SQL-Server-2005.aspx).
If you just need to return the list of nodes then you can use this expression:
//Book[not(#ID) or #ID = 5]
but I get the impression that it's not what you need. It would help if you can provide a clearer example.
Edit: This example is indeed more clear. The best that I could find is this:
SET #Xml.modify('delete(//*[#category!=1])')
SELECT #Xml
The idea is to delete from the XML all the nodes that you don't need, so you remain with the original structure and the needed nodes. I tested with your two examples and it produced the wanted result.
However modify has some restrictions - it seems you can't use it in a select statement, it has to modify data in place. If you need to return such data with a select you could use a temporary table in which to copy the original data and then update that table. Something like this:
INSERT INTO #temp VALUES(#Xml)
UPDATE #temp SET data.modify('delete(//*[#category!=2])')
Hope that helps.
The question is not really clear, but is this what you're looking for?
DECLARE #Xml AS XML
SET #Xml =
N'
<Books>
<Book ID="1">Book1</Book>
<Book ID="2">Book2</Book>
<Book ID="3">Book3</Book>
<Book>Book4</Book>
<Book ID="5">Book5</Book>
<Book ID="6">Book6</Book>
<Book>Book7</Book>
<Book ID="8">Book8</Book>
</Books>
'
DECLARE #BookID AS INT
SET #BookID = 5
DECLARE #Result AS XML
SET #result = (SELECT #xml.query('//Book[not(#ID) or #ID = sql:variable("#BookID")]'))
SELECT #result

Resources