How do I select the value of an ID element in XPath? - sql-server

Given an XML with the following set up:
<eCrow.CrowGroup CorrelationID="ec367934-e7bd-4213-b0e5-d149c57eec61" >
<eCrow.01>fu</eCrow.01>
<eCrow.02>bar</eCrow.02>
<eCrow.03 CorrelationID="bfe7d35b-bbc1-4591-8d0d-9d42252039bc" >03003</eCrew.03>
</eCrow.CrowGroup>
How do I manage to get XPath to select the CorrelationID from within the node header: <eCrow.CrowGroup CorrelationID="ec367934-e7bd-4213-b0e5-d149c57eec61" >, NOT the CorrelationID from eCrow.03.
In regards to the link suggestion, I am probably doing something wrong but //eCrew.CrewGroup/*#CorrelationID just selects the entire node.

Please try the following.
As already mentioned in the comments, I had to fix your XML to make it well-formed.
XQuery .value() method gives you the answer.
SQL
DECLARE #xml XML =
N'<eCrow.CrowGroup CorrelationID="ec367934-e7bd-4213-b0e5-d149c57eec61">
<eCrow.01>fu</eCrow.01>
<eCrow.02>bar</eCrow.02>
<eCrow.03 CorrelationID="bfe7d35b-bbc1-4591-8d0d-9d42252039bc">03003</eCrow.03>
</eCrow.CrowGroup>';
SELECT #xml.value('(/eCrow.CrowGroup/eCrow.03/#CorrelationID)[1]', 'VARCHAR(30)') AS CorrelationID
, #xml.value('(/eCrow.CrowGroup/eCrow.03/#CorrelationID)[1]', 'uniqueidentifier') AS CorrelationID2;
Output
+--------------------------------+--------------------------------------+
| CorrelationID | CorrelationID2 |
+--------------------------------+--------------------------------------+
| bfe7d35b-bbc1-4591-8d0d-9d4225 | BFE7D35B-BBC1-4591-8D0D-9D42252039BC |
+--------------------------------+--------------------------------------+

Related

Extract data from XML document using t-sql

I have been trying to extract data from the following xml doc using t-sql on sql server 2019.
XML:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://www.URL1.com/1</loc>
<image:image>
<image:loc>https://www.URL1.com/11</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/12</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/13</image:loc>
</image:image>
</url>
<url>
<loc>https://www.URL1.com/2</loc>
<image:image>
<image:loc>https://www.URL1.com/21</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/22</image:loc>
</image:image>
</url>
<url>
<loc>https://www.URL1.com/3</loc>
<image:image>
<image:loc>https://www.URL1.com/32</image:loc>
</image:image>
</url>
</urlset>
I would like to extract data out of the xml document into a SQL Server table. My desired output as below
Desired output:
+------------------------+-------------------------+
| Loc | ImageLoc |
+------------------------+-------------------------+
| https://www.URL1.com/1 | https://www.URL1.com/11 |
| https://www.URL1.com/1 | https://www.URL1.com/12 |
| https://www.URL1.com/1 | https://www.URL1.com/13 |
| https://www.URL1.com/2 | https://www.URL1.com/21 |
| https://www.URL1.com/2 | https://www.URL1.com/22 |
| https://www.URL1.com/3 | https://www.URL1.com/32 |
+------------------------+-------------------------+
My Attempts have been failed so far miserably. I have tried many thing but the only thing that allowed me to get even the Loc element was the following, I have tried using OUTER APPLY/CROSS APPLY to het the ImageLoc with no luck.
My Attempt:
DECLARE #xml XML
SELECT #xml = BulkColumn
FROM OPENROWSET(BULK 'M:\Files\MyXML.xml', SINGLE_BLOB) x
SELECT
t.c.value('(text())[1]', 'VARCHAR(max)') URLs
, t2.i.value('(text())[1]', 'VARCHAR(max)') URLs
FROM #xml.nodes('*:urlset/*:url/*:loc') t(c)
OUTER APPLY #xml.nodes('*:urlset/*:url/*:loc/*:image/*:loc') t2(i)
Could you please help? Thanks in advance
This answer was posted by lptr in the comments as just a link to a fiddle. As the OP has said that it answers their question, and lptr doesn't wish/respond to posting answers, I have migrated it to the answer section.
Here they use the * wildcard rather than defining the namespace to get the values from the XML:
dbfiddle.uk/...
SELECT
t.c.value('(*:loc/text())[1]', 'VARCHAR(max)') URLs
, t2.i.value('(text())[1]', 'VARCHAR(max)') URLs
FROM #xml.nodes('*:urlset/*:url') t(c)
OUTER APPLY t.c.nodes('*:image/*:loc') t2(i);
You need to define your namespace in your SQL as well. This can be done with putting WITH XMLNAMESPACES at the start your query and defining it there. Then you can define the image namespace and prefix it in your references and return the values from the nodes:
WITH XMLNAMESPACES ('xyz' AS image)
SELECT u.i.value('(../loc/text())[1]','varchar(500)') AS loc,
u.i.value('(image:loc/text())[1]','varchar(500)') AS loc
FROM #xml.nodes('urlset/url/image:image') u(i);
db<>fiddle

XPath 'contains()' requires a singleton (or empty sequence)

Given the XML:
<Dial>
<DialID>
24521
</DialID>
<DialName>
Base Price
</DialName>
</Dial>
<Dial>
<DialID>
24528
</DialID>
<DialName>
Rush Options
</DialName>
<DialValue>
1.5
</DialValue>
</Dial>
<Dial>
<DialID>
24530
</DialID>
<DialName>
Bill Rush Charges
</DialName>
<DialValue>
School
</DialValue>
</Dial>
I can use the contains() function in my xpath:
//Dial[DialName[contains(text(), 'Bill')]]/DialValue
To retrieve the values I'm after:
School
The above XML is stored in a field in my SQL database so I'm using the .value method to select from that field.
SELECT Dials.DialDetail.value('(//Dial[DialName[contains(text(), "Bill")]]/DialValue)[1]','VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
I can't seem to get the syntax right though... the xpath works as expected (tested in Oxygen and elsewhere) but when I use it in the XQuery argument of the .value() method, I get an error:
Started executing query at Line 1
Msg 2389, Level 16, State 1, Line 36
XQuery [Dials.DialDetail.value()]: 'contains()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'
Total execution time: 00:00:00.004
I've tried different variations of single and double quotes with no effect. The error refers to an XPath data type for attributes, but I'm not retrieving an attribute; I'm getting the text value. I receive the same error if I type the response with //Dial[DialName[contains(text(), 'Bill')]]/DialValue/text() instead.
What is the correct way to use contains() in an XQuery when it's used in the XML.value() method? Or is this the wrong approach to begin with?
You nearly have it right, you just need [1] on the text() function to guarantee a single value.
You should also use text() on the actual node you are pulling out, for performance reasons.
Also, // can be inefficient, so only use it if you really need recursive descent. You can instead use /*/ to get the first node of any name.
SELECT
Dials.DialDetail.value(
'(//Dial[DialName[contains(text()[1], "Bill")]]/DialValue/text())[1]',
'VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
As Yitzhak Kabinsky notes, this only gets you one value per row of the table, you need .nodes if you want to shred the XML itself into rows.
The difference between your actual database case that fails and your reduced sample case that works is likely one of different data.
The error,
contains() requires a singleton (or empty sequence)
indicates that one of your DialName elements has multiple text node children rather than a single text node child as you're expecting.
You can abstract away such variations by testing the string-value of DialName rather than its text node children:
//Dial[contains(DialName, 'Bill')]/DialValue
See also
Testing text() nodes vs string values in XPath
Here is how to do XML shredding in MS SQL Server correctly.
You need to apply filter in the XQuery .nodes() method.
The .value() method is just for the actual value retrieval.
It is possible to pass SQL Server variable as a parameter instead of the hard-coding "Bill" value.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, DialDetail XML);
INSERT INTO #tbl (DialDetail) VALUES
(N'<Dial>
<DialID>24521</DialID>
<DialName>Base Price</DialName>
</Dial>
<Dial>
<DialID>24528</DialID>
<DialName>Rush Options</DialName>
<DialValue>1.5</DialValue>
</Dial>
<Dial>
<DialID>24530</DialID>
<DialName>Bill Rush Charges</DialName>
<DialValue>School</DialValue>
</Dial>');
-- DDL and sample data population, end
SELECT ID
, c.value('(DialID/text())[1]', 'INT') AS DialID
, c.value('(DialName/text())[1]', 'VARCHAR(30)') AS DialName
, c.value('(DialValue/text())[1]', 'VARCHAR(30)') AS DialValue
FROM #tbl CROSS APPLY DialDetail.nodes('/Dial[contains((DialName/text())[1], "Bill")]') AS t(c);
Output
+----+--------+-------------------+-----------+
| ID | DialID | DialName | DialValue |
+----+--------+-------------------+-----------+
| 1 | 24530 | Bill Rush Charges | School |
+----+--------+-------------------+-----------+

Selecting data from XML

I'm trying to insert a row based on data extracted from a chunk of XML. Some columns need to initialized to node values a couple of nodes "deep" in the XML structure.
I can't seem to the query right. Here's what I got:
declare #xmlRaw xml = '
<LogEntry>
<SummaryMessage>Something bad happened</SummaryMessage>
<Exception>
<Type>System.ApplicationException</Type>
<Message>A test of the error handling</Message>
</Exception>
</LogEntry>'
select
LogEntryColumn.value('SummaryMessage[1]', 'varchar(10)') as SummaryMessage, -- works fine
LogEntryColumn.query('Exception[1]').value('Message[1]', 'varchar(10)') as ExMessage -- not working
from
#xmlRaw.nodes('LogEntry[1]') as LogEntryTable(LogEntryColumn)
This outputs:
SummaryMessage ExMessage
-------------- ----------
Something NULL
I've tried a raft of variations for the "ExMessage" column query but no joy.
Note that I'm using "LogEntryColumn.query(...).value(...)" because I want to check how that form performs versus something like:
select
LogEntryColumn.value('SummaryMessage[1]', 'varchar(10)') as SummaryMessage, -- works fine
ExceptionEntryColumn.value('Message[1]', 'varchar(10)') as ExMessage -- not working
from
#xmlRaw.nodes('LogEntry[1]') as LogEntryTable(LogEntryColumn)
outer apply #xmlData.nodes('LogEntry[1]/Exception') as ExceptionTable(ExceptionEntryColumn)
Basically I'm wondering if multiple "outer apply" from clauses is better/worse than multiple .query(...) invocations.
Here is what you need.
SQL
DECLARE #xmlRaw XML =
N'<LogEntry>
<SummaryMessage>Something bad happened</SummaryMessage>
<Exception>
<Type>System.ApplicationException</Type>
<Message>A test of the error handling</Message>
</Exception>
</LogEntry>';
SELECT c.value('(SummaryMessage/text())[1]', 'varchar(100)') AS SummaryMessage
, c.value('(Exception/Message/text())[1]', 'varchar(100)') AS ExMessage
FROM #xmlRaw.nodes('/LogEntry') AS t(c);
Output
+------------------------+------------------------------+
| SummaryMessage | ExMessage |
+------------------------+------------------------------+
| Something bad happened | A test of the error handling |
+------------------------+------------------------------+

Multiple values from XML column

I am trying to figure out how to get multiple values from multiple nodes of an XML field in a table (actually it's XML stored as text).
I've seen several methods that involve declaring the XML as a variable and using it as a table but I don't see how that would work for me. How to Extract data from xml column in sql 2008
I am currently using .value to get some fields but I don't see how to make it work since there can be multiple LX01_AssignedNumber and I need to get all of the ProcedureModifier from each.
SELECT CAST(xmldata as xml).value('declare namespace ns1="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML";declare namespace ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006";
(/ns1:X12EnrichedMessage/TransactionSet/ns0:X12_00501_837_P/ns0:TS837_2000A_Loop/ns0:TS837_2000B_Loop/ns0:TS837_2300_Loop/ns0:TS837_2400_Loop/ns0:SV1_ProfessionalService/ns0:C003_CompositeMedicalProcedureIdentifier/C00303_ProcedureModifier) [1]', 'varchar(20)') AS RendAttendNPI
FROM EDI_DATA
How do I get all the Line Numbers and all of the Procedure Modifiers from each record?
XML:
<ns1:X12EnrichedMessage xmlns:ns1="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML">
...
<TransactionSet>
<!-- ProcessLogID=PLG0005169955 ;ProcessLogDetailID=PLG0005173285 ;EnvID=1;RetryCount=1 -->
<ns0:X12_00501_837_P xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2000A_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2000B_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2300_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2400_Loop>
<ns0:LX_ServiceLineNumber>
<LX01_AssignedNumber>1</LX01_AssignedNumber>
</ns0:LX_ServiceLineNumber>
<ns0:SV1_ProfessionalService>
<ns0:C003_CompositeMedicalProcedureIdentifier>
<C00301_ProductorServiceIDQualifier>HC</C00301_ProductorServiceIDQualifier>
<C00302_ProcedureCode>26340</C00302_ProcedureCode>
<C00303_ProcedureModifier>AG</C00303_ProcedureModifier>
<C00304_ProcedureModifier>58</C00304_ProcedureModifier>
<C00305_ProcedureModifier>51</C00305_ProcedureModifier>
<C00306_ProcedureModifier>XS</C00306_ProcedureModifier>
</ns0:C003_CompositeMedicalProcedureIdentifier>
<SV102_LineItemChargeAmount>8918</SV102_LineItemChargeAmount>
<SV103_UnitorBasisforMeasurementCode>UN</SV103_UnitorBasisforMeasurementCode>
<SV104_ServiceUnitCount>13</SV104_ServiceUnitCount>
<ns0:C004_CompositeDiagnosisCodePointer>
<C00401_DiagnosisCodePointer>1</C00401_DiagnosisCodePointer>
<C00402_DiagnosisCodePointer>2</C00402_DiagnosisCodePointer>
</ns0:C004_CompositeDiagnosisCodePointer>
</ns0:SV1_ProfessionalService>
<ns0:DTP_SubLoop_2>
<ns0:DTP_Date_ServiceDate>
<DTP01_DateTimeQualifier>472</DTP01_DateTimeQualifier>
<DTP02_DateTimePeriodFormatQualifier>D8</DTP02_DateTimePeriodFormatQualifier>
<DTP03_ServiceDate>20160104</DTP03_ServiceDate>
</ns0:DTP_Date_ServiceDate>
</ns0:DTP_SubLoop_2>
<ns0:REF_SubLoop_7>
<ns0:REF_LineItemControlNumber>
<REF01_ReferenceIdentificationQualifier>6R</REF01_ReferenceIdentificationQualifier>
<REF02_LineItemControlNumber>11453481</REF02_LineItemControlNumber>
</ns0:REF_LineItemControlNumber>
</ns0:REF_SubLoop_7>
</ns0:TS837_2400_Loop>
<ns0:TS837_2400_Loop>
<ns0:LX_ServiceLineNumber>
<LX01_AssignedNumber>2</LX01_AssignedNumber>
</ns0:LX_ServiceLineNumber>
<ns0:SV1_ProfessionalService>
<ns0:C003_CompositeMedicalProcedureIdentifier>
<C00301_ProductorServiceIDQualifier>HC</C00301_ProductorServiceIDQualifier>
<C00302_ProcedureCode>20680</C00302_ProcedureCode>
<C00303_ProcedureModifier>58</C00303_ProcedureModifier>
</ns0:C003_CompositeMedicalProcedureIdentifier>
<SV102_LineItemChargeAmount>1277</SV102_LineItemChargeAmount>
<SV103_UnitorBasisforMeasurementCode>UN</SV103_UnitorBasisforMeasurementCode>
<SV104_ServiceUnitCount>1</SV104_ServiceUnitCount>
<ns0:C004_CompositeDiagnosisCodePointer>
<C00401_DiagnosisCodePointer>3</C00401_DiagnosisCodePointer>
</ns0:C004_CompositeDiagnosisCodePointer>
</ns0:SV1_ProfessionalService>
</ns0:TS837_2400_Loop>
</ns0:TS837_2300_Loop>
</ns0:TS837_2000B_Loop>
</ns0:TS837_2000A_Loop>
</ns0:X12_00501_837_P>
</TransactionSet>
</ns1:X12EnrichedMessage>
Look into SQL Server CROSS APPLY which you can use to shred single XML data into multiple rows, for example :
;WITH XMLNAMESPACES ('http://schemas.microsoft.com/BizTalk/EDI/X12/2006' as ns0
,'http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML' as ns1)
SELECT
TS837_2400_Loop.value('(.//LX01_AssignedNumber)[1]', 'int') 'line_number'
,C00303_ProcedureModifier.value('.', 'varchar(100)') 'procedure_modifier'
FROM EDI_DATA
CROSS APPLY (select CONVERT(XML, xmldata)) as P(X)
CROSS APPLY X.nodes('.//ns0:TS837_2400_Loop') AS Q(TS837_2400_Loop)
CROSS APPLY TS837_2400_Loop.nodes('.//C00303_ProcedureModifier') AS R(C00303_ProcedureModifier)
sqlfiddle demo
output :
| line_number | procedure_modifier |
|-------------|--------------------|
| 1 | AG |
| 2 | 58 |

XQuery vs OpenXML in SQL Server

I have this XML in a SQL Server table:
<root>
<meetings>
<meeting>
<id>111</id>
<participants>
<participant><name>Smith</name></participant>
<participant><name>Jones</name></participant>
<participant><name>Brown</name></participant>
</participants>
</meeting>
<meeting>
<id>222</id>
<participants>
<participant><name>White</name></participant>
<participant><name>Bloggs</name></participant>
<participant><name>McDonald</name></participant>
</participants>
</meeting>
</meetings>
</root>
And want a result set like this:
MeetingID Name
111 Smith
111 Jones
111 Brown
222 White
222 Bloggs
222 McDonald
This is easy using select from openxml but I failed using XQuery. Can someone help me there, and maybe also give pros and cons for either method?
Once you've fixed your invalid XML (the <name> elements need to be ended with a </name> end tag), you should be able to use this:
SELECT
Meetings.List.value('(id)[1]', 'int') AS 'Meeting ID',
Meeting.Participant.value('(name)[1]', 'varchar(50)') AS 'Name'
FROM
#input.nodes('/root/meetings/meeting') AS Meetings(List)
CROSS APPLY
Meetings.List.nodes('participants/participant') AS Meeting(Participant)
Basically, the first call to .nodes() gives you a pseudo-table of all <meeting> nodes, from which I extract the meeting ID.
The second .nodes() call on that <meeting> tag digs deeper into the <participants>/<participant> list of subnodes and extracts the name from those nodes.
This may give you the XQuery based Table based output.
(: Assume $x is your Xml Content. so :)
let $x := Assign your Xml Content.
let $d1:= <table border="1"><tr><td>Meeting</td><td> Participant</td></tr>
{ for $p in $x//meeting/participants/participant
return element{'tr'} {
element{'td'} {$p/parent::*/parent::*/id/text()},
element{'td'} {data($p)}
}
}
</table>

Resources