T-SQL, how to parse this XML? - sql-server

I've spent hours trying to parse this XML (bus stop schedule) and produce a recordset with , . Is there a way to convert XML to JSON, which I find is easier to handle?
Anyone willing to help? (Azure SQL Server)
<?xml version="1.0" encoding="UTF-8"?>
<Trias xmlns="http://www.vdv.de/trias" version="1.1">
<ServiceDelivery>
<ResponseTimestamp xmlns="http://www.siri.org.uk/siri">2021-11-25T17:52:12Z</ResponseTimestamp>
<DeliveryPayload>
<StopEventResponse>
<StopEventResult>
<StopEvent>
<ThisCall>
<CallAtStop>
<ServiceDeparture>
<TimetabledTime>2021-11-25T17:53:00Z</TimetabledTime>
<EstimatedTime>2021-11-25T17:53:00Z</EstimatedTime>
</ServiceDeparture>
</CallAtStop>
</ThisCall>
<Service>
<PublishedLineName>
<Text>58</Text>
<Language>de</Language>
</PublishedLineName>
</Service>
</StopEvent>
</StopEventResult>
<StopEventResult>
<StopEvent>
<ThisCall>
<CallAtStop>
<ServiceDeparture>
<TimetabledTime>2021-11-25T17:58:00Z</TimetabledTime>
<EstimatedTime>2021-11-25T17:58:00Z</EstimatedTime>
</ServiceDeparture>
</CallAtStop>
</ThisCall>
<Service>
<PublishedLineName>
<Text>60</Text>
<Language>de</Language>
</PublishedLineName>
</Service>
</StopEvent>
</StopEventResult>
</StopEventResponse>
</DeliveryPayload>
</ServiceDelivery>
</Trias>

A minimal reproducible example was not provided.
So shooting from the hip.
There is no need for any XML parsing. SQL Server comes with the built-in XQuery language support to handle XML data type.
The only nuance is that the input XML has namespaces.
A default namespace is declared by using XMLNAMESPACES() clause.
A couple of XQuery methods are in use: .nodes() and .value()
SQL
DECLARE #xml XML =
N'<Trias xmlns="http://www.vdv.de/trias" version="1.1">
<ServiceDelivery>
<ResponseTimestamp xmlns="http://www.siri.org.uk/siri">2021-11-25T17:52:12Z</ResponseTimestamp>
<DeliveryPayload>
<StopEventResponse>
<StopEventResult>
<StopEvent>
<ThisCall>
<CallAtStop>
<ServiceDeparture>
<TimetabledTime>2021-11-25T17:53:00Z</TimetabledTime>
<EstimatedTime>2021-11-25T17:53:00Z</EstimatedTime>
</ServiceDeparture>
</CallAtStop>
</ThisCall>
<Service>
<PublishedLineName>
<Text>58</Text>
<Language>de</Language>
</PublishedLineName>
</Service>
</StopEvent>
</StopEventResult>
<StopEventResult>
<StopEvent>
<ThisCall>
<CallAtStop>
<ServiceDeparture>
<TimetabledTime>2021-11-25T17:58:00Z</TimetabledTime>
<EstimatedTime>2021-11-25T17:58:00Z</EstimatedTime>
</ServiceDeparture>
</CallAtStop>
</ThisCall>
<Service>
<PublishedLineName>
<Text>60</Text>
<Language>de</Language>
</PublishedLineName>
</Service>
</StopEvent>
</StopEventResult>
</StopEventResponse>
</DeliveryPayload>
</ServiceDelivery>
</Trias>';
;WITH XMLNAMESPACES(DEFAULT 'http://www.vdv.de/trias')
SELECT c.value('(ThisCall/CallAtStop/ServiceDeparture/TimetabledTime/text())[1]', 'DATETIMEOFFSET(0)') AS TimetabledTime
, c.value('(ThisCall/CallAtStop/ServiceDeparture/EstimatedTime/text())[1]', 'DATETIMEOFFSET(0)') AS EstimatedTime
, c.value('(Service/PublishedLineName/Text/text())[1]', 'VARCHAR(100)') AS [Text]
, c.value('(Service/PublishedLineName/Language/text())[1]', 'CHAR(2)') AS [Language]
FROM #xml.nodes('/Trias/ServiceDelivery/DeliveryPayload/StopEventResponse/StopEventResult/StopEvent') AS t(c);
Output
+----------------------------+----------------------------+------+----------+
| TimetabledTime | EstimatedTime | Text | Language |
+----------------------------+----------------------------+------+----------+
| 2021-11-25 17:53:00 +00:00 | 2021-11-25 17:53:00 +00:00 | 58 | de |
| 2021-11-25 17:58:00 +00:00 | 2021-11-25 17:58:00 +00:00 | 60 | de |
+----------------------------+----------------------------+------+----------+

Related

Extract data from XML document using t-sql

I have been trying to extract data from the following xml doc using t-sql on sql server 2019.
XML:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://www.URL1.com/1</loc>
<image:image>
<image:loc>https://www.URL1.com/11</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/12</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/13</image:loc>
</image:image>
</url>
<url>
<loc>https://www.URL1.com/2</loc>
<image:image>
<image:loc>https://www.URL1.com/21</image:loc>
</image:image>
<image:image>
<image:loc>https://www.URL1.com/22</image:loc>
</image:image>
</url>
<url>
<loc>https://www.URL1.com/3</loc>
<image:image>
<image:loc>https://www.URL1.com/32</image:loc>
</image:image>
</url>
</urlset>
I would like to extract data out of the xml document into a SQL Server table. My desired output as below
Desired output:
+------------------------+-------------------------+
| Loc | ImageLoc |
+------------------------+-------------------------+
| https://www.URL1.com/1 | https://www.URL1.com/11 |
| https://www.URL1.com/1 | https://www.URL1.com/12 |
| https://www.URL1.com/1 | https://www.URL1.com/13 |
| https://www.URL1.com/2 | https://www.URL1.com/21 |
| https://www.URL1.com/2 | https://www.URL1.com/22 |
| https://www.URL1.com/3 | https://www.URL1.com/32 |
+------------------------+-------------------------+
My Attempts have been failed so far miserably. I have tried many thing but the only thing that allowed me to get even the Loc element was the following, I have tried using OUTER APPLY/CROSS APPLY to het the ImageLoc with no luck.
My Attempt:
DECLARE #xml XML
SELECT #xml = BulkColumn
FROM OPENROWSET(BULK 'M:\Files\MyXML.xml', SINGLE_BLOB) x
SELECT
t.c.value('(text())[1]', 'VARCHAR(max)') URLs
, t2.i.value('(text())[1]', 'VARCHAR(max)') URLs
FROM #xml.nodes('*:urlset/*:url/*:loc') t(c)
OUTER APPLY #xml.nodes('*:urlset/*:url/*:loc/*:image/*:loc') t2(i)
Could you please help? Thanks in advance
This answer was posted by lptr in the comments as just a link to a fiddle. As the OP has said that it answers their question, and lptr doesn't wish/respond to posting answers, I have migrated it to the answer section.
Here they use the * wildcard rather than defining the namespace to get the values from the XML:
dbfiddle.uk/...
SELECT
t.c.value('(*:loc/text())[1]', 'VARCHAR(max)') URLs
, t2.i.value('(text())[1]', 'VARCHAR(max)') URLs
FROM #xml.nodes('*:urlset/*:url') t(c)
OUTER APPLY t.c.nodes('*:image/*:loc') t2(i);
You need to define your namespace in your SQL as well. This can be done with putting WITH XMLNAMESPACES at the start your query and defining it there. Then you can define the image namespace and prefix it in your references and return the values from the nodes:
WITH XMLNAMESPACES ('xyz' AS image)
SELECT u.i.value('(../loc/text())[1]','varchar(500)') AS loc,
u.i.value('(image:loc/text())[1]','varchar(500)') AS loc
FROM #xml.nodes('urlset/url/image:image') u(i);
db<>fiddle

Processing XML prolog by SQL Server XML functions

I have a large database table with an XML column. The XML contents is a kind of document like as below:
<?int-dov version="1.0" encoding="UTF-8" standalone="no"?>
<ds:datastoreItem ds:itemID="{F8484AF4-73BF-45CA-A524-0D796F244F37}" xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml"><ds:schemaRefs><ds:schemaRef ds:uri="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/></ds:schemaRefs></ds:datastoreItem>
I'm seeking a function or fast way to fetch standalone attribute value in a T-SQL query. When I run the below query:
select XmlContent.query('#standalone') from XmlDocuments
I get this error message:
Msg 2390, Level 16, State 1, Line 4
XQuery [XmlDocuments.XmlContent.query()]: Top-level attribute nodes are not supported
So, I would be appreciated if anybody gives me a solution to address this problem.
You can use the processing-instruction() function to get that.
SELECT #xml.value('./processing-instruction("int-dov")[1]','nvarchar(max)')
Result
version="1.0" encoding="UTF-8" standalone="no"
If you want to get just the standalone part, the only way I've found is to construct an XML node from it:
SELECT CAST(
N'<x ' +
#xml.value('./processing-instruction("int-dov")[1]','nvarchar(max)') +
N' />' AS xml).value('x[1]/#standalone','nvarchar(10)'
Result
no
db<>fiddle
Just to complement #Charlieface answer. All credit goes to him.
SQL
DECLARE #xml XML =
N'<?int-dov version="1.0" encoding="UTF-8" standalone="no"?>
<ds:datastoreItem ds:itemID="{F8484AF4-73BF-45CA-A524-0D796F244F37}"
xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml">
<ds:schemaRefs>
<ds:schemaRef ds:uri="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/>
</ds:schemaRefs>
</ds:datastoreItem>';
SELECT col.value('x[1]/#standalone','nvarchar(10)') AS [standalone]
, col.value('x[1]/#encoding','nvarchar(10)') AS [encoding]
, col.value('x[1]/#version','nvarchar(10)') AS [version]
FROM (VALUES(CAST(N'<x ' +
#xml.value('./processing-instruction("int-dov")[1]','nvarchar(max)') +
N' />' AS xml))
) AS tab(col);
Output
+------------+----------+---------+
| standalone | encoding | version |
+------------+----------+---------+
| no | UTF-8 | 1.0 |
+------------+----------+---------+

Convert Excel Exponential Format back to its text in SQL Server 2008 R2

First off I am have a constraint the my solution must work from SQL Server 2008 R2
The problem that I'm trying to solve is that Excel converts the text value '002E9' to 2.00E+09. The task is to pass the original value '002E9' as text into a CSV file.
I have been passed a SSIS solution by a developer that has a the conversion as a SQL function. They have used
SELECT FORMAT(CAST(2.00E+09 AS FLOAT),'0E0');
This is fine in 2012 and above but does not work in SQL Server 2008 R2.
Is there a simple alternative? I'm happy to abandon SQL for a SSIS script if that's the best advice.
FORMAT doesn't exist in SQL Server 2008; but it's use is best avoided any way; it's an awfully slow function.
You can use CONVERT and the style 0 though:
SELECT REPLACE(CONVERT(varchar(10),CAST(2.00E+09 AS float),0),'+','');
This won't, however, give exactly the same format, and would return '2e009'. Based on the fact that you use the value '0E0' for the FORMAT function though (which would return '2E9' for your example value), I assume this is permissible.
Based upon the post Larnu made I arrived at this (note the REPLICATE function for getting the correct format from the stripped down string):
DECLARE #INPUTS AS table
(input_val varchar(100))
INSERT INTO #INPUTS
VALUES
('00923'),('00234'),('00568'),('00123'),('2.00E+09' ),('2.00E+34' ),('00RT1'),('001TL')
SELECT input_val
,REPLACE(REPLACE(REPLACE(input_val,'+',''),'0',''),'.','') paired_value
,REPLICATE('0',5-LEN(REPLACE(REPLACE(REPLACE(input_val,'+',''),'0',''),'.','')))
+REPLACE(REPLACE(REPLACE(input_val,'+',''),'0',''),'.','')+';' Converted_value
FROM #INPUTS
The results:
+-----------+--------------+-----------------+
| input_val | paired_value | Converted_value |
+-----------+--------------+-----------------+
| 00923 | 923 | 00923; |
| 00234 | 234 | 00234; |
| 00568 | 568 | 00568; |
| 00123 | 123 | 00123; |
| 2.00E+09 | 2E9 | 002E9; |
| 2.00E+34 | 2E34 | 02E34; |
| 00RT1 | RT1 | 00RT1; |
| 001TL | 1TL | 001TL; |
+-----------+--------------+-----------------+
Confirms the approach.
Thanks Larnu.

Multiple values from XML column

I am trying to figure out how to get multiple values from multiple nodes of an XML field in a table (actually it's XML stored as text).
I've seen several methods that involve declaring the XML as a variable and using it as a table but I don't see how that would work for me. How to Extract data from xml column in sql 2008
I am currently using .value to get some fields but I don't see how to make it work since there can be multiple LX01_AssignedNumber and I need to get all of the ProcedureModifier from each.
SELECT CAST(xmldata as xml).value('declare namespace ns1="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML";declare namespace ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006";
(/ns1:X12EnrichedMessage/TransactionSet/ns0:X12_00501_837_P/ns0:TS837_2000A_Loop/ns0:TS837_2000B_Loop/ns0:TS837_2300_Loop/ns0:TS837_2400_Loop/ns0:SV1_ProfessionalService/ns0:C003_CompositeMedicalProcedureIdentifier/C00303_ProcedureModifier) [1]', 'varchar(20)') AS RendAttendNPI
FROM EDI_DATA
How do I get all the Line Numbers and all of the Procedure Modifiers from each record?
XML:
<ns1:X12EnrichedMessage xmlns:ns1="http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML">
...
<TransactionSet>
<!-- ProcessLogID=PLG0005169955 ;ProcessLogDetailID=PLG0005173285 ;EnvID=1;RetryCount=1 -->
<ns0:X12_00501_837_P xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2000A_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2000B_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2300_Loop xmlns:ns0="http://schemas.microsoft.com/BizTalk/EDI/X12/2006">
<ns0:TS837_2400_Loop>
<ns0:LX_ServiceLineNumber>
<LX01_AssignedNumber>1</LX01_AssignedNumber>
</ns0:LX_ServiceLineNumber>
<ns0:SV1_ProfessionalService>
<ns0:C003_CompositeMedicalProcedureIdentifier>
<C00301_ProductorServiceIDQualifier>HC</C00301_ProductorServiceIDQualifier>
<C00302_ProcedureCode>26340</C00302_ProcedureCode>
<C00303_ProcedureModifier>AG</C00303_ProcedureModifier>
<C00304_ProcedureModifier>58</C00304_ProcedureModifier>
<C00305_ProcedureModifier>51</C00305_ProcedureModifier>
<C00306_ProcedureModifier>XS</C00306_ProcedureModifier>
</ns0:C003_CompositeMedicalProcedureIdentifier>
<SV102_LineItemChargeAmount>8918</SV102_LineItemChargeAmount>
<SV103_UnitorBasisforMeasurementCode>UN</SV103_UnitorBasisforMeasurementCode>
<SV104_ServiceUnitCount>13</SV104_ServiceUnitCount>
<ns0:C004_CompositeDiagnosisCodePointer>
<C00401_DiagnosisCodePointer>1</C00401_DiagnosisCodePointer>
<C00402_DiagnosisCodePointer>2</C00402_DiagnosisCodePointer>
</ns0:C004_CompositeDiagnosisCodePointer>
</ns0:SV1_ProfessionalService>
<ns0:DTP_SubLoop_2>
<ns0:DTP_Date_ServiceDate>
<DTP01_DateTimeQualifier>472</DTP01_DateTimeQualifier>
<DTP02_DateTimePeriodFormatQualifier>D8</DTP02_DateTimePeriodFormatQualifier>
<DTP03_ServiceDate>20160104</DTP03_ServiceDate>
</ns0:DTP_Date_ServiceDate>
</ns0:DTP_SubLoop_2>
<ns0:REF_SubLoop_7>
<ns0:REF_LineItemControlNumber>
<REF01_ReferenceIdentificationQualifier>6R</REF01_ReferenceIdentificationQualifier>
<REF02_LineItemControlNumber>11453481</REF02_LineItemControlNumber>
</ns0:REF_LineItemControlNumber>
</ns0:REF_SubLoop_7>
</ns0:TS837_2400_Loop>
<ns0:TS837_2400_Loop>
<ns0:LX_ServiceLineNumber>
<LX01_AssignedNumber>2</LX01_AssignedNumber>
</ns0:LX_ServiceLineNumber>
<ns0:SV1_ProfessionalService>
<ns0:C003_CompositeMedicalProcedureIdentifier>
<C00301_ProductorServiceIDQualifier>HC</C00301_ProductorServiceIDQualifier>
<C00302_ProcedureCode>20680</C00302_ProcedureCode>
<C00303_ProcedureModifier>58</C00303_ProcedureModifier>
</ns0:C003_CompositeMedicalProcedureIdentifier>
<SV102_LineItemChargeAmount>1277</SV102_LineItemChargeAmount>
<SV103_UnitorBasisforMeasurementCode>UN</SV103_UnitorBasisforMeasurementCode>
<SV104_ServiceUnitCount>1</SV104_ServiceUnitCount>
<ns0:C004_CompositeDiagnosisCodePointer>
<C00401_DiagnosisCodePointer>3</C00401_DiagnosisCodePointer>
</ns0:C004_CompositeDiagnosisCodePointer>
</ns0:SV1_ProfessionalService>
</ns0:TS837_2400_Loop>
</ns0:TS837_2300_Loop>
</ns0:TS837_2000B_Loop>
</ns0:TS837_2000A_Loop>
</ns0:X12_00501_837_P>
</TransactionSet>
</ns1:X12EnrichedMessage>
Look into SQL Server CROSS APPLY which you can use to shred single XML data into multiple rows, for example :
;WITH XMLNAMESPACES ('http://schemas.microsoft.com/BizTalk/EDI/X12/2006' as ns0
,'http://schemas.microsoft.com/BizTalk/EDI/EDIFACT/2006/EnrichedMessageXML' as ns1)
SELECT
TS837_2400_Loop.value('(.//LX01_AssignedNumber)[1]', 'int') 'line_number'
,C00303_ProcedureModifier.value('.', 'varchar(100)') 'procedure_modifier'
FROM EDI_DATA
CROSS APPLY (select CONVERT(XML, xmldata)) as P(X)
CROSS APPLY X.nodes('.//ns0:TS837_2400_Loop') AS Q(TS837_2400_Loop)
CROSS APPLY TS837_2400_Loop.nodes('.//C00303_ProcedureModifier') AS R(C00303_ProcedureModifier)
sqlfiddle demo
output :
| line_number | procedure_modifier |
|-------------|--------------------|
| 1 | AG |
| 2 | 58 |

XML query against SQL Server SSIS DataProfiler xml file in Powershell does not work.

I am trying to query out the attribute values from the xml file attached below. Specifically, I am trying to get the Name, SqlDbType etc. attribute values that are in the "Column" Element Node under"ColumnNullRatioProfile" Node. The xml output file comes as a part of the SQL Server 2008 SSIS DataProfiler Task. My goal is to use Powershell to create a CSV file with selected Attributes that can be loaded into an Excel workbook.
However, I have tried several approaches (see some of them in Method 1 and Method 2 below. I cannot make it work. Any suggestions?
#Save as t.xml on C:\
#-----------------------
<?xml version="1.0" ?>
<DataProfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<DataSources>
<DtsDataSource ID="{45277997-59B4-4A95-909E-7804F0761FA1}" Name="DatabaseConn">
<DtsConnectionManagerID>DatabaseConn</DtsConnectionManagerID>
</DtsDataSource>
<DtsDataSource ID="{BAEE1FCA-E5A2-4C3C-A1B6-100B3B681397}" Name="Table.xml">
<DtsConnectionManagerID>Table.xml</DtsConnectionManagerID>
</DtsDataSource>
</DataSources>
<DataProfileOutput>
<Profiles>
<ColumnNullRatioProfile ProfileRequestID="NullRatioReq" IsExact="true">
<DataSourceID>{45877997-59B4-4A95-909E-7804F0761FA1}</DataSourceID>
<Table DataSource="XVRTFD0585\SQL905" Database="BusinessData" Schema="General" Table="Email_Notifications_Lookup" RowCount="-1" />
<Column Name="EmailURL_ID" SqlDbType="Int" MaxLength="0" Precision="10" Scale="0" LCID="-1" CodePage="0" IsNullable="false" StringCompareOptions="0" />
<NullCount>0</NullCount>
</ColumnNullRatioProfile>
<ColumnNullRatioProfile ProfileRequestID="NullRatioReq1" IsExact="true">
<DataSourceID>{45CC99B2-E396-4CFA-A1F5-4E703F04E9E7}</DataSourceID>
<Table DataSource="XVRTFD0585\SQL905" Database="BusinessData" Schema="General" Table="LOOKUP_CODES" RowCount="5979114" />
<Column Name="TRANS_ID" SqlDbType="Decimal" MaxLength="0" Precision="9" Scale="0" LCID="-1" CodePage="0" IsNullable="true" StringCompareOptions="0" />
<NullCount>5979114</NullCount>
</ColumnNullRatioProfile>
</Profiles>
</DataProfileOutput>
</DataProfile>
#Method 1
#--------
$uri="C:\t.xml"
$xDoc = [xml](Get-Content $uri )
$XDoc.DataProfile.DataProfileOutput.Profiles.ColumnNullRatioProfile.Column | select Name
#Method 2 (Using LINQ)
#--------
$uri="C:\t.xml"
[Reflection.Assembly]::LoadWithPartialName(”System.Xml.Linq”) | Out-Null
$XDoc = [System.Xml.Linq.XDocument]::Load($uri)
$XDoc.Descendants(“ColumnNullRatioProfile”) | ForEach {$_.Element("Column”).GetAttribute("Name").Value}
$XDoc.DataProfile.DataProfileOutput.Profiles.ColumnNullRatioProfile |
% {$_.column} |
select name
$file = [xml](gc "C:\t.xml")
$columns = $file.DataProfile.ColumnNullRatioProfile.Column.name

Resources