I have researched the best way to shred this xml file extensively and have come close but not all the way to what I want.
I am using SQL Server 2012 and have Visual Studio 2012 as well though I prefer to use SQL Server.
Here is a snippet of the type of XML data I am working with. I cannot control how the XML is built as it comes from a 3rd party. In Reality below the node there are about 450 response types such as ResponseID, Name, Status etc... I only show about ten.
<xml>
<Response>
<ResponseID>R_a4yThVvKXzVyftz</ResponseID>
<ResponseSet>Default Response Set</ResponseSet>
<Name>Doe, John</Name>
<ExternalDataReference>0</ExternalDataReference>
<EmailAddress>jdoe#gmail.com</EmailAddress>
<IPAddress>140.123.12.123</IPAddress>
<Status>0</Status>
<StartDate>2014-09-18 09:21:11</StartDate>
<EndDate>2014-09-23 16:09:58</EndDate>
<Finished>1</Finished>
</Response>
</xml>
I've tried the OPENROWSET Method shown on this site
http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx
Using a query like this:
SELECT
a1.value('(RESPONSEID/text())[1]', 'varchar(50)') as RESPONSEID,
a2.value('(RESPONSESET/text())[1]', 'varchar(50)') as RESPONSESET,
a3.value('(NAME/text())[1]', 'varchar(50)') as NAME
FROM XmlSourceTable
CROSS APPLY XmlData.nodes('//Response') AS RESPONSEID(a1)
CROSS APPLY XmlData.nodes('//Response') AS RESPONSESET(a2)
CROSS APPLY XmlData.nodes('//Response') AS NAME(a3)
I got this to work once, but the shredded output was repeating values and not appearing in the table form I want which is like output below, though note in reality the table is very wide, at least 450 rows in all. Another issue is due to the width being greater than 255 I can't convert this to .txt and import it though I'd strongly prefer to consume and shred the native XML so this process can be automated:
RESPONSEID RESPONSESET NAME EXTERNALDATAREFERENCE EMAILADDRESS IPADDRESS STATUS STARTDATE ENDDATE
R_a4yThVvKXzVyftz Default Response Set Doe, John 1/1/2014 doej#gmail.com 123.12.123 0 9/18/2014 9:21 9/23/2014 16:09
R_06znwEis73yLsnX NonDefault Response Set Doe, Jane 1/1/2014 doeja#gmail.com 123.12.123 0 9/18/2014 5:29 9/29/2014 9:42
R_50HuB0jDFfI6hmZ Response Set 1 Doe, Cindy 1/1/2014 doec#gmail.com 123.12.123 0 9/18/2014 17:21 10/1/2014 11:45
I did find this application
https://www.novixys.com/ExultSQLServer/
to shred XML files which created a single table for the Nodehowever in addition to the response table it creates a table for each and every response node which results in about 500 additional tables. Also the application costs $250..
https://www.novixys.com/ExultSQLServer/
You don't need to add a cross apply for each value you want to extract. One is enough.
SELECT
R.X.value('(ResponseID/text())[1]', 'varchar(50)') as RESPONSEID,
R.X.value('(ResponseSet/text())[1]', 'varchar(50)') as RESPONSESET,
R.X.value('(Name/text())[1]', 'varchar(50)') as NAME
FROM XmlSourceTable
CROSS APPLY XmlData.nodes('//Response') AS R(X)
Related
I have a source as XML and has a huge number of records. just for the sample I have pasted 1 record below :
<?xml version='1.0' encoding='UTF-8'?><wd:Report_Data xmlns:wd="urn:com.workday.report/BCF-Termination-Details">
<wd:Report_Entry>
<wd:Worker>
<wd:Associate_ID>997215</wd:Associate_ID>
<wd:Total_Base_Pay_Amount>13</wd:Total_Base_Pay_Amount>
<wd:Total_Base_Pay_Currency wd:Descriptor="USD"><wd:ID wd:type="WID">9e996ffdd3e14da0ba7275d5400bafd4</wd:ID><wd:ID wd:type="Currency_ID">USD</wd:ID><wd:ID wd:type="Currency_Numeric_Code">840</wd:ID></wd:Total_Base_Pay_Currency>
<wd:Length_of_Service_-_Position>0 year(s), 4 month(s), 7 day(s)</wd:Length_of_Service_-_Position>
</wd:Worker>
<wd:Time_Type wd:Descriptor="Part time"><wd:ID wd:type="WID">3baf0a7f595210daec53e26fa7476d5b</wd:ID><wd:ID wd:type="Position_Time_Type_ID">Part_time</wd:ID></wd:Time_Type>
<wd:Hire_Date>2022-05-25-07:00</wd:Hire_Date>
<wd:Termination_Date>2022-10-02-07:00</wd:Termination_Date>
<wd:Date_Initiated>2022-10-28T17:39:53.943-07:00</wd:Date_Initiated>
<wd:Termination_Category>Voluntary</wd:Termination_Category>
<wd:Termination_Reason>Job Abandonment</wd:Termination_Reason>
<wd:Length_of_Service_in_Days>130</wd:Length_of_Service_in_Days>
<wd:workdayID>f415ada264f1100211408522a0e00000</wd:workdayID>
</wd:Report_Entry></wd:Report_Data>
I need this to implement in ETL. Using xml as source need few of the columns from the xml and load into the database. I am new to xquery, so need to know how can we start it. I am doing the POC on this.
If you want to extract the values from xml source you can try using the XML functions from SQLEXT Toolkit package which can be installed on top of Netezza.
Here is an example of fetching the associate_id from the xml source. You can enter the extracted value in a table.
select xmlextractvalue(xmlparse(ele_string),'/Report_Entry/Worker/Associate_ID') as associate_id from (select replace(element,'wd:','') as ele_string from t1) as foo;
ASSOCIATE_ID
--------------
997215
(1 row)
Given the XML:
<Dial>
<DialID>
24521
</DialID>
<DialName>
Base Price
</DialName>
</Dial>
<Dial>
<DialID>
24528
</DialID>
<DialName>
Rush Options
</DialName>
<DialValue>
1.5
</DialValue>
</Dial>
<Dial>
<DialID>
24530
</DialID>
<DialName>
Bill Rush Charges
</DialName>
<DialValue>
School
</DialValue>
</Dial>
I can use the contains() function in my xpath:
//Dial[DialName[contains(text(), 'Bill')]]/DialValue
To retrieve the values I'm after:
School
The above XML is stored in a field in my SQL database so I'm using the .value method to select from that field.
SELECT Dials.DialDetail.value('(//Dial[DialName[contains(text(), "Bill")]]/DialValue)[1]','VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
I can't seem to get the syntax right though... the xpath works as expected (tested in Oxygen and elsewhere) but when I use it in the XQuery argument of the .value() method, I get an error:
Started executing query at Line 1
Msg 2389, Level 16, State 1, Line 36
XQuery [Dials.DialDetail.value()]: 'contains()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'
Total execution time: 00:00:00.004
I've tried different variations of single and double quotes with no effect. The error refers to an XPath data type for attributes, but I'm not retrieving an attribute; I'm getting the text value. I receive the same error if I type the response with //Dial[DialName[contains(text(), 'Bill')]]/DialValue/text() instead.
What is the correct way to use contains() in an XQuery when it's used in the XML.value() method? Or is this the wrong approach to begin with?
You nearly have it right, you just need [1] on the text() function to guarantee a single value.
You should also use text() on the actual node you are pulling out, for performance reasons.
Also, // can be inefficient, so only use it if you really need recursive descent. You can instead use /*/ to get the first node of any name.
SELECT
Dials.DialDetail.value(
'(//Dial[DialName[contains(text()[1], "Bill")]]/DialValue/text())[1]',
'VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
As Yitzhak Kabinsky notes, this only gets you one value per row of the table, you need .nodes if you want to shred the XML itself into rows.
The difference between your actual database case that fails and your reduced sample case that works is likely one of different data.
The error,
contains() requires a singleton (or empty sequence)
indicates that one of your DialName elements has multiple text node children rather than a single text node child as you're expecting.
You can abstract away such variations by testing the string-value of DialName rather than its text node children:
//Dial[contains(DialName, 'Bill')]/DialValue
See also
Testing text() nodes vs string values in XPath
Here is how to do XML shredding in MS SQL Server correctly.
You need to apply filter in the XQuery .nodes() method.
The .value() method is just for the actual value retrieval.
It is possible to pass SQL Server variable as a parameter instead of the hard-coding "Bill" value.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, DialDetail XML);
INSERT INTO #tbl (DialDetail) VALUES
(N'<Dial>
<DialID>24521</DialID>
<DialName>Base Price</DialName>
</Dial>
<Dial>
<DialID>24528</DialID>
<DialName>Rush Options</DialName>
<DialValue>1.5</DialValue>
</Dial>
<Dial>
<DialID>24530</DialID>
<DialName>Bill Rush Charges</DialName>
<DialValue>School</DialValue>
</Dial>');
-- DDL and sample data population, end
SELECT ID
, c.value('(DialID/text())[1]', 'INT') AS DialID
, c.value('(DialName/text())[1]', 'VARCHAR(30)') AS DialName
, c.value('(DialValue/text())[1]', 'VARCHAR(30)') AS DialValue
FROM #tbl CROSS APPLY DialDetail.nodes('/Dial[contains((DialName/text())[1], "Bill")]') AS t(c);
Output
+----+--------+-------------------+-----------+
| ID | DialID | DialName | DialValue |
+----+--------+-------------------+-----------+
| 1 | 24530 | Bill Rush Charges | School |
+----+--------+-------------------+-----------+
I'm using MS SQL2016 and I have an XML file that I need to parse to put various data elements into the separate fields. For the most part everything works find except I need a little help to identify a particular node value. If I have (I put only a snippet of the xml here but it does show the problem)
DECLARE #xmlString xml
SET #xmlString ='<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">25685064</PMID>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Electronic">1234-5678</ISSN>
<ISSN IssnType="Print">1475-2867</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>15</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2015</Year>
</PubDate>
</JournalIssue>
</Journal>
</Article>
</MedlineCitation>
</PubmedArticle>'
select
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN
from #xmlString.nodes ('//MedlineCitation[1]') as R(nref)
I bypass the second ISSNType and read the first value available. I need to pull both values. What do I need to change? Thanks
You can read as second column:
SELECT
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN,
nref.value('Article[1]/Journal[1]/ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]') as R(nref)
Or
SELECT
nref.value('ISSN[1]','varchar(max)') ISSN,
nref.value('ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]') as R(nref)
Or as a separate row:
SELECT nref.value('.','varchar(MAX)') ISSN
from #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)
Update
If number of ISSNs may vary, I recommend normalize your resultset:
SELECT
nref.value('.','varchar(MAX)') Issn,
nref.value('#IssnType','varchar(MAX)') IssnType
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)
I have an XML document that I'm working to build a schema for in order to bulk load these documents into a SQL Server table. The XML I'm focusing on looks like this:
<Coverage>
<CoverageCd>BI</CoverageCd>
<CoverageDesc>BI</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>30000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerPerson</LimitAppliesToCd>
</Limit>
<Limit>
<FormatCurrencyAmt>
<Amt>85000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerAcc</LimitAppliesToCd>
</Limit>
</Coverage>
<Coverage>
<CoverageCd>PD</CoverageCd>
<CoverageDesc>PD</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>50000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>Coverage</LimitAppliesToCd>
</Limit>
</Coverage>
Inside the Limit element, there's a child LimitAppliesToCd that I need to use to determine where the Amt element's value actually gets stored inside my table. Is this possible to do using the standard XML Bulk Load feature of SQL Server? Normally in XML I'd expect that the element would have an attribute containing the "PerPerson" or "PerAcc" information, but this standard we're using does not call for that.
If anyone has worked with the ACORD standard before, you might know what I'm working with here. Any help is greatly appreciated.
Don't know exactly what you are talking about, but this is a solution to get the information out of your XML.
Assumption: Your XML is already bulk-loaded into a declared variable #xml of type XML:
A CTE will pull the information out of your XML. The final query will then use PIVOT to put your data into the right column.
With a fitting table's structure the actual insert should be simple...
WITH DerivedTable AS
(
SELECT cov.value('CoverageCd[1]','varchar(max)') AS CoverageCd
,cov.value('CoverageDesc[1]','varchar(max)') AS CoverageDesc
,lim.value('(FormatCurrencyAmt/Amt)[1]','decimal(14,4)') AS Amt
,lim.value('LimitAppliesToCd[1]','varchar(max)') AS LimitAppliesToCd
FROM #xml.nodes('/root/Coverage') AS A(cov)
CROSS APPLY cov.nodes('Limit') AS B(lim)
)
SELECT p.*
FROM
(SELECT * FROM DerivedTable) AS tbl
PIVOT
(
MIN(Amt) FOR LimitAppliesToCD IN(PerPerson,PerAcc,Coverage)
) AS p
I'm writing an SQL select statement which returns XML. I wanted to put in some comments and found a post asking how to do this. The answer seemed to be the "comment()" function/keyword. So, my code looks broadly like this:
select ' extracted on tuesday ' as 'comment()',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
...which returns XML as follows:
<stuff>
<!-- extracted on tuesday -->
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>
In the process of trying to find out more about "comment()", I discovered "data()" as well.
select top 5 id as 'data()' from MyTable for xml path('')
Unfortunately, the names make searching for information on these functions very difficult.
Can someone point me at the documentation on their usage, as well as any other similar functions ?
Thanks,
Edit:
Another would appear to be "processing-instruction(blah)".
Example:
select 'type="text/css" href="style.css"' as 'processing-instruction(xml-stylesheet)',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
Results:
<stuff>
<?xml-stylesheet type="text/css" href="style.css"?>
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>
Here is the link to the BOL info: Columns with the Name of an XPath Node Test.
This details the functionality you are interested in. (It can indeed be a pain to find)
Also you can find quick functional examples here