Best ETL tool for converting XML to a table - sql-server

I need to convert >500 XML's to tables that I can query. I have the XSD that I use to verify the structure. I was considering using notepad++ to structure the files. Is that a good idea, if not what is better? The end goal is either flatfiles with the same columns or directly to SQL
Example #1 XML
(...)
<Customer>
<CustomerID>1</CustomerID>
<Address>
<Street>John Street</Street>
<Number>6</Number>
<Apartment>68</Apartment>
<City>New York</City>
<Zip>10068</Zip>
</Address>
<Firstname>John</Firstname>
<LastName>Doe<LastName/>
</Customer>
(...)
Example #2 XML
(...)
<Customer>
<CustomerID>2</CustomerID>
<Address>
<Street>Wall Street</Street>
<City>New York</City>
</Address>
<Firstname>James Smith</Firstname>
</Customer>
(...)
Example #3 XML
(...)
<n1:Customer>
<n1:CustomerID>3</n1:CustomerID>
<n1:Address>
<n1:Apartment>32</n1:Apartment>
<n1:City>Chicago</n1:City>
</n1:Address>
</n1:Customer>
(...)

Related

SQL Server reduce recurring XML nodes to JSON array

I have some XML in which every entry can contain some recurring elements. I'm trying to query it with OpenXML function and I want to reduce those elements to JSON arrays.
My SQL looks like this:
declare #idoc int,
#xml xml = '
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
exec sp_xml_preparedocument #idoc out, #xml;
select
id as ID
, address as AddressesJson
, document as DocumentsJson
from openxml(#idoc, '//individual', 2) with (
id int
, address nvarchar(max)
, document nvarchar(max)
);
exec sp_xml_removedocument #idoc;
The rusult I'm getting is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |Country1ZIP1City1 |1012020-01-01desc1 |
|2 |Country3ZIP3City3 |1032020-01-03desc3 |
What I would like to get is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |[{"coutry":"Country1","zip":"ZIP1","city":"City1"},{"coutry":"Country2","zip":"ZIP2","city":"City2"}] |[{"num":"101","issued":"2020-01-01","description":"desc1"},{"num":"102","issued":"2020-01-02","description":"desc2"}] |
|2 |[{"coutry":"Country3","zip":"ZIP3","city":"City3"},{"coutry":"Country4","zip":"ZIP4","city":"City4"}] |[{"num":"103","issued":"2020-01-03","description":"desc3"},{"num":"104","issued":"2020-01-04","description":"desc4"}] |
How can I achieve this?
P.S. I'm using OpenXML because it seems to work faster. I would also appreciate a solution with xml.nodes()/xquery
Seems a couple of subqueries and a JSON PATH is what you want here. Note, as well, I had to amend your xml to remove the leading line break, as that actually makes the value an invalid xml value:
DECLARE #idoc int,
#xml xml = '<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
SELECT c.i.value('(id/text())[1]','int') AS id,
(SELECT i.a.value('(coutry/text())[1]','varchar(30)') AS country, --It's spelt country, I suggest fixing this at your source, as fundament typographical errors like this can be a real problem later down the line
i.a.value('(zip/text())[1]','varchar(30)') AS zip,
i.a.value('(city/text())[1]','varchar(30)') AS city
FROM c.i.nodes('address')i(a)
FOR JSON PATH) AS AddressJson,
(SELECT i.d.value('(num/text())[1]','int') AS num,
i.d.value('(issued/text())[1]','date') AS issued,
i.d.value('(description/text())[1]','varchar(30)') AS description
FROM c.i.nodes('document')i(d)
FOR JSON PATH) AS DocumentJson
FROM #xml.nodes('collection/individual') c(i);
db<>fiddle

SQL Server 2019 FOR XML nested nodes preserving CDATA

I have to build this payload
<?xml version="1.0" encoding="utf-8"?>
<shipment>
<software>
<application>MYRTL</application>
<version>1.0</version>
</software>
<security>
<customer>X00000</customer>
<user>X00000</user>
<password>password1</password>
<langid>IT</langid>
</security>
<consignment action="I" cashondeliver="N" international="N" insurance="N">
<labelType>T</labelType>
<senderAccId>200200</senderAccId>
<consignmenttype>T</consignmenttype>
<actualweight>00008000</actualweight>
<actualvolume>0000018</actualvolume>
<totalpackages>2</totalpackages>
<packagetype>C</packagetype>
<division>D</division>
<product>N</product>
<insurancevalue>0000000000000</insurancevalue>
<insurancecurrency>EUR</insurancecurrency>
<reference><![CDATA[22X000223]]></reference>
<collectiondate>20220818</collectiondate>
<termsofpayment>S</termsofpayment>
<systemcode>RL</systemcode>
<systemversion>1.0</systemversion>
<codfvalue>0000000000000</codfvalue>
<codfcurrency>EUR</codfcurrency>
<goodsdesc><![CDATA[Bread, Butter & Puré]]></goodsdesc>
<addresses>
<address>
<addressType>S</addressType>
<vatno>123456789123</vatno>
<addrline1><![CDATA[Via Mondovì, n° 23]]></addrline1>
<postcode><![CDATA[20125]]></postcode>
<phone1><![CDATA[345]]></phone1>
<phone2><![CDATA[3456345]]></phone2>
<name><![CDATA[Jack & Joe srl]]></name>
<country><![CDATA[IT]]></country>
<town><![CDATA[Arquà Polesine]]></town>
<province><![CDATA[RO]]></province>
<email><![CDATA[mail#jack_and_joe.it]]></email>
</address>
<address>
<addressType>C</addressType>
<addrline1><![CDATA[12° Reggimento Granatieri, 14]]></addrline1>
<postcode><![CDATA[00195]]></postcode>
<phone1><![CDATA[321]]></phone1>
<phone2><![CDATA[3214321]]></phone2>
<name><![CDATA[Giosuè Rossë]]></name>
<country><![CDATA[IT]]></country>
<town><![CDATA[Gambolo']]></town>
<province><![CDATA[TV]]></province>
<email><![CDATA[mario#rossi.it]]></email>
</address>
<address>
<addressType>R</addressType>
<addrline1><![CDATA[Hauptstraße 13]]></addrline1>
<postcode><![CDATA[34100]]></postcode>
<phone1><![CDATA[333]]></phone1>
<phone2><![CDATA[333444555]]></phone2>
<name><![CDATA[Noè Giassù]]></name>
<country><![CDATA[IT]]></country>
<town><![CDATA[Völs am Schlern]]></town>
<province><![CDATA[BZ]]></province>
<email><![CDATA[mail#noe.it]]></email>
</address>
</addresses>
<collectiontrg>
<priopntime>0900</priopntime>
<priclotime>1200</priclotime>
<secopntime>1400</secopntime>
<secclotime>1800</secclotime>
<availabilitytime>1600</availabilitytime>
<pickupdate>18.08.2022</pickupdate>
<pickuptime>1600</pickuptime>
<pickupdays>1</pickupdays>
<pickupinstr><![CDATA[Test Shipment ===> DO NOT COLLECT <===]]></pickupinstr>
</collectiontrg>
<dimensions itemaction="I">
<itemsequenceno>1</itemsequenceno>
<itemtype>C</itemtype>
<itemreference><![CDATA[22X0002223_1]]></itemreference>
<volume>0000009</volume>
<weight>00003000</weight>
<length>030000</length>
<heigh>010000</heigh>
<width>030000</width>
<quantity>1</quantity>
</dimensions>
<dimensions itemaction="I">
<itemsequenceno>2</itemsequenceno>
<itemtype>C</itemtype>
<itemreference><![CDATA[22X0002223_2]]></itemreference>
<volume>0000009</volume>
<weight>00005000</weight>
<length>030000</length>
<heigh>010000</heigh>
<width>030000</width>
<quantity>1</quantity>
</dimensions>
</consignment>
</shipment>
I had the bad idea to use T-SQL since all data are in SQL Server DB
I thought it was quite easy, and actually, it was, since was just required to nest some FOR XML PATH, TYPE subqueries.
Problems arose when considered that some fields could contain not standard charachters, therefore was better to use some CDATA fields.
I faced several problems since it appears that the only way to preserve CDATA is using FOR XML EXPLICIT that seems to be deprecated.
However it was very difficult to find documentation.
Fortunately I found this post that helped me to make the reverse path:
Therefore I built a sproc with XML Explicit format:
SELECT 1 AS Tag,
NULL AS Parent,
'MYRTL' AS 'software!1!application!element',
'1.0' AS 'software!1!version!element',
NULL AS 'security!2!customer!element',
...
NULL AS 'security!2!langid!element',
NULL AS 'consignment!3!action',
...
NULL AS 'consignment!3!goodsdesc!CDATA',
NULL AS 'addresses!4!address',
NULL AS 'address!5!addressType!element',
...
NULL AS 'address!5!town!CDATA',
...
NULL AS 'collectiontrg!9!priopntime!element',
...
NULL AS 'collectiontrg!9!pickupdate!element',
UNION ALL
SELECT 2 AS Tag,
NULL AS Parent,
...
UNION ALL
SELECT 3 AS Tag,
NULL AS Parent,
...
UNION ALL
SELECT 9 AS Tag,
3 AS Parent,
...
FOR XML EXPLICIT, ROOT('shipment')
It seems to be working well... although I think there has to be a better way to build it.
Now I have a further issue that I do not know how to solve, or better, I could solve it using a dynamic query, but I would avoid it:
New issue is that node shipment.consignment.addresses.address where addressType=='C'
has to be omitted if it contains the same values as shipment.consignment.addresses.address where addressType=='S'
furthermore the node shipment.consignment.collectiontrg has to appear only if the variable pickupDate is not null
Is there a way to avoid the dynamic query?
Is there a better way to build this query?
Thanks

How to handle apostrophe ( ' ) in XMl and SQL Server

I have an XML file which has details of employees and the problem I am try to figure it out is these XML values have apostrophe ( ' ) in random places all over the XML content, so it's getting hard to insert them into SQL Server tables.
I will be sending this entire XML content from MVC C# to a SQL Server stored procedure which will insert the data into various tables, but whenever there is an apostrophe ( ' ) in XML content, the error occurs. So these apostrophes should be either handled or replaced or removed. How can I do this?
This is some sample XML content:
<xml>
<Channel>
<Program id="1" category="A">
<name>Pra'Matino</name>
<Bin>
<Date>1/1/2020</Date>
<Date>1/1/2020</Date>
</Bin>
<Player>
<Pla>S'Rajesh</Pla>
<Pla>Su'man</Pla>
</Player>
<Television>
<HostDeails>2/9/2020</HostDeails>
<HostDeails>MALE</HostDeails>
<HostDeails>Colour</HostDeails>
</Television>
<addresses>
<address>
<address1>No 10</address1>
<city>Chennai</city>
<country>IN's</country>
<ProductName>Lavender's</ProductName>
</address>
<address>
<address1>N0 72</address1>
<city>Sanagoor Road</city>
<postalCode>641006</postalCode>
</address>
<address>
<address1>Old No 10/ New No 3</address1>
<city>Madurai</city>
<country>IN</country>
<ProductName>Lavender</ProductName>
</address>
<address>
<address1>N0 98</address1>
<city>BridhSanagoor Road</city>
<country>SriLanka</country>
<postalCode>641006</postalCode>
</address>
</addresses>
</Program>
<Program id="25" category="B">
<name>Rahman'G</name>
<Bin>
<Date>10/1/2020</Date>
<Date>1/12/1989</Date>
</Bin>
<Player>
<Pla>Paul'D</Pla>
<Pla>Right'F</Pla>
</Player>
<Television>
<HostDeails>5/7/2021</HostDeails>
<HostDeails>MALE</HostDeails>
<HostDeails>C'olour</HostDeails>
</Television>
<addresses>
<address>
<address1>S7</address1>
<city>Coimbatire</city>
<country>IN</country>
<ProductName>Lavender</ProductName>
</address>
<address>
<address1>Sai Akshya Appartment</address1>
<city>Sanagoor Road</city>
<postalCode>631009</postalCode>
</address>
<address>
<address1> No 3</address1>
<city>Thenkaasi</city>
<ProductName>Lavender</ProductName>
</address>
<address>
<address1>N0 98</address1>
<city>Bridh'Sanagoor Road</city>
<country>SriLanka</country>
<postalCode>641006</postalCode>
</address>
</addresses>
</Program>
</Channel>
</xml>
Thank you all.
Copied from comment:
I doing like this
SqlCommand.Parameters.Add("#XMLValue", SqlDbType.Xml).Value = xmlDetails.ToString();
If you use Parameterized Queries (Take a look at SqlParameters), you won't be having these issues.
SqlCommand cmd = new SqlCommand(query, connection);
SqlParameter param = new SqlParameter();
param.ParameterName = "#ParamName";
param.Value = valueVariable;
try not to use string concatenation and be aware of the risks and the implications.
EDIT:
xmlDetails.ToString() dont convert xmlDetails to string. Instead parse it
SqlXml newXml = new SqlXml(new XmlTextReader("MyTestStoreData.xml"));
SqlXml newXml = new SqlXml(xmlDetails); <-- xmlDetails is needed to be either Stream or XmlReader type

Insert XML child node to SQL table

I've got an XML file like this and I'm working with SQL 2014 SP2
<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">'OSKOAO'</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>
I made a script to insert data from the XML to a SQL table. How can I insert child node into a table? I think I should replicate the row for each new child node but i don't know the best way to proceed.
Here is my SQL code
DECLARE #InputXML XML
SELECT #InputXML = CAST(x AS XML)
FROM OPENROWSET(BULK 'C:\MyFiles\sample.XML', SINGLE_BLOB) AS T(x)
SELECT
product.value('(#id)[1]', 'NVARCHAR(10)') id,
product.value('(#version)[1]', 'NVARCHAR(14)') ID
product.value('(name[1])', 'NVARCHAR(255)') name,
product.value('(listId[1])', 'NVARCHAR(9)')listId,
product.value('(listCode[1])', 'NVARCHAR(10)')listCode,
product.value('(entityType[1])', 'NVARCHAR(2)')entityType,
product.value('(createdDate[1])', 'NVARCHAR(10)')createdDate,
product.value('(lastUpdateDate[1])', 'NVARCHAR(10)')lastUpdateDate,
product.value('(source[1])', 'NVARCHAR(15)')source,
product.value('(OriginalSource[1])', 'NVARCHAR(50)')OriginalSource,
product.value('(aliases[1])', 'NVARCHAR(50)')aliases,
product.value('(programs[1])', 'NVARCHAR(50)')programs,
product.value('(sdfs[1])', 'NVARCHAR(500)')sdfs,
product.value('(addresses[1])', 'NVARCHAR(50)')addresses,
product.value('(otherIDs[1])', 'NVARCHAR(50)')otherIDs
FROM #InputXML.nodes('gwl/entities/entity') AS X(product)
You have a lot of different children here...
Just to show the principles:
DECLARE #xml XML=
N'<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">''OSKOAO''</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>';
-The query will fetch some values from several places.
--It should be easy to get the rest yourself...
SELECT #xml.value('(/gwl/version/text())[1]','bigint') AS [version]
,A.ent.value('(name/text())[1]','nvarchar(max)') AS [Entity_Name]
,A.ent.value('(listId/text())[1]','int') AS Entity_ListId
--more columns taken from A.ent
,B.als.value('#category','nvarchar(max)') AS Alias_Category
,B.als.value('text()[1]','nvarchar(max)') AS Alias_Content
--similar for programs and sdfs
,E.addr.value('(address1/text())[1]','nvarchar(max)') AS Address_Address1
,E.addr.value('(country/text())[1]','nvarchar(max)') AS Address_Country
--and so on
FROM #xml.nodes('/gwl/entities/entity') A(ent)
OUTER APPLY A.ent.nodes('aliases/alias') B(als)
OUTER APPLY A.ent.nodes('programs/program') C(prg)
OUTER APPLY A.ent.nodes('sdfs/sdf') D(sdfs)
OUTER APPLY A.ent.nodes('addresses/address') E(addr)
OUTER APPLY A.ent.nodes('otherIds/childId') F(ids);
The idea in short:
We read non-repeating values (e.g. version) from the xml variable directly
We use .nodes() to return repeating elements as derived sets.
We can use a cascade of .nodes() to dive deeper into repeating child elements by using a relativ Xpath (no / at the beginning).
You have two approaches:
Read the XML like above into a staging table (simply by adding INTO #tmpTable before FROM) and proceed from there (will need one SELECT ... GROUP BY for each type of child).
Create one SELECT per type of child, using only one of the APPLY lines and shift the data into specific child tables.
I would tend to the first one.
This allows to do some cleaning, generate IDs, check for business rules, before you shift this into the target tables.

Convert XML from one format to another

I have this below xml data which is stored in a table.
The XML Structure I have
<Response>
<Question ID="1">
<Value ID="1">I want a completely natural childbirth - no medical interventions for me</Value>
<Value ID="2">no medical interventions for me</Value>
</Question>
</Response>
I need to convert this XML to a slightly different format, like the below one.
The XML Structure I need
<Response>
<Question ID="1">
<SelectedChoices>
<Choice>
<ID>1</ID>
</Choice>
<Choice>
<ID>2</ID>
</Choice>
</SelectedChoices>
</Question>
</Response>
Here the "Value" is changed to "Choice" and "ID" attribute of "Value" element is changed to an element.
I know this can be done in other ways, like using an XSLT. But it will be much more helpful if can accomplish with SQL itself.
Can someone help me to convert this using SQL?
Use this variable to test the statements
DECLARE #xml XML=
N'<Response>
<Question ID="1">
<Value ID="1">I want a completely natural childbirth - no medical interventions for me</Value>
<Value ID="2">no medical interventions for me</Value>
</Question>
</Response>';
This can be done with FLWOR-XQuery:
The query will re-build the XML out of itself... Very similar to XSLT...
SELECT #xml.query(
N'
<Response>
{
for $q in /Response/Question
return
<Question ID="{$q/#ID}">
<SelectedChoices>
{
for $v in $q/Value
return <Choice><ID>{string($v/#ID)}</ID></Choice>
}
</SelectedChoices>
</Question>
}
</Response>
'
);
Another approach: Shredding and re-build
You'd reach the same with this, but I'd prefere the first...
WITH Shredded AS
(
SELECT q.value('#ID','int') AS qID
,v.value('#ID','int') AS vID
FROM #xml.nodes('/Response/Question') AS A(q)
OUTER APPLY q.nodes('Value') AS B(v)
)
SELECT t1.qID AS [#ID]
,(
SELECT t2.vID AS ID
FROM Shredded AS t2
WHERE t1.qID=t2.qID
FOR XML PATH('Choice'),ROOT('SelectedChoices'),TYPE
) AS [node()]
FROM Shredded AS t1
GROUP BY t1.qID
FOR XML PATH('Question'),ROOT('Response')

Resources