Parsing XML with T-SQL in two separate tables - sql-server

Using SQL Server 2012 and trying to parse an XML to 2 separate tables in my database. Normally 1 table would be enough, but not in this instance. My XML is structured as follows (I can't change it's structure, I already receive it like that)
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<podjetje id="" storitev="" uporabnik="" ts="" opis_storitve="">
<izdelki>
<izdelek st="1">
<izdelekID>ID</izdelekID>
<ean>EAN CODE</ean>
<izdelekIme>PRODUCT NAME</izdelekIme>
<url>WEBSITE</url>
<kratkiopis>SHORT DESCRIPTION</kratkiopis>
<opis>DESCRIPTION</opis>
<dodatneLastnosti>ATTRIBUTES</dodatneLastnosti>
<slikaVelika>BIG PICTURE URL</slikaVelika>
<dodatneSlike>
<dodatnaSlika1>EXTRA IMAGE URL</dodatnaSlika1>
<dodatnaSlika2>EXTRA IMAGE URL2</dodatnaSlika2>
<dodatnaSlika3>EXTRA IMAGE URL3</dodatnaSlika3>
</dodatneSlike>
</izdelek>
</izdelki>
</podjetje>
To insert this XML into a table i use SQL bulk insert
SET #SQLString = 'INSERT INTO tmpImport(XmlCol)
SELECT *
FROM OPENROWSET(BULK ''' + #ImportFileName + ''', SINGLE_BLOB, ERRORFILE = ''' + #BulkLoadFilePath + ''') AS x '
EXECUTE (#SQLString)
I can handle most of the data without any problems. I ran into some problems when i get to the node "dodatneSlike". The idea is, that each article has some pictures. The main picture is in the node "slikaVelika" and I can insert it into my table. There are extra pictures in the child nodes of node "dodatneSlike". This is causing me problems, because I have to insert these extra pictures into a separate table (inserting the picture from node "slikaVelika" would also help, but I think i can get around it if it's not possible). The table is nothing special, just the Article ID from node "izdelekID" and the pictures from "dodatneSlike".
The problem is, I never know how many nodes ("dodatnaSlika1", "dodatnaSlika2",...) there will be. There might be 1, 10, 0....
So my question is how do I get the values from "dodatnaSlika" nodes?

Try to use the native XQuery support in SQL Server! Much easier than the clunky old OPENROWSET stuff....
You can try something like this:
DECLARE #input XML = '<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<podjetje id="" storitev="" uporabnik="" ts="" opis_storitve="">
<izdelki>
<izdelek st="1">
<izdelekID>ID</izdelekID>
<ean>EAN CODE</ean>
<izdelekIme>PRODUCT NAME</izdelekIme>
<url>WEBSITE</url>
<kratkiopis>SHORT DESCRIPTION</kratkiopis>
<opis>DESCRIPTION</opis>
<dodatneLastnosti>ATTRIBUTES</dodatneLastnosti>
<slikaVelika>BIG PICTURE URL</slikaVelika>
<dodatneSlike>
<dodatnaSlika1>EXTRA IMAGE URL</dodatnaSlika1>
<dodatnaSlika2>EXTRA IMAGE URL2</dodatnaSlika2>
<dodatnaSlika3>EXTRA IMAGE URL3</dodatnaSlika3>
</dodatneSlike>
</izdelek>
</izdelki>
</podjetje>'
SELECT
izdelek_st = #input.value('(/podjetje/izdelki/izdelek/#st)[1]', 'int'),
izdelekID = #input.value('(/podjetje/izdelki/izdelek/izdelekID)[1]', 'varchar(50)'),
ean = #input.value('(/podjetje/izdelki/izdelek/ean)[1]', 'varchar(50)'),
XC.value('local-name(.)', 'varchar(50)'),
XC.value('(.)[1]', 'varchar(50)')
FROM
#input.nodes('/podjetje/izdelki/izdelek/dodatneSlike/*') AS XT(XC)
This will give you all the subnodes under <dodatneSlike> - no matter how many there are - and it gives you both the node name, as well as the node value.
Update: assuming you have multiple <izdelek> nodes, then you could use this query instead:
SELECT
izdelek_st = #input.value('(/podjetje/izdelki/izdelek/#st)[1]', 'int'),
izdelekID = xc1.value('(izdelekID)[1]', 'varchar(50)'),
ean = xc1.value('(ean)[1]', 'varchar(50)'),
XC2.value('local-name(.)', 'varchar(50)'),
XC2.value('(.)[1]', 'varchar(50)')
FROM
#input.nodes('/podjetje/izdelki/izdelek') AS XT1(XC1)
CROSS APPLY
xc1.nodes('dodatneSlike/*') AS XT2(XC2)

Related

Slow XML import with SQL server

I have a XML file with a size of 1GB.
I use the following code to load the data into sql server.
DECLARE #xmlvar XML
SELECT #xmlvar = BulkColumn
FROM OPENROWSET(BULK 'C:\Data\demo.xml', SINGLE_BLOB) x;
WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
'ux:no:ehe:v5:move' AS ns4,
'ux:no:ehe:v5:cat:fill' as ns3,
'ux:no:ehe:v5:centre' as ns2)
SELECT
zs.value(N'(../#versionCode)', 'VARCHAR(100)') as versionCode,
zs.value(N'(#Start)', 'VARCHAR(50)') as Start_date,
zs.value(N'(#End)', 'VARCHAR(50)') as End_date
into testtbl
FROM #xmlvar.nodes('/ns4:Dataview1/ns4:Content/ns4:gen') A(zs);
I takes now more than 2 hours to run the query and it is not finished.
I have tested the query with a smaller version of the XML file and that works.
Any tips on improving the loading speed?
Thank you.
Update XML file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:Dataview1 xmlns="ux:no::ehe:v5:actual:aver" xmlns:ns4="ux:no:ehe:v5:move">
<ns4:Content versionCode="16000">
<ns4:gen start="1961-07-01" end="1961-07-01">
</ns4:gen>
<ns4:gen start="2017-09-19">
</ns4:gen>
<ns4:gen start="1961-07-02" end="2016-09-30">
</ns4:gen>
<ns4:gen start="2016-10-01" end="2017-09-18">
</ns4:gen>
</ns4:Content>
</ns4:Dataview1>
(1) As #Stu already pointed out, loading XML file first into a single row table will speed up the process of loading significantly.
(2) it is not a good idea to traverse XML up in the XPath expressions. Like here:
c.value('../#versionCode', 'VARCHAR(100)') as versionCode
But the XML structure was not shared in the question. So, it is impossible to suggest anything concrete.
2nd CROSS APPLY is simulating 1-to-many relationship in the XML hierarchy.
Check it out below.
SQL
CREATE TABLE tbl (
ID INT IDENTITY(1, 1) PRIMARY KEY,
XmlColumn XML
);
INSERT INTO tbl(XmlColumn)
SELECT * FROM OPENROWSET(BULK N'C:\Data\demo.xml', SINGLE_BLOB) AS x;
WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
'ux:no:ehe:v5:move' AS ns4,
'ux:no:ehe:v5:cat:fill' as ns3,
'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('#versionCode', 'VARCHAR(100)') as versionCode,
x.value('#start', 'DATE') as Start_date,
x.value('#end', 'DATE') as End_date
INTO dbo.testtbl
FROM tbl
CROSS APPLY XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
CROSS APPLY t1.c.nodes('ns4:gen') AS t2(x);
In my opinion it's better to use an SSIS Package for importing XML files.
It has a component named "XML Source" for loading XML file.
There is a useful article at : https://www.sqlshack.com/import-xml-documents-into-sql-server-tables-using-ssis-packages/

Trying to query XML Data - node has a space in it

I am trying to learn how to work with xml files and data in SQL Server and I'm trying to query an xml file but nothing is returned.
Here is the xml data:
<?xml version="1.0" encoding="UTF-8"?>
<Report xmlns="AdmissionsByPCP" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="AdmissionsByPCP" xsi:schemaLocation="AdmissionsByPCP http://10.xxx.x.xx/ReportServer_NameofReportServer?%2FHl%20C%20Syst%20Reports%2health%2FAdmissBy&rs%3ACommand=Render&rs%3AFormat=XML&rs%3ASessionID=h0iz5ijxgt2vdl45g3pjfs45&rc%3ASchema=True">
<Tablix2>
<Details_Collection>
<Details PCPCarrier="DoctorsName">
<Subreport1>
<Report Name="PCPAdmitSubReport">
<Tablix5 Textbox5="79">
<Details_Collection>
<Details Textbox37="Discharge Dx Code: ICDCode" Textbox89="Admit Dx Code: ICDCode" LOS="4" DischargeDate="07/10/2017" AdmitDate="07/06/2017" Hospital="Hospital Name" MemberName="Name" DOB="1/1/2019" AdmissionType="Inpatient" MemberNo="12345" Auth="321*I" Status="Close" AdmissionID="00001" LobName="Medicare" CarrierName="CarrierName"/>
</Details></Details_Collection></Tablix5></Report></Subreport1></Details></Details_Collection></Tablix2></Report>
Here is the query I'm using:
Declare #XMLData as XML
Set #XMLData=(
Select bulkcolumn
FROM OPENROWSET (Bulk '\Directory\AdmissionsByPCP.xml',
Single_Blob) a)
Select
#XMLData.value('(/Root/Report/Tablix2/Detail_Collections/DetailsPCPCarrier) [1]', 'varchar(max)') PCP
The query returns null and I don't know why. Is it because there is a space in the node (<Details PCPCarrier>) and if so how do I work around that?
You have misunderstood how XML works. This is the node you are looking for:
<Details PCPCarrier="DoctorsName">
This is not a node called Details PCPCarrier; it is a node called Details with an attribute called PCPCarrier.
So the XPath to select it would be:
/Root/Report/Tablix2/Detail_Collections/Details
Or, if you want to specifically filter by the PCPCarrier attribute existing:
/Root/Report/Tablix2/Detail_Collections/Details[#PCPCarrier]
Or, to get the value of the attribute itself:
/Root/Report/Tablix2/Detail_Collections/Details/#PCPCarrier
IMSoP pointed me in the right direction and I figured out the rest myself.
I also needed to add this:
With XMLNAMESPACES (Default 'AdmissionsByPCP')
So the query looks like this:
Declare #XMLData as XML
Set #XMLData=(
Select *
FROM OPENROWSET (Bulk '\\Directory\AdmissionsByPCP.xml',
Single_Clob) a );
With XMLNAMESPACES (Default 'AdmissionsByPCP')
Select
#XMLData.value('(/Report/Tablix2/Details_Collection/Details/#PCPCarrier)
[1]', 'varchar(max)')

SQL Server 2008: Null Return in Dynamic XML Query

I have a set of dynamic queries which return XML as varchars, see below.
Example query:
set #sqlstr = 'Select ''<?xml version="1.0" encoding="windows-1252" ?>'' + ''<File_Name><Location>'' + (Select a,b,c from table for xml path(''Row'')) + </Location></File_name>'''
exec(#sqlstr)
This works a treat until the select a,b,c ... query is NULL. Then I don't receive the outside elements as you'd expect like:
<?xml version="1.0" encoding="windows-1252"><File_Name><Location><Row></Row></Location></File_name>
All I receive is NULL
After a bit of Googling I find the issue is the concatenation of NULL results is a complete NULL Result. However I cannot find one solution gives me what I'd expect to be the result.
I've tried (not to say I have tried correctly)
IsNull(Exec(#sqlstring),'*blank elements*')
xsnil (doesn't seem to work in dynamic queries)
#result = exec(#sqlstring) then isnull and select
Anyone have a better solution? (preferably small due to multiple such queries)
I think you need something like this:
set #sqlstr = 'Select ''<?xml version="1.0" encoding="windows-1252" ?><File_Name><Location>'' + (Select IsNull(a, '') as a, IsNull(b, '') as b,IsNull(c, '') as c from table for xml path(''Row'')) + </Location></File_name>'''
exec(#sqlstr)

SQL Server FOR XML PATH: Set xml-declaration or processing instruction "xml-stylesheet" on top

I want to set a processing instruction to include a stylesheet on top of an XML:
The same issue was with the xml-declaration (e.g. <?xml version="1.0" encoding="utf-8"?>)
Desired result:
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<TestPath>
<Test>Test</Test>
<SomeMore>SomeMore</SomeMore>
</TestPath>
My research brought me to node test syntax and processing-instruction().
This
SELECT 'type="text/xsl" href="stylesheet.xsl"' AS [processing-instruction(xml-stylesheet)]
,'Test' AS Test
,'SomeMore' AS SomeMore
FOR XML PATH('TestPath')
produces this:
<TestPath>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<Test>Test</Test>
<SomeMore>SomeMore</SomeMore>
</TestPath>
All hints I found tell me to convert the XML to VARCHAR, concatenate it "manually" and convert it back to XML. But this is - how to say - ugly?
This works obviously:
SELECT CAST(
'<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<TestPath>
<Test>Test</Test>
<SomeMore>SomeMore</SomeMore>
</TestPath>' AS XML);
Is there a chance to solve this?
There is another way, which will need two steps but don't need you to treat the XML as string anywhere in the process :
declare #result XML =
(
SELECT
'Test' AS Test,
'SomeMore' AS SomeMore
FOR XML PATH('TestPath')
)
set #result.modify('
insert <?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
before /*[1]
')
Sqlfiddle Demo
The XQuery expression passed to modify() function tells SQL Server to insert the processing instruction node before the root element of the XML.
UPDATE :
Found another alternative based on the following thread : Merge the two xml fragments into one? . I personally prefer this way :
SELECT CONVERT(XML, '<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>'),
(
SELECT
'Test' AS Test,
'SomeMore' AS SomeMore
FOR XML PATH('TestPath')
)
FOR XML PATH('')
Sqlfiddle Demo
As it came out, har07's great answer does not work with an XML-declaration. The only way I could find was this:
DECLARE #ExistingXML XML=
(
SELECT
'Test' AS Test,
'SomeMore' AS SomeMore
FOR XML PATH('TestPath'),TYPE
);
DECLARE #XmlWithDeclaration NVARCHAR(MAX)=
(
SELECT N'<?xml version="1.0" encoding="UTF-8"?>'
+
CAST(#ExistingXml AS NVARCHAR(MAX))
);
SELECT #XmlWithDeclaration;
You must stay in the string line after this step, any conversion to real XML will either give an error (when the encoding is other then UTF-16) or will omit this xml-declaration.

Bulk Import of XML Into Existing Tables

I am new to XML and SQL Server and am trying import an XML file into SQL Server 2010. I have 14 tables that I would like to parse the data into. All 14 table names are listed in the XML as nodes (I think) I found some example code that worked with the simple example XML, but my XML seems a little more complicated and may not be structured optimally; unfortunately, I can't change that. As a basic attempt, I tried to insert the data into just one field of one existing table (SILVX_SN16000), but the Message pane shows "(0 rows(s) affected). Thanks in advance for looking at this.
USE TEST
Declare #xml XML
Select #xml =
CONVERT(XML,bulkcolumn,2) FROM OPENROWSET(BULK 'C:\Users\Kevin_S\Documents \SilvxInSightImport.xml',SINGLE_BLOB) AS X
SET ARITHABORT ON
Insert into [SILVX_SN16000]
(
md_group
)
Select
P.value('MD_GROUP[1]','NVARCHAR(255)') AS md_group
From #xml.nodes('/TableData/Row') PropertyFeed(P)
Here is a much-shortened (rows removed) version of my XML:
<?xml version="1.0" ?>
<SilvxInSightImport Version="1.0" Host="uslsss17" Date="14-09-14_20-40-02">
<Tables Count="14">
<Table Name="SN16000">
<TableSchema>
<Column><COLUMN_NAME>PARENT_HPKEY</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>MD_GROUP</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>PKEY</COLUMN_NAME><DATA_TYPE>NUMBER</DATA_TYPE></Column>
<Column><COLUMN_NAME>S_STATE</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>NAME</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>ROUTER_ID</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>IP_ADDR</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
</TableSchema>
<TableData>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>161888</PKEY><NAME>UODEDTM010</NAME><ROUTER_ID>10.41.32.129</ROUTER_ID> <IP_ADDR>10.41.32.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>278599</PKEY><NAME>UODEETM010</NAME><ROUTER_ID>10.41.4.129</ROUTER_ID> <IP_ADDR>10.41.4.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>183583</PKEY><NAME>UODEGRM010</NAME><ROUTER_ID>10.41.76.129</ROUTER_ID> <IP_ADDR>10.41.76.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
NT_HPKEY>100</PARENT_HPKEY><PKEY>811003</PKEY><NAME>UODWTIN010</NAME> <ROUTER_ID>10.27.36.130</ROUTER_ID><IP_ADDR>10.27.36.130</IP_ADDR><S_STATE>IS-NR</S_STATE> </Row>
</TableData>
</Table>
</Tables>
</SilvxInSightImport>
The xPath in .nodes() must specify the whole path to the Row nodes so you should start with SilvxInSightImport and work your way down to Row.
/SilvxInSightImport/Tables/Table/TableData/Row
In your case you have multiple table nodes, one for each table and I assume you only need one table at a time. You can use a predicate on the table name in the .nodes() xPath expression.
/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row
Your whole query for SN16000 should look something like this.
select T.X.value('(MD_GROUP/text())[1]', 'varchar(20)') as MD_GROUP,
T.X.value('(PARENT_HPKEY/text())[1]', 'int') as PARENT_HPKEY,
T.X.value('(PKEY/text())[1]', 'int') as PKEY,
T.X.value('(NAME/text())[1]', 'varchar(20)') as NAME,
T.X.value('(ROUTER_ID/text())[1]', 'varchar(20)') as ROUTER_ID,
T.X.value('(IP_ADDR/text())[1]', 'varchar(20)') as IP_ADDR,
T.X.value('(S_STATE/text())[1]', 'varchar(20)') as S_STATE
from #XML.nodes('/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row') as T(X)
You have to sort out the data types used for each column.
SQL Fiddle

Resources