Parse, filter nested XML in TSQL

Parse, filter nested XML in TSQL - sql-server

I have a table (table1) which has a column containing XML data.
I need to parse that XML and create rows of data from the child elements of the element -
The output needs to be something like
TestID Sequence ParentSequence ExtID ExtName
-1 1 -1 1 ABC
-1 2 -1 1 DEF
-1 2 -1 1 GHI
But I am getting an empty result set with every other method I tried.
I have focused on accessing Sequence as rest will follow the same process.
Not sure why this does not work. Any help in this regard is appreciated. Thank you. The SQL I have tried is after the XML(commented text is the options I have tried)
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmlfield
NVARCHAR(MAX));
INSERT INTO #tbl (xmlfield) VALUES
(N'<OBJECT CLASS="Test1" ID="-1" FULL="FULL" VERSION="1">
<SUBTYPE NAME="SubType1">
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">1</FIELD>
<FIELD NAME="ParentSequence">-1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">ABC</FIELD>
</OBJECT>
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">2</FIELD>
<FIELD NAME="ParentSequence">1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">DEF</FIELD>
<FIELD NAME="__ExtendedData"><OBJECT
CLASS="Meet123" ID="-1" FULL="FULL"
VERSION="1"><FIELD
NAME="OrderDetailID">-1</FIELD><FIELD
NAME="OrderID">-1</FIELD><FIELD
NAME="Sequence">0</FIELD><FIELD
NAME="AttendeeID">123</FIELD><FIELD NAME="
AttendeeID _Name">Test, Mark/I H 6</FIELD><FIELD
NAME="ShowList">1</FIELD><FIELD
NAME="BdgeName">Mark</FIELD><FIELD
NAME="BadgeCompanyName">I H 6</FIELD>
</OBJECT></FIELD>
</OBJECT>
</OBJECT>
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">3</FIELD>
<FIELD NAME="ParentSequence">1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">GHI</FIELD>
</OBJECT>
</SUBTYPE>
<SUBTYPE NAME="SubType2"/>
<SUBTYPE NAME="SubType3"/>
</OBJECT>');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT ID, TRY_CAST(xmlfield AS XML) AS cartxml
FROM #tbl
)
SELECT ID
, c.value('(FIELD[#NAME="TestID"]/text())[1]', 'INT') AS TestID
, c.value('(FIELD[#NAME="Sequence"]/text())[1]', 'INT') AS [Sequence]
, c.value('(FIELD[#NAME="ParentSequence"]/text())[1]', 'INT') AS
ParentSequence
, c.value('(FIELD[#NAME="ExtID"]/text())[1]', 'INT') AS ExtID
, c.value('(FIELD[#NAME="ExtName"]/text())[1]', 'VARCHAR(20)') AS
ExtName
,c1.value('(FIELD[#NAME="AttendeeID"]/text())[1]', 'VARCHAR(20)') AS
AttendeeId,
,c1.value('(FIELD[#NAME="AttendeeID_Name"]/text())[1]',
'VARCHAR(20)') AS AttendeeName,
FROM src As T
CROSS APPLY cartxml.nodes('/OBJECT/SUBTYPE/OBJECT[#ID="-1"]') as
t2(c)
OUTER APPLY cartxml.nodes('/
OBJECT/SUBTYPE/OBJECT[#ID="-1"]/FIELD[#NAME="__ExtendedData"]') as
t3(c1)

Please try the following solution.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmlfield NVARCHAR(MAX));
INSERT INTO #tbl (xmlfield) VALUES
(N'<OBJECT CLASS="Test1" ID="-1" FULL="FULL" VERSION="1">
<SUBTYPE NAME="SubType1">
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">1</FIELD>
<FIELD NAME="ParentSequence">-1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">ABC</FIELD>
</OBJECT>
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">2</FIELD>
<FIELD NAME="ParentSequence">1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">DEF</FIELD>
<FIELD NAME="__ExtendedData"><OBJECT
CLASS="Meet123" ID="-1" FULL="FULL"
VERSION="1"><FIELD
NAME="OrderDetailID">-1</FIELD><FIELD
NAME="OrderID">-1</FIELD><FIELD
NAME="Sequence">0</FIELD><FIELD
NAME="AttendeeID">123</FIELD><FIELD NAME="AttendeeID_Name">Test, Mark/I H 6</FIELD><FIELD
NAME="ShowList">1</FIELD><FIELD
NAME="BdgeName">Mark</FIELD><FIELD
NAME="BadgeCompanyName">I H 6</FIELD>
</OBJECT></FIELD>
</OBJECT>
<OBJECT NAME="SubType111" ID="-1">
<FIELD NAME="TestID">-1</FIELD>
<FIELD NAME="Sequence">3</FIELD>
<FIELD NAME="ParentSequence">1</FIELD>
<FIELD NAME="ExtID">-1</FIELD>
<FIELD NAME="ExtName">GHI</FIELD>
</OBJECT>
</SUBTYPE>
<SUBTYPE NAME="SubType2"/>
<SUBTYPE NAME="SubType3"/>
</OBJECT>');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT ID, TRY_CAST(xmlfield AS XML) AS cartxml
FROM #tbl
)
SELECT ID
, c.value('(FIELD[#NAME="TestID"]/text())[1]', 'INT') AS TestID
, c.value('(FIELD[#NAME="Sequence"]/text())[1]', 'INT') AS [Sequence]
, c.value('(FIELD[#NAME="ParentSequence"]/text())[1]', 'INT') AS ParentSequence
, c.value('(FIELD[#NAME="ExtID"]/text())[1]', 'INT') AS ExtID
, c.value('(FIELD[#NAME="ExtName"]/text())[1]', 'VARCHAR(20)') AS ExtName
, w.value('(OBJECT/FIELD[#NAME="AttendeeID"]/text())[1]', 'VARCHAR(20)') AS AttendeeID
, w.value('(OBJECT/FIELD[#NAME="AttendeeID_Name"]/text())[1]', 'VARCHAR(20)') AS AttendeeID_Name
FROM rs AS t
CROSS APPLY cartxml.nodes('/OBJECT/SUBTYPE/OBJECT[#ID="-1"]') as t1(c)
CROSS APPLY (VALUES(TRY_CAST(c.query('FIELD[#NAME="__ExtendedData"]').value('.','NVARCHAR(MAX)') AS XML))) AS t2(w)
WHERE w.exist('/OBJECT[#CLASS="Meet123"]') = 1;
Output
ID
TestID
Sequence
ParentSequence
ExtID
ExtName
AttendeeID
AttendeeID_Name
1
-1
2
1
-1
DEF
123
Test, Mark/I H 6

Related

Reading XML in SQL Server of the specified XML node's "name" and get its "value"

How can I read the following XML in a SQL Server database:
<Row>
<Keys>
<Key>
<Name>NAME_2</Name>
<Data>22</Data>
</Key>
<Key>
<Name>NAME_3</Name>
<Data>33</Data>
</Key>
<Key>
<Name>NAME_1</Name>
<Data>98</Data>
</Key>
</Keys>
</Row>
I want to select from that XML and get only one row with columns:
NAME_1, NAME_2, NAME_3.
That's why I need something which would let me to find Keys/Key/Name with the value: NAME_1 and return its Keys/Key/Data, and so on ...
Expected resultset (1 ROW):
NAME_1 NAME_2 NAME_3
-----------------------
98 22 33
One more important thing. Those values NAME_1, NAME_2, NAME_3. I am expecting them. That's why I need to query for them and return their values for a row.

This was my approach with expected names:
DECLARE #xml XML=
'<Row>
<Keys>
<Key>
<Name>NAME_2</Name>
<Data>22</Data>
</Key>
<Key>
<Name>NAME_3</Name>
<Data>33</Data>
</Key>
<Key>
<Name>NAME_1</Name>
<Data>98</Data>
</Key>
</Keys>
</Row>';
--Most explicit (recommended) version:
SELECT #xml.value('(/Row/Keys/Key[(Name/text())[1]="NAME_1"]/Data/text())[1]','int') AS NAME_1
,#xml.value('(/Row/Keys/Key[(Name/text())[1]="NAME_1"]/Data/text())[1]','int') AS NAME_1
,#xml.value('(/Row/Keys/Key[(Name/text())[1]="NAME_2"]/Data/text())[1]','int') AS NAME_2;
--The same, but less explicit (and therefore not recommended)
SELECT #xml.value('(//Key[Name="NAME_1"]/Data)[1]','int') AS NAME_1
,#xml.value('(//Key[Name="NAME_2"]/Data)[1]','int') AS NAME_1
,#xml.value('(//Key[Name="NAME_3"]/Data)[1]','int') AS NAME_2;
The idea in short:
we fetch each value directly from the XML
We pick the <Key> with a XQuery-predicate asking for the element, where <Name> as the given string as content (= text() node).
We dive into the <Data> node below the <Key> and fetch its content and return it as int.

Assuming you know the column names will always be NAME_1, NAME_2, NAME_3, etc, you can do the following without having to resort to dynamic SQL.
DECLARE #xml xml =
'<Row>
<Keys>
<Key>
<Name>NAME_2</Name>
<Data>22</Data>
</Key>
<Key>
<Name>NAME_3</Name>
<Data>33</Data>
</Key>
<Key>
<Name>NAME_1</Name>
<Data>98</Data>
</Key>
</Keys>
</Row>';
SELECT
*
FROM (
SELECT
x.f.value( 'Data[1]', 'int' ) AS [Data],
x.f.value( 'Name[1]', 'varchar(50)' ) AS [Name]
FROM #xml.nodes( '//Row/Keys/Key' ) x( f )
) AS d
PIVOT (
MAX( [Data] )
FOR [Name] IN ( [NAME_1], [NAME_2], [NAME_3] )
) AS x;
Returns
+--------+--------+--------+
| NAME_1 | NAME_2 | NAME_3 |
+--------+--------+--------+
| 98 | 22 | 33 |
+--------+--------+--------+

SQL Server query on xml datatype for the value present in the tag

I have a SQL Server database with a table FormTable that has a column DataXML in xml format.
Data in that XML column looks like:
<Form>
<Section name="metric1">
<InputNumber type="int" min="0" max="100" title="Length of the box.">
<Value>50</Value>
</InputNumber>
<PickList title="Is the shape square?">
<Option selected="true">Yes</Option>
<Option>No</Option>
<Option>No standard shape</Option>
</PickList>
</Section>
</Form>
The XML contains many more InputNumber and PickList tags. I have written this query to fetch the required data. It returns:
category -- row number -- title
metric1 -- 1 -- Length of the box
metric1 -- 2 -- Is the shape square?
SELECT
tbl.FormId,
category = c.value(''../../#name'', ''VARCHAR(max)''),
ROW_NUMBER() OVER (order by (select 1)) as RowNumber,
c.value(''../#title'',''VARCHAR(max)'') AS [title]
FROM
FormTable AS tbl
CROSS APPLY
tbl.DataXML.nodes(''/Form[1]/Section/*[local-name() = ("PickList", "InputNumber"")]/*[local-name() = ("Value", "Option")]'') AS t(c)
WHERE
Formid = '10';
Now I want to return one more column called value and the result should look be:
category -- row number -- title -- value
metric1 -- 1 -- Length of the box -- 50
metric1 -- 2 -- Is the shape square? -- Yes
How to fetch the value column?

If you're wanting the value (inner text) of the current element just use c.value('.', 'VARCHAR(max)') as [value], for example:
create table FormTable (
FormID int not null,
DataXML xml
);
insert FormTable (FormID, DataXML) values
(10, N'<Form>
<Section name="metric1">
<InputNumber type="int" min="0" max="100" title="Length of the box.">
<Value>50</Value>
</InputNumber>
<PickList title="Is the shape square?">
<Option selected="true">Yes</Option>
<Option>No</Option>
<Option>No standard shape</Option>
</PickList>
</Section>
</Form>');
SELECT
tbl.FormId,
category = c.value('../../#name', 'VARCHAR(max)'),
ROW_NUMBER() OVER (order by t.c) as RowNumber,
c.value('../#title', 'VARCHAR(max)') AS [title],
c.value('.', 'VARCHAR(max)') as [value]
FROM
FormTable AS tbl
CROSS APPLY
tbl.DataXML.nodes('/Form[1]/Section/*[local-name() = ("PickList", "InputNumber")]/*[local-name() = ("Value", "Option")]') AS t(c)
WHERE
FormID = 10;
Which yields the result:
FormId category RowNumber title value
10 metric1 1 Length of the box. 50
10 metric1 2 Is the shape square? Yes
10 metric1 3 Is the shape square? No
10 metric1 4 Is the shape square? No standard shape

Reading xml data in SQL

I am reading data from xml in Sql.
Here is my xml :
Declare #MainXml XML =
'<?xml version="1.0" encoding="utf-8"?>
<result>
<details>
<admin>
<code>555</code>
</admin>
<claimhistory>
<claim id="1" number="100">
<account>Closed</account>
</claim>
<claim id="2" number="200">
<account>Closed</account>
</claim>
</claimhistory>
</details>
</result>'
Reading data like this:
select
C.X.value('(admin/code)[1]', 'varchar(max)') as Code,
A.X.value('#id', 'varchar(max)') as Id,
A.X.value('#number', 'varchar(max)') as No,
A.X.value('(account)[1]', 'varchar(max)') as Status
from
#MainXml.nodes('result/details') as C(X)
cross apply
C.X.nodes('claimhistory/claim') as A(X)
This is returning:
Code Id No Status
---------------------
555 1 100 Closed
555 2 200 Closed
Stored procedure contains above code.
Here datatable variable is used as an input for Stored Procedure. It contains id and name.
Declare #dtValue As [dbo].[DataTableDetails]
Insert Into #dtValue(Requested_Id, Name) Values(1, 'Tim');
Insert Into #dtValue(Requested_Id, Name) Values(2, 'Joe');
I want to add these names to select query based on matching Id of an xml to input.
Expected output -
Code Id No Status Name
----------------------------
555 1 100 Closed Tim
555 2 200 Closed Joe
Currently - after inserting the selected records from xml, I am using update query But table contains over a million records so it is effecting performance now.
Please suggest me.
Edited:
Tried with Join - [added below line in select query]
Select
C.X.value('(admin/code)[1]', 'varchar(max)') as Code,
A.X.value('#id', 'varchar(max)') as Id,
A.X.value('#number', 'varchar(max)') as No,
A.X.value('(account)[1]', 'varchar(max)') as Status,
CA.Name
from
#MainXml.nodes('result/details') as C(X)
cross apply
C.X.nodes('claimhistory/claim') as A(X)
join
#dtValue CA on CA.Requested_Id = A.X.value('#id', 'varchar(max)')

I'd recommend refactoring the way you're selecting from the XML like so:
select
C.X.value('(../../admin/code)[1]', 'varchar(max)') as Code,
C.X.value('#id', 'varchar(max)') as Id,
C.X.value('#number', 'varchar(max)') as No,
C.X.value('(account)[1]', 'varchar(max)') as Status,
dt.Name
from
#MainXml.nodes('result/details/claimhistory/claim') as C(X)
INNER JOIN #dtValue dt
ON dt.Requested_Id = C.X.value('(#id)[1]', 'int')
You don't actually want to CROSS APPLY the child nodes, you want them to be the primary part you're selecting from (i.e. one row per claim element) - it's then easy enough to select based on the grandparent node to get the Code value, and then you can properly INNER JOIN your table variable.
Full sample:
Declare #MainXml XML =
'<?xml version="1.0" encoding="utf-8"?>
<result>
<details>
<admin>
<code>555</code>
</admin>
<claimhistory>
<claim id="1" number="100">
<account>Closed</account>
</claim>
<claim id="2" number="200">
<account>Closed</account>
</claim>
</claimhistory>
</details>
</result>'
DECLARE #dtValue TABLE (Requested_Id int, Name varchar(10))
Insert Into #dtValue(Requested_Id, Name) Values(1, 'Tim'), (2, 'Joe');
select
C.X.value('(../../admin/code)[1]', 'varchar(max)') as Code,
C.X.value('#id', 'varchar(max)') as Id,
C.X.value('#number', 'varchar(max)') as No,
C.X.value('(account)[1]', 'varchar(max)') as Status,
dt.Name
from
#MainXml.nodes('result/details/claimhistory/claim') as C(X)
INNER JOIN #dtValue dt
ON dt.Requested_Id = C.X.value('(#id)[1]', 'int')

How do I get the child node values AND parent node values in SQL XPath

My data (passed into a parameter (#Data XML) in a Stored Procedure) looks like this:
<Records>
<Record id="1">
<Data>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Data>
<Result>
<StatusId>3</StatusId>
<ErrorCodes>
<Item>4</Item>
<Item>23</Item>
<Item>19</Item>
</ErrorCodes>
</Result>
</Record>
<Record id="2">
<Data>
<FirstName>Fred</FirstName>
<LastName>Blog</LastName>
</Data>
<Result>
<StatusId>2</StatusId>
<ErrorCodes>
<Item>1</Item>
<Item>3</Item>
</ErrorCodes>
</Result>
</Record>
</Records>
I want to select the Record id and the Error Codes, like this:
id Item
----------
1 4
1 23
1 19
2 1
2 3
The order of data doesn't matter.
The following gets me the Error Codes, but not the Record id:
SELECT Data.value('.', 'int') as ErrorCode
FROM #Data.nodes('/Records/Record/Result/ErrorCodes/*') AS data(Data)

This expression should get you parent-of-parent-of-parent element:
Data.query('../../..')
...so try something like this (untested)...
SELECT
id = Data.value('../../../#id', 'int'),
item = Data.value('.', 'int')
FROM #Data.nodes('/Records/Record/Result/ErrorCodes/*') AS data(Data)

SQL Server BULK INSERT fixed length char data

I use SQL Server 2008 and have a table with 5 char typed columns.
CREATE TABLE [dbo].[deviceDataBulk](
[f1] [char](9) NULL,
[f2] [char](5) NULL,
[f3] [char](7) NULL,
[f4] [char](7) NULL,
[f5] [char](6) NULL)
I also have a bcp format file ;
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="9" COLLATION="Turkish_CI_AS"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="5" COLLATION="Turkish_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="7" COLLATION="Turkish_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="7" COLLATION="Turkish_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="6" COLLATION="Turkish_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="f1" NULLABLE="YES" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="2" NAME="f2" NULLABLE="YES" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="3" NAME="f3" NULLABLE="YES" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="4" NAME="f4" NULLABLE="YES" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="5" NAME="f5" NULLABLE="YES" xsi:type="SQLCHAR"/>
</ROW>
My data file contains fixed length char data with no field terminators in each line. So, a full line will be 34 characters long.
My problem is field 4 and field 5 may not be present for each row. I may have 21 characters long line or 28 characters long line in that file.
There is no case that field 5 exists and field 4 not.
Possible scenarios for text file are ;
f1 f2 f3 f4 f5
f1 f2 f3 f4
f1 f2 f3
I couldn't insert this file with BULK INSERT. I want BULK INSERT to insert nulls when it doesn't have those fields, if the tool reaches end of line, just insert nulls for the rest of the fields.

How about a 2-step approach ? First load the data into a staging table as 'big rows', then use a second query to split the raw lines into their corresponding fields and handle the "missing f5 and/or f4 columns"-situation accordingly ?
Would look (more or less) like this : (untested!)
CREATE TABLE [dbo].[deviceDataBulk_staging](
[rowid] int IDENTITY(1 , 1) PRIMARY KEY,
[raw] [varchar](34) NOT NULL)
GO
BULK INSERT [deviceDataBulk_staging]
FROM '<your file>'
-- not sure if you really need a format-file here,
-- simply make sure to pass the correct line-separator if it is 'exotic'.
GO
INSERT [deviceDataBulk] (f1, f2, f3, f4, f5)
SELECT f1 = SubString([raw], 1 , 9),
f1 = SubString([raw], 10 , 5),
f1 = SubString([raw], 15 , 7),
f1 = (CASE WHEN Length([raw] < 22 THEN NULL ELSE SubString([raw], 22 , 7) END),
f1 = (CASE WHEN Length([raw] < 29 THEN NULL ELSE SubString([raw], 29 , 6) END)
FROM [deviceDataBulk_staging]
ORDER BY [rowid]
The Staging file would then look like :
The [rowid] is there to keep the order identical to the order originally in the file, you might not need it but IMHO the overhead is minimal and MSSQL isn't too keen on HEAP tables anyway so having it there is "A good thing [Tm]"

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Parse, filter nested XML in TSQL - sql-server

Related

Reading XML in SQL Server of the specified XML node's "name" and get its "value"

SQL Server query on xml datatype for the value present in the tag

Reading xml data in SQL

How do I get the child node values AND parent node values in SQL XPath

SQL Server BULK INSERT fixed length char data

Categories

Resources