I'm having some issues with trying to get the some XML data (stored as text in an old MS SQL Server 2012) parsed and into a usable format.
XML data is a string, but when I convert it to XML, it look like this:
<?xml version="1.0" encoding="utf-8"?>
<header1>
<header2>
<OrderFormHeader>
<AccountNum>123456</AccountNum>
<OrderNum>000123987</OrderNum>
<OrderDetails>
<CompanyName>Biznez1</CompanyName>
<CompAddressInfo>
<City>Phoenix</City>
<State>AZ</State>
</CompAddressInfo>
<ShipTo>TRUE</ShipTo>
<BillTo>FALSE</BillTo>
</OrderDetails>
</OrderFormHeader>
<OrderFormDetails>
<OrderFormLines>
<ItemNum>000001</ItemNum>
<InventoryNum>INV-001-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Bandaids</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789123</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</OrderFormDetails>
<OrderFormLines>
<ItemNum>000002</ItemNum>
<InventoryNum>INV-001-000002</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>QTips</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789987</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
<OrderFormLines>
<ItemNum>000003</ItemNum>
<InventoryNum>INV-003-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Scissors</ItemDesc>
<UnitofMeasure>Each</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456987321</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</header2>
</header1>
Needless to say, it's a crazy XML (at least to me).
(Note: There are multiple sets of OrderFormDetails nested within the object and parsing them via my code seems to fan out on the ItemNum and InventoryNum. I've removed the UPC code stuff as that was causing additional fan out, but wouldn't mind bringing that back into my code)
With that said, my current SQL code uses a table variable to take the data from the table, correct the UTF-8 and put it into an XML format. From there, I use the CROSS APPLY functions to get the data out, but it has severe fan-out issues where it will show the data multiple times rather than just 1 row each:
DECLARE #xml TABLE (IMPORTED_XML xml)
INSERT INTO #xml
SELECT
CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS XML) AS IMPORTED_XML
FROM MyXMLTable as mxt
with temp1 AS (
SELECT DISTINCT
sales_order.value('(./AccountNum/text())[1]','nvarchar(max)') AS ACCOUNT_NUM
, sales_order.value('(./OrderNum/text())[1]','nvarchar(max)') AS ORDER_NUM
, extra_so.value('(./CompanyName/text())[1]','nvarchar(max)') AS COMPANY_NAME
, base.value('(./ItemNum/text())[1]','nvarchar(max)') AS ITEM_ID
, base.value('(./InventoryNum/text())[1]','nvarchar(max)') AS INVENTORY_NUM
, sales.value('(./QtyOrdered/text())[1]','nvarchar(max)') AS QTY_ORDERED
, sales.value('(./UnitofMeasure/text())[1]','nvarchar(max)') AS ITEM_UOM
, sales.value('(./ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
FROM #xml
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('//OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY core.nodes('//OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY base.nodes('//OtherDetails') as sales(sales)
CROSS APPLY sales_order.nodes('//OrderDetails') AS extra_so(extra_so)
CROSS APPLY sales.nodes('//ItemCode') as itmcode(itmcode)
)
select * from temp1 order by item_desc asc
This seems to mostly work, but it ends up with multiple rows of data for the same stuff... I'm used to using the lateral flatten function in Snowflake, but not this XML parsing in SQL Server 2012. Any insight into this? Thank you in advance for your help
Your issue is that you are cross-joining each nested node all the way back from the root, because you are using //.
There are other points to note:
You don't need temporary tables, you can CROSS APPLY everything together in one query
You don't need REPLACE if the column is already varchar, only if it's nvarchar.
You don't need to use .nodes on every level of nesting, you only need it if you want multiple items from a single level.
Pick your data types carefully, does everything have to be nvarchar(max)?
SELECT
sales_order.value('(AccountNum/text())[1]','varchar(50)') AS ACCOUNT_NUM
, sales_order.value('(OrderNum/text())[1]','varchar(50)') AS ORDER_NUM
, sales_order.value('(OrderDetails/CompanyName/text())[1]','nvarchar(200)') AS COMPANY_NAME
, base.value('(ItemNum/text())[1]','varchar(50)') AS ITEM_ID
, base.value('(InventoryNum/text())[1]','varchar(50)') AS INVENTORY_NUM
, sales.value('(QtyOrdered/text())[1]','int') AS QTY_ORDERED
, sales.value('(UnitofMeasure/text())[1]','varchar(20)') AS ITEM_UOM
, sales.value('(ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
, itmcode.value('(CodeType/text())[1]','varchar(20)') AS itemcodetype
, itmcode.value('(CodeID/text())[1]','varchar(50)') AS itemcodeID
FROM MyXMLTable as mxt
CROSS APPLY (VALUES( CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS xml) )) v(IMPORTED_XML)
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY core.nodes('OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY base.nodes('OtherDetails') as sales(sales)
CROSS APPLY sales.nodes('ItemCode') as itmcode(itmcode);
db<>fiddle
Related
In a T-SQL stored procedure, when supplied with two tables each of which has the same number of rows, how can I pair-wise match the rows based on row order rather than a join criteria?
Basically, an equivalent of .NET's IEnumerable.Zip() method?
I'm using SQL Server 2016.
Background
The purpose of the stored procedure is to act as an integration adapter between two other applications. I do not control the source code for either application.
The "client" application contains extensibility objects which can be configured to invoke a stored procedure in an SQL Server database. The configuration options for the extensibility point allow me to name a stored procedure which will be invoked, and provide a statically configured list of named parameters and their associated values, which will be passed to the stored procedure. Only scalar parameters are supported, not table-valued parameters.
The stored procedure needs to collect data from the "server" application (which is exposed through an OLE-DB provider) and transform it into a suitable result set for consumption by the client application.
For maintenance reasons, I want to avoid storing any configuration in the adapter database. I want to write generic, flexible logic in the stored procedure, and pass all necessary configuration information as parameters to that stored procedure.
The configuration information that's needed for the stored procedure is, essentially, equivalent to the following table variable schema:
DECLARE #TableOfServerQueryParameterValues AS TABLE (
tag NVARCHAR(50),
filterexpr NVARCHAR(500)
)
This table can then be used as the left-hand side of JOIN and CROSS APPLY queries in the stored proc which are run against the "server" application interfaces.
The problem I encountered is that I did not know of any way of passing a table of parameter info from the client application, because its extensibility points only include scalar parameter support.
So, I thought I would pass two scalar parameters. One would be a comma-separated list of tag values. The other would be a comma-separated list of filterexpr values.
Inside the stored proc, it's easy to use STRING_SPLIT to convert each of those parameters into a single-column table. But then I needed to match the two columns together into a two-column table, which I could then use as the basis for INNER JOIN or CROSS APPLY to query the server application.
The best solution I've come up with so far is selecting each table into a table variable and use the ROW_NUMBER() function to assign a row number, and then join the two tables together by matching on the extra ROW_NUMBER column. Is there an easier way to do it than that? It would be nice not to have to declare all the columns in the table variables.
Your suggestion of using row_number seems sound.
Instead of table variables you can use subqueries or CTEs; there should be little difference overall, though avoiding the table variable reduces the number of passes you need to make & avoids the additional code to maintain.
select a.*, b.* --specify whatever columns you want to return
from (
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) a
full outer join --use a full outer if your have different numbers of rows in the tables & want
--results from the larger table with nulls from the smaller for the bonus rows
--otherwise use an inner join to only get matches for both tables
(
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) b
on b.r = a.r
Update
Regarding #PanagiotisKanavos's comment on passing structured data, here's a simple example of how you could convert a value passed as an xml type to table data:
declare #tableA xml = '<TableA>
<row><col1>x</col1><col2>Anne</col2><col3>Droid</col3></row>
<row><col1>y</col1><col2>Si</col2><col3>Borg</col3></row>
<row><col1>z</col1><col2>Roe</col2><col3>Bott</col3></row>
</TableA>'
select row_number() over (order by aRow) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
You may get a performance boost over the above by using the following. This creates a dummy column allowing us to do an order by where we don't care about the order. This should be faster than the above as ordering by 1 will be simpler than sorting based on the xml type.
select row_number() over (order by ignoreMe) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
cross join (select 1) a(ignoreMe)
If you do care about the order, you can order by the data's fields, as such:
select row_number() over (order by x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') ) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
I have the below query I am trying to return distinct value from the second .value method.
Here is what I have tried. I tried adding 'distinct-values(.)' to return only distinct but it is still returning the same results as a normal '.' How can I select distinct values from just one column?
;WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
SELECT
a.value('.', 'NVARCHAR(50)') AS Visitor
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
Here is the normal result
Here is whay I am trying to get
Xml Like this
<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>
It is your cross apply with your nodes statement listed twice that is showing this problem. Do what you are doing with the 'nodes' syntax with a 'query' extension instead followed up by a 'value' extension to show what is in the xml directly from extension instead of relying on the nodes with a cross apply. The problem is you are not displaying to the audience where you get that Id from? Are you determining that at run time from the xml itself or joining yet to another table or having another part of the xml not present? What in essence that is happening with the nodes is it is cross applying and saying: "I have two vales in that node heirarchy here they are." Then you are cross applying again a different node and it is returning the same thing twice. You must be careful when using cross apply twice exactly what it is doing. I can show the differentiation but without how I know you are relating back to 1 (are you just hunting for it somehow for the int after visitor?) I don't know how to represent exactly what you are wanting.
EDIT: Okay it is what I thought then. Now my code may be longer than some and I will admit there may be an easier way to do this however I would do three things:
Keep your cross apply with nodes because nodes is useful in that it will repeat rows you need to count on. However I would add an artificial flag for the name you use for the node. Then I would union together two select statements using the nodes.
I would then use a nested select as a from statement and then determine row number with a windowed function based on the flags I just set.
I would then nest that again and then use the very same row number as the Id of the row number and then I would do some syntactic pivoting based on a max(case when) based on the flags I arbitrarily set.
I usually prefer cte's but since your XML namespace has a 'with' beginning and the first cte does as well I forgot how the syntax is to work around that. Nested Selects IMHO can get hairy when there are multiple so I choose CTE's usually but in this case I did a nested select inside of another nested select. I hope this helps:
declare #xml xml = '<upx:Root xmlns:upx="http://www.w3.org/2001/XMLSchema">
<upx:Visitors>
<upx:Visitor>Visitor1</upx:Visitor>
<upx:Visitor>Visitor2</upx:Visitor>
</upx:Visitors>
<upx:Senders>
<upx:Sender>Sender1</upx:Sender>
</upx:Senders>
</upx:Root>'
;
declare #Xmltable table ( xmlDocument xml);
insert into #XmlTable values (#xml);
WITH XMLNAMESPACES (default 'http://www.w3.org/2001/XMLSchema')
select
pos as Id
, max(case when Listing = 'Visitors' then Value end) as Visitors
, max(case when Listing = 'Senders' then Value end) as Senders
from
(
select
*
, row_number() over(partition by Listing order by Value) as pos
from
(
SELECT
'Visitors' as Listing
, a.value('.', 'NVARCHAR(50)') AS Value
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Visitors/Visitor') AS aa(a)
union
SELECT
'Senders'
, b.value('distinct-values(.)', 'NVARCHAR(50)') AS Sender
FROM #XmlTable AS X
CROSS APPLY xmlDocument.nodes('Root/Senders/Sender') AS bb(b)
) as u
) as listing
group by pos
source data looks comes from the following, freely available XML files describing major league baseball games.
http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_09/gid_2013_04_09_atlmlb_miamlb_1/inning/
I have created a SQL Server table that contains a row for every GamePK/inning, with an XML column named PBP. Each file in the folder above becomes a row in this table. The query below is my attempt to parse the XML into a record set. It works but is very slow for a large number of rows, and very repetitive - seems like there should be a better way to do this without the UNION clause. Any help in improving/optimizing is appreciated
select
i.GamePK
,inn.value('#num', 'int') as inning
,itop.value('1', 'int') as IsTop
,itop.value('#num', 'int') as abNum
,itop.value('#batter', 'int') as batter
-- clip
,itoppit.value('#des', 'varchar(32)') as pitdesc
,itoppit.value('#id', 'int') as seq
,itoppit.value('#type', 'varchar(8)') as pittype
-- clip
from tblInnings i
cross apply PBP.nodes('/inning') as inn(inn)
cross apply inn.nodes('top/atbat') as itop(itop)
cross apply itop.nodes('pitch') as itoppit(itoppit)
union
select
i.GamePK
,inn.value('#num', 'int') as inning
,ibot.value('0', 'int') as IsTop
,ibot.value('#num', 'int') as abNum
,ibot.value('#batter', 'int') as batter
-- clip
,ibotpit.value('#des', 'varchar(32)') as pitdesc
,ibotpit.value('#id', 'int') as seq
,ibotpit.value('#type', 'varchar(8)') as pittype
--clip
from tblInnings i
cross apply PBP.nodes('/inning') as inn(inn)
cross apply inn.nodes('bottom/atbat') as ibot(ibot)
cross apply ibot.nodes('pitch') as ibotpit(ibotpit)
If you're using a recent version of SQL Server, there's a new column data type (XML).
You can apply xpath to it, making querying the column much easier.
Instead of trying to store the XML as a string in your DB, I'd suggest you actually store it as XML, and treat it as XML.
There is a learning curve. You'll need to be familiar with XPATH, but it's not rocket science.
an example:
SELECT Id, PartitionMonth, EmailAddress, AcquisitionCodeId, FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item/#value)[1]', 'varchar(200)')
FROM Leads.Leads WITH (NOLOCK)
WHERE Id = 190708
Another example retrieving values by key:
SELECT r.EmailAddress, ar.Ip, ar.DateLog,
ar.FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item[#key="First Name"]/#value)[1]', 'varchar(20)') FirstName,
ar.FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item[#key="Last Name"]/#value)[1]', 'varchar(20)') LastName
FROM Records.Records r WITH (NOLOCK)
JOIN Records.AcquisitionRecords ar WITH (NOLOCK) ON r.Id = ar.Id
WHERE ar.AcquisitionCodeId IN (19, 21, 30, 34, 36)
AND ar.DateLog BETWEEN '1-mar-09' AND '31-mar-09'
A good place to get started on XML in SQL Server
http://msdn.microsoft.com/en-US/library/ms189887(v=sql.90).aspx
Is the result of a SELECT like the one below using XQuery in SQL Server guaranteed to be in document order, that is in the original order the nodes are listed in the xml string?
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
SELECT t.i.value('.', 'int')
FROM #x.nodes('/hey/#i') t(i)
I ask because in general SELECT statements do not guarantee an order unless one is provided.
Also, if this order is (or is not) guaranteed, is that documented somewhere officially, maybe on Microsoft's web site?
Last, is it possible to sort opposite of document order or do other strange sorts and queries based on the original document order from within the SELECT statement?
The
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'
is an example of an XML "sequence". By definition, without a sort order, the select should always come back in the documents original order.
As already mentioned, you can change the sort order. Here is one example:
SELECT t.i.value('.', 'int') as x, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as rn
FROM #x.nodes('/hey/#i') t(i)
order by rn desc
Here is some information about sequences on the Microsoft site:
http://msdn.microsoft.com/en-us/library/ms179215(v=sql.105).aspx
Here is a more general discussion of a sequence in Xquery.
http://en.wikibooks.org/wiki/XQuery/Sequences
I realize that my original answer above is incorrect after reading the page on the Microsoft site I referred to above. That page says that you need a comma between elements to construct a sequence. The example given is not a "sequence". However, my original info about changing the sort order stands :)
I think that same rules of Select apply here, no matter if you're selectig from ordinary table, or XML. There's selection part, and there's projection part, and the engine can take different paths to retrieve your data (from the middle sideways, for example). Unfortunately I can't find any official document to support that.
And for sure, there's no intrinsic table/document order that you can access and manipulate.
You can add order by clause to the select statement.
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>'SELECT [column1] = t.i.value('.', 'int') FROM #x.nodes('/hey/#i') t(i) order by column1 desc
I know what you mean about this. I suspect the order would be document order, but the documentation does not make it clear, and relying on it implicitly just isn't nice.
One way to be confident about the order would be:
DECLARE #x XML = '<hey i="3"/><hey i="4"/><hey i="0"/>';
select #x.value('(/hey[sql:column("id")]/#i)[1]', 'int'), id
from (
select row_number() over (order by (select 0)) as id
from #x.nodes('/hey') t(i)
) t
order by id
This would then give you a way to answer your other question, i.e. getting the values in reverse, or some other, order.
N.B. This is going to be much slower than just using nodes as the size of your XML increases.
Not sure if this question makes for some poor performance down the track, but seems to at least feel "a better way" right now..
What I am trying to do is this:
I have a table called CONTACTS which amongst other things has a primary key field called memberID
I also have an XML field which contains the ID's of your friends (for example).. like:
<root><id>2</id><id>6</id><id>14</id></root>
So what I am trying to do via a stored proc is pass in say your member ID, and return all of your friends info, for example:
select name, address, age, dob from contacts
where id... xml join stuff...
The previous way I had it working (well sort of!) selected all the XML nodes (/root/id) into a temp table, and then did a join from that temp table to the contact table to get the contact fields...
Any help much appreciated.. just a bit overloaded from the .query .nodes examples, and of course which is maybe a better way of doing this...
THANKS IN ADVANCE!
<-- EDIT -->
I did get something working, but looks like a SQL frankenstein statement!
Basically I needed to get the friends contact ID's from the XML field, and populate into a temp table like so:
Declare #contactIDtable TABLE (ID int)
INSERT INTO #contactIDtable (ID)
SELECT CONVERT(INT,CAST(T2.memID.query('.') AS varchar(100))) AS friendsID
FROM dbo.members
CROSS APPLY memberContacts.nodes('/root/id/text()') AS T2(memID)
But crikey! the convert/cast thing looks serious.. as I need to get an INT for the next bit which is the actual join to return the contact data as follows:
SELECT memberID, memberName, memberAddress1
FROM members
INNER JOIN #contactIDtable cid
ON members.memberID = cid.ID
ORDER BY memberName
RESULT...
Well it works.. in my case, my memberContacts XML field had 3 nodes (id's in this case), and the above query returned 3 rows of data (memberID, memberName, memberAddress1)...
The whole point of this of course was to try to save creating a many join table i.e. list of all my friends ID's... just not sure if the above actually makes this quicker and easier...
Anymore ideas / more efficient ways of trying to do this???
SQL Server's syntax for reading XML is one of the least intuitive around. Ideally, you'd want to:
select f.name
from friends f
join #xml x
on x.id = f.id
Instead, SQL Server requires you to spell out everything. To turn an XML variable or column into a "rowset", you have to spell out the exact path and think up two aliases:
#xml.nodes('/root/id') as table_alias(column_alias)
Now you have to explain to SQL Server how to turn <id>1</id> into an int:
table_alias.column_alias.value('.', 'int')
So you can see why most people prefer to decode XML on the client side :)
A full example:
declare #friends table (id int, name varchar(50))
insert #friends (id, name)
select 2, 'Locke Lamorra'
union all select 6, 'Calo Sanzo'
union all select 10, 'Galdo Sanzo'
union all select 14, 'Jean Tannen'
declare #xml xml
set #xml = ' <root><id>2</id><id>6</id><id>14</id></root>'
select f.name
from #xml.nodes('/root/id') as table_alias(column_alias)
join #friends f
on table_alias.column_alias.value('.', 'int') = f.id
In order to get your XML contents as rows from a "pseudo-table", you need to use the .nodes() on the XML column - something like:
DECLARE #xmlfield XML
SET #xmlfield = '<root><id>2</id><id>6</id><id>14</id></root>'
SELECT
ROOT.ID.value('(.)[1]', 'int')
FROM
#xmfield.nodes('/root/id') AS ROOT(ID)
SELECT
(list of fields)
FROM
dbo.Contacts c
INNER JOIN
#xmlfield.nodes('/root/id') AS ROOT(ID) ON c.ID = Root.ID.value('(.)[1]', 'INT')
Basically, the .nodes() defines a pseudo-table ROOT with a single column ID, that will contain one row for each node in the XPath expression, and the .value() selects a specific value out of that XML fragment.