Matching one attribute to another using XPath/XQuery in SQL Server 2008 - sql-server

Consider the XML and SQL:
declare #xml xml = '
<root>
<person id="11272">
<notes for="107">Some notes!</notes>
<item id="107" selected="1" />
</person>
<person id="77812">
<notes for="107"></notes>
<notes for="119">Hello</notes>
<item id="107" selected="0" />
<item id="119" selected="1" />
</person>
</root>'
select Row.Person.value('data(../#id)', 'int') as person_id,
Row.Person.value('data(#id)', 'int') as item_id,
Row.Person.value('data(../notes[#for=data(#id)][1])', 'varchar(max)') as notes
from #xml.nodes('/root/person/item') as Row(Person)
I end up with:
person_id item_id notes
----------- ----------- -------
77812 107 NULL
77812 119 NULL
11272 107 NULL
What I want is the 'notes' column to be pulled based on the #id attribute of the current item. If I replace [#for=data(#id)] in the selector with [#for=107] of course I get the value Some notes! in the last record. Is it possible to do this with XPath/XQuery, or am I barking up the wrong tree here? I think the problem is that
The XML is a bit awkward, yes, but I can't really change it I'm afraid.
I found one solution that works, but it feels awfully heavy for something like this.
select Item.Person.value('data(../#id)', 'int') as person_id,
Item.Person.value('data(#id)', 'int') as item_id,
Notes.Person.value('text()[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person/item') as Item(Person)
inner join #xml.nodes('/root/person/notes') as Notes(Person) on
Notes.Person.value('data(#for)', 'int') = Item.Person.value('data(#id)', 'int')
and
Notes.Person.value('data(../#id)', 'int') = Item.Person.value('data(../#id)', 'int')
Update!
I figured it out! I'm new at XQuery but this works, so I'm calling it job done :) I changed the query for the notes to:
Item.Person.value('
let $id := data(#id)
return data(../notes[#for=$id])[1]
', 'varchar(max)') as notes

I would suggest that you do a cross apply instead of doing ../ to find a parent node. According to query plan it is a lot faster.
select P.X.value('data(#id)', 'int') as person_id,
I.X.value('data(#id)', 'int') as item_id,
I.X.value('let $id := data(#id)
return data(../notes[#for=$id])[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person') as P(X)
cross apply P.X.nodes('item') as I(X)
You can even remove the ../ in the flwor with one extra cross apply gaining a bit more.
select P.X.value('#id', 'int') as person_id,
TI.id as item_id,
P.X.value('(notes[#for = sql:column("TI.id")])[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person') as P(X)
cross apply P.X.nodes('item') as I(X)
cross apply (select I.X.value('#id', 'int')) as TI(id)
Comparing the queries against each other I got 67% on your query 17% on my first and 16% on the second. Note: these figures only give you a hint on what query will actually be faster in reality. Test the against your data to know for sure.

Related

Unable to save data from xml format in SQL table

SELECT *
FROM #myHierarchy
FOR XML AUTO
Data is
<_x0040_myHierarchy element_id="1" parent_ID="1" NAME="itemCode" StringValue="Simmi" ValueType="string" />
I'm unable to load data in this query
SELECT #xml = dbo.ToXML(#myHierarchy);
SELECT
a.b.value('#ItemCode', 'varchar(20)') AS ItemCode
FROM
#xml.nodes('/root/_x0040_myHierarchy') a(b)
In this query, itemcode is blank. How can I load data using this query?
Your sample XML does not contain any attribute ItemCode - it has these attributes:
element_id
parent_ID
NAME
StringValue
ValueType
So which value do you really want to read out from the XML element?
Update: to retrieve the StringValue attribute, use this code:
SELECT
XC.value('#StringValue', 'varchar(50)')
FROM
#xml.nodes('/_x0040_myHierarchy') AS XT(XC)
If your XML contains a <root> ..... </root> root element, and multiple <_x0040_myHierarchy> elements inside, and you want to extract the one with #Name = 'itemCode' - then you need to use this SELECT:
SELECT
XC.value('#StringValue', 'varchar(50)')
FROM
#xml.nodes('/root/_x0040_myHierarchy') AS XT(XC)
WHERE
XC.value('#NAME', 'varchar(50)') = 'itemCode'

Trouble searching for specific values using XMLNAMESPACES

I've taken a look and wasn't able to find an answer that would help me with my issue. (Most probably due to my poor skills)
However was hoping that someone would be able to point me in the right direction.
The issue:
I have an XML column in the table that I am querying and I need the query to return rows all rows with a specific value.
An example from the xml column
<EventD xmlns="http://example1" xmlns:e3q1="http://example2" xmlns:xsi="http://example3" xsi:type="e3q1:Valuechange">
<e3q1:NewValue>Running</e3q1:NewValue>
<e3q1:OldValue>Stopped</e3q1:OldValue>
</EventD>
What I would need to do is to return all rows that have "NewValue" as "Running"
;WITH XMLNAMESPACES ('example2' as e3q1)
select top 100
Xml.value('(EventD/NewValue)[1]', 'varchar(100)'),
* from Table1
and Xml.value('(EventD/NewValue)[1]', 'varchar(100)') like 'Running'
Yet this does not seem to return any rows at all, would be really grateful if someone could point out what am i doing wrong here.
Thanks in advance,
You do declare the namespace e3q1 (although it's missing the http:// and you don't use it later...), but you did not declare the default namespace
DECLARE #tbl TABLE([Xml] XML);
INSERT INTO #tbl VALUES
(
N'<EventD xmlns="http://example1" xmlns:e3q1="http://example2" xmlns:xsi="http://example3" xsi:type="e3q1:Valuechange">
<e3q1:NewValue>Running</e3q1:NewValue>
<e3q1:OldValue>Stopped</e3q1:OldValue>
</EventD>'
);
;WITH XMLNAMESPACES (DEFAULT 'http://example1', 'http://example2' as e3q1)
SELECT [Xml].value('(EventD/e3q1:NewValue)[1]', 'varchar(100)')
from #tbl AS Table1
WHERE Xml.value('(EventD/e3q1:NewValue)[1]', 'varchar(100)') like 'Running';
But this approach is - at least I think so - not what you really want. I think you are looking for .nodes(). The next lines show as alternative an approach to replace namespaces with a wildcard. But I would recommend to be as specific as possible.
SELECT Only.Running.value('text()[1]', 'varchar(100)')
from #tbl AS Table1
CROSS APPLY Xml.nodes('*:EventD/*:NewValue[text()="Running"]') AS Only(Running)
Looks like  Jeroen Mostert already says all necessary
I can only add - name of namespace is not important, only uri
declare #xml xml='<EventD xmlns="http://example1" xmlns:e3q1="http://example2" xmlns:xsi="http://example3" xsi:type="e3q1:Valuechange">
<e3q1:NewValue>Running</e3q1:NewValue>
<e3q1:OldValue>Stopped</e3q1:OldValue>
<xsi:test>test</xsi:test>
<test1>test1</test1>
</EventD>'
;WITH XMLNAMESPACES ('http://example2' as test)
select
#xml.query('(*/test:*)')
compare with result of
select
#xml.query('(*/*)')

Retrieving xml attribute using Xquery

I am using the below query to select the values of XML from attributes ad elements of the XML file but I am not able to read the seq, id, reported dated attributes from XML page
so any one please suggest How to get values of attributes using this Query.
select a_node.value('(./text())[1]', 'var char(50)') AS c_val,
c1_node.value('(./text())[1]', 'var char(50)') AS c_val 2,
ca_node.value('(./text())[1]', 'var char(50)') AS c_val3,
d_node.value('(./text())[1]', 'var char(50)') ,
e_node.value('(./text())[1]', 'varchar(50)') ,
f_node.value('(./text())[1]', 'var char(50)')
FROM #xmlData.nodes('/Reports/x:InquiryResponse/x:ReportData/x:AccountDetails/x:Account') AS b(b_node)
outer APPLY b.b_node.nodes('./x:primarykey') AS pK_InquiryResponse (a_node)
outer APPLY b.b_node.nodes('./x:seq') AS CustomerCode (c1_node)
outer APPLY b.b_node.nodes('./x:id') AS amount (ca_node)
outer APPLY b.b_node.nodes('./x:ReportedDate') AS CustRefField (d_node)
outer APPLY b.b_node.nodes('./x:AccountNumber') AS ReportOrderNO (e_node)
outer apply b.b_node.nodes('./x:CurrentBalance') as additional_id (f_node);
Edit: Xml Snippets Provided in Comments
<sch:Account seq="2" id="345778174" ReportedDate="2014-01-01">
<sch:AccountNumber>TSTC1595</sch:AccountNumber>
<sch:CurrentBalance>0</sch:CurrentBalance>
<sch:Institution>Muthoot Fincorp Limited</sch:Institution>
<sch:PastDueAmount>0</sch:PastDueAmount>
<sch:DisbursedAmount>12000</sch:DisbursedAmount>
<sch:LoanCategory>JOG Group</sch:LoanCategory>
</sch:Account>
<sch:Account seq="2" id="345778174" ReportedDate="2014-01-01">
<sch:BranchIDMFI>THRISSUR ROAD</sch:BranchIDMFI>
<sch:KendraIDMFI>COSTCO/RECENT-107</sch:KendraIDMFI>
</sch:Account>
Parsing XQuery with an Xml Loose #Variable
Assuming an Xml document similar to this (viz with all the attributes on one element):
DECLARE #xmlData XML =
N'<Reports xmlns:x="http://foo">
<x:InquiryResponse>
<x:ReportData>
<x:AccountDetails>
<x:Account x:primarykey="pk" x:seq="sq" x:id="id"
x:ReportedDate="2014-01-01T00:00:00" />
</x:AccountDetails>
</x:ReportData>
</x:InquiryResponse>
</Reports>';
You can scrape the attributes out as follows:
WITH XMLNAMESPACES('http://foo' AS x)
select
Nodes.node.value('(#x:primarykey)[1]', 'varchar(50)') AS c_val,
Nodes.node.value('(#x:seq)[1]', 'varchar(50)') AS c_val2,
Nodes.node.value('(#x:id)[1]', 'varchar(50)') AS c_val3,
Nodes.node.value('(#x:ReportedDate)[1]', 'DATETIME') as someDateTime
FROM
#xmlData.nodes('/Reports/x:InquiryResponse/x:ReportData/x:AccountDetails/x:Account')
AS Nodes(node);
Attributes don't need text() as they are automatically strings
It is fairly unusual to have attributes in a namespace - drop the xmlns alias prefix if they aren't.
SqlFiddle here
Edit - Parsing Xml Column
Namespace dropped from the attributes
-Assumed that you have the data in a table, not a variable, hence the APPLY requirement. Note that OUTER APPLY will return nulls, e.g. useful only if you have rows with
empty Xml or missing Xml Elements. CROSS APPLY is the norm (viz
applying the xpath to each row selected on the LHS table)
Elements are accessed similar to attributes, just without #
WITH XMLNAMESPACES('http://foo' AS x)
select
Nodes.node.value('(#seq)[1]', 'varchar(50)') AS c_val2,
Nodes.node.value('(#id)[1]', 'varchar(50)') AS c_val3,
Nodes.node.value('(#ReportedDate)[1]', 'DATETIME') as someDateTime,
Nodes.node.value('(x:AccountNumber)[1]', 'VARCHAR(50)') as accountNumber
FROM
MyXmlData z
CROSS APPLY
z.XmlColumn.nodes('/Reports/x:InquiryResponse/x:ReportData/x:AccountDetails/x:Account')
AS Nodes(node);
Updated Fiddle
Edit Xml File off Disk
Here's the same thing for an xml file read from disk. Note that once you have the data in an XML variable (#MyXmlData) that you don't need to CROSS APPLY to anything - just supply xpath to select the appropriate node, and then scrape out the elements and attributes.
DECLARE #MyXmlData XML;
SET #MyXmlData =
( SELECT * FROM OPENROWSET ( BULK N'c:\temp\file3098.xml', SINGLE_CLOB ) AS MyXmlData );
-- Assuming all on the one element, no need for all the applies
-- attributes don't have a text axis (they are automatically strings
WITH XMLNAMESPACES('http://foo' AS x)
select
Nodes.node.value('(#seq)[1]', 'varchar(50)') AS c_val2,
Nodes.node.value('(#id)[1]', 'varchar(50)') AS c_val3,
Nodes.node.value('(#ReportedDate)[1]', 'DATETIME') as someDateTime,
Nodes.node.value('(x:AccountNumber)[1]', 'VARCHAR(50)') as accountNumber
FROM
#MyXmlData.nodes('/Reports/x:InquiryResponse/x:ReportData/x:AccountDetails/x:Account')
AS Nodes(node);

Baseball XML to SQL query - optimize

source data looks comes from the following, freely available XML files describing major league baseball games.
http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_09/gid_2013_04_09_atlmlb_miamlb_1/inning/
I have created a SQL Server table that contains a row for every GamePK/inning, with an XML column named PBP. Each file in the folder above becomes a row in this table. The query below is my attempt to parse the XML into a record set. It works but is very slow for a large number of rows, and very repetitive - seems like there should be a better way to do this without the UNION clause. Any help in improving/optimizing is appreciated
select
i.GamePK
,inn.value('#num', 'int') as inning
,itop.value('1', 'int') as IsTop
,itop.value('#num', 'int') as abNum
,itop.value('#batter', 'int') as batter
-- clip
,itoppit.value('#des', 'varchar(32)') as pitdesc
,itoppit.value('#id', 'int') as seq
,itoppit.value('#type', 'varchar(8)') as pittype
-- clip
from tblInnings i
cross apply PBP.nodes('/inning') as inn(inn)
cross apply inn.nodes('top/atbat') as itop(itop)
cross apply itop.nodes('pitch') as itoppit(itoppit)
union
select
i.GamePK
,inn.value('#num', 'int') as inning
,ibot.value('0', 'int') as IsTop
,ibot.value('#num', 'int') as abNum
,ibot.value('#batter', 'int') as batter
-- clip
,ibotpit.value('#des', 'varchar(32)') as pitdesc
,ibotpit.value('#id', 'int') as seq
,ibotpit.value('#type', 'varchar(8)') as pittype
--clip
from tblInnings i
cross apply PBP.nodes('/inning') as inn(inn)
cross apply inn.nodes('bottom/atbat') as ibot(ibot)
cross apply ibot.nodes('pitch') as ibotpit(ibotpit)
If you're using a recent version of SQL Server, there's a new column data type (XML).
You can apply xpath to it, making querying the column much easier.
Instead of trying to store the XML as a string in your DB, I'd suggest you actually store it as XML, and treat it as XML.
There is a learning curve. You'll need to be familiar with XPATH, but it's not rocket science.
an example:
SELECT Id, PartitionMonth, EmailAddress, AcquisitionCodeId, FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item/#value)[1]', 'varchar(200)')
FROM Leads.Leads WITH (NOLOCK)
WHERE Id = 190708
Another example retrieving values by key:
SELECT r.EmailAddress, ar.Ip, ar.DateLog,
ar.FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item[#key="First Name"]/#value)[1]', 'varchar(20)') FirstName,
ar.FieldValues.value('
declare namespace s="http://domain.com/FieldValues.xsd";
data(/s:FieldValues/s:item[#key="Last Name"]/#value)[1]', 'varchar(20)') LastName
FROM Records.Records r WITH (NOLOCK)
JOIN Records.AcquisitionRecords ar WITH (NOLOCK) ON r.Id = ar.Id
WHERE ar.AcquisitionCodeId IN (19, 21, 30, 34, 36)
AND ar.DateLog BETWEEN '1-mar-09' AND '31-mar-09'
A good place to get started on XML in SQL Server
http://msdn.microsoft.com/en-US/library/ms189887(v=sql.90).aspx

XPath in T-SQL query

I have two tables, XMLtable and filterTable.
I need all the XMLtable.ID values from XMLtable where the data in Col_X contains MyElement, the contents of which matches filterColumn in filterTable.
The XML for each row in Col_X may contain multiple MyElement's, and I want that ID in case ANY of those elements match ANY of the values in filterColumn.
The problem is that those columns are actually of varchar(max) datatype, and the table itself is huge (like 50GB huge). So this query needs to be as optimized as possible.
Here's an example for where I am now, which merely returns the row where the first matching element equals one of the ones I'm looking for. Due to a plethora of different error messages I can't seem to be able to change this to compare to all of the same named elements as I want to.
SELECT ID,
CAST(Col_X AS XML).value('(//*[local-name()=''MyElement''])', N'varchar(25)')
FROM XMLtable
...and then compare the results to filterTable. This already takes 5+ minutes.
What I'm trying to achieve is something like:
SELECT ID
FROM XMLtable
WHERE CAST(Col_X AS XML).query('(//*[local-name()=''MyElement''])')
IN (SELECT filterColumn FROM filterTable)
The only way I can currently achieve this is to use the LIKE operator, which takes like a thousand times longer.
Now, obviously it's not an option to start changing the datatypes of the columns or anything else. This is what I have to work with. :)
Try this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('(//*[local-name()="MyElement"])') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
and this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
Update
I think that you are experiencing what is described here Compute Scalars, Expressions and Execution Plan Performance. The cast to XML is deferred to each call to the value function. The test you should make is to change the datatype of Col_X to XML.
If that is not an option you could query the rows you need from XMLTable into a temporary table that has an XML column and then do the query above against the temporary table without the need to cast to XML.
CREATE TABLE #XMLTable
(
ID int,
Col_X xml
)
INSERT INTO #XMLTable(ID, Col_X)
SELECT ID, Col_X
FROM XMLTable
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'varchar(25)') AS 'MyElementValue'
FROM #XMLTable
CROSS APPLY Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
DROP TABLE #XMLTable
You could try something like this. It does at least functionally do what you want, I believe. You'll have to explore its performance with your data set empirically.
SELECT ID
FROM
(
SELECT xt.ID, CAST(xt.Col_X AS XML) [content] FROM XMLTable AS xt
) AS src
INNER JOIN FilterTable AS f
ON f.filterColumn IN
(
SELECT
elt.value('.', 'varchar(25)')
FROM src.content.nodes('//MyElement') AS T(elt)
)
I finally got this working, and with far better performance than I expected. Below is the script that finally produced the correct result in 5 - 6 minutes.
SELECT ID, myE.value('.', N'VARCHAR(25)') AS 'MyElementValue'
FROM (SELECT ID, CAST(Col_X AS XML) AS Col_X
FROM XMLTable) T1
CROSS APPLY Col_X.nodes('(//*[local-name()=''MyElement''])' T2(myE)
WHERE myE.value('.', N'varchar(25)') IN (SELECT filterColumn FROM filterTable)
Thanks for the help tho people!

Resources