SQL Server XQuery - Selecting a Subset - sql-server

Take for example the following XML:
Initial Data
<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<isbn>9999999999999</isbn>
<pages>500</pages>
<backing>paperback</backing>
</computer_book>
and:
<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
<pages>275</pages>
<backing>paperback</backing>
</cooking_book>
I have something similar in a single xml-typed column of a SQL Server 2008 database. Using SQL Server XQuery, would it be possible to get results such as this:
Resulting Data
<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<pages>500</pages>
</computer_book>
and:
<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
</cooking_book>
Please note that I am not referring to selecting both examples in one query; rather I am selecting each via its primary key (which is in another column). In each case, I am essentially trying to select the root and an arbitrary subset of children. The roots can be different, as seen above, so I do not believe I can hard-code the root node name into a "for xml" clause.
I have a feeling SQL Server's XQuery capabilities will not allow this, and that is fine if it is the case. If I can accomplish this, however, I would greatly appreciate an example.

Here is the test data I used in the queries below:
declare #T table (XMLCol xml)
insert into #T values
('<computer_book>
<title>Selecting XML Nodes the Fun and Easy Way</title>
<isbn>9999999999999</isbn>
<pages>500</pages>
<backing>paperback</backing>
</computer_book>'),
('<cooking_book>
<title>50 Quick and Easy XML Dishes</title>
<isbn>5555555555555</isbn>
<pages>275</pages>
<backing>paperback</backing>
</cooking_book>')
You can filter the nodes under to root node like this using local-name() and a list of the node names you want:
select XMLCol.query('/*/*[local-name()=("isbn","pages")]')
from #T
Result:
<isbn>9999999999999</isbn><pages>500</pages>
<isbn>5555555555555</isbn><pages>275</pages>
If I understand you correctly the problem with this is that you don't get the root node back.
This query will give you an empty root node:
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'/>' as xml)
from #T
Result:
<computer_book />
<cooking_book />
From this I have found two solutions for you.
Solution 1
Get the nodes from your table to a table variable and then modify the XML to look like you want.
-- Table variable to hold the node(s) you want
declare #T2 table (RootNode xml, ChildNodes xml)
-- Fetch the xml from your table
insert into #T2
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'/>' as xml),
XMLCol.query('/*/*[local-name()=("isbn","pages")]')
from #T
-- Add the child nodes to the root node
update #T2 set
RootNode.modify('insert sql:column("ChildNodes") into (/*)[1]')
-- Fetch the modified XML
select RootNode
from #T2
Result:
RootNode
<computer_book><isbn>9999999999999</isbn><pages>500</pages></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages></cooking_book>
The sad part with this solution is that it does not work with SQL Server 2005.
Solution 2
Get the parts, build the XML as a string and cast it back to XML.
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>'+
cast(XMLCol.query('/*/*[local-name()=("isbn","pages")]') as varchar(max))+
'</'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>' as xml)
from #T
Result:
<computer_book><isbn>9999999999999</isbn><pages>500</pages></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages></cooking_book>
Making the nodes parameterized
In the queries above the nodes you get as child nodes is hard coded in the query. You can use sql:varaible() to do this instead. I have not found a way of making the number of nodes dynamic but you can add as many as you think you need and have null as value for the nodes you don't need.
declare #N1 varchar(10)
declare #N2 varchar(10)
declare #N3 varchar(10)
declare #N4 varchar(10)
set #N1 = 'isbn'
set #N2 = 'pages'
set #N3 = 'backing'
set #N4 = null
select cast('<'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>'+
cast(XMLCol.query('/*/*[local-name()=(sql:variable("#N1"),
sql:variable("#N2"),
sql:variable("#N3"),
sql:variable("#N4"))]') as varchar(max))+
'</'+XMLCol.value('local-name(/*[1])', 'varchar(100)')+'>' as xml)
from #T
Result:
<computer_book><isbn>9999999999999</isbn><pages>500</pages><backing>paperback</backing></computer_book>
<cooking_book><isbn>5555555555555</isbn><pages>275</pages><backing>paperback</backing></cooking_book>

Related

Can't figure out how to search XML column in my table

I have a table called v_EpisodeAudit, with a column called EventData that contains XML data. The XML data differs from row to row, so one record could have XML data in this column that looks like this:
<AddMDMDocument>
<EpisodeMDMId>282521</EpisodeMDMId>
<OncologyReferral>0</OncologyReferral>
<SpecialPalliativeReferral>0</SpecialPalliativeReferral>
<SurgeonReferral>0</SurgeonReferral>
<MDMReport>0</MDMReport>
<GPReferral>0</GPReferral>
<GPReferralApproval>0</GPReferralApproval>
<GeneralPalliativeCare>0</GeneralPalliativeCare>
<AuditLogin>mkell010</AuditLogin>
<AuditTrust>4</AuditTrust>
<Error />
</AddMDMDocument>
while another row might contain the following XML data:
<CloseEpisode>
<EpisodeId>652503</EpisodeId>
<TrackingStatusId>9</TrackingStatusId>
<TrackingClosureReason>100</TrackingClosureReason>
<DateOfTrackingClosure>Sep 25 2017 12:37PM</DateOfTrackingClosure>
<AuditLogin>ccass001</AuditLogin>
<AuditTrust>1</AuditTrust>
<Error />
</CloseEpisode>
And there are further differing types/configurations of XML data. I've read about 20 different sources this morning trying to work out how to search against the XML data in this column to get a specific EpisodeId in the CloseEpisode XMLs, and I can't for the life of me figure it out. Can anyone help me with a query that will find a specified EpisodeId in this column?
XML can be queried very generically. Some approaches:
DECLARE #v_EpisodeAudit TABLE(ID INT IDENTITY, [EventData] XML);
INSERT INTO #v_EpisodeAudit VALUES
(N'<AddMDMDocument>
<EpisodeMDMId>282521</EpisodeMDMId>
<OncologyReferral>0</OncologyReferral>
<SpecialPalliativeReferral>0</SpecialPalliativeReferral>
<SurgeonReferral>0</SurgeonReferral>
<MDMReport>0</MDMReport>
<GPReferral>0</GPReferral>
<GPReferralApproval>0</GPReferralApproval>
<GeneralPalliativeCare>0</GeneralPalliativeCare>
<AuditLogin>mkell010</AuditLogin>
<AuditTrust>4</AuditTrust>
<Error />
</AddMDMDocument>')
,(N'<CloseEpisode>
<EpisodeId>652503</EpisodeId>
<TrackingStatusId>9</TrackingStatusId>
<TrackingClosureReason>100</TrackingClosureReason>
<DateOfTrackingClosure>Sep 25 2017 12:37PM</DateOfTrackingClosure>
<AuditLogin>ccass001</AuditLogin>
<AuditTrust>1</AuditTrust>
<Error />
</CloseEpisode>');00
--This will return the very first node on the second level
SELECT ID
,vEA.[EventData].value(N'local-name(/*[1]/*[1])',N'nvarchar(max)') AS NodeName
,vEA.[EventData].value(N'/*[1]/*[1]/text()[1]',N'nvarchar(max)') AS NodeValue
FROM #v_EpisodeAudit AS vEA
--This will return all nodes of the sevond level and use WHERE with LIKE to find the Episode..Id elements
SELECT ID
,SecondLevelNode.Nd.value(N'local-name(.)',N'nvarchar(max)') AS NodeName
,SecondLevelNode.Nd.value(N'text()[1]',N'nvarchar(max)') AS NodeValue
FROM #v_EpisodeAudit AS vEA
OUTER APPLY vEA.[EventData].nodes(N'/*/*') AS SecondLevelNode(Nd)
WHERE SecondLevelNode.Nd.value(N'local-name(.)',N'nvarchar(max)') LIKE 'Episode%' --or LIKE 'Episode%Id'
--Similar but filtering on XQuery level
SELECT ID
,SecondLevelNode.Nd.value(N'local-name(.)',N'nvarchar(max)') AS NodeName
,SecondLevelNode.Nd.value(N'text()[1]',N'nvarchar(max)') AS NodeValue
FROM #v_EpisodeAudit AS vEA
OUTER APPLY vEA.[EventData].nodes(N'/*/*[substring(local-name(),1,7)="Episode"]') AS SecondLevelNode(Nd)
Use the xml querying functions
select EventData.value('(/CloseEpisode/EpisodeId)[1]','int')
from v_EpisodeAudit
where EventData.value('local-name(/*[1])','varchar(100)')='CloseEpisode'
or perhaps
select EventData
from #v_EpisodeAudit
where EventData.value('(/CloseEpisode/EpisodeId)[1]','int')=652503
depending on what you're trying to do.
If you don't know the root node name, you could use
select EventData.value('(//EpisodeId)[1]','int')
from v_EpisodeAudit
where EventData.exist('//EpisodeId')=1
See https://learn.microsoft.com/en-us/sql/t-sql/xml/value-method-xml-data-type

Getting multiple values from same xml column in SQL Server

I want to get the values from same xml node under same element.
Sample data:
I have to select all <award_number> values.
This is my SQL code:
DECLARE #xml XML;
DECLARE #filePath varchar(max);
SET #filePath = '<workFlowMeta><fundgroup><funder><award_number>0710564</award_number><award_number>1106058</award_number><award_number>1304977</award_number><award_number>1407404</award_number></funder></fundgroup></workFlowMeta>'
SET #xml = CAST(#filePath AS XML);
SELECT
REPLACE(Element.value('award_number','NVARCHAR(255)'), CHAR(10), '') AS award_num
FROM
#xml.nodes('workFlowMeta/fundgroup/funder') Datalist(Element);
Can't change this #xml.nodes('workFlowMeta/fundgroup/funder'), because I'm getting multiple node values inside funder node.
Can anyone please help me?
Since those <award_number> nodes are inside the <funder> nodes, and there could be several <funder> nodes (if I understood your question correctly), you need to use two .nodes() calls like this:
SELECT
XC.value('.', 'int')
FROM
#xml.nodes('/workFlowMeta/fundgroup/funder') Datalist(Element)
CROSS APPLY
Element.nodes('award_number') AS XT(XC)
The first .nodes() call gets all <funder> elements, and then the second call goes into each <funder> element to get all <award_number> nodes inside of that element and outputs the value of the <award_number> element as a INT (I couldn't quite understand what you're trying to do to the <award_number> value in your code sample....)
Your own code was very close, but
You are diving one level to low
You need to set a singleton XPath for .value(). In most cases this means a [1] at the end)
As you want to read many <award_number> elements, this is the level you have to step down in .nodes(). Reading these element's values is easy, once you have your hands on it:
SELECT
REPLACE(Element.value('text()[1]','NVARCHAR(255)'), CHAR(10), '') AS award_num
FROM
#xml.nodes('/workFlowMeta/fundgroup/funder/award_number') Datalist(Element);
What are you trying to do with the REPLACE()?
If all <arward_number> elements contain valid numbers, you should use int or bigint as target type and there shouldn't be any need to replace non-numeric characters. Try it like this:
SELECT Element.value('text()[1]','int') AS award_num
FROM #xml.nodes('/workFlowMeta/fundgroup/funder/award_number') Datalist(Element);
If marc_s is correct...
... and you have to deal with several <funder> groups, each of which contains several <award_number> nodes, go with his approach (two calls to .nodes())

Trying to transfer xml data into SQL as elements (not attribute)

I am new at this so please bear with me. I am attempting to transfer some XML data into Microsoft SQL Server. I am assuming that this data needs to be transferred as elements and not attributes because the contents of the columns will not be static.
However for some reason when I attempt to transfer the data as elements I get NULL values. But when I try to transfer this same data as attributes it works and looks the way it is supposed to. I am tempted to shrug and just move on but I'm worried that things might go awry for me if I do that later on down the road.
I already have some attributes from this XML that I managed to transfer as attributes which I plan to combine with these elements that are masquerading as attributes into a single table. Will it work? And if it does will there be problems down the road?
Here is my SQL code when I attempt to transfer the elements as elements:
SELECT *
FROM OPENXML (#hdoc, '/roll/voter', 2)
WITH (
id int,
[value] char(50),
[state] char(2))
Here is my SQL code when I attempt to transfer the elements as attributes:
SELECT *
FROM OPENXML (#hdoc, '/roll/voter', 1)
WITH (
id int,
[value] char(50),
[state] char(2))
Here is a miniaturized version of the XML document:
<roll>
<voter id="400048" value="Yea" state="FL" />
<voter id="412516" value="Yea" state="CA" />
</roll>
Here is a link to the xml document via google drive (very small XML): https://drive.google.com/open?id=0B5VgOwWcGeLHaWctRU56Qlk3UWM
A screenshot of my SQL query, the table results, and the XML
FROM OPENXML is outdated and should not be used anymore (rare exceptions exist)...
Try with the real XML methods:
DECLARE #xml XML=
N'<roll>
<voter id="400048" value="Yea" state="FL" />
<voter id="412516" value="Yea" state="CA" />
</roll>';
SELECT #xml.value(N'(/roll/voter/#id)[1]',N'int') AS voter_id
,#xml.value(N'(/roll/voter/#value)[1]',N'nvarchar(max)') AS voter_value
,#xml.value(N'(/roll/voter/#state)[1]',N'nvarchar(max)') AS voter_state
The result
voter_id voter_value voter_state
400048 Yea FL

In SQL Server can I insert multiple nodes into XML from a table?

I want to generate some XML in a stored procedure based on data in a table.
The following insert allows me to add many nodes but they have to be hard-coded or use variables (sql:variable):
SET #MyXml.modify('
insert
<myNode>
{sql:variable("#MyVariable")}
</myNode>
into (/root[1]) ')
So I could loop through each record in my table, put the values I need into variables and execute the above statement.
But is there a way I can do this by just combining with a select statement and avoiding the loop?
Edit I have used SELECT FOR XML to do similar stuff before but I always find it hard to read when working with a hierarchy of data from multiple tables. I was hoping there would be something using the modify where the XML generated is more explicit and more controllable.
Have you tried nesting FOR XML PATH scalar valued functions?
With the nesting technique, you can brake your SQL into very managable/readable elemental pieces
Disclaimer: the following, while adapted from a working example, has not itself been literally tested
Some reference links for the general audience
http://msdn2.microsoft.com/en-us/library/ms178107(SQL.90).aspx
http://msdn2.microsoft.com/en-us/library/ms189885(SQL.90).aspx
The simplest, lowest level nested node example
Consider the following invocation
DECLARE #NestedInput_SpecificDogNameId int
SET #NestedInput_SpecificDogNameId = 99
SELECT [dbo].[udfGetLowestLevelNestedNode_SpecificDogName]
(#NestedInput_SpecificDogNameId)
Let's say had udfGetLowestLevelNestedNode_SpecificDogName had been written without the FOR XML PATH clause, and for #NestedInput_SpecificDogName = 99 it returns the single rowset record:
#SpecificDogNameId DogName
99 Astro
But with the FOR XML PATH clause,
CREATE FUNCTION dbo.udfGetLowestLevelNestedNode_SpecificDogName
(
#NestedInput_SpecificDogNameId
)
RETURNS XML
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar XML
-- Add the T-SQL statements to compute the return value here
SET #ResultVar =
(
SELECT
#SpecificDogNameId as "#SpecificDogNameId",
t.DogName
FROM tblDogs t
FOR XML PATH('Dog')
)
-- Return the result of the function
RETURN #ResultVar
END
the user-defined function produces the following XML (the # signs causes the SpecificDogNameId field to be returned as an attribute)
<Dog SpecificDogNameId=99>Astro</Dog>
Nesting User-defined Functions of XML Type
User-defined functions such as the above udfGetLowestLevelNestedNode_SpecificDogName can be nested to provide a powerful method to produce complex XML.
For example, the function
CREATE FUNCTION [dbo].[udfGetDogCollectionNode]()
RETURNS XML
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar XML
-- Add the T-SQL statements to compute the return value here
SET #ResultVar =
(
SELECT
[dbo].[udfGetLowestLevelNestedNode_SpecificDogName]
(t.SpecificDogNameId)
FROM tblDogs t
FOR XML PATH('DogCollection') ELEMENTS
)
-- Return the result of the function
RETURN #ResultVar
END
when invoked as
SELECT [dbo].[udfGetDogCollectionNode]()
might produce the complex XML node (given the appropriate underlying data)
<DogCollection>
<Dog SpecificDogNameId="88">Dino</Dog>
<Dog SpecificDogNameId="99">Astro</Dog>
</DogCollection>
From here, you could keep working upwards in the nested tree to build as complex an XML structure as you please
CREATE FUNCTION [dbo].[udfGetAnimalCollectionNode]()
RETURNS XML
AS
BEGIN
DECLARE #ResultVar XML
SET #ResultVar =
(
SELECT
dbo.udfGetDogCollectionNode(),
dbo.udfGetCatCollectionNode()
FOR XML PATH('AnimalCollection'), ELEMENTS XSINIL
)
RETURN #ResultVar
END
when invoked as
SELECT [dbo].[udfGetAnimalCollectionNode]()
the udf might produce the more complex XML node (given the appropriate underlying data)
<AnimalCollection>
<DogCollection>
<Dog SpecificDogNameId="88">Dino</Dog>
<Dog SpecificDogNameId="99">Astro</Dog>
</DogCollection>
<CatCollection>
<Cat SpecificCatNameId="11">Sylvester</Cat>
<Cat SpecificCatNameId="22">Tom</Cat>
<Cat SpecificCatNameId="33">Felix</Cat>
</CatCollection>
</AnimalCollection>
Use sql:column instead of sql:variable. You can find detailed info here: http://msdn.microsoft.com/en-us/library/ms191214.aspx
Can you tell a bit more about what exactly you are planning to do.
Is it simply generating XML data based on a content of the table
or adding some data from the table to an existing xml structure?
There are great series of articles on the subject on XML in SQLServer written by Jacob Sebastian, it starts with the basics of generating XML from the data in the table

Using SQL Server 2005's XQuery select all nodes with a specific attribute value, or with that attribute missing

Update: giving a much more thorough example.
The first two solutions offered were right along the lines of what I was trying to say not to do. I can't know location, it needs to be able to look at the whole document tree. So a solution along these lines, with /Books/ specified as the context will not work:
SELECT x.query('.') FROM #xml.nodes('/Books/*[not(#ID) or #ID = 5]') x1(x)
Original question with better example:
Using SQL Server 2005's XQuery implementation I need to select all nodes in an XML document, just once each and keeping their original structure, but only if they are missing a particular attribute, or that attribute has a specific value (passed in by parameter). The query also has to work on the whole XML document (descendant-or-self axis) rather than selecting at a predefined depth.
That is to say, each individual node will appear in the resultant document only if it and every one of its ancestors are missing the attribute, or have the attribute with a single specific value.
For example:
If this were the XML:
DECLARE #Xml XML
SET #Xml =
N'
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
<Novel category="4">Novel4</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
<Volume category="3">M-S</Volume>
<Volume category="4">T-Z</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
'
A parameter of 1 for category would result in this:
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
A parameter of 2 for category would result in this:
<Library>
<Novels>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
</Encyclopedia>
</Encyclopedias>
</Library>
I know XSLT is perfectly suited for this job, but it's not an option. We have to accomplish this entirely in SQL Server 2005. Any implementations not using XQuery are fine too, as long as it can be done entirely in T-SQL.
It's not clear for me from your example what you're actually trying to achieve. Do you want to return a new XML with all the nodes stripped out except those that fulfill the condition? If yes, then this looks like the job for an XSLT transform which I don't think it's built-in in MSSQL 2005 (can be added as a UDF: http://www.topxml.com/rbnews/SQLXML/re-23872_Performing-XSLT-Transforms-on-XML-Data-Stored-in-SQL-Server-2005.aspx).
If you just need to return the list of nodes then you can use this expression:
//Book[not(#ID) or #ID = 5]
but I get the impression that it's not what you need. It would help if you can provide a clearer example.
Edit: This example is indeed more clear. The best that I could find is this:
SET #Xml.modify('delete(//*[#category!=1])')
SELECT #Xml
The idea is to delete from the XML all the nodes that you don't need, so you remain with the original structure and the needed nodes. I tested with your two examples and it produced the wanted result.
However modify has some restrictions - it seems you can't use it in a select statement, it has to modify data in place. If you need to return such data with a select you could use a temporary table in which to copy the original data and then update that table. Something like this:
INSERT INTO #temp VALUES(#Xml)
UPDATE #temp SET data.modify('delete(//*[#category!=2])')
Hope that helps.
The question is not really clear, but is this what you're looking for?
DECLARE #Xml AS XML
SET #Xml =
N'
<Books>
<Book ID="1">Book1</Book>
<Book ID="2">Book2</Book>
<Book ID="3">Book3</Book>
<Book>Book4</Book>
<Book ID="5">Book5</Book>
<Book ID="6">Book6</Book>
<Book>Book7</Book>
<Book ID="8">Book8</Book>
</Books>
'
DECLARE #BookID AS INT
SET #BookID = 5
DECLARE #Result AS XML
SET #result = (SELECT #xml.query('//Book[not(#ID) or #ID = sql:variable("#BookID")]'))
SELECT #result

Resources