I have an xml-file with products.
I have split it up into a table with one row for each product with product number and xml
SKU | xml
----|-------
1111|<product><price>123</price....</product>
1112|<product><price>345</price....</product>
The attributes are stored like this:
<attribute-list>
<attribute name="tax_id" attribute-type="integer"><value default="1">2</value></attribute>
<attribute name="weight" attribute-type="integer"><value default="1">258</value></attribute>
<attribute name="length" attribute-type="integer"><value default="1">180</value></attribute>
<attribute name="width" attribute-type="integer"><value default="1">115</value></attribute>
<attribute name="height" attribute-type="integer"><value default="1">15</value></attribute>
<attribute name="series_name" attribute-type="string"><value language-id="DE" default="1"><![CDATA[CSV]]></value></attribute>
<attribute name="country_of_origin_code" attribute-type="string"><value default="1">LT</value></attribute>
<attribute name="number_of_pages" attribute-type="string"><value default="1">288</value></attribute>
...
</attribute-list>
Different products may have different attributes, for instance shoe-size is not relevant for a book :-)
I'd like to select all possible attribute-names.
attr
----
weight
length
number_of_pages
shoe_size
I can get all the possible values for a given attribute-name
select distinct xml.value('(/product/attribute-list/attribute[#name="color"])[1]',
'varchar(100)') as colors from product_xml
I'm getting close with
SELECT distinct cast(T2.attr.query('.') as nvarchar(max))
FROM product_xml
CROSS APPLY xml.nodes('/product/attribute-list/attribute') as T2(attr)
Here I get a record for each possible attribute-name and value
So I'm just missing the last step of only getting the name.
EDIT: The quick-and-dirty version is here:
;with p as (SELECT distinct cast(T2.attr.query('.') as nvarchar(max)) at
FROM product_xml
CROSS APPLY xml.nodes('/product/attribute-list/attribute') as T2(attr))
select distinct left(at,CHARINDEX('>',at)) from p
This produces each attribute in a record by itself, which I can then manipulate in the application (php), Not as clean as just getting the name alone, but easily parsed, and only to be used very rarely.
<attribute name="age_rating" attribute-type="string">
<attribute name="aroma" attribute-type="string">
<attribute name="barcode" attribute-type="string">
<attribute name="barcode_type" attribute-type="string">
Is this what you're looking for? This statement lists all #name attributes for each attribute, and then also grabs the actual Value as well as the #DefaultValue from the <Value> subnode:
SELECT DISTINCT
AttrName = XC.value('#name', 'varchar(50)'),
DefaultValue=XC.value('(value/#default)[1]', 'varchar(50)'),
Value=XC.value('(value)[1]', 'varchar(50)')
FROM
product_xml
CROSS APPLY
xml.nodes('/product/attribute-list/attribute') AS XT(XC)
This shows me something like:
Solution based on #mark_s
SELECT distinct AttrName = attr.value('#name', 'varchar(50)')
FROM product_xml
CROSS APPLY xml.nodes('/product/attribute-list/attribute') as T2(attr)
Related
This problem keeps messing around with my Friday afternoon:
I have this XML:
declare #xml as XML
set #xml =
'<fields>
<field>
<id>1</id>
<items>
<item>
<name>name1_1</name>
<value>value1_1</value>
</item>
<item>
<name>name1_2</name>
<value>value1_2</value>
</item>
</items>
</field>
<field>
<id>2</id>
<items>
<item>
<name>name2_1</name>
<value>value2_1</value>
</item>
<item>
<name>name2_2</name>
<value>value2_2</value>
</item>
</items>
</field>
</fields>'
Using T-SQL and XPath, I need a query to get this result:
id name value
1 name1_1 value1_1
1 name1_2 value1_2
2 name2_1 value2_1
2 name2_2 value2_2
I'm getting name and value with:
SELECT c.value('name[1]', 'nvarchar(255)') name,
c.value('value[1]', 'nvarchar(255)') value
FROM #xml.nodes('fields/field/items/item') t(c)
...but how to insert the parent column "id"?
Your own code uses .nodes() to get a derived table from repeating elements. In your case there are two levels of repeating elements:
many fields and within each field
many items
You have to use .nodes() twice:
SELECT fld.value(N'(id/text())[1]',N'int') AS FieldID
,itm.value(N'(name/text())[1]',N'nvarchar(max)') AS ItemName
,itm.value(N'(value/text())[1]',N'nvarchar(max)') AS ItemValue
FROM #xml.nodes(N'/fields/field') AS A(fld)
OUTER APPLY A.fld.nodes(N'items/item') AS B(itm);
The first .nodes() comes back with XML fragments, one for each field, the second node is called for each of these field-fragments to pick their items.
Use OUTER APPLY if there might be fields without <item> nodes and CROSS APPLY when you do not want to see fields without <item> nodes (similar to LEFT JOIN vs INNER JOIN)
Assumption: there is only one id element per field.
SELECT c.value('../../id[1]', 'int') id,
c.value('name[1]', 'nvarchar(255)') name,
c.value('value[1]', 'nvarchar(255)') value
FROM #xml.nodes('fields/field/items/item') t(c)
The .. operator means "select parent of node" in XPATH. So the query will select the parent of item, then the parent of items, then the first child node id
I have been given an XML document that I want to generate via a SQL script, I've not done something like this and haven't been able to find any examples that can lead me to being able to generate the final XML I need (and I'm not sure which of the possible methods available if one is better suited to what I need - EXPLICIT or PATH or if its even possible).
I'm hoping somebody with some experience in generating XML from SQL will be able to point me in the right direction (or tell me what I'm trying to do is impossible and that I need to do it with sub-queries).
The scenario is I'm returning product details from a single table (I would prefer to not have to do sub-queries for each of the values I need).
The xml I'm hoping to be able to generate looks like (I have no control over this format):
<records>
<record>
<fields>
<field name="id">
<values>
<value>666111</value>
</values>
</field>
<field name="name">
<values>
<value>
<![CDATA[My Product Title]]>
</value>
</values>
</field>
</fields>
</record>
<record>
...
</record>
</records>
The first method I've looked at is using FOR XML PATH
SELECT TOP 2
'id' AS "#name",
p.product_id as [value],
p.title
FROM products p
ORDER BY p.product_id DESC
FOR XML PATH ('field'), ROOT ('fields'), ELEMENTS;
and this gives me the XML:
<fields>
<field name="id">
<value>20624</value>
<title>test154</title>
</field>
<field name="id">
<value>20623</value>
<title>test153</title>
</field>
</fields>
Which gives me the '' that I need, but I can't then specify the layout I need for the next elements.
I also looked into FOR XML EXPLICIT
SELECT TOP 2
1 AS Tag, NULL AS Parent,
p.product_id AS [record!1!product_id!ELEMENT],
NULL AS [values!2!value!ELEMENT]
FROM products p
UNION ALL
SELECT TOP 2
2, 1,
p.product_id,
p.title
FROM products p
ORDER BY [record!1!product_id!ELEMENT] DESC
FOR XML EXPLICIT;
Which gave me the following XML:
<record>
<product_id>20624</product_id>
<values>
<value>test154</value>
</values>
</record>
<record>
<product_id>20623</product_id>
<values>
<value>test153</value>
</values>
</record>
I'm a bit lost in being able to build up the request or get something that is along the right lines (and I think I'm trying to do too much in a single lookup and that is the cause of my problem). Any help is appreciated - even if its pointing me at a good guide (the only ones I've found have been very poor when it comes to examples - they don't show the subtleties of how you can build/change them)
This is the query you might be looking for
The ,'' in the middle is a trick which allows you to create several elements with the same name one below the other...
DECLARE #tbl TABLE(id INT,name VARCHAR(100));
INSERT INTO #tbl VALUES
(666111,'My Product Title 111')
,(666222,'My Product Title 222');
SELECT
(
SELECT 'id' AS [field/#name]
,id AS [field/values/value]
,''
,'name' AS [field/#name]
,name AS [field/values/value]
FOR XML PATH('fields'),TYPE
)
FROM #tbl AS tbl
FOR XML PATH('record'),ROOT('records')
The result
<records>
<record>
<fields>
<field name="id">
<values>
<value>666111</value>
</values>
</field>
<field name="name">
<values>
<value>My Product Title 111</value>
</values>
</field>
</fields>
</record>
<record>
<fields>
<field name="id">
<values>
<value>666222</value>
</values>
</field>
<field name="name">
<values>
<value>My Product Title 222</value>
</values>
</field>
</fields>
</record>
</records>
UPDATE: As far as I know there is no clean way to add CDATA-sections
For some reasons people at Microsoft think, that CDATA sections are not necessary. Well, they aren't but still sometimes they are demanded...
The only clean way to add CDATA sections was to use FOR XML EXPLICIT. Another workaround was to put something like '|' + name + '#' (use two characters wich will never occur in your actual data.
Then you can cast the result to NVARCHAR(MAX), replace these characters on string base.
This would return your XML as string
SELECT
REPLACE(REPLACE(CAST(
(
SELECT
(
SELECT 'id' AS [field/#name]
,id AS [field/values/value]
,''
,'name' AS [field/#name]
,'|' + name + '#' AS [field/values/value]
FOR XML PATH('fields'),TYPE
)
FROM #tbl AS tbl
FOR XML PATH('record'),ROOT('records')
) AS NVARCHAR(MAX)),'|','<![CDATA['),'#',']]>')
At the moment you cast this back to XML the CDATA is gone :-(
something like that
declare #t table (id varchar(10))
insert into #t values ('1')
insert into #t values ('2')
select (
select
t.id 'fields/field/#id'
, t.id 'fields/field/name'
from #t t
for xml path(''), type
) 'records/record'
for xml path('')
The final SQL I used is:
SELECT TOP 2
(
SELECT
(SELECT 'id' AS [field/#id],
product_id [field/values/value]
FOR XML PATH(''), TYPE),
(SELECT 'title' AS [field/#id],
title [field/values/value]
FOR XML PATH(''), TYPE)
FOR XML PATH('fields'), TYPE
)
FROM products
FOR XML PATH('record'), ROOT('records')
As this allows me to manipulate the output a little easier.
Thank you to both #xdd and especially #Shnugo for your answers! The end solution is based on #Shnugo's suggestion, just with avoiding the trick of putting extra blank rows in.
I need to pull values from an XML column. The table contains 3 fields with one being an XML column like below:
TransID int,
Place varchar(20),
Custom XML
The XML column is structured as following:
<Fields>
<Field>
<Id>9346-00155D1C204E</Id>
<TransactionCode>0710</TransactionCode>
<Amount>5.0000</Amount>
</Field>
<Field>
<Id>A6F0-BA07EF3A7D43</Id>
<TransactionCode>0885</TransactionCode>
<Amount>57.9000</Amount>
</Field>
<Field>
<Id>9BDA-7858FD182Z3C</Id>
<TransactionCode>0935</TransactionCode>
<Amount>25.85000</Amount>
</Field>
</Fields>
I need to be able to query the xml column and return only the value for the <Amount> if there is a <Transaction code> = 0935. Note: there are records where this transaction code isn’t present, but it won't exist in the same record twice.
This is probably simple, but I’m having a problem returning just the <amount> value where the <transaction code> = 0935.
You can try this way :
DECLARE #transCode VARCHAR(10) = '0935'
SELECT field.value('Amount[1]', 'decimal(18,5)') as Amount
FROM yourTable t
OUTER APPLY t.Custom.nodes('/Fields/Field[TransactionCode=sql:variable("#transCode)"]') as x(field)
Alternatively, you can put logic for filtering Field by TransactionCode in SQL WHERE clause instead of in XPath expression, like so :
DECLARE #transCode VARCHAR(10) = '0935'
SELECT field.value('Amount[1]', 'decimal(18,5)') as Amount
FROM yourTable t
OUTER APPLY t.Custom.nodes('/Fields/Field') as x(field)
WHERE field.value('TransactionCode[1]', 'varchar(10)') = #transCode
SQL Fiddle Demo
You can use an XPath like this in your TSQL:
SELECT
*,
Custom.value('(/Fields/Field[#Name="Id"]/#Value)[1]', 'varchar(50)')
FROM YourTable
WHERE Custom.value('(/Fields/Field[#Name="Id"]/#Value)[1]', 'varchar(50)') = '0655'
I have two different pieces of XML to put together.
For example, SQL for the first piece looks like this:
SELECT
*
FROM
(
SELECT 1 AS OrdNum, 'Abc' AS Name
) a
FOR XML
AUTO,
TYPE
Once executed, you'll get this:
<a OrdNum="1" Name="Abc" />
The second one is here:
SELECT
*
FROM
(
SELECT 4 AS Age, 'M' AS Sex, 'John' AS FirstName
) b
FOR XML
AUTO,
TYPE
You'll get this:
<b Age="4" Sex="M" FirstName="John" />
Now I'll put the two pieces together:
SELECT
*
FROM
(
SELECT
(
SELECT
*
FROM
(
SELECT 1 AS OrdNum, 'Abc' AS Name
) a
FOR XML
AUTO,
TYPE
) AS aa
,
(
SELECT
*
FROM
(
SELECT 4 AS Age, 'M' AS Sex, 'John' AS FirstName
) b
FOR XML
AUTO,
TYPE
) AS bb
) Data
FOR XML
AUTO,
ELEMENTS
The result is as follows:
<Data>
<aa>
<a OrdNum="1" Name="Abc" />
</aa>
<bb>
<b Age="4" Sex="M" FirstName="John" />
</bb>
</Data>
But I do not want to have the elements "aa" and "bb" there. I'd love to get this:
<Data>
<a OrdNum="1" Name="Abc" />
<b Age="4" Sex="M" FirstName="John" />
</Data>
But I have no idea how to achieve that.
Any hints?
There is no "simple" way to do it. FOR XML PATH|EXPLICIT|AUTO will all require each top-level, output element to have the same name. And you can't UNION multiple FOR XML queries together (Sql Server 2012).
The direction you went in is the most reliable and flexible. Essentially, you have to add a separate column for each different element type you want to include. You could simplify your final attempt to this to get what you wanted:
SELECT
(
SELECT 1 AS [#OrdNum], 'Abc' AS [#Name]
WHERE 1=1
FOR XML PATH ('a'), TYPE
)
,
(
SELECT 4 AS [#Age], 'M' AS [#Sex], 'John' AS [#FirstName]
WHERE 1=1
FOR XML PATH ('b'), TYPE
)
FOR XML PATH ('Data'), TYPE;
The above query outputs:
<Data>
<a OrdNum="1" Name="Abc" />
<b Age="4" Sex="M" FirstName="John" />
</Data>
When you use FOR XML PATH, the column aliases are XPaths. So to make it an attribute name, you have to prefix with '#'---which then requires you to escape the alias (hence the []). The parameter on PATH dictates the name of each row's Xml element. The TYPE option says to keep the output as type Xml instead of nvarchar(max), which means that the outer query can merge it better. And the outer query just has 2 columns to stuff into the single element it represents. Finally, I like the WHERE 1=1, but it's not syntactically required.
A tangent: I know your example is simplified, so you may wish to know that Xml data types can have "methods" applied to them. For example, say you wanted the above, but an outer query only needed the "b" elements. You could use the query() method to select only parts of the Xml to merge into some outer query.
SELECT
(
SELECT
(
SELECT 1 AS [#OrdNum], 'Abc' AS [#Name]
WHERE 1=1
FOR XML PATH('a'), TYPE
)
,
(
SELECT 4 AS [#Age], 'M' AS [#Sex], 'John' AS [#FirstName]
WHERE 1=1
FOR XML PATH('b'), TYPE
)
FOR XML PATH('Data'), TYPE
).query('Data/b');
Which produces this:
<b Age="4" Sex="M" FirstName="John" />
You need to look at the FOR XML PATH option that SQL Server 2005 introduced - see the What's New in FOR XML in Microsoft SQL Server 2005 document for more information.
Basically, with FOR XML PATH, you can define the shape of your XML very easily. You can define certain structures, you can define certain columns to be output as attributes, and others as elements - totally under your control.
You can get more information on how to format that here:
enter link description here
Consider the XML and SQL:
declare #xml xml = '
<root>
<person id="11272">
<notes for="107">Some notes!</notes>
<item id="107" selected="1" />
</person>
<person id="77812">
<notes for="107"></notes>
<notes for="119">Hello</notes>
<item id="107" selected="0" />
<item id="119" selected="1" />
</person>
</root>'
select Row.Person.value('data(../#id)', 'int') as person_id,
Row.Person.value('data(#id)', 'int') as item_id,
Row.Person.value('data(../notes[#for=data(#id)][1])', 'varchar(max)') as notes
from #xml.nodes('/root/person/item') as Row(Person)
I end up with:
person_id item_id notes
----------- ----------- -------
77812 107 NULL
77812 119 NULL
11272 107 NULL
What I want is the 'notes' column to be pulled based on the #id attribute of the current item. If I replace [#for=data(#id)] in the selector with [#for=107] of course I get the value Some notes! in the last record. Is it possible to do this with XPath/XQuery, or am I barking up the wrong tree here? I think the problem is that
The XML is a bit awkward, yes, but I can't really change it I'm afraid.
I found one solution that works, but it feels awfully heavy for something like this.
select Item.Person.value('data(../#id)', 'int') as person_id,
Item.Person.value('data(#id)', 'int') as item_id,
Notes.Person.value('text()[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person/item') as Item(Person)
inner join #xml.nodes('/root/person/notes') as Notes(Person) on
Notes.Person.value('data(#for)', 'int') = Item.Person.value('data(#id)', 'int')
and
Notes.Person.value('data(../#id)', 'int') = Item.Person.value('data(../#id)', 'int')
Update!
I figured it out! I'm new at XQuery but this works, so I'm calling it job done :) I changed the query for the notes to:
Item.Person.value('
let $id := data(#id)
return data(../notes[#for=$id])[1]
', 'varchar(max)') as notes
I would suggest that you do a cross apply instead of doing ../ to find a parent node. According to query plan it is a lot faster.
select P.X.value('data(#id)', 'int') as person_id,
I.X.value('data(#id)', 'int') as item_id,
I.X.value('let $id := data(#id)
return data(../notes[#for=$id])[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person') as P(X)
cross apply P.X.nodes('item') as I(X)
You can even remove the ../ in the flwor with one extra cross apply gaining a bit more.
select P.X.value('#id', 'int') as person_id,
TI.id as item_id,
P.X.value('(notes[#for = sql:column("TI.id")])[1]', 'varchar(max)') as notes
from #xml.nodes('/root/person') as P(X)
cross apply P.X.nodes('item') as I(X)
cross apply (select I.X.value('#id', 'int')) as TI(id)
Comparing the queries against each other I got 67% on your query 17% on my first and 16% on the second. Note: these figures only give you a hint on what query will actually be faster in reality. Test the against your data to know for sure.