I have a table that contains lots of integers. This table gets queried and the results end up being turned into xml. If the table contains for example the following items:
SELECT itemId FROM items WHERE enabled = true
1
2
3
5
The my final xml output after some processing would be:
<item id="1" />
<item id="2" />
<item id="3" />
<item id="5" />
The xml ends up being fairly large and alot of the items are actually ranges. What I would like to do is update my query to combine ranges (alot of these items are 'neighbours' so the xml generated would be quite a bit smaller). I'm trying to get the procedures results to be more like this:
1-3
5
So that the final XML looks something like this (if I can just change the procedure, the XML processing can stay the same):
<item id="1-3"/>
<item id="5"/>
I was thinking my best route may be to use a self join where table1.itemId = table2.itemId - 1 but I haven't been able to get it working. Does anyone have any suggestions on how I can go about this?
Would this help?
SELECT
MIN(ItemID)
,MAX(ItemID)
FROM
(
SELECT ItemID, RANK() OVER (ORDER BY ItemID) R FROM Items
) Tmp
GROUP BY
ItemID - R
I'd think this should do the trick: 1) order by itemID 2) use OVER...PARTITION to get row number 3) use it in a recursive Common Table Expression that joins a number to all others where anchor + row number equals the ItemID, thereby finding all sequential numbers 4) group by the anchor in an outer query and then use MIN and MAX to get the range.
Related
I have a table with people who have bought tickets for a charity evening event, and the table contains details of registration event, and the XML will show guests they are bringing with them, but also details of any dietary requirements, and the occasional person who might be disabled. This is supposed to be pushed to our CRM system but this is not currently working.
I'm trying to extract some values out of some XML which is in a column in our import table.
I've seen plenty of examples of querying ordinary chunks of XML, but not when the XML is inside a table with other normal INT and VARCHAR values.
We are using SQL Server 2014. I've spent hours googling but haven't the faintest idea on making a query that combined the two together. Or even if I'm supposed to push the XML stuff into a temp table which I could then do a join with.
Declare #xmlstring xml = '<field_import_admin_event_tickets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<und is_array="true">
<item>
<value>8463</value>
<revision_id>4763</revision_id>
</item>
</und>
</field_import_admin_event_tickets>'
select
MainDataCenter.Col.value('(value)[1]', 'varchar(max)') as Name,
MainDataCenter.Col.value('(revision_id)[1]', 'varchar(max)') as Value
from
#xmlstring.nodes('/field_import_admin_event_tickets/und/item') as MainDataCenter(Col)
^ this will work
but I need to query it along with this:-
SELECT *
FROM [importtickets].[bcc].[entityform]
WHERE type LIKE '%show%'
AND createdDATETIME > '2019-03-14'
AND LEN(CAST(field_import_admin_event_tickets AS VARCHAR(MAX)) ) >1
-- bodging a way of seeing if XML code exists or not, doesn't seem to work with IS NOT NULL
AND Jobstatus = 'completed'
The only way I can crudely get values out of the XML is CAST it to a VARCHAR and use lots of REPLACE commands to strip out the XML tags to get it down to the values. There may be 2 to 18 numeric values in each lump of XML
This is my first post on StackOverflow and I've spent days searching on this, so please be gentle with me. Thanks.
2019-07-10 Hey, so I didn't make this fully clear.
each column of XML (a few are nulls) contains 2 - 34 separate numbers in. I dd some crude manipulation of data by CASTing this into VARCHAR and running lots of replace commands to understand it better.
this is the largest example here of some XML, 34 integer values, 17 are 'value' and 17 are 'revision_id'
So I then pushed this all into a new table using lots of SUBSTRING. This is crude but effective, but assumes each value is five digits long (it is so far) my boss is not keen on this solution though.
crudely shredded XML using CAST to VARCHAR and tags manually stripped out
I just need each sets of values extracted in each row so I can then do a JOIN or subquery to them, with a row or something identifiable. The numbers will refer to a guest who is coming to some charity events which will have some attributes such as dietary requirements or disability.
I don't know, if this is the very best approach for your issue, but I hope that I got your question correctly, that you want to combine the working query against an isolated XML with the tabular query, where the XML is the content of a column:
First of all I create a mockup with two rows
DECLARE #mockupTable TABLE(ID INT IDENTITY,SomeOtherValue VARCHAR(100),YourXml XML);
INSERT INTO #mockupTable(SomeOtherValue,YourXml) VALUES
('This is some value in row 1'
,'<field_import_admin_event_tickets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<und is_array="true">
<item>
<value>8463</value>
<revision_id>4763</revision_id>
</item>
</und>
</field_import_admin_event_tickets>')
,('This is some value in row 2'
,'<field_import_admin_event_tickets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<und is_array="true">
<item>
<value>999</value>
<revision_id>888</revision_id>
</item>
</und>
</field_import_admin_event_tickets>');
--The query
SELECT t.ID
,t.SomeOtherValue
,MainDataCenter.Col.value('(value)[1]', 'varchar(max)') as Name
,MainDataCenter.Col.value('(revision_id)[1]', 'varchar(max)') as Value
FROM #mockupTable t
CROSS APPLY t.YourXml.nodes('/field_import_admin_event_tickets/und/item') as MainDataCenter(Col);
The result
ID SomeOtherValue Name Value
1 This is some value in row 1 8463 4763
2 This is some value in row 2 999 888
The idea in short:
APPLY allows to call a table-valued function row-wise. In this case we hand in the content of a column (in your case the XML) into the built-in function .nodes().
Similar to a JOIN we get a joined set, which adds columns (and rows) to the final set. We can use the .value() method to retrieve the actual values from the XML.
If this is the best approach? I don't know...
Your sample above shows just one single <item>. .nodes() would be needed to return several <item> elements in a derived set. With just one <item> this could be done more easily using .value() directly...
What I Have
I have a variable size XML document that needs to be parsed on MSSQL 2008 R2 that looks like this:
<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
<item name="1">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
</item>
<item name="2">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
</item>
</data>
.
What I Want
I need to transform it into a regular table type dataset that looks like this:
item_name field_id field_type field_value
--------- ------------------------------------ ----------- ---------------
1 EA032B25-19F1-4C1B-BDDE-3113542D13A5 2 0.5065430097062
1 71014ACB-571B-4C72-9C9B-05458B11335F 2 -0.795004023461
1 740C36E9-1988-413E-A1D5-B3E5B4405B45 2 0.0152649050024
2 EA032B25-19F1-4C1B-BDDE-3113542D13A5 2 0.3660968028040
2 71014ACB-571B-4C72-9C9B-05458B11335F 2 -0.386642801354
2 740C36E9-1988-413E-A1D5-B3E5B4405B45 2 0.0316711741841
3 EA032B25-19F1-4C1B-BDDE-3113542D13A5 2 0.8839620369590
3 71014ACB-571B-4C72-9C9B-05458B11335F 2 -0.781459993268
3 740C36E9-1988-413E-A1D5-B3E5B4405B45 2 0.2284423515729
.
What Works
This cross apply query creates the desired output:
create table #temp (x xml)
insert into #temp (x)
values ('
<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
<item name="1">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
</item>
<item name="2">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
</item>
<item name="3">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
</item>
</data>
')
select c.value('(../#name)','varchar(5)') as item_name
,c.value('(#id)','uniqueidentifier') as field_id
,c.value('(#type)','int') as field_type
,c.value('(.)','nvarchar(15)') as field_value
from #temp cross apply
#temp.x.nodes('/data/item/field') as y(c)
drop table #temp
.
Problem
When there are a few hundred (or fewer) <item> elements in the XML, the query performs just fine. However, when there are 1,000 <item> elements, it takes 24 seconds to finish returning the rows in SSMS. When there are 6,500 <item> elements, it takes about 20 minutes to run the cross apply query. We could have 10-20,000 <item> elements.
.
Questions
What makes the cross apply query perform so poorly on this simple XML document, and perform exponentially slower as the dataset grows?
Is there a more efficient way to transform the XML document into the tabular dataset (in SQL)?
What makes the cross apply query perform so poorly on this simple XML
document, and perform exponentially slower as the dataset grows?
It Is the use of the parent axis to get the attribute ID from the item node.
It is this part of the query plan that is problematic.
Notice the 423 rows coming out of the lower Table-valued function.
Adding just one more item node with three field nodes gives you this.
732 rows returned.
What if we double the nodes from the first query to a total of 6 item nodes?
We are up to a whopping 1602 row returned.
The figure 18 in the top function is all field nodes in your XML. We have here 6 items with three fields in each item. Those 18 nodes are used in a nested loops join against the other function so 18 executions returning 1602 rows gives that it is returning 89 rows per iteration. That just happens to be the exact number of nodes in the entire XML. Well it is actually one more than all the visible nodes. I don't know why. You can use this query to check the total number of nodes in your XML.
select count(*)
from #XML.nodes('//*, //#*, //*/text()') as T(X)
So the algorithm used by SQL Server to get the value when you use the parent axis .. in a values function is that it first finds all the nodes you are shredding on, 18 in the last case. For each of those nodes it shreds and returns the entire XML document and checks in the filter operator for the node you actually want. There you have your exponential growth.
Instead of using the parent axis you should use one extra cross apply. First shred on item and then on field.
select I.X.value('#name', 'varchar(5)') as item_name,
F.X.value('#id', 'uniqueidentifier') as field_id,
F.X.value('#type', 'int') as field_type,
F.X.value('text()[1]', 'nvarchar(15)') as field_value
from #temp as T
cross apply T.x.nodes('/data/item') as I(X)
cross apply I.X.nodes('field') as F(X)
I also changed how you access the text value of the field. Using . will make SQL Server go look for child nodes to field and concatenate those values in the result. You have no child values so the result is the same but it is a good thing to avoid to have that part in the query plan (the UDX operator).
The query plan does not have the issue with the parent axis if you are using an XML index but you will still benefit from changing how you fetch the field value.
Adding an XML index did the trick. Now the 6,500 records that took 20 minutes to run takes < 4 seconds.
create table #temp (id int primary key, x xml)
create primary xml index idx_x on #temp (x)
insert into #temp (id, x)
values (1, '
<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
<item name="1">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
</item>
<item name="2">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
</item>
<item name="3">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
</item>
</data>
')
select c.value('(../#name)','varchar(5)') as item_name
,c.value('(#id)','uniqueidentifier') as field_id
,c.value('(#type)','int') as field_type
,c.value('(.)','nvarchar(15)') as field_value
from #temp cross apply
#temp.x.nodes('/data/item/field') as y(c)
drop table #temp
I am doing some crude benchmarks with the xml datatype of SQL Server 2008. I've seen many places where .exist is used in where clauses. I recently compared two queries though and got odd results.
select count(testxmlrid) from testxml
where Attributes.exist('(form/fields/field)[#id="1"]')=1
This query takes about 1.5 seconds to run, with no indexes on anything but the primary key(testxmlrid)
select count(testxmlrid) from testxml
where Attributes.value('(/form/fields/field/#id)[1]','integer')=1
This query on the otherhand takes about .75 seconds to run.
I'm using untyped XML and my benchmarking is taking place on a SQL Server 2008 Express instance. There are about 15,000 rows in the dataset and each XML string is about 25 lines long.
Are these results I'm getting correct? If so, why does everyone use .exist? Am I doing something wrong and .exist could be faster?
You are not counting the same things. Your .exist query (form/fields/field)[#id="1"] checks all occurrences of #id in the XML until it finds one with the value 1 and your .value query (/form/fields/field/#id)[1] only fetches the first occurrence of #id.
Test this:
declare #T table
(
testxmlrid int identity primary key,
Attributes xml
)
insert into #T values
('<form>
<fields>
<field id="2"/>
<field id="1"/>
</fields>
</form>')
select count(testxmlrid) from #T
where Attributes.exist('(form/fields/field)[#id="1"]')=1
select count(testxmlrid) from #T
where Attributes.value('(/form/fields/field/#id)[1]','integer')=1
The .exist query count is 1 because it finds the #id=1in the second field node and the .value query count is 0 because it only checks the value for the first occurrence of #id.
An .exist query that only checks the value for the first occurrence of #id like your .value query would look like this.
select count(testxmlrid) from #T
where Attributes.exist('(/form/fields/field/#id)[1][.="1"]')=1
The difference could come from your indexes.
A PATH index will boost performance of the exist() predicate on the WHERE clause, whereas a PROPERTY index will boost performance of the value() function.
Read:
http://msdn.microsoft.com/en-us/library/bb522562.aspx
I have an automated process that inserts an XML document into SQL Server 2008 table, the column is of Type XML. There is a lot of duplicated data, I wonder if anyone can recommend a good way to delete non-distinct values based on the XML column? The table has thousands of rows and each XML document is about 70k.
Each XML document looks the same except for one element value, for example:
Row 1 , Column C:
<?xml version="1.0"?><a><b/><c>2010.09.28T10:10:00</c></a>
Row 2, Column C:
<?xml version="1.0"?><a><b/><c>2010.09.29T10:10:00</c></a>
I want to pretend that the value of is ignored when it comes to the diff. If everything else is equal, then I want to consider the documents to be the same. If any other element is different, then the documents would be considered different.
Thanks for all ideas.
Can you qualify what 'distinct XML' means for you? For example what is the difference between:
<a><b/></a>
<?xml version="1.0"?><a><b/></a>
<a xmlns:xhtml="http://www.w3.org/1999/xhtml"><b/></a>
<a><b xsi:nil="true" /></a>
<a><b></b></a>
<?xml version="1.0" encoding="UTF-8"?><a><b/></a>
<?xml version="1.0" encoding="UTF-16"?><a><b></b></a>
In your opinion, how many 'distinct' XMLs are there?
Updated
If your XML looks like: <?xml version="1.0"?><a><b/><c>2010.09.29T10:10:00</c></a> then you can project the element that distinguish the fields and query on this projection:
with cte_x as (
select xmlcolumn.value(N'(//a/c)[1]', N'DATETIME') as xml_date_a_c,
...
from table
),
cte_rank as (
select row_number() over (partition by xml_date_a_c order by ...) as rn
from cte_x)
delete from cte_rank
where rn > 1;
I'm trying to query a particular value in an XML field. I've seen lots of examples, but they don't seem to be what I'm looking for
Supposing my xml field is called XMLAttributes and table TableName, and the complete xml value is like the below:
<Attribute name="First2Digits" value="12" />
<Attribute name="PurchaseXXXUniqueID" value="U4RV123456762MBE79" />
(although the xml field will frequently have other attributes, not just PurchaseXXXUniqueID)
If I'm looking for a specific value in the PurchaseXXXUniqueID attribute name - say U4RV123456762MBE79 - how would I write the query? I believe it would be something like:
select *
from TableName
where XMLAttributes.value('(/path/to/tag)[1]', 'varchar(100)') = '5FTZP2QT8Z3E2MAV2D'
... but it's the path/to/tag that I need to figure out.
Or probably there's other ways of getting the values I want.
To summarize - I need to get all the records in a table where the value of a particular attribute in the xml field matches a value I'll pass to the query.
thanks for the help!
Sylvia
edit: I was trying to make this simpler, but in case it makes a difference - ultimately I'll have a temporary table of 50 or so potential values for the PurchaseXXXUniqueID field. For these, I want to get all the matching records from the table with the XML field.
This ought to work:
SELECT
(fields from base table),
Nodes.Attr.value('(#name)[1]', 'varchar(100)'),
Nodes.Attr.value('(#value)[1]', 'varchar(100)')
FROM
dbo.TableName
CROSS APPLY
XMLAttributes.nodes('/Attribute') AS Nodes(Attr)
WHERE
Nodes.Attr.value('(#name)[1]', 'varchar(100)') = 'PurchaseXXXUniqueID'
AND Nodes.Attr.value('(#value)[1]', 'varchar(100)') = 'U4RV123456762MBE79'
You basically need to join the base table's row against one "pseudo-row" for each of the <Attribute> nodes inside the XML column, and the pick out the individual attribute values from the <Attribute> node to select what you're looking for.
Something like that?
declare #PurchaseXXXUniqueID varchar(max)
set #PurchaseXXXUniqueID = 'U4RV123456762MBE79';
select * from TableName t
where XMLAttributes.exist('//Attribute/#value = sql:variable("#PurchaseXXXUniqueID")') = 1