Finding Duplicate Values in XML Column

Finding Duplicate Values in XML Column - sql-server

In my SQL Server DB I have table with an XML column. The XML that goes in it is like the sample below:
<Rows>
<Row>
<Name>John</Name>
</Row>
<Row>
<Name>Debbie</Name>
</Row>
<Row>
<Name>Annie</Name>
</Row>
<Row>
<Name>John</Name>
</Row>
</Rows>
I have a requirement that I need to find the occurrence of all rows where the XML data has duplicate entries of <Name>. For example, above we have 'John' twice in the XML.
I can use the exist XML statement to find 1 occurrence, but how can I find if it's more than 1? Thanks.

To identify any table row that has duplicate <Name> values in its XML, you can use exist as well:
exist('//Name[. = preceding::Name]')
To identify which names are duplicates, respectively, you need nodes and CROSS APPLY
SELECT
t.id,
x.Name.value('.', 'varchar(100)') AS DuplicateName
FROM
MyTable t
CROSS APPLY t.MyXmlColumn.nodes('//Name[. = preceding::Name]') AS x(Name)
WHERE
t.MyXmlColumn.exist('//Name[. = preceding::Name]')

Try this:
;with cte as
(SELECT tbl.col.value('.[1]', 'varchar(100)') as name
FROM yourtable
CROSS APPLY xmlcol.nodes('/Rows/Row/Name') as tbl(col))
select name
from cte
group by name
having count(name) > 1
We first use the nodes function to convert from XML to relational data, then use value to get the text inside the Name node. We then put the result of the previous step into a CTE, and use a simple group by to get the value with multiple occurences.
Demo

Related

get xml value from an NVARCHAR(MAX) column

I read many related posts and tried to get the xml value from a column with type NVARCHAR(MAX) for any specific tags.
CREATE TABLE dataTable (RECID NVARCHAR(MAX),XMLRECORD NVARCHAR(MAX));
My XMLRECORD column will contain a data with tag like
<row id='1'>
<c2>Account-sample</c2>
</row>
Below attached is a select query that I created, which yielded a CLOB instead of the actual value. Any idea on how to get the actual value? (i.e. Account-sample)
select b.x.value('data(/row/c2)[1]', 'NVARCHAR(max)')
from dataTable a
cross apply(select cast(cast(XMLRECORD as VARCHAR(max)) as XML) x) b;

Use the Below Query
select *, try_cast(xmlrecord as xml).value('(row/c2)[1]', 'nvarchar(500)') as c2
from dataTable

SQL Server 2014 - FOR XML AUTO avoid automatic node nesting

I'm trying to build some query to export data in XML and I build this query:
select
[invoice].*,
[rows].*,
[payment].payerID,
[items].picture
from InvoicesHeader [invoice]
join InvoicesRows [rows] on [rows].invoiceID=[invoice].invoiceID
join Payments [payments] on [payments].paymentID=[invoice].paymentID
join Items [items] on [items].itemID=[rows].itemID
FOR XML Auto, ROOT ('invoices'), ELEMENTS
and I got something like this as result
<invoices>
<invoice>
<ID>82</ID>
<DocType>R</DocType>
<DocYear>2017</DocYear>
<DocNumber>71</DocNumber>
<IssueDate>2017-07-17T15:17:30.237</IssueDate>
<OrderID>235489738019</OrderID>
...
<payments>
<payerID>3234423f33</payerID>
<rows>
<ID>163</ID>
<ItemID>235489738019</ItemID>
<Quantity>2</Quantity>
<Price>1</Price>
<VATCode>22</VATCode>
<Color>-</Color>
<Size></Size>
<SerialNumber></SerialNumber>
<items>
<picture>http://nl.imgbb.com/AAOSwOdpXyB4I.JPG</picture>
</items>
</rows>
....
</payments>
</invoice>
</invoices>
while I would like to have something like this where
[rows] is childnode of invoice and not of payments
<invoices>
<invoice>
<ID>82</ID>
<DocType>R</DocType>
<DocYear>2017</DocYear>
<DocNumber>71</DocNumber>
<IssueDate>2017-07-17T15:17:30.237</IssueDate>
<OrderID>235489738019</OrderID>
...
<payments>
<payerID>3234423f33</payerID>
</payments>
<rows>
<ID>163</ID>
<ItemID>235489738019</ItemID>
<Quantity>2</Quantity>
<Price>1</Price>
<VATCode>22</VATCode>
<Color>-</Color>
<Size></Size>
<SerialNumber></SerialNumber>
<items>
<picture>http://nl.imgbb.com/AAOSwOdpXyB4I.JPG</picture>
</items>
</rows>
....
</invoice>
</invoices>
seen some solution where there are many
FOR XML AUTO
put all together, but the data here comes from connected table, would be a pity to re-query 2-3 times same values
how can achieve it?
Thanks

Try changing the select order around to this;
select
[invoice].*,
[payment].payerID,
[items].picture,
[rows].*
from InvoicesHeader [invoice]
join InvoicesRows [rows] on [rows].invoiceID=[invoice].invoiceID
join Payments [payments] on [payments].paymentID=[invoice].paymentID
join Items [items] on [items].itemID=[rows].itemID
FOR XML Auto, ROOT ('invoices'), ELEMENTS

well, found that have to use FOR XML PATH instead and add the other table as subquery with each FOR XML PATH as follows:
select
[invoice].*,
p.payerID,
(select r.* from InvoiceRows r where r.invoiceID=i.invoiceID for XML PATH ('rows'), type)
from InvoicesHeader i
join payment p on i.paymentID=p.paymentID
FOR XML PATH('invoice'), ROOT ('invoices'), ELEMENTS

Search XML files stored in a MS SQL database

I have over 500,000 XML files stored in a MS SQL data base such as the one below (which has been edited to save space in the question).
<?xml version="1.0"?>
<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<APPLICATION_ID>7000518</APPLICATION_ID>
<ACTIVITY>C06</ACTIVITY>
<ADMINISTERING_IC>RR</ADMINISTERING_IC>
<APPLICATION_TYPE>1</APPLICATION_TYPE>
<BUDGET_START>09/01/2009</BUDGET_START>
<BUDGET_END>09/30/2013</BUDGET_END>
<FULL_PROJECT_NUM>1C06RR020539-01A1</FULL_PROJECT_NUM>
<FY>2009</FY>
<ORG_STATE>CA</ORG_STATE>
<ORG_ZIPCODE>900952000</ORG_ZIPCODE>
<PIS>
<PI>
<PI_NAME>JONES,MARY</PI_NAME>
<PI_ID>9876543</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN</PI_NAME>
<PI_ID>1234567</PI_ID>
</PI>
</PIS>
<PROJECT_TERMSX>
<TERM>Extramural Activities</TERM>
<TERM>Extramural Research Facilities Construction Project</TERM>
</PROJECT_TERMSX>
<PROJECT_TITLE>The Center for Oral/Research</PROJECT_TITLE>
<SUPPORT_YEAR>1</SUPPORT_YEAR>
</row>
</PROJECTS>
I can search for any of the single nodes using something like:
SELECT nref.value('(APPLICATION_ID)[1]', 'Int') APPLICATION_ID,
nref.value('(ACTIVITY)[1]', 'varchar(3)') ACTIVITY
FROM [XML_2010] cross apply XMLData.nodes('//PROJECTS/row') as R(nref)
WHERE nref.value('(CORE_PROJECT_NUM)[1]', 'varchar(25)') LIKE '%CA187342%'
But how can I find the data associated with all XML files that have DOE, JOHN as a PI which is a sub node to PIS? Such as the APPLICATION_ID and BUDGET_START etc?
Thanks for the help

XML is great for archives and data exchange, but is the wrong container to store actively used / filtered / searched data. Therefore I'd strongly suggest to transfer all your data in classical, indexed tables like this:
Attention I reduce your XML to some examples per level, the rest follows the same approach and is up to you. The declared table variable is to mock-up a test scenario:
DECLARE #YourTable TABLE(ID INT IDENTITY,YourXml XML);
INSERT INTO #YourTable VALUES
('<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<APPLICATION_ID>7000518</APPLICATION_ID>
<ACTIVITY>C06</ACTIVITY>
<!-- more first level elements like above -->
<!-- Here there are multiple PIs -->
<PIS>
<PI>
<PI_NAME>JONES,MARY</PI_NAME>
<PI_ID>9876543</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN</PI_NAME>
<PI_ID>1234567</PI_ID>
</PI>
</PIS>
<!-- Here there are multiple PROJECT_TERMS -->
<PROJECT_TERMSX>
<TERM>Extramural Activities</TERM>
<TERM>Extramural Research Facilities Construction Project</TERM>
</PROJECT_TERMSX>
<!-- These are normal first level elements again -->
<PROJECT_TITLE>The Center for Oral/Research</PROJECT_TITLE>
<SUPPORT_YEAR>1</SUPPORT_YEAR>
</row>
</PROJECTS>');
--This SELECT reads all first-level-data together with the partial XMLs into a temp table #Projects:
SELECT r.value('(APPLICATION_ID/text())[1]','bigint') AS APPLICATION_ID
,r.value('(ACTIVITY/text())[1]','nvarchar(max)') AS ACTIVITY
--more columns like above
,r.query('PIS') AS AllPis
,r.query('PROJECT_TERMSX') AS AllProjectTerms
--more first level columns
INTO #Projects
FROM #YourTable AS t
OUTER APPLY t.YourXml.nodes('/PROJECTS/row') AS A(r);
--This SELECT reads from #Projects and stores all related PI-data in another temp table #PIs
SELECT APPLICATION_ID
,p.value('(PI_ID/text())[1]','bigint') AS PI_ID
,p.value('(PI_NAME/text())[1]','nvarchar(max)') AS PI_NAME
INTO #PIs
FROM #Projects AS p
OUTER APPLY p.AllPis.nodes('PIS/PI') AS A(p);
--Same with #Terms
SELECT APPLICATION_ID
,t.value('(./text())[1]','nvarchar(max)') AS TERM
INTO #Terms
FROM #Projects AS p
OUTER APPLY p.AllProjectTerms.nodes('PROJECT_TERMSX/TERM') AS A(t);
--This is now the content of your temp tables
SELECT * FROM #Projects;
SELECT * FROM #PIs;
SELECT * FROM #Terms;
--Clean up
GO
DROP TABLE #Projects;
DROP TABLE #PIs;
DROP TABLE #Terms;
Before the Clean up you will enter some code, which writes your data out of these staging tables into real tables. The IDs to define the relation are stored together with the data. This should be easy. You will need INSERT INTO or MERGE, depending if you have to deal with already existing data.
Hint
You might think about a m:n-relation between projects and PIs and projects and terms. For this you'd write a separate PI-table and a separate Term-table with a mapping table in between (holding the application_id and the second id, both as foreign keys)

SQL Server query xml column

I need to pull values from an XML column. The table contains 3 fields with one being an XML column like below:
TransID int,
Place varchar(20),
Custom XML
The XML column is structured as following:
<Fields>
<Field>
<Id>9346-00155D1C204E</Id>
<TransactionCode>0710</TransactionCode>
<Amount>5.0000</Amount>
</Field>
<Field>
<Id>A6F0-BA07EF3A7D43</Id>
<TransactionCode>0885</TransactionCode>
<Amount>57.9000</Amount>
</Field>
<Field>
<Id>9BDA-7858FD182Z3C</Id>
<TransactionCode>0935</TransactionCode>
<Amount>25.85000</Amount>
</Field>
</Fields>
I need to be able to query the xml column and return only the value for the <Amount> if there is a <Transaction code> = 0935. Note: there are records where this transaction code isn’t present, but it won't exist in the same record twice.
This is probably simple, but I’m having a problem returning just the <amount> value where the <transaction code> = 0935.

You can try this way :
DECLARE #transCode VARCHAR(10) = '0935'
SELECT field.value('Amount[1]', 'decimal(18,5)') as Amount
FROM yourTable t
OUTER APPLY t.Custom.nodes('/Fields/Field[TransactionCode=sql:variable("#transCode)"]') as x(field)
Alternatively, you can put logic for filtering Field by TransactionCode in SQL WHERE clause instead of in XPath expression, like so :
DECLARE #transCode VARCHAR(10) = '0935'
SELECT field.value('Amount[1]', 'decimal(18,5)') as Amount
FROM yourTable t
OUTER APPLY t.Custom.nodes('/Fields/Field') as x(field)
WHERE field.value('TransactionCode[1]', 'varchar(10)') = #transCode
SQL Fiddle Demo

You can use an XPath like this in your TSQL:
SELECT
*,
Custom.value('(/Fields/Field[#Name="Id"]/#Value)[1]', 'varchar(50)')
FROM YourTable
WHERE Custom.value('(/Fields/Field[#Name="Id"]/#Value)[1]', 'varchar(50)') = '0655'

SQL 2008: getting rows with a join to an XML field?

Not sure if this question makes for some poor performance down the track, but seems to at least feel "a better way" right now..
What I am trying to do is this:
I have a table called CONTACTS which amongst other things has a primary key field called memberID
I also have an XML field which contains the ID's of your friends (for example).. like:
<root><id>2</id><id>6</id><id>14</id></root>
So what I am trying to do via a stored proc is pass in say your member ID, and return all of your friends info, for example:
select name, address, age, dob from contacts
where id... xml join stuff...
The previous way I had it working (well sort of!) selected all the XML nodes (/root/id) into a temp table, and then did a join from that temp table to the contact table to get the contact fields...
Any help much appreciated.. just a bit overloaded from the .query .nodes examples, and of course which is maybe a better way of doing this...
THANKS IN ADVANCE!
<-- EDIT -->
I did get something working, but looks like a SQL frankenstein statement!
Basically I needed to get the friends contact ID's from the XML field, and populate into a temp table like so:
Declare #contactIDtable TABLE (ID int)
INSERT INTO #contactIDtable (ID)
SELECT CONVERT(INT,CAST(T2.memID.query('.') AS varchar(100))) AS friendsID
FROM dbo.members
CROSS APPLY memberContacts.nodes('/root/id/text()') AS T2(memID)
But crikey! the convert/cast thing looks serious.. as I need to get an INT for the next bit which is the actual join to return the contact data as follows:
SELECT memberID, memberName, memberAddress1
FROM members
INNER JOIN #contactIDtable cid
ON members.memberID = cid.ID
ORDER BY memberName
RESULT...
Well it works.. in my case, my memberContacts XML field had 3 nodes (id's in this case), and the above query returned 3 rows of data (memberID, memberName, memberAddress1)...
The whole point of this of course was to try to save creating a many join table i.e. list of all my friends ID's... just not sure if the above actually makes this quicker and easier...
Anymore ideas / more efficient ways of trying to do this???

SQL Server's syntax for reading XML is one of the least intuitive around. Ideally, you'd want to:
select f.name
from friends f
join #xml x
on x.id = f.id
Instead, SQL Server requires you to spell out everything. To turn an XML variable or column into a "rowset", you have to spell out the exact path and think up two aliases:
#xml.nodes('/root/id') as table_alias(column_alias)
Now you have to explain to SQL Server how to turn <id>1</id> into an int:
table_alias.column_alias.value('.', 'int')
So you can see why most people prefer to decode XML on the client side :)
A full example:
declare #friends table (id int, name varchar(50))
insert #friends (id, name)
select 2, 'Locke Lamorra'
union all select 6, 'Calo Sanzo'
union all select 10, 'Galdo Sanzo'
union all select 14, 'Jean Tannen'
declare #xml xml
set #xml = ' <root><id>2</id><id>6</id><id>14</id></root>'
select f.name
from #xml.nodes('/root/id') as table_alias(column_alias)
join #friends f
on table_alias.column_alias.value('.', 'int') = f.id

In order to get your XML contents as rows from a "pseudo-table", you need to use the .nodes() on the XML column - something like:
DECLARE #xmlfield XML
SET #xmlfield = '<root><id>2</id><id>6</id><id>14</id></root>'
SELECT
ROOT.ID.value('(.)[1]', 'int')
FROM
#xmfield.nodes('/root/id') AS ROOT(ID)
SELECT
(list of fields)
FROM
dbo.Contacts c
INNER JOIN
#xmlfield.nodes('/root/id') AS ROOT(ID) ON c.ID = Root.ID.value('(.)[1]', 'INT')
Basically, the .nodes() defines a pseudo-table ROOT with a single column ID, that will contain one row for each node in the XPath expression, and the .value() selects a specific value out of that XML fragment.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Finding Duplicate Values in XML Column - sql-server

Related

get xml value from an NVARCHAR(MAX) column

SQL Server 2014 - FOR XML AUTO avoid automatic node nesting

Search XML files stored in a MS SQL database

SQL Server query xml column

SQL 2008: getting rows with a join to an XML field?

Categories

Resources