Pull all of the child XML nodes in SQL Server - sql-server

I have a SQL Server 2014 database that stores 2 million XML files. The XML file looks like:
<?xml version='1.0' encoding='UTF-16'?>
<PROJECTS xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044978</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS>
My problem is that I want to pull all of the PI values based upon the ORG_DUNS numbers in a stored procedure. So the code that I have is:
SELECT
APPLICATION_ID,
nref.value('.','varchar(max)') TERM
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS/PI') AS R(nref)
WHERE
RECID = 1
And that work fine when I use a WHERE cause based up another field in the database but if I need to reference a node in the xml file that's where I'm having a problem. I need to pull all the the XML files that have ORG_DUNS equal to 600044978 and I know that the nref.value('ORG_DUNS[1]', 'varchar(max)') does not exist because of the cross apply.
SELECT
APPLICATION_ID,
nref.value('.','varchar(max)') TERM
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS/PI') as R(nref)
WHERE
nref.value('ORG_DUNS[1]', 'varchar(max)') = '600044978'
So how can I get all of the PI nodes using the ORG_DUNS as my WHERE? Thanks

Change your Cross Apply statement to include the filter logic in the XPath:
CROSS APPLY XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)
To explain, //PIS[../ORG_DUNS/text() = ''600044978'']/PI says:
//PIS - find all elements called PIS
[...] - filter the returned elements for those matching this condition
../ORG_DUNS/text() = ''600044978'' - Condition: the PIS's parent element's ORG_DUNS's text value equals 600044978.
Then return the child PI elements of any matching PIS elements.
Update per comments
Full SQL, including PI and PI_ID as separate values:
SELECT
APPLICATION_ID
, nref.value('(./PI_NAME/text())[1]','varchar(max)') PI
, nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)
WHERE
RECID = 1
Notes:
./PI_NAME - from the currently selected element (i.e. the one refered to by the nref column; which is pointing at a PI element), take its child element, PI_NAME.
/text() - from the PI_NAME element, take its child text element (strictly this is not required, since when pulling back the value & converting to an varchar we'd get the same result; but I like to be explicit).
(...)[1] - return a singleton. i.e. we only want 1 value back, even if there were multiple PI_NAME elements under the current PI. By putting brackets around our expression we're saying "for all values returned by this expression"; and [1] says take the first result (since XPATH uses one-based indexes rather than zero-based as most other languages would).
Filtering with a variable
Anticipating your next issue; i.e. "How do you change the number at runtime without building dynamic SQL?", the answer's to use the sql:variable function:
declare #DunningNumber int = 600044978 --9 digit code http://www.dnb.com/duns-number.html; so can easily hold in an int: https://learn.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql
SELECT
APPLICATION_ID
, nref.value('(./PI_NAME/text())[1]','varchar(max)') PI
, nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS[../ORG_DUNS/text() = sql:variable("#DunningNumber")]/PI') AS R(nref)
WHERE
RECID = 1

The trick here is to first grab multiple levels in the document using multiple CROSS APPLY clauses. So first, starting from the root grab all the '/PROJECTS/row' and then use a relative path from each of those 'PIS/PI'.
Like this:
declare #t table(id int identity, APPLICATION_ID int default (2), XmlData xml)
insert into #t(XmlData) values (
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044978</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> '),(
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044979</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> ')
select APPLICATION_ID,
pinode.value('PI_NAME[1]','varchar(max)') PI_NAME,
pinode.value('PI_ID[1]','varchar(max)') PI_ID
FROM #t
cross apply XMLData.nodes('/PROJECTS/row') as r(rownode)
cross apply rownode.nodes('PIS/PI') as p(pinode)
where rownode.value('ORG_DUNS[1]','varchar(max)') = '600044978'

Related

Retrieve associated value from next node for each tag

I have the following XML:
<Envelope format="ProceedoOrderTransaction2.1">
<Sender>SENDER</Sender>
<Receiver>RECEIVER</Receiver>
<EnvelopeID>xxxxx</EnvelopeID>
<Date>2021-05-06</Date>
<Time>11:59:46</Time>
<NumberOfOrder>1</NumberOfOrder>
<Order>
<Header>
<OrderNumber>POXXXXX</OrderNumber>
</Header>
<Lines>
<Line>
<LineNumber>1</LineNumber>
<ItemName>Ipsum Lorum</ItemName>
<SupplierArticleNumber>999999</SupplierArticleNumber>
<UnitPrice vatRate="25.0">50</UnitPrice>
<UnitPriceBasis>1</UnitPriceBasis>
<OrderedQuantity unit="Styck">200</OrderedQuantity>
<AdditionalItemProperty Key="ARTIKELNUMMER" Description="Unik ordermärkning (artikelnummer):" />
<Value>999999</Value>
<AdditionalItemProperty Key="BESKRIVNING" Description="Kort beskrivning:" />
<Value>Ipsum Lorum</Value>
<AdditionalItemProperty Key="BSKRIVNING" Description="Beskrivning:" />
<Value>Ipsum Lorum</Value>
<AdditionalItemProperty Key="ENHET" Description="Enhet:" />
<Value>Styck</Value>
<AdditionalItemProperty Key="KVANTITET" Description="Kvantitet:" />
<Value>200</Value>
<AdditionalItemProperty Key="PRIS" Description="Pris/Enhet (ex. moms):" />
<Value>50</Value>
<AdditionalItemProperty Key="VALUTA" Description="Valuta:" />
<Value>SEK</Value>
<Accounting>
<AccountingLine amount="10000">
<AccountingValue dimensionPosition="001" dimensionExternalID="ACCOUNT">xxx</AccountingValue>
<AccountingValue dimensionPosition="002" dimensionExternalID="F1">Ipsum Lorum</AccountingValue>
<AccountingValue dimensionPosition="005" dimensionExternalID="F3">1</AccountingValue>
<AccountingValue dimensionPosition="010" dimensionExternalID="F2">9999</AccountingValue>
</AccountingLine>
</Accounting>
</Line>
</Lines>
</Order>
</Envelope>
I am able to parse out all values correctly to table structure except for 1 value in a way that ensures its it associated with its tag. So where I stumble is that I am correctly getting 1 row per AdditionalItemProperty and I am able to get the Key and Description tag values, for example BESKRIVNING and Kort beskrivning:, but I can't (in a reasonable way) get the value between <Value> </Value> brackets that is also associated with each tag value. So for tag key value BESKRIVNING the associated value is 99999 which seem to be on same hierarchy level (insane I know) as the AdditionalItemProperty it is associated with. Seems like they use logic that value for a AdditionalItemProperty will be following the AdditionalItemProperty tag.
I am using SQL Server 2019. I have gotten this far:
-- Purchaseorderrowattributes
select top(10)
i.value(N'(./Header/OrderNumber/text())[1]', 'nvarchar(30)') as OrderNumber,
ap.value(N'(../LineNumber/text())[1]', 'nvarchar(30)') as LineNumber,
ap.value(N'(#Description)', 'nvarchar(50)') property_description
from
load.proceedo_orders t
outer apply
t.xml_data.nodes('Envelope/Order') AS i(i)
outer apply
i.nodes('Lines/Line/AdditionalItemProperty') as ap(ap)
where
file_name = #filename
Which produces the following output:
OrderNumber LineNumber property_description
--------------------------------------------
PO170006416 1 Antal timmar
PO170006416 1 Beskrivning
PO170006416 1 Kompetensområde
PO170006416 1 Roll
PO170006416 1 Ordernummer
PO170006416 1 Timpris
I can't find a way to add the value for each property in a correct way. Since the ordering of the values will always be same as the ordering of the AdditionalItemProperty i found solution to get ordering of the AdditionalItemProperty and then i could use rownumber and i was then hoping to input the rownumber value into the bracket in
ap.value(N'(../Value/text())[1]', 'nvarchar(50)') property_description
but SQL Server throws exception that it has to be string literal.
So to be clear what I tried doing with something like:
ap.value(CONCAT( N'(../Value/text())[', CAST(ROWNUMBER as varchar) ,']'), 'nvarchar(50)') property_description
SQL Server uses XQuery 1.0.
You can make use of the >> Node Comparison operator to find the Value sibling node with code similar to the following:
-- Purchaseorderrowattributes
select top(10)
i.value(N'(./Header/OrderNumber/text())[1]', 'nvarchar(30)') as OrderNumber
,ap.value(N'(../LineNumber/text())[1]', 'nvarchar(30)') as LineNumber
,ap.value(N'(#Description)', 'nvarchar(50)') property_description
,ap.query('
let $property := .
return (../Value[. >> $property][1])/text()
').value('.[1]', 'varchar(20)') as property_value
from load.proceedo_orders t
outer apply t.xml_data.nodes('Envelope/Order') AS i(i)
outer apply i.nodes('Lines/Line/AdditionalItemProperty') as ap(ap)
where file_name = #filename
So what's going on here?
let $property := . is creating a reference to the current AdditionalItemProperty node.
../Value[. >> $property] is ascending to the parent node, Line, and descending again to find Value nodes after the AditionalItemProperty node reference in document order, with [1] selecting the first one of those nodes.
See the 3.5.3 Node Comparisons section for a little more detail.

Querying Non-XML compliant structured data

As a data analyst, I am constantly running across files with structured data that are in some proprietary format and resist normal XML parsing.
For example, I have an archive of about a hundred documents that all begin with this:
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
I have included an abridged example of the document below, don't read it if you're offended by cloning.
At any rate, is there a way to query this without having DTD or namespace or URI or whatever it is I need? I'm ok using SQL Server 2012+ or xquery or, I dunno, php or vba.
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
<document synfileid="MCIESS0044">
<galedata><project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project></galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>
<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
<para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
</doc.body>
</document>
Your SGML input data is almost XML, up to the fact that, unlike full SGML, XML always requires a system identifier (a file name or URL) for the external DTD subset, and not just a public identifier such as -//Gale Research//DTD Document V2.0//EN. So the DOCTYPE declaration must take this form in XML
<!DOCTYPE DOCUMENT
PUBLIC "-//Gale Research//DTD Document V2.0//EN"
"test.dtd">
where I've added "test.dtd" as system identifier/file name of the external subset. Of course, now a test.dtd file must exist. It is sufficient that an empty test.dtd file is created in the working directory, or test.dtd could contain some meaningful declarations for your markup at hand such as the following:
<!ELEMENT document ANY>
<!ELEMENT galedata ANY>
<!ELEMENT project ANY>
<!ELEMENT projectname ANY>
<!ELEMENT title ANY>
<!ELEMENT doc.head ANY>
<!ELEMENT doc.body ANY>
<!ELEMENT para ANY>
<!ELEMENT head ANY>
<!ATTLIST document synfileid CDATA #IMPLIED>
<!ATTLIST head n NUMBER #IMPLIED>
But as you've found out, you can also make XML tools happy by just removing the DOCTYPE line.
Now if you wanted to process your file(s) into XML without manual editing, you could use SGML to pre-process the file(s) into compliant XML, and then use XML query tools against the produced XML.
To do so using the OpenSP/OpenJade SGML package (can be installed on Ubuntu by eg. sudo apt-get install opensp), you'd place a catalog file in the directory, containing the following line telling SGML to resolve the public identifier -//Gale Research//DTD Document V2.0//EN to test.dtd:
PUBLIC "-//Gale Research//DTD Document V2.0//EN" "test.dtd"
You could edit test.dtd from the XML version I gave above to contain tag omission indicators which would be required for classic SGML by default. But there's another feature your test data is using which isn't enabled by default in OpenSP tools, namely support for hexadecimal character entity references such as — in your example data. Hence, we're going to need to use a custom SGML declaration (a somewhat archaic piece of plain text specifying SGML features your document is using) for your data anyway. So while we're at it, we're also going to declare FEATURES MINIMIZE OMITTAG NO in the the SGML declaration, which makes SGML accept element declarations without tag omission indicators so we don't have to change test.dtd from the XML version.
We could place the SGML declaration right into the document itself, but to avoid manual editing. we're going to use catalog resolution for the SGML declaration. If you add the following line to your catalog file
SGMLDECL "xml10-sgmldecl.dcl"
then OpenSP will use whatever is stored in xml10-sgmldecl.dcl as SGML declaration. The actual SGML we're going to use is the official SGML declaration for XML 1.0 (which has all the features we want already). You don't need to understand the meaning of an SGML declaration in detail. Just paste the text attached below into a file named xml10-sgmldecl.dcl; if you're interested in the details, see my description at http://sgmljs.net/docs/sgmlrefman.html#sgml-declaration.
Now you'll be able to invoke
osx <your-file>
to produce XML from your SGML without errors.
<!SGML "ISO 8879:1986 (WWW)"
-- SGML Declaration for XML 1.0 --
-- from:
Final text of revised Web SGML Adaptations Annex (TC2) to ISO 8879:1986
ISO/IEC JTC1/SC34 N0029: 1998-12-06
Annex L.2 (informative): SGML Declaration for XML
changes made to accommodate validation are noted with 'VALID:'
--
CHARSET
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 55136 160
55296 2048 UNUSED -- surrogates --
57344 8190 57344
65534 2 UNUSED -- FFFE and FFFF --
65536 1048576 65536
CAPACITY NONE -- Capacities are not restricted in XML --
SCOPE DOCUMENT
SYNTAX
SHUNCHAR NONE
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 1114112 0
FUNCTION
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING
LCNMSTRT ""
UCNMSTRT ""
NAMESTRT
58 95 192-214 216-246 248-305 308-318 321-328
330-382 384-451 461-496 500-501 506-535 592-680
699-705 902 904-906 908 910-929 931-974 976-982
986 988 990 992 994-1011 1025-1036 1038-1103
1105-1116 1118-1153 1168-1220 1223-1224
1227-1228 1232-1259 1262-1269 1272-1273
1329-1366 1369 1377-1414 1488-1514 1520-1522
1569-1594 1601-1610 1649-1719 1722-1726
1728-1742 1744-1747 1749 1765-1766 2309-2361
2365 2392-2401 2437-2444 2447-2448 2451-2472
2474-2480 2482 2486-2489 2524-2525 2527-2529
2544-2545 2565-2570 2575-2576 2579-2600
2602-2608 2610-2611 2613-2614 2616-2617
2649-2652 2654 2674-2676 2693-2699 2701
2703-2705 2707-2728 2730-2736 2738-2739
2741-2745 2749 2784 2821-2828 2831-2832
2835-2856 2858-2864 2866-2867 2870-2873 2877
2908-2909 2911-2913 2949-2954 2958-2960
2962-2965 2969-2970 2972 2974-2975 2979-2980
2984-2986 2990-2997 2999-3001 3077-3084
3086-3088 3090-3112 3114-3123 3125-3129
3168-3169 3205-3212 3214-3216 3218-3240
3242-3251 3253-3257 3294 3296-3297 3333-3340
3342-3344 3346-3368 3370-3385 3424-3425
3585-3630 3632 3634-3635 3648-3653 3713-3714
3716 3719-3720 3722 3725 3732-3735 3737-3743
3745-3747 3749 3751 3754-3755 3757-3758 3760
3762-3763 3773 3776-3780 3904-3911 3913-3945
4256-4293 4304-4342 4352 4354-4355 4357-4359
4361 4363-4364 4366-4370 4412 4414 4416 4428
4430 4432 4436-4437 4441 4447-4449 4451 4453
4455 4457 4461-4462 4466-4467 4469 4510 4520
4523 4526-4527 4535-4536 4538 4540-4546 4587
4592 4601 7680-7835 7840-7929 7936-7957
7960-7965 7968-8005 8008-8013 8016-8023 8025
8027 8029 8031-8061 8064-8116 8118-8124 8126
8130-8132 8134-8140 8144-8147 8150-8155
8160-8172 8178-8180 8182-8188 8486 8490-8491
8494 8576-8578 12295 12321-12329 12353-12436
12449-12538 12549-12588 19968-40869 44032-55203
LCNMCHAR ""
UCNMCHAR ""
NAMECHAR
45-46 183 720-721 768-837 864-865 903 1155-1158
1425-1441 1443-1465 1467-1469 1471 1473-1474
1476 1600 1611-1618 1632-1641 1648 1750-1764
1767-1768 1770-1773 1776-1785 2305-2307 2364
2366-2381 2385-2388 2402-2403 2406-2415
2433-2435 2492 2494-2500 2503-2504 2507-2509
2519 2530-2531 2534-2543 2562 2620 2622-2626
2631-2632 2635-2637 2662-2673 2689-2691 2748
2750-2757 2759-2761 2763-2765 2790-2799
2817-2819 2876 2878-2883 2887-2888 2891-2893
2902-2903 2918-2927 2946-2947 3006-3010
3014-3016 3018-3021 3031 3047-3055 3073-3075
3134-3140 3142-3144 3146-3149 3157-3158
3174-3183 3202-3203 3262-3268 3270-3272
3274-3277 3285-3286 3302-3311 3330-3331
3390-3395 3398-3400 3402-3405 3415 3430-3439
3633 3636-3642 3654-3662 3664-3673 3761
3764-3769 3771-3772 3782 3784-3789 3792-3801
3864-3865 3872-3881 3893 3895 3897 3902-3903
3953-3972 3974-3979 3984-3989 3991 3993-4013
4017-4023 4025 8400-8412 8417 12293 12330-12335
12337-12341 12441-12442 12445-12446 12540-12542
NAMECASE
GENERAL NO
ENTITY NO
DELIM
GENERAL SGMLREF
HCRO "&#x"
-- Ampersand followed by "#x" (without quotes) --
NESTC "/"
NET ">"
PIC "?>"
SHORTREF NONE
NAMES
SGMLREF
QUANTITY
NONE -- Quantities are not restricted in XML --
ENTITIES
"amp" 38
"lt" 60
"gt" 62
"quot" 34
"apos" 39
FEATURES
MINIMIZE
DATATAG NO
OMITTAG NO
RANK NO
SHORTTAG
STARTTAG
EMPTY NO
UNCLOSED NO
NETENABL IMMEDNET
ENDTAG
EMPTY NO
UNCLOSED NO
ATTRIB
DEFAULT YES
OMITNAME NO
VALUE NO
EMPTYNRM YES
IMPLYDEF
ATTLIST YES
DOCTYPE NO
ELEMENT YES
ENTITY NO
NOTATION YES
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL NO
URN NO
KEEPRSRE YES
VALIDITY NOASSERT
ENTITIES
REF ANY
INTEGRAL YES
APPINFO NONE
SEEALSO "ISO 8879//NOTATION Extensible Markup Language (XML)
1.0//EN">
If you are looking for a helper function which will parse virtually any XML, consider the following.
The results are probably more than necessary, but it is easily reduced. I'll often use this in my discovery phase.
Example
Declare #XML xml ='
<document synfileid="MCIESS0044">
<galedata>
<project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project>
</galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>
<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
<para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
</doc.body>
</document>
'
Select *
From [dbo].[tvf-XML-Hier](#XML)
Order By R1
Returns
The TVF if Interested
Full Disclosure: the original source was
http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx ... just made a few tweaks
CREATE FUNCTION [dbo].[tvf-XML-Hier](#XML xml)
Returns Table
As Return
with cte0 as (
Select Lvl = 1
,ID = Cast(1 as int)
,Pt = Cast(NULL as int)
,Element = x.value('local-name(.)','varchar(150)')
,Attribute = cast('' as varchar(150))
,Value = x.value('text()[1]','varchar(max)')
,XPath = cast(concat(x.value('local-name(.)','varchar(max)'),'[' ,cast(Row_Number() Over(Order By (Select 1)) as int),']') as varchar(max))
,Seq = cast(1000000+Row_Number() over(Order By (Select 1)) as varchar(max))
,AttData = x.query('.')
,XMLData = x.query('*')
From #XML.nodes('/*') a(x)
Union All
Select Lvl = p.Lvl + 1
,ID = Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10
,Pt = p.ID
,Element = c.value('local-name(.)','varchar(150)')
,Attribute = cast('' as varchar(150))
,Value = cast( c.value('text()[1]','varchar(max)') as varchar(max) )
,XPath = cast(concat(p.XPath,'/',c.value('local-name(.)','varchar(max)'),'[',cast(Row_Number() Over(PARTITION BY c.value('local-name(.)','varchar(max)') Order By (Select 1)) as int),']') as varchar(max) )
,Seq = cast(concat(p.Seq,' ',10000000+Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10) as varchar(max))
,AttData = c.query('.')
,XMLData = c.query('*')
From cte0 p
Cross Apply p.XMLData.nodes('*') b(c)
)
, cte1 as (
Select R1 = Row_Number() over (Order By Seq),A.*
From (
Select Lvl,ID,Pt,Element,Attribute,Value,XPath,Seq From cte0
Union All
Select Lvl = p.Lvl+1
,ID = p.ID + Row_Number() over (Order By (Select NULL))
,Pt = p.ID
,Element = p.Element
,Attribute = x.value('local-name(.)','varchar(150)')
,Value = x.value('.','varchar(max)')
,XPath = p.XPath + '/#' + x.value('local-name(.)','varchar(max)')
,Seq = cast(concat(p.Seq,' ',10000000+p.ID + Row_Number() over (Order By (Select NULL)) ) as varchar(max))
From cte0 p
Cross Apply AttData.nodes('/*/#*') a(x)
) A
)
Select A.R1
,R2 = IsNull((Select max(R1) From cte1 Where Seq Like A.Seq+'%'),A.R1)
,A.Lvl
,A.ID
,A.Pt
,A.Element
,A.Attribute
,A.XPath
,Title = Replicate('|---',Lvl-1)+Element+IIF(Attribute='','','#'+Attribute)
,A.Value
From cte1 A
/*
Source: http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx
Declare #XML xml='<person><firstname preferred="Annie" nickname="BeBe">Annabelle</firstname><lastname>Smith</lastname></person>'
Select * from [dbo].[tvf-XML-Hier](#XML) Order by R1
*/

Select value from XML node where conditions are met

I have the following xml which I want to query to turn into a flat table structure.
<RiskBoundReportProcess>
<RiskBoundReport>
<Policy key="Pol126446">
<LineOfBusinessCode>ISR</LineOfBusinessCode>
<AssignedIdentifier>
<RoleCode>Insured</RoleCode>
<Id>BP-1438</Id>
</AssignedIdentifier>
<RiskParticipationPercentIndicator>true</RiskParticipationPercentIndicator>
<PolicySection>
<PolicyProducer>
<RoleCode>Coverholder</RoleCode>
<Producer>
<OrganizationReferences organizationReference="Coverholder"/>
</Producer>
</PolicyProducer>
<PolicyProducer>
<RoleCode>Underwriter</RoleCode>
<Producer>
<Contact>
<PersonReferences>
<PersonName>
<FullName>Name</FullName>
</PersonName>
</PersonReferences>
</Contact>
</Producer>
</PolicyProducer>
<PolicyProducer>
<RoleCode>Broker</RoleCode>
<AssignedIdentifier>
<RoleCode>Broker</RoleCode>
<Id>112</Id>
</AssignedIdentifier>
<Producer>
<OrganizationReferences organizationReference="BrokerOrg112"/>
</Producer>
</PolicyProducer>
<PolicyProducer>
<RoleCode>LondonBroker</RoleCode>
<ProducerReferences producerReference="Binder1">
<ExternalIdentifier>
<TypeCode>UniqueMarketReference</TypeCode>
<Id>UMRGoesHere</Id>
</ExternalIdentifier>
<OrganizationReferences>
<OrganizationName>
<FullName>Partners Ltd</FullName>
</OrganizationName>
</OrganizationReferences>
<BindingAuthorityPeriod>
<StartDate>2014-05-01</StartDate>
<EndDate>2015-04-30</EndDate>
</BindingAuthorityPeriod>
</ProducerReferences>
<RiskParticipationPercent>1.0000</RiskParticipationPercent>
</PolicyProducer>
</PolicySection>
</Policy>
</RiskBoundReport>
</RiskBoundReportProcess>
So far I have two columns. I now want to create a column for UniqueMarketReference which is found in the following node
RiskBoundReportProcess/RiskBoundReport/Policy/PolicySection/PolicyProducer[RoleCode="LondonBroker"]/ProducerReferences/ExternalIdentifier/[TypeCode="UniqueMarketReference"]/Id
How do I modifiy the SQL below to include this?
SELECT
Policy.value('#key', 'VARCHAR(50)') AS PolicyId,
Policy.value('LineOfBusinessCode[1]', 'VARCHAR(50)') AS LineOfBusinessCode
FROM #ACORDXML.nodes('/RiskBoundReportProcess/RiskBoundReport') AS RiskBoundReport(Nodes)
CROSS APPLY RiskBoundReport.Nodes.nodes('Policy') AS Policies(Policy)
I will eventually return The Insured, Coverholer and Underwriter as columns too.
You were very close... The XPath you provided has just one / before [TypeCode ...], which is to much.
Try it like this:
SELECT
Policy.value(N'#key', N'VARCHAR(50)') AS PolicyId
,Policy.value(N'LineOfBusinessCode[1]', N'VARCHAR(50)') AS LineOfBusinessCode
,Policy.value(N'(/RiskBoundReportProcess/RiskBoundReport/Policy/PolicySection
/PolicyProducer[RoleCode="LondonBroker"]
/ProducerReferences/ExternalIdentifier[TypeCode="UniqueMarketReference"]/Id/text())[1]','VARCHAR(50)') AS LondonBroker
,Policy.value(N'(/RiskBoundReportProcess/RiskBoundReport/Policy/PolicySection
/PolicyProducer[RoleCode="Underwriter"]
/Producer/Contact/PersonReferences/PersonName/FullName/text())[1]','VARCHAR(50)') AS Underwriter
FROM #ACORDXML.nodes('/RiskBoundReportProcess/RiskBoundReport') AS RiskBoundReport(Nodes)
CROSS APPLY RiskBoundReport.Nodes.nodes('Policy') AS Policies(Policy);
Some of your nodes look like 1:n-related (<PersonReferences><PersonName>...). This might need some extra effort (.nodes())...

Getting XML node value and all nested column nodes

I have here a small extract of the XML files I will have to handle:
<garRoot fileMaster="9371034.0582.30582">
<garTransactions>
<garTransaction InnerTransId="89274503">
<garSection>
<garSectionCounterFName />
<garColumns />
<garSection>
<garSectionName>Header Section</ChapterName>
<garSectionCounterFName />
<garColumns />
<garSection>
<garSectionName>Startup</ChapterName>
<garSectionCounterFName />
<garColumns>
<garColumn>
<garColText>Idea Date:</garColText>
<garColVal>2017-03-22</garColVal>
</garColumn>
<garColumn>
<garColText>Idea Name:</garColText>
<garColVal>The Invisible Cloak</garColVal>
</garColumn>
</garColumns>
</garSection>
I have tried some code to attempt to:
Start with getting the InnerTransId values for each garTransaction:
SELECT
T.value('./#InnerTransID','varchar(50)') As InnerTransID
FROM #XML.nodes('//garTransaction') AS GarT(T)
Because in reality there have been nested garSection in other garSection I've tried to get all garColText and garColVal via garColumn:
SELECT
C.query('./garColText') As cText
, C.query('./garColVal') As cVal
FROM #XML.nodes('//garColumn') as garC(C)
Where I'm having trouble is, as an example I know I have 145 columns for each transaction id but i cannot seem to link the data together, as I would need to have returned:
InnerTransId cText cVal
--------------- ----------- -------------------
89274503 Idea Date: 2017-03-22
89274503 Idea Name: The Invisible Cloak
You can use CROSS APPLY or OUTER APPLY for that purpose :
SELECT
T.value('#InnerTransID','varchar(50)') As InnerTransID
, C.value('garColText[1]','varchar(max)') As cText
, C.value('garColVal[1]','varchar(max)') As cVal
FROM #XML.nodes('//garTransaction') AS GarT(T)
OUTER APPLY T.nodes('.//garColumn') as garC(C)
Notice how nodes('.//garColumn') was called on T, so that the result, garC(C), only contains garColumn that related to current garTransaction.

Querying (a lot of) XML data for specific text with in xml Elements (Tsql)

Sorry: this post got a little long (wanted to make sure everything relevant was in)
Struggling to wrap my brain around this a little.
I have a table that has 5 columns
ID (varchar)
Record value(XML)
date (datetime)
applicationtypeid (int)
applicationstatusid(int)
XML contains alot of data but part i'm interested in looks like this.
<GoodsAndServicesItemSummaryViewModelItems>
<GoodsAndServicesMasterViewModel>
<Id>1</Id>
<ClassId>1</ClassId>
<TermsText>text</TermsText>
<TermsCreationType>ManuallyEntered</TermsCreationType>
<Terms />
Specifically the "TermsCreationType" element. In it can be one of the following 3 strings:
ManuallyEntered
CopyFromExistingMarks
CopyFromPreapprovedTermsDatabase
Depending on what the customer files there could also be a mixture of all three and they could contain as many of these mixtures as they theoretically wanted i.e.:
</PriorityClaimAdditionalDetailWizardStepViewModel>
<GoodsAndServicesMasterViewModel>
<Id>9</Id>
<ClassId>2</ClassId>
<TermsText>texty</TermsText>
<TermsCreationType>ManuallyEntered</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
<GoodsAndServicesMasterViewModel>
<Id>10</Id>
<ClassId>1</ClassId>
<TermsText>text</TermsText>
<TermsCreationType>ManuallyEntered</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
</GoodsAndServicesItemSummaryViewModelItems>
<GoodsAndServicesItemSummaryViewModelItemsUnMerged>
<GoodsAndServicesMasterViewModel>
<Id>9</Id>
<ClassId>9</ClassId>
<TermsText>test</TermsText>
<TermsCreationType>CopyFromExistingMarks</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
I'm trying to find a count of how many records from a certain date range that
ONLY contain one of the aforementioned (whether it appears once or even 50 times as long as that's the only value in that element)
Records that contain any mixture of the three.
My attempt so far, for the "one element only" is thus:
SELECT Count (id) [pre-approved only]
FROM [TMWebForms].[dbo].[webformapplication]
WHERE trademarkid NOT IN (SELECT id
FROM [TMWebForms].[dbo].[webformapplication]
WHERE applicationtypeid = '5'
AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "ManuallyEntered")', 'bit') = 1
AND applicationstatusid = 50
AND applicationtypeid = 5)
AND id NOT IN (SELECT id
FROM [TMWebForms].[dbo].[webformapplication]
WHERE applicationtypeid = '5'
AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "CopyFromExistingMark")', 'bit') = 1
AND applicationstatusid = 50
AND applicationtypeid = 5)
AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "CopyFromPreapprovedTermsDatabase")', 'bit') = 1
AND applicationstatusid = 50
AND applicationtypeid = 5
This is long winded and not the best performance wise by a long shot! (i thought about converting it to a string then using "like" - but that is probably just as bad, if not worse)
It doesn't pull exactly what i wanted back. To me it is searching only for "CopyFromPreapprovedTermsDatabase", but when interrogating the data, There are a few cases where this is the first line but, i can also see "manuallyentered" exists.
Any help appreciated here!
Try it like this
(I had to add a root and add some opening and closing tags)
DECLARE #xml XML=
N'<SomeRoot>
<GoodsAndServicesItemSummaryViewModelItems>
<GoodsAndServicesMasterViewModel>
<Id>9</Id>
<ClassId>2</ClassId>
<TermsText>texty</TermsText>
<TermsCreationType>ManuallyEntered</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
<GoodsAndServicesMasterViewModel>
<Id>10</Id>
<ClassId>1</ClassId>
<TermsText>text</TermsText>
<TermsCreationType>ManuallyEntered</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
</GoodsAndServicesItemSummaryViewModelItems>
<GoodsAndServicesItemSummaryViewModelItemsUnMerged>
<GoodsAndServicesMasterViewModel>
<Id>9</Id>
<ClassId>9</ClassId>
<TermsText>test</TermsText>
<TermsCreationType>CopyFromExistingMarks</TermsCreationType>
<Terms />
</GoodsAndServicesMasterViewModel>
</GoodsAndServicesItemSummaryViewModelItemsUnMerged>
</SomeRoot>';
DECLARE #tbl TABLE(ID VARCHAR(10),Record_Value XML,[date] DATETIME,applicationtypeid INT,applicationstautsid INT);
INSERT INTO #tbl VALUES('SomeTest',#xml,GETDATE(),11,22);
--The CTE will select all columns and add the actual count of the TermsCreationType with the given text()
WITH CTE AS
(
SELECT *
,Record_Value.value('count(//TermsCreationType[text()="ManuallyEntered"])','int') AS CountManuallyEntered
,Record_Value.value('count(//TermsCreationType[text()="CopyFromExistingMarks"])','int') AS CountCopyFromExistingMarks
,Record_Value.value('count(//TermsCreationType[text()="CopyFromPreapprovedTermsDatabase"])','int') AS CountCopyFromPreapprovedTermsDatabase
FROM #tbl
)
--The final SELECT uses as big CASE WHEN hierarchy to analyse the counts
SELECT *
,CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'None'
ELSE
CASE WHEN CTE.CountManuallyEntered>0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'Manually'
ELSE
CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks>0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'Existing'
ELSE
CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase>0 THEN 'Preapproved'
ELSE
'Mixed'
END
END
END
END AS CountAnalysis
FROM CTE;

Resources