Querying Non-XML compliant structured data - sql-server

As a data analyst, I am constantly running across files with structured data that are in some proprietary format and resist normal XML parsing.
For example, I have an archive of about a hundred documents that all begin with this:
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
I have included an abridged example of the document below, don't read it if you're offended by cloning.
At any rate, is there a way to query this without having DTD or namespace or URI or whatever it is I need? I'm ok using SQL Server 2012+ or xquery or, I dunno, php or vba.
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
<document synfileid="MCIESS0044">
<galedata><project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project></galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>
<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
<para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
</doc.body>
</document>

Your SGML input data is almost XML, up to the fact that, unlike full SGML, XML always requires a system identifier (a file name or URL) for the external DTD subset, and not just a public identifier such as -//Gale Research//DTD Document V2.0//EN. So the DOCTYPE declaration must take this form in XML
<!DOCTYPE DOCUMENT
PUBLIC "-//Gale Research//DTD Document V2.0//EN"
"test.dtd">
where I've added "test.dtd" as system identifier/file name of the external subset. Of course, now a test.dtd file must exist. It is sufficient that an empty test.dtd file is created in the working directory, or test.dtd could contain some meaningful declarations for your markup at hand such as the following:
<!ELEMENT document ANY>
<!ELEMENT galedata ANY>
<!ELEMENT project ANY>
<!ELEMENT projectname ANY>
<!ELEMENT title ANY>
<!ELEMENT doc.head ANY>
<!ELEMENT doc.body ANY>
<!ELEMENT para ANY>
<!ELEMENT head ANY>
<!ATTLIST document synfileid CDATA #IMPLIED>
<!ATTLIST head n NUMBER #IMPLIED>
But as you've found out, you can also make XML tools happy by just removing the DOCTYPE line.
Now if you wanted to process your file(s) into XML without manual editing, you could use SGML to pre-process the file(s) into compliant XML, and then use XML query tools against the produced XML.
To do so using the OpenSP/OpenJade SGML package (can be installed on Ubuntu by eg. sudo apt-get install opensp), you'd place a catalog file in the directory, containing the following line telling SGML to resolve the public identifier -//Gale Research//DTD Document V2.0//EN to test.dtd:
PUBLIC "-//Gale Research//DTD Document V2.0//EN" "test.dtd"
You could edit test.dtd from the XML version I gave above to contain tag omission indicators which would be required for classic SGML by default. But there's another feature your test data is using which isn't enabled by default in OpenSP tools, namely support for hexadecimal character entity references such as — in your example data. Hence, we're going to need to use a custom SGML declaration (a somewhat archaic piece of plain text specifying SGML features your document is using) for your data anyway. So while we're at it, we're also going to declare FEATURES MINIMIZE OMITTAG NO in the the SGML declaration, which makes SGML accept element declarations without tag omission indicators so we don't have to change test.dtd from the XML version.
We could place the SGML declaration right into the document itself, but to avoid manual editing. we're going to use catalog resolution for the SGML declaration. If you add the following line to your catalog file
SGMLDECL "xml10-sgmldecl.dcl"
then OpenSP will use whatever is stored in xml10-sgmldecl.dcl as SGML declaration. The actual SGML we're going to use is the official SGML declaration for XML 1.0 (which has all the features we want already). You don't need to understand the meaning of an SGML declaration in detail. Just paste the text attached below into a file named xml10-sgmldecl.dcl; if you're interested in the details, see my description at http://sgmljs.net/docs/sgmlrefman.html#sgml-declaration.
Now you'll be able to invoke
osx <your-file>
to produce XML from your SGML without errors.
<!SGML "ISO 8879:1986 (WWW)"
-- SGML Declaration for XML 1.0 --
-- from:
Final text of revised Web SGML Adaptations Annex (TC2) to ISO 8879:1986
ISO/IEC JTC1/SC34 N0029: 1998-12-06
Annex L.2 (informative): SGML Declaration for XML
changes made to accommodate validation are noted with 'VALID:'
--
CHARSET
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 55136 160
55296 2048 UNUSED -- surrogates --
57344 8190 57344
65534 2 UNUSED -- FFFE and FFFF --
65536 1048576 65536
CAPACITY NONE -- Capacities are not restricted in XML --
SCOPE DOCUMENT
SYNTAX
SHUNCHAR NONE
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 1114112 0
FUNCTION
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING
LCNMSTRT ""
UCNMSTRT ""
NAMESTRT
58 95 192-214 216-246 248-305 308-318 321-328
330-382 384-451 461-496 500-501 506-535 592-680
699-705 902 904-906 908 910-929 931-974 976-982
986 988 990 992 994-1011 1025-1036 1038-1103
1105-1116 1118-1153 1168-1220 1223-1224
1227-1228 1232-1259 1262-1269 1272-1273
1329-1366 1369 1377-1414 1488-1514 1520-1522
1569-1594 1601-1610 1649-1719 1722-1726
1728-1742 1744-1747 1749 1765-1766 2309-2361
2365 2392-2401 2437-2444 2447-2448 2451-2472
2474-2480 2482 2486-2489 2524-2525 2527-2529
2544-2545 2565-2570 2575-2576 2579-2600
2602-2608 2610-2611 2613-2614 2616-2617
2649-2652 2654 2674-2676 2693-2699 2701
2703-2705 2707-2728 2730-2736 2738-2739
2741-2745 2749 2784 2821-2828 2831-2832
2835-2856 2858-2864 2866-2867 2870-2873 2877
2908-2909 2911-2913 2949-2954 2958-2960
2962-2965 2969-2970 2972 2974-2975 2979-2980
2984-2986 2990-2997 2999-3001 3077-3084
3086-3088 3090-3112 3114-3123 3125-3129
3168-3169 3205-3212 3214-3216 3218-3240
3242-3251 3253-3257 3294 3296-3297 3333-3340
3342-3344 3346-3368 3370-3385 3424-3425
3585-3630 3632 3634-3635 3648-3653 3713-3714
3716 3719-3720 3722 3725 3732-3735 3737-3743
3745-3747 3749 3751 3754-3755 3757-3758 3760
3762-3763 3773 3776-3780 3904-3911 3913-3945
4256-4293 4304-4342 4352 4354-4355 4357-4359
4361 4363-4364 4366-4370 4412 4414 4416 4428
4430 4432 4436-4437 4441 4447-4449 4451 4453
4455 4457 4461-4462 4466-4467 4469 4510 4520
4523 4526-4527 4535-4536 4538 4540-4546 4587
4592 4601 7680-7835 7840-7929 7936-7957
7960-7965 7968-8005 8008-8013 8016-8023 8025
8027 8029 8031-8061 8064-8116 8118-8124 8126
8130-8132 8134-8140 8144-8147 8150-8155
8160-8172 8178-8180 8182-8188 8486 8490-8491
8494 8576-8578 12295 12321-12329 12353-12436
12449-12538 12549-12588 19968-40869 44032-55203
LCNMCHAR ""
UCNMCHAR ""
NAMECHAR
45-46 183 720-721 768-837 864-865 903 1155-1158
1425-1441 1443-1465 1467-1469 1471 1473-1474
1476 1600 1611-1618 1632-1641 1648 1750-1764
1767-1768 1770-1773 1776-1785 2305-2307 2364
2366-2381 2385-2388 2402-2403 2406-2415
2433-2435 2492 2494-2500 2503-2504 2507-2509
2519 2530-2531 2534-2543 2562 2620 2622-2626
2631-2632 2635-2637 2662-2673 2689-2691 2748
2750-2757 2759-2761 2763-2765 2790-2799
2817-2819 2876 2878-2883 2887-2888 2891-2893
2902-2903 2918-2927 2946-2947 3006-3010
3014-3016 3018-3021 3031 3047-3055 3073-3075
3134-3140 3142-3144 3146-3149 3157-3158
3174-3183 3202-3203 3262-3268 3270-3272
3274-3277 3285-3286 3302-3311 3330-3331
3390-3395 3398-3400 3402-3405 3415 3430-3439
3633 3636-3642 3654-3662 3664-3673 3761
3764-3769 3771-3772 3782 3784-3789 3792-3801
3864-3865 3872-3881 3893 3895 3897 3902-3903
3953-3972 3974-3979 3984-3989 3991 3993-4013
4017-4023 4025 8400-8412 8417 12293 12330-12335
12337-12341 12441-12442 12445-12446 12540-12542
NAMECASE
GENERAL NO
ENTITY NO
DELIM
GENERAL SGMLREF
HCRO "&#x"
-- Ampersand followed by "#x" (without quotes) --
NESTC "/"
NET ">"
PIC "?>"
SHORTREF NONE
NAMES
SGMLREF
QUANTITY
NONE -- Quantities are not restricted in XML --
ENTITIES
"amp" 38
"lt" 60
"gt" 62
"quot" 34
"apos" 39
FEATURES
MINIMIZE
DATATAG NO
OMITTAG NO
RANK NO
SHORTTAG
STARTTAG
EMPTY NO
UNCLOSED NO
NETENABL IMMEDNET
ENDTAG
EMPTY NO
UNCLOSED NO
ATTRIB
DEFAULT YES
OMITNAME NO
VALUE NO
EMPTYNRM YES
IMPLYDEF
ATTLIST YES
DOCTYPE NO
ELEMENT YES
ENTITY NO
NOTATION YES
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL NO
URN NO
KEEPRSRE YES
VALIDITY NOASSERT
ENTITIES
REF ANY
INTEGRAL YES
APPINFO NONE
SEEALSO "ISO 8879//NOTATION Extensible Markup Language (XML)
1.0//EN">

If you are looking for a helper function which will parse virtually any XML, consider the following.
The results are probably more than necessary, but it is easily reduced. I'll often use this in my discovery phase.
Example
Declare #XML xml ='
<document synfileid="MCIESS0044">
<galedata>
<project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project>
</galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>
<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
<para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
</doc.body>
</document>
'
Select *
From [dbo].[tvf-XML-Hier](#XML)
Order By R1
Returns
The TVF if Interested
Full Disclosure: the original source was
http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx ... just made a few tweaks
CREATE FUNCTION [dbo].[tvf-XML-Hier](#XML xml)
Returns Table
As Return
with cte0 as (
Select Lvl = 1
,ID = Cast(1 as int)
,Pt = Cast(NULL as int)
,Element = x.value('local-name(.)','varchar(150)')
,Attribute = cast('' as varchar(150))
,Value = x.value('text()[1]','varchar(max)')
,XPath = cast(concat(x.value('local-name(.)','varchar(max)'),'[' ,cast(Row_Number() Over(Order By (Select 1)) as int),']') as varchar(max))
,Seq = cast(1000000+Row_Number() over(Order By (Select 1)) as varchar(max))
,AttData = x.query('.')
,XMLData = x.query('*')
From #XML.nodes('/*') a(x)
Union All
Select Lvl = p.Lvl + 1
,ID = Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10
,Pt = p.ID
,Element = c.value('local-name(.)','varchar(150)')
,Attribute = cast('' as varchar(150))
,Value = cast( c.value('text()[1]','varchar(max)') as varchar(max) )
,XPath = cast(concat(p.XPath,'/',c.value('local-name(.)','varchar(max)'),'[',cast(Row_Number() Over(PARTITION BY c.value('local-name(.)','varchar(max)') Order By (Select 1)) as int),']') as varchar(max) )
,Seq = cast(concat(p.Seq,' ',10000000+Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10) as varchar(max))
,AttData = c.query('.')
,XMLData = c.query('*')
From cte0 p
Cross Apply p.XMLData.nodes('*') b(c)
)
, cte1 as (
Select R1 = Row_Number() over (Order By Seq),A.*
From (
Select Lvl,ID,Pt,Element,Attribute,Value,XPath,Seq From cte0
Union All
Select Lvl = p.Lvl+1
,ID = p.ID + Row_Number() over (Order By (Select NULL))
,Pt = p.ID
,Element = p.Element
,Attribute = x.value('local-name(.)','varchar(150)')
,Value = x.value('.','varchar(max)')
,XPath = p.XPath + '/#' + x.value('local-name(.)','varchar(max)')
,Seq = cast(concat(p.Seq,' ',10000000+p.ID + Row_Number() over (Order By (Select NULL)) ) as varchar(max))
From cte0 p
Cross Apply AttData.nodes('/*/#*') a(x)
) A
)
Select A.R1
,R2 = IsNull((Select max(R1) From cte1 Where Seq Like A.Seq+'%'),A.R1)
,A.Lvl
,A.ID
,A.Pt
,A.Element
,A.Attribute
,A.XPath
,Title = Replicate('|---',Lvl-1)+Element+IIF(Attribute='','','#'+Attribute)
,A.Value
From cte1 A
/*
Source: http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx
Declare #XML xml='<person><firstname preferred="Annie" nickname="BeBe">Annabelle</firstname><lastname>Smith</lastname></person>'
Select * from [dbo].[tvf-XML-Hier](#XML) Order by R1
*/

Related

How do i extract xml data stored as data in a table column with a complex formed sql

I have XML data that comes in from a third party (so I cannot change the XML format) and gets stored in a table.
An example XML message format
<MedicalAidMessage xmlns="test.co.za/messaging" version="6.0">
<BenefitCheckResponseMessage>
<FinancialResponseDetails>
<FinancialResponseLines>
<FinancialResponseLine>
<LineIdentifier>1</LineIdentifier>
<RequestedAmount>1000</RequestedAmount>
<AmountToProvider>800</AmountToProvider>
<AmountToMember>0</AmountToMember>
<MemberLiableAmount>200</MemberLiableAmount>
<MemberNotLiableAmount>0</MemberNotLiableAmount>
<TariffCode>12345</TariffCode>
<LineResponseCodes>
<LineResponseCode>
<Sequence>1</Sequence>
<Code>274</Code>
<Description>We have not paid the amount claimed because the funds in the Medical Savings are used up.</Description>
<Type>Info</Type>
</LineResponseCode>
<LineResponseCode>
<Sequence>2</Sequence>
<Code>1239</Code>
<Description>We have applied a co-payment on this claim, in line with the member’s plan benefit for MRI and CT scans.</Description>
<Type>Info</Type>
</LineResponseCode>
</LineResponseCodes>
</FinancialResponseLine>
</FinancialResponseLines>
</FinancialResponseDetails>
</BenefitCheckResponseMessage>
</MedicalAidMessage>
I have tried the following code to get the Tariffcode, RequestedAmount, AmountToProvider and AmountToMember
DECLARE #XMLData XML = (select top 1 replace(br.xmlmessage,'<?xml version="1.0" encoding="UTF-8"?>','')
from benefitcheckresponse br
inner join benefitcheck bc on bc.id = br.BenefitCheckId
where bc.id =1562
order by bc.id desc)
SELECT
[TariffCode] = Node.Data.value('TariffCode', 'vacrhar(50)'),
[RequestedAmount] = Node.Data.value('RequestedAmount', 'float)'),
[AmountToProvider] = Node.Data.value('AmountToProvider', 'float)'),
[AmountToMember] = Node.Data.value('AmountToMember', 'float)')
FROM #XMLData.nodes('/*/FinancialResponseLine/') Node(Data)
The problem I am getting is that it is giving me the following error message
Msg 9341, Level 16, State 1, Line 12 XQuery [nodes()]: Syntax error
near '', expected a step expression.
How do I resolve that error?
How would I include the results for multiple lines when there multiple line responses?
How would I include the values from the sub nodes to the line response nodes?
You would be better off with syntax like this, if I am honest. I wasn't able to complete the ORDER BY clause for you, as I don't know what column to order by, but as you have a TOP (1) you need one. You also need to define your XML namespace, which you have not, which would have resulted in no rows being returned:
WITH XMLNAMESPACES(DEFAULT 'test.co.za/messaging')
SELECT TOP (1)
FRL.FRL.value('(TariffCode/text())[1]','int') AS TariffCode,
FRL.FRL.value('(RequestedAmount/text())[1]','int') AS RequestedAmount, --I doubt float is the correct data type here
FRL.FRL.value('(AmountToProvider/text())[1]','int') AS AmountToProvider, --I doubt float is the correct data type here
FRL.FRL.value('(AmountToMember/text())[1]','int') AS AmountToMember --I doubt float is the correct data type here
FROM dbo.benefitcheckresponse br
INNER JOIN benefitcheck bc ON bc.id = br.BenefitCheckId
CROSS APPLY br.xmlmessage.nodes('MedicalAidMessage/BenefitCheckResponseMessage/FinancialResponseDetails/FinancialResponseLines/FinancialResponseLine') FRL(FRL)
WHERE bc.id = 1562
ORDER BY {Column Name(s)};

SQL Server geography: Reduce size (decimal precision) of WKT text

For my farming app, a stored proc retrieves paddock/field boundaries that are stored as SQL Server geography data type, for display on the user's mobile device.
SQL Server's .ToString() and .STAsText() functions render each vertex as a lat/long pair, to 15 decimal places. From this answer, 15 decimals defines a location to within the width of an atom! To the nearest metre would be good enough for me.
The resulting overly-precise payload is extremely large and too slow for use on larger farms.
From my SQL Server geography data, I would like to produce WKT that is formatted to 4 or 5 decimals. I could find no built-in methods, but my best leads are:
Postgis and Google Cloud "BigQuery" have the ST_SNAPTOGRID function, which would be perfect, and
Regex might be useful, e.g. this answer, but SQL Server doesn't seem to have a regex replace.
I would think this is a common problem: is there a simple solution?
By stepping through #iamdave's excellent answer, and using that same approach, it looks like we need split only on periods.... I think we can ignore all parantheses and commas, and ignore the POLYGON prefix (which means it'll work across other GEOGRAPHY types, such as MULTIPOLYGON.)
i.e. Each time we find a period, grab only the next 4 characters after it, and throw away any numbers after that (until we hit a non-number.)
This works for me (using #iamdave's test data):
DECLARE #wkt NVARCHAR(MAX), #wktShort NVARCHAR(MAX);
DECLARE #decimalPlaces int = 4;
SET #wkt = 'POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))';
-- Split on '.', then get the next N decimals, and find the index of the first non-number.
-- Then recombine the fragments, skipping the unwanted numbers.
WITH points AS (
SELECT value, LEFT(value, #decimalPlaces) AS decimals, PATINDEX('%[^0-9]%', value) AS indx
FROM STRING_SPLIT(#wkt, '.')
)
SELECT #wktShort = STRING_AGG(IIF(indx < #decimalPlaces, '', decimals) + SUBSTRING(value, indx, LEN(value)), '.')
FROM points;
Comparing original vs shortened, we can see that each number is truncated to 4dp:
SELECT #wkt AS Text UNION ALL SELECT #wktShort;
Edit
I believe I may have misunderstood your question and you are looking to transmit the WKT rather than the binary representations of the polygons? If that is the case, my answer below still shows you how to chop off some decimal places (without rounding them). Just don't wrap the stuff(...) FOR XML in a STGeomFromText and you have the amended WKT.
When working with geography data types, it can be handy to maintain a very detailed 'master' version, from which you generate and persist less detailed versions based on your requirements.
An easy way to generate these reduced complexity polygons is to use the helpfully named Reduce function, which I think will actually help you in this situation.
If you would prefer to go down the route of reducing the number of decimal places you will either have to write a custom CLR function or enter the wonderful world of SQL Server string manipulation!
SQL Query
declare #DecimalPlaces int = 4; -- Specify the desired number of lat/long decimals
with g as(
select p.g -- Original polygon, for comparison purposes
,geography::STGeomFromText('POLYGON((' -- stripped apart and then recreated polygon from text, using a custom string split function. You won't be able to use the built in STRING_SPLIT here as it doesn't guarantee sort order.
+ stuff((select ', ' + left(s.item,charindex('.',s.item,0) + #DecimalPlaces) + substring(s.item,charindex(' ',s.item,0),charindex('.',s.item,charindex(' ',s.item,0)) - charindex(' ',s.item,0) + 1 + #DecimalPlaces)
from dbo.fn_StringSplitMax(replace(replace(p.g.STAsText(),'POLYGON ((',''),'))',''),', ',null) as s
for xml path(''), type).value('.', 'NVARCHAR(MAX)') -- STUFF and FOR XML mimics GROUP_CONCAT functionality seen in other SQL languages, to recombine shortened Points back into a Polygon string
,1,2,''
)
+ '))', 4326).MakeValid() as x -- Remember to make the polygon valid again, as you have been messing with the Point data
from(values(geography::STGeomFromText('POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))', 4326))) as p(g)
)
-- select various versions of the polygons into the same column for overlay comparison in SSMS
select 'Original' as l
,g
from g
union all
select 'Short' as l
,x
from g
union all
select 'Original Reduced' as l
,g.Reduce(10)
from g
union all
select 'Short Reduced' as l
,x.Reduce(10)
from g;
Output
What is interesting to note here is the difference in length of the geog binary representation (simple count of characters as displayed). As I mentioned above, simply using the Reduce function may do what you need, so you will want to test various approaches to see how best you cut down on your data transfer.
+------------------+--------------------+------+
| l | g | Len |
+------------------+--------------------+------+
| Original | 0xE6100000010484...| 4290 |
| Short | 0xE6100000010471...| 3840 |
| Original Reduced | 0xE6100000010418...| 834 |
| Short Reduced | 0xE610000001041E...| 1184 |
+------------------+--------------------+------+
Visual Comparison
String Split Function
As polygon data can be bloody huge, you will need a string splitter that can handle more then 4k or 8k characters. In my case I tend to opt for an xml based approach:
create function [dbo].[fn_StringSplitMax]
(
#str nvarchar(max) = ' ' -- String to split.
,#delimiter as nvarchar(max) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
with s as
( -- Convert the string to an XML value, replacing the delimiter with XML tags
select convert(xml,'<x>' + replace((select #str for xml path('')),#delimiter,'</x><x>') + '</x>').query('.') as s
)
select rn
,item -- Select the values from the generated XML value by CROSS APPLYing to the XML nodes
from(select row_number() over (order by (select null)) as rn
,n.x.value('.','nvarchar(max)') as item
from s
cross apply s.nodes('x') as n(x)
) a
where rn = #num
or #num is null;

Coldfusion - How to parse and segment out data from an email file

I am trying to parse email files that will be coming periodically for data that is contained within. We plan to setup cfmail to get the email within the box within CF Admin to run every minute.
The data within the email consists of name, code name, address, description, etc. and will have consistent labels so we are thinking of performing a loop or find function for each field of data. Would that be a good start?
Here is an example of email data:
INCIDENT # 12345
LONG TERM SYS# C12345
REPORTED: 08:39:34 05/20/19 Nature: FD NEED Address: 12345 N TEST LN
City: Testville
Responding Units: T12
Cross Streets: Intersection of: N Test LN & W TEST LN
Lat= 39.587453 Lon= -86.485021
Comments: This is a test post. Please disregard
Here's a picture of what the data actually looks like:
So we would like to extract the following:
INCIDENT
LONG TERM SYS#
REPORTED
Nature
Address
City
Responding Units
Cross Streets
Comments
Any feedback or suggestions would be greatly appreciated!
Someone posted this but it was apparently deleted. Whoever it was I want to thank you VERY MUCH as it worked perfectly!!!!
Here is the function:
<!---CREATE FUNCTION [tvf-Str-Extract] (#String varchar(max),#Delimiter1
varchar(100),#Delimiter2 varchar(100))
Returns Table
As
Return (
with cte1(N) as (Select 1 From (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))
N(N)),
cte2(N) as (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By
(Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A
),
cte3(N) as (Select 1 Union All Select t.N+DataLength(#Delimiter1) From cte2 t
Where Substring(#String,t.N,DataLength(#Delimiter1)) = #Delimiter1),
cte4(N,L) as (Select S.N,IsNull(NullIf(CharIndex(#Delimiter1,#String,s.N),0)-
S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(#Delimiter2,RetVal)-1)
From ( Select *,RetVal = Substring(#String, N, L) From cte4 ) A
Where charindex(#Delimiter2,RetVal)>1
)
And here is the CF code that worked:
<cfquery name="body" datasource="#Application.dsn#">
Declare #S varchar(max) ='
INCIDENT 12345
LONG TERM SYS C12345
REPORTED: 08:39:34 05/20/19 Nature: FD NEED Address: 12345 N TEST
LN City: Testville
Responding Units: T12
Cross Streets: Intersection of: N Test LN & W TEST LN
Lat= 39.587453 Lon= -86.485021
Comments: This is a test post. Please disregard
'
Select Incident = ltrim(rtrim(B.RetVal))
,LongTerm = ltrim(rtrim(C.RetVal))
,Reported = ltrim(rtrim(D.RetVal))
,Nature = ltrim(rtrim(E.RetVal))
,Address = ltrim(rtrim(F.RetVal))
,City = ltrim(rtrim(G.RetVal))
,RespUnit = ltrim(rtrim(H.RetVal))
,CrossStr = ltrim(rtrim(I.RetVal))
,Comments = ltrim(rtrim(J.RetVal))
From (values (replace(replace(#S,char(10),''),char(13),' ')) )A(S)
Outer Apply [dbo].[tvf-Str-Extract](S,'INCIDENT' ,'LONG
TERM' ) B
Outer Apply [dbo].[tvf-Str-Extract](S,'LONG TERM SYS'
,'REPORTED' ) C
Outer Apply [dbo].[tvf-Str-Extract](S,'REPORTED:' ,'Nature'
) D
Outer Apply [dbo].[tvf-Str-Extract](S,'Nature:'
,'Address' ) E
Outer Apply [dbo].[tvf-Str-Extract](S,'Address:' ,'City'
) F
Outer Apply [dbo].[tvf-Str-Extract](S,'City:'
,'Responding ') G
Outer Apply [dbo].[tvf-Str-Extract](S,'Responding Units:','Cross'
) H
Outer Apply [dbo].[tvf-Str-Extract](S,'Cross Streets:' ,'Lat'
) I
Outer Apply [dbo].[tvf-Str-Extract](S+'|||','Comments:' ,'|||'
) J
</cfquery>
<cfoutput>
B. #body.Incident#<br>
C. #body.LongTerm#<br>
D. #body.Reported#<br>
SQL tends to have limited string functions, so it isn't the best tool for parsing. If the email content is always in that exact format, you could use either plain string functions or regular expressions to parse it. However, the latter is more flexible.
I suspect the content actually does contain new lines, which would make for simpler parsing. However, if you prefer searching for content in between two labels, regular expressions would do the trick.
Build an array of the label names (only). Loop through the array, grabbing a pair of labels: "current" and "next". Use the two values in a regular expression to extract the text in between them:
label &"\s*[##:=](.*?)"& nextLabel
/* Explanation: */
label - First label name (example: "Incident")
\s* - Zero or more spaces
[##:=] - Any of these characters: pound sign, colon or equal sign
(.*?) - Group of zero or more characters (non-greedy)
nextLabel - Next label (example: "Long Term Sys")
Use reFindNoCase() to get details about the position and length of matched text. Then use those values in conjunction with mid() to extract the text.
Note, newer versions like ColdFusion 2016+ automagically extract the text under a key named MATCH
The newer CF2016+ syntax is slicker, but something along these lines works under CF10:
emailBody = "INCIDENT # 12345 ... etc.... ";
labelArray = ["Incident", "Long Term Sys", "Reported", ..., "Comments" ];
for (pos = 1; pos <= arrayLen(labelArray); pos++) {
// get current and next label
hasNext = pos < arrayLen(labelArray);
currLabel = labelArray[ pos ];
nextLabel = (hasNext ? labelArray[ pos+1 ] : "$");
// extract label and value
matches = reFindNoCase( currLabel &"\s*[##:=](.*?)"& nextLabel, emailBody, 1, true);
if (arrayLen(matches.len) >= 2) {
results[ currLabel ] = mid( emailBody, matches.pos[2], matches.len[2]);
}
}
writeDump( results );
Results:

Pull all of the child XML nodes in SQL Server

I have a SQL Server 2014 database that stores 2 million XML files. The XML file looks like:
<?xml version='1.0' encoding='UTF-16'?>
<PROJECTS xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044978</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS>
My problem is that I want to pull all of the PI values based upon the ORG_DUNS numbers in a stored procedure. So the code that I have is:
SELECT
APPLICATION_ID,
nref.value('.','varchar(max)') TERM
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS/PI') AS R(nref)
WHERE
RECID = 1
And that work fine when I use a WHERE cause based up another field in the database but if I need to reference a node in the xml file that's where I'm having a problem. I need to pull all the the XML files that have ORG_DUNS equal to 600044978 and I know that the nref.value('ORG_DUNS[1]', 'varchar(max)') does not exist because of the cross apply.
SELECT
APPLICATION_ID,
nref.value('.','varchar(max)') TERM
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS/PI') as R(nref)
WHERE
nref.value('ORG_DUNS[1]', 'varchar(max)') = '600044978'
So how can I get all of the PI nodes using the ORG_DUNS as my WHERE? Thanks
Change your Cross Apply statement to include the filter logic in the XPath:
CROSS APPLY XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)
To explain, //PIS[../ORG_DUNS/text() = ''600044978'']/PI says:
//PIS - find all elements called PIS
[...] - filter the returned elements for those matching this condition
../ORG_DUNS/text() = ''600044978'' - Condition: the PIS's parent element's ORG_DUNS's text value equals 600044978.
Then return the child PI elements of any matching PIS elements.
Update per comments
Full SQL, including PI and PI_ID as separate values:
SELECT
APPLICATION_ID
, nref.value('(./PI_NAME/text())[1]','varchar(max)') PI
, nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS[../ORG_DUNS/text() = ''600044978'']/PI') AS R(nref)
WHERE
RECID = 1
Notes:
./PI_NAME - from the currently selected element (i.e. the one refered to by the nref column; which is pointing at a PI element), take its child element, PI_NAME.
/text() - from the PI_NAME element, take its child text element (strictly this is not required, since when pulling back the value & converting to an varchar we'd get the same result; but I like to be explicit).
(...)[1] - return a singleton. i.e. we only want 1 value back, even if there were multiple PI_NAME elements under the current PI. By putting brackets around our expression we're saying "for all values returned by this expression"; and [1] says take the first result (since XPATH uses one-based indexes rather than zero-based as most other languages would).
Filtering with a variable
Anticipating your next issue; i.e. "How do you change the number at runtime without building dynamic SQL?", the answer's to use the sql:variable function:
declare #DunningNumber int = 600044978 --9 digit code http://www.dnb.com/duns-number.html; so can easily hold in an int: https://learn.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql
SELECT
APPLICATION_ID
, nref.value('(./PI_NAME/text())[1]','varchar(max)') PI
, nref.value('(./PI_ID/text())[1]','varchar(max)') PI_ID
INTO
ADMIN_MUSC_RePORTER_TERMS
FROM
[ADMIN_Grant_Exporter_Files_XML]
CROSS APPLY
XMLData.nodes('//PIS[../ORG_DUNS/text() = sql:variable("#DunningNumber")]/PI') AS R(nref)
WHERE
RECID = 1
The trick here is to first grab multiple levels in the document using multiple CROSS APPLY clauses. So first, starting from the root grab all the '/PROJECTS/row' and then use a relative path from each of those 'PIS/PI'.
Like this:
declare #t table(id int identity, APPLICATION_ID int default (2), XmlData xml)
insert into #t(XmlData) values (
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044978</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> '),(
'<PROJECTS xmlns:xsi=''http://www.w3.org/2001/XMLSchema-instance''>
<row>
<APPLICATION_ID>7975883</APPLICATION_ID>
<ACTIVITY>N01</ACTIVITY>
<ADMINISTERING_IC>HL</ADMINISTERING_IC>
<APPLICATION_TYPE xsi:nil="true"/>
<ARRA_FUNDED>N</ARRA_FUNDED>
<PIS>
<PI>
<PI_NAME>MICHEL, MARY Q</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
<PI>
<PI_NAME>SMITH, ROBERT B</PI_NAME>
<PI_ID>3704354</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN A</PI_NAME>
<PI_ID>3704353</PI_ID>
</PI>
</PIS>
<ORG_DUNS>600044979</ORG_DUNS>
<ORG_COUNTRY>UNITED STATES</ORG_COUNTRY>
<ORG_DISTRICT>08</ORG_DISTRICT>
<ORG_ZIPCODE>208523003</ORG_ZIPCODE>
</row>
</PROJECTS> ')
select APPLICATION_ID,
pinode.value('PI_NAME[1]','varchar(max)') PI_NAME,
pinode.value('PI_ID[1]','varchar(max)') PI_ID
FROM #t
cross apply XMLData.nodes('/PROJECTS/row') as r(rownode)
cross apply rownode.nodes('PIS/PI') as p(pinode)
where rownode.value('ORG_DUNS[1]','varchar(max)') = '600044978'

SQL Query Parent Child Full Path from table

I have a table listing the parent child relationship for each element like this:
ParentID ParentTitle ChildId ChildTitle
----------------------------------------------
843 Documents 38737 Jobs
843 Documents 52537 Tools
843 Documents 5763 SecondOps
843 Documents 4651 Materials
38737 Jobs 16619 Job001
38737 Jobs 16620 Job002
38737 Jobs 16621 Job003
38737 Jobs 16622 Job004
38737 Jobs 16623 Job005
52537 Tools 1952 HandTools
52537 Tools 1953 Automated
52537 Tools 1957 Custom
1952 HandTools 12 Cordless10mm
1952 HandTools 13 Cordless8mm
1952 HandTools 14 CableCrimp
1952 HandTools 15 Cutter
1952 HandTools 16 EdgePlane
5763 SecondOps 101 Procedure001
5763 SecondOps 102 Procedure002
5763 SecondOps 103 Procedure003
4651 Materials 33576 Raw
4651 Materials 33577 Mixed
4651 Materials 33578 Hybrid
4651 Materials 33579 Custom
16622 Job004 101 Procedure001
16622 Job004 14 CableCrimp
16622 Job004 15 Cutter
16622 Job004 4651 Mixed
16623 Job005 102 Procedure002
16623 Job005 103 Procedure003
16623 Job005 16619 Job001
16623 Job005 1953 Automated
16623 Job005 33579 Custom
16623 Job005 33576 Raw
I would like to get the full path of each Combination using the IDs, for example
Documents\Jobs\Job003 = 843\38737\16621
Another example would be "Procedure001" which is listed in 2 places
Documents\SecondOps\Procedure001 = 843\5763\101
The same document is also referenced here:
Documents\Jobs\Job004\Procedure001 = 843\38737\16622\101
I'd like to take this table and build a TreeView in .NET. So having the full path for each item would make it a cake walk.
Otherwise, I was thinking that I could start at the Root page and keep recursing through the parents, building a child list, then recursing those, etc.
Is there a better way to query this to build those paths? This list has 400,000 records so if there is a more efficient way it would save time
This was all originally in an AS400 system DB until 2000ish then made into a MediaWiki site. I am pulling the data via the api with the intent of building an interface for a SQL Server database.
I can do basic SQL queries, joins, unions, etc.
Let me know what other info I can provide if this isn't clear
You could use INNER JOIN and LEFT JOIN if you are using SQL SERVER MS, and here are how the query look like, which will give you the full result (combination) based on your requirement:
SELECT A.ParentTitle + '\'+B.ParentTitle+
CASE WHEN C.ParentTitle IS NOT NULL THEN '\' +C.ParentTitle
ELSE ''
END
+
' =' + A.ParentID + '\'+B.ParentID+
CASE WHEN C.ParentID IS NOT NULL THEN '\' +C.ParentID
ELSE ''
END
FROM TABLE AS A
INNER JOIN TABLE AS B
ON B.ParentID = A.ChildId
LEFT JOIN TABLE AS C
ON C.ParentID = B.ChildId
Not 100% sure whether it will work as I expected or not, please give it a try xD
A tree structure means Recursion for a generic solution.
Pls, don't try this in sql. Just take datarow from sql into a list or something like and make populate with recursion in a programming language.
Your tree class wil be like :
public class MyObj {
public int Id {get; set;}
public string Title {get; set;}
public List<MyObj> {get; set; } = null;
}
0.You table its pretty wrong. The corect way will be :
CREATE TABLE Jobs(
Id int not null primary key,
Title nvarchar(255) not null,
StartTime datetime,--optional maybe will help
ParentId int null --can be null root will have no parent
)
But I will try to explain on your table how it's done.
I will suppose that you have some kind datacontext (DBML,EDMX etc.)
Find root or roots. In your case root will those nr that are on ParentID and are not on the ChildId.
Query that will list your roots:
SELECT DISTINCT a.ParentId FROM
YourTable a LEFT JOIN
YourTable b ON a.ParentId=b.ChildId
WHERE b.ParentId is null
Make a recursive procedure that will retrive your data in a class structure as above(MyObj).
procedure MyObj GetTree(int id, db){
if (db.YourTable.Any(r => r.ParentId==Id)){
var q=db.YourTable.Select(r => r.ParentId==Id).ToList();
var result = new MyObj{
Id = q.ParentId,
Title = q.ParentTitle,
Children = new List<MyObj>()
}
foreach( var el in q) {
if (db.YourTable.Any(r => r.ParentId==el.ChildId))
result.Children.Add(GetTree(el.ChildId,db))
else
result.Children.Add( new MyObj{
Id = el.ChildId,
Title = el.ChildTitle,
Children = null
});
return result;
}
}
return null;
}
make trees with list Id from point 1 stored in a list let's say ListIds you will do something like that:
List finaltrees = new List()
Ids.ForEach(id => finaltrees.Add(GetTree(id,dbcontext));
Now you have a tree structure in finaltrees.
PS:
I wrote the code directly in browser (C#),there can be some typos error.
So to elaborate on what I am trying to do, I'm working with a wiki version that doesn't use namespaces to establish document paths.
For example if a page is 3 levels deep on a document tree like this
RootPage
Page01
Page02
Page03
Page04
Using the Namespace approach Page03's Name(Path) is "RootPage:Page01:Page02:Page03"
I would Like to do the same thing with the PageIDs
So given this example you would have
PageTitle PageId Path
RootPage 001 001
Page01 101 001:101
Page02 201 001:101:201
Page03 301 001:101:201:301
Page04 302 001:101:201:302
So now All I have to do is Put the PagePath together.
There are several challenges to Consider with this wiki
No 2 documents can have the same TITLE
Document IDs are basically
irrelevant, but handy in this case(at least in the version I am
working on)
Thankfully there is a list of Pages and their "Links" or
Child Pages. I believe you would call it a MANY to MANY
The Key Point to remember is even if a page is listed as a child of many other pages, Only one really exists and I only need one of them in the results.
So Using LONG's example here is where I've gotten to
Using this Table:
CREATE Table [dbo].[ExampleTable](
[RecordID] Int IDENTITY (1, 1) Not NULL,
[ParentID] Int Not NULL,
[ParentTitle] VARCHAR(800) NULL,
[ChildID] Int Not NULL,
[ChildTitle] VARCHAR(800) NULL,
PRIMARY KEY CLUSTERED ([RecordID] ASC),);
This Data:
INSERT INTO [dbo].[ExampleTable]
([ParentID]
,[ParentTitle]
,[ChildID]
,[ChildTitle])
VALUES
(843,'Documents',38737,'Jobs'),
(843,'Documents',52537,'Tools'),
(843,'Documents',5763,'SecondOps'),
(843,'Documents',4651,'Materials'),
(38737,'Jobs',16619,'Job001'),
(38737,'Jobs',16620,'Job002'),
(38737,'Jobs',16621,'Job003'),
(38737,'Jobs',16622,'Job004'),
(38737,'Jobs',16623,'Job005'),
(52537,'Tools',1952,'HandTools'),
(52537,'Tools',1953,'Automated'),
(52537,'Tools',1957,'Custom'),
(1952,'HandTools',12,'Cordless10mm'),
(1952,'HandTools',13,'Cordless8mm'),
(1952,'HandTools',14,'CableCrimp'),
(1952,'HandTools',15,'Cutter'),
(1952,'HandTools',16,'EdgePlane'),
(5763,'SecondOps',101,'Procedure001'),
(5763,'SecondOps',102,'Procedure002'),
(5763,'SecondOps',103,'Procedure003'),
(4651,'Materials',33576,'Raw'),
(4651,'Materials',33577,'Mixed'),
(4651,'Materials',33578,'Hybrid'),
(4651,'Materials',33579,'Custom'),
(16622,'Job004',101,'Procedure001'),
(16622,'Job004',14,'CableCrimp'),
(16622,'Job004',15,'Cutter'),
(16622,'Job004',4651,'Mixed'),
(16623,'Job005',102,'Procedure002'),
(16623,'Job005',103,'Procedure003'),
(16623,'Job005',16619,'Job001'),
(16623,'Job005',1953,'Automated'),
(16623,'Job005',33579,'Custom'),
(16623,'Job005',33576,'Raw')
GO
And This Query, Which I modified from LONG's example:
SELECT DISTINCT C.ChildTitle as PageTitle, convert(varchar(20),A.ParentID) + ':' + convert(varchar(20),B.ParentID) +
CASE WHEN C.ParentID IS NOT NULL THEN ':' + convert(varchar(20),C.ParentID)
ELSE ''
END
+
CASE WHEN C.ChildID IS NOT NULL THEN ':' + convert(varchar(20),C.ChildID)
ELSE ''
END
FROM ExampleTable AS A
INNER JOIN ExampleTable AS B
ON B.ParentID = A.ChildId
LEFT JOIN ExampleTable AS C
ON C.ParentID = B.ChildId
ORDER By PageTitle
I get These Results:
PageTitle UnNamed
NULL 16622:4651
NULL 38737:16622
NULL 38737:16623
NULL 52537:1952
NULL 843:38737
NULL 843:4651
NULL 843:52537
NULL 843:5763
Automated 843:38737:16623:1953
CableCrimp 843:38737:16622:14
CableCrimp 843:52537:1952:14
Cordless10mm 843:52537:1952:12
Cordless8mm 843:52537:1952:13
Custom 38737:16622:4651:33579
Custom 843:38737:16623:33579
Cutter 843:38737:16622:15
Cutter 843:52537:1952:15
EdgePlane 843:52537:1952:16
Hybrid 38737:16622:4651:33578
Job001 843:38737:16623:16619
Mixed 38737:16622:4651:33577
Mixed 843:38737:16622:4651
Procedure001 843:38737:16622:101
Procedure002 843:38737:16623:102
Procedure003 843:38737:16623:103
Raw 38737:16622:4651:33576
Raw 843:38737:16623:33576
What I'd like to get is a SINGLE occurance of each page, Regarless of which Parent it happens to be found
Then I can use these Paths to turn the Virtual Tree Structure into an actual Tree Structure.
The Last Issue is that the actual Link List is VERY similar to the example I created, except that it has 400,000 records.
When I run this query against the actual "Link List" it runs for about 17 minutes and runs out of memory.
I've been researching the MAXRECURSION option, but I am still working on it, don't know if that is problem or not.

Resources