SSIS: Getting parent row field in sub-row output in XML Source - sql-server

I have an SSIS Package which reads XML file using XML Source Component.
This XML File has two outputs. One is for "Invoice" and other is for "InvoiceDetail"
The structure of the XML File is like this.
<my:myFields>
<my:group1>
<my:Invoice>
<my:field1>1</my:field1>
<my:field2>2014-11-11</my:field2>
<my:field3>33370</my:field3>
<my:Group2>
<my:InvoiceDetail>
<my:Sub6 xsi:nil="true">100</my:Sub6>
<my:Sub7 xsi:nil="true">Charges</my:Sub7>
<my:Sub8>140</my:Sub8>
<my:Sub9 xsi:nil="true">78</my:Sub9>
<my:Sub10 xsi:nil="true">0</my:Sub10>
<my:Sub12>0</my:Sub12>
</my:InvoiceDetail>
</my:Group2>
<my:field18></my:field18>
</my:Invoice>
</my:group1>
</my:myFields>
I can get all fields of Invoice and InvoiceDetail in seperate outputs.
But, I cannot join these rows since InvoiceDetail doesn't have the ID (field1) which links to the Invoice.
Is there any idea to get the InvoiceID field also with the InvoiceDetail output ?

It Can be possible by XSLT transformation.
Create XSLT schema then give xml,flat file parameter to C# script xml transformation

Related

Loading the data into Netezza database using xquery from XML source

I have a source as XML and has a huge number of records. just for the sample I have pasted 1 record below :
<?xml version='1.0' encoding='UTF-8'?><wd:Report_Data xmlns:wd="urn:com.workday.report/BCF-Termination-Details">
<wd:Report_Entry>
<wd:Worker>
<wd:Associate_ID>997215</wd:Associate_ID>
<wd:Total_Base_Pay_Amount>13</wd:Total_Base_Pay_Amount>
<wd:Total_Base_Pay_Currency wd:Descriptor="USD"><wd:ID wd:type="WID">9e996ffdd3e14da0ba7275d5400bafd4</wd:ID><wd:ID wd:type="Currency_ID">USD</wd:ID><wd:ID wd:type="Currency_Numeric_Code">840</wd:ID></wd:Total_Base_Pay_Currency>
<wd:Length_of_Service_-_Position>0 year(s), 4 month(s), 7 day(s)</wd:Length_of_Service_-_Position>
</wd:Worker>
<wd:Time_Type wd:Descriptor="Part time"><wd:ID wd:type="WID">3baf0a7f595210daec53e26fa7476d5b</wd:ID><wd:ID wd:type="Position_Time_Type_ID">Part_time</wd:ID></wd:Time_Type>
<wd:Hire_Date>2022-05-25-07:00</wd:Hire_Date>
<wd:Termination_Date>2022-10-02-07:00</wd:Termination_Date>
<wd:Date_Initiated>2022-10-28T17:39:53.943-07:00</wd:Date_Initiated>
<wd:Termination_Category>Voluntary</wd:Termination_Category>
<wd:Termination_Reason>Job Abandonment</wd:Termination_Reason>
<wd:Length_of_Service_in_Days>130</wd:Length_of_Service_in_Days>
<wd:workdayID>f415ada264f1100211408522a0e00000</wd:workdayID>
</wd:Report_Entry></wd:Report_Data>
I need this to implement in ETL. Using xml as source need few of the columns from the xml and load into the database. I am new to xquery, so need to know how can we start it. I am doing the POC on this.
If you want to extract the values from xml source you can try using the XML functions from SQLEXT Toolkit package which can be installed on top of Netezza.
Here is an example of fetching the associate_id from the xml source. You can enter the extracted value in a table.
select xmlextractvalue(xmlparse(ele_string),'/Report_Entry/Worker/Associate_ID') as associate_id from (select replace(element,'wd:','') as ele_string from t1) as foo;
ASSOCIATE_ID
--------------
997215
(1 row)

Extracting XML in a column from a SQL Server database

I have read dozens of posts and have tried numerous SQL queries to try and get this figured out. Sadly, I'm not a SQL expert (not even a novice) nor am I an XML expert. I understand basic queries from SQL, and understand XML tags, mostly.
I'm trying to query a database table, and have the data show a list of values from a column that contains XML. I'll give you an example of the data. I won't burden you with everything I have tried.
Here is an example of field inside of the column I need. So this is just one row, I would need to query the whole table to get all of the data I need.
When I select * from [table name] it returns hundreds of rows and when I double click in the column name of 'Document' on one row, I get the information I need.
It looks like this:
<code_set xmlns="">
<name>ExampleCodeTable</name>
<last_updated>2010-08-30T17:49:58.7919453Z</last_updated>
<code id="1" last_updated="2010-01-20T17:46:35.1658253-07:00"
start_date="1998-12-31T17:00:00-07:00"
end_date="9999-12-31T16:59:59.9999999-07:00">
<entry locale="en-US" name="T" description="Test1" />
</code>
<code id="2" last_updated="2010-01-20T17:46:35.1658253-07:00"
start_date="1998-12-31T17:00:00-07:00"
end_date="9999-12-31T16:59:59.9999999-07:00">
<entry locale="en-US" name="Z" description="Test2" />
</code>
<displayExpression>[Code] + ' - ' + [Description]</displayExpression>
<sortColumn>[Description]</sortColumn>
</code_set>
Ideally I would write it so it runs the query on the table and produces results like this:
Code Description
--------------------
(Data) (Data)
Any ideas? Is it even possible? The dozens of things I have tried that are always posted in stack, either return Nulls or fail.
Thanks for your help
Try something like this:
SELECT
CodeSetId = xc.value('#id', 'int'),
Description = xc.value('(entry/#description)[1]', 'varchar(50)')
FROM
dbo.YourTableNameHere
CROSS APPLY
YourXmlColumn.nodes('/code_set/code') AS XT(XC)
This basically uses the built-in XQuery to get an "in-memory" table (XT) with a single column (XC), each containing an XML fragment that represents each <code> node inside your <code_set> root node.
Once you have each of these XML fragments, you can use the .value() XQuery operator to "reach in" and grab some pieces of information from it, e.g. it's #id (attribute by the name of id), or the #description attribute on the contained <entry> subelement.
The following query will read the xml field in every row, then shred certain values into a tabular result set.
SELECT
-- get attribute [attribute name] from the parent node
parent.value('./#attribute name','varchar(max)') as ParentAttributeValue,
-- get the text value of the first child node
child.value('./text()', 'varchar(max)') as ChildNodeValueFromFirstChild,
-- get attribute attribute [attribute name] from the first child node
child.value('./#attribute name', 'varchar(max)') as ChildAttributeValueFromFirstChild
FROM
[table name]
CROSS APPLY
-- create a handle named parent that references that <parent node> in each row
[xml field name].nodes('//xpath to parent name') AS ParentName(parent)
CROSS APPLY
-- create a handle named child that references first <child node> in each row
parent.nodes('(xpath from parent/to child)[0]') AS FirstChildNode(child)
GO
Please provide the exact values you want to shred from the XML for a more precise answer.

Creating XML Schema for Bulk Load to SQL Server - Child Element Describes Parent

I have an XML document that I'm working to build a schema for in order to bulk load these documents into a SQL Server table. The XML I'm focusing on looks like this:
<Coverage>
<CoverageCd>BI</CoverageCd>
<CoverageDesc>BI</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>30000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerPerson</LimitAppliesToCd>
</Limit>
<Limit>
<FormatCurrencyAmt>
<Amt>85000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerAcc</LimitAppliesToCd>
</Limit>
</Coverage>
<Coverage>
<CoverageCd>PD</CoverageCd>
<CoverageDesc>PD</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>50000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>Coverage</LimitAppliesToCd>
</Limit>
</Coverage>
Inside the Limit element, there's a child LimitAppliesToCd that I need to use to determine where the Amt element's value actually gets stored inside my table. Is this possible to do using the standard XML Bulk Load feature of SQL Server? Normally in XML I'd expect that the element would have an attribute containing the "PerPerson" or "PerAcc" information, but this standard we're using does not call for that.
If anyone has worked with the ACORD standard before, you might know what I'm working with here. Any help is greatly appreciated.
Don't know exactly what you are talking about, but this is a solution to get the information out of your XML.
Assumption: Your XML is already bulk-loaded into a declared variable #xml of type XML:
A CTE will pull the information out of your XML. The final query will then use PIVOT to put your data into the right column.
With a fitting table's structure the actual insert should be simple...
WITH DerivedTable AS
(
SELECT cov.value('CoverageCd[1]','varchar(max)') AS CoverageCd
,cov.value('CoverageDesc[1]','varchar(max)') AS CoverageDesc
,lim.value('(FormatCurrencyAmt/Amt)[1]','decimal(14,4)') AS Amt
,lim.value('LimitAppliesToCd[1]','varchar(max)') AS LimitAppliesToCd
FROM #xml.nodes('/root/Coverage') AS A(cov)
CROSS APPLY cov.nodes('Limit') AS B(lim)
)
SELECT p.*
FROM
(SELECT * FROM DerivedTable) AS tbl
PIVOT
(
MIN(Amt) FOR LimitAppliesToCD IN(PerPerson,PerAcc,Coverage)
) AS p

Backup table to xml or csv file, content is long text

I have a table that contains content (like blog posts, so fairly long text) that I want to export to a xml file.
So I want it like:
<table>
<column1>1231</column1>
<column2>January 1, 2001</column2>
<column3>some very long text will all types of characters in it</column3>
</table>
Is there a built in way to do this?
Basically each column will have its own element.
The content should ideally be CDATA since the content can contain any type of character potentially.
I have sql server 2008 express.
From SQL Server 2005, the FOR XML clause provides a way to convert the results of an SQL query to XML.
E.g.
Consider a table building with Blgd, Suit, SQFT, PDate columns.
SELECT * FROM building FOR XML AUTO
will convert the contents of table to the following XML:
<building Bldg="1" SUit="1" SQFT="1000" PDate="2012-09-24T00:00:00" />
<building Bldg="1" SUit="1" SQFT="1500" PDate="2011-12-31T00:00:00" />
If you want the columns to be elements, then
SELECT * FROM building FOR XML AUTO, ELEMENTS
would convert the contents to following XML:
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1000</SQFT>
<PDate>2012-09-24T00:00:00</PDate>
</building>
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1500</SQFT>
<PDate>2011-12-31T00:00:00</PDate>
</building>
If you want to model your text fields as CDATA sections, then you should use the FOR XML EXPLICIT clause and define your XML schema as per the guidelines here.
If the above Building table has a text_col column of type TEXT that should be modeled as CDATA section in the generated XML, then the SELECT query would be as follows:
SELECT
1 as Tag,
NULL as Parent,
Bldg AS [Building!1!Bldg!ELEMENT],
text_col AS [Building!1!!CDATA]
FROM Building
WHERE text_col IS NOT NULL
FOR XML EXPLICIT
The results would be as follows:
<Building><Bldg>1</Bldg><![CDATA[From SQL Server 2005, the FOR XML clause provides a way to convert the results of an SQL query to XML.
E.g. Consider a table building with Blgd, Suit, SQFT, PDate columns.
SELECT * FROM building FOR XML AUTO
will convert the contents of table to the following XML:
<building Bldg="1" SUit="1" SQFT="1000" PDate="2012-09-24T00:00:00" />
<building Bldg="1" SUit="1" SQFT="1500" PDate="2011-12-31T00:00:00" />
If you want the columns to be elements, then
SELECT * FROM building FOR XML AUTO, ELEMENTS
would convert the contents to following XML:
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1000</SQFT>
<PDate>2012-09-24T00:00:00</PDate>
</building>
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1500</SQFT>
<PDate>2011-12-31T00:00:00</PDate>
</building>]]></Building>
You can use the FOR XML SQL construct to do this. Please read here

Generate XML in proper syntax from SQL Server table

How to write a SQL statement to generate XML like this
<ROOT>
<Production.Product>
<ProductID>1 </ProductID>
<Name>Adjustable Race</Name>
........
</Production.Product>
</ROOT>
Currently I am getting this with
SELECT * FROM Production.Product
FOR XML auto
Result is:
<ROOT>
<Production.Product ProductID="1" Name="Adjustable Race"
ProductNumber="AR-5381" MakeFlag="0" FinishedGoodsFlag="0"
SafetyStockLevel="1000" ReorderPoint="750" StandardCost="0.0000"
ListPrice="0.0000" DaysToManufacture="0" SellStartDate="1998-06-01T00:00:00"
rowguid="694215B7-08F7-4C0D-ACB1-D734BA44C0C8"
ModifiedDate="2004-03-11T10:01:36.827" />
One simple way would be to use:
SELECT *
FROM Production.Product
FOR XML AUTO, ELEMENTS
Then, your data should be stored in XML elements inside the <Production.Product> node.
If you need even more control, then you should look at the FOR XML PATH syntax - check out this MSDN article on What's new in FOR XML in SQL Server 2005 which explains the FOR XML PATH (among other new features).
Basically, with FOR XML PATH, you can control very easily how things are rendered - as elements or as attributes - something like:
SELECT
ProductID AS '#ProductID', -- rendered as attribute on XML node
Name, ProductNumber, -- all rendered as elements inside XML node
.....
FROM Production.Product
FOR XML PATH('NewProductNode') -- define a new name for the XML node
This would give you something like:
<NewProductNode ProductID="1">
<Name>Adjustabel Race</Name>
<ProductNumber>AR-5381</ProductNumber>
.....
</NewProductNode>

Resources