Loading the data into Netezza database using xquery from XML source - database

I have a source as XML and has a huge number of records. just for the sample I have pasted 1 record below :
<?xml version='1.0' encoding='UTF-8'?><wd:Report_Data xmlns:wd="urn:com.workday.report/BCF-Termination-Details">
<wd:Report_Entry>
<wd:Worker>
<wd:Associate_ID>997215</wd:Associate_ID>
<wd:Total_Base_Pay_Amount>13</wd:Total_Base_Pay_Amount>
<wd:Total_Base_Pay_Currency wd:Descriptor="USD"><wd:ID wd:type="WID">9e996ffdd3e14da0ba7275d5400bafd4</wd:ID><wd:ID wd:type="Currency_ID">USD</wd:ID><wd:ID wd:type="Currency_Numeric_Code">840</wd:ID></wd:Total_Base_Pay_Currency>
<wd:Length_of_Service_-_Position>0 year(s), 4 month(s), 7 day(s)</wd:Length_of_Service_-_Position>
</wd:Worker>
<wd:Time_Type wd:Descriptor="Part time"><wd:ID wd:type="WID">3baf0a7f595210daec53e26fa7476d5b</wd:ID><wd:ID wd:type="Position_Time_Type_ID">Part_time</wd:ID></wd:Time_Type>
<wd:Hire_Date>2022-05-25-07:00</wd:Hire_Date>
<wd:Termination_Date>2022-10-02-07:00</wd:Termination_Date>
<wd:Date_Initiated>2022-10-28T17:39:53.943-07:00</wd:Date_Initiated>
<wd:Termination_Category>Voluntary</wd:Termination_Category>
<wd:Termination_Reason>Job Abandonment</wd:Termination_Reason>
<wd:Length_of_Service_in_Days>130</wd:Length_of_Service_in_Days>
<wd:workdayID>f415ada264f1100211408522a0e00000</wd:workdayID>
</wd:Report_Entry></wd:Report_Data>
I need this to implement in ETL. Using xml as source need few of the columns from the xml and load into the database. I am new to xquery, so need to know how can we start it. I am doing the POC on this.

If you want to extract the values from xml source you can try using the XML functions from SQLEXT Toolkit package which can be installed on top of Netezza.
Here is an example of fetching the associate_id from the xml source. You can enter the extracted value in a table.
select xmlextractvalue(xmlparse(ele_string),'/Report_Entry/Worker/Associate_ID') as associate_id from (select replace(element,'wd:','') as ele_string from t1) as foo;
ASSOCIATE_ID
--------------
997215
(1 row)

Related

SQL Server FOR XML PATH carriage return after each root node

I am using FOR XML PATH in SQL Server 2014 to generate an XML file to send to one of our vendors. Their system requires that each root node be separated by a carriage return / line break. Here is the T-SQL code I'm using to generate it:
Declare #xmldata xml
set #xmldata =
(SELECT a.StatementDate AS [stmt_date]
,a.CustomerID AS [student_id]
,'Upon Receipt' AS [due_date]
,a.TotalDue AS [curr_bal]
,a.TotalDue AS [total_due]
,a.AlternateID AS [alternate_id]
,a.FullName AS [student_name]
,a.Email AS [student_email]
,a.Addr1
,a.Addr2
,a.Msg AS [message]
,(
SELECT b.StatementDate AS [activity_date]
,b.ActivityDesc AS [activity_desc]
,b.TermBalance AS [charge]
FROM #ActivityXML AS b
WHERE a.CustomerID = b.CustomerID
ORDER BY a.StatementDate
FOR XML PATH('activity'),TYPE
)
FROM #BillingStatement AS a
FOR XML PATH('Billing'))
select #xmldata as returnXml
This works great, but returns one long string with no separation between nodes at all. (I would post an example but it would just look like a jumbled up mess in here.)
Anyhow, what we need is to generate a file where each <Billing> tag and contents within is placed on a new line after a closing </Billing> tag. I would guess there's a simple solution, such as inserting char(13)+char(10) somewhere in the code, but I've been unable to get that working. Is it possible or will I need to do it in another system?
Based on responses here and research elsewhere, this is not possible using just T-SQL. We would need to either copy / paste the output, or use another program to take the data and insert line breaks.
From #Shnugo - "The pretty print of XML is not supported natively within T-SQL. You might use a CLR method, a service or any kind of post processing with a physically stored file. You might open the XML from grid-results' xml viewer and copy-paste the output to a text editor. Don't forget to set the XML size for grid result to unlimited, if your XML is big."

Creating XML Schema for Bulk Load to SQL Server - Child Element Describes Parent

I have an XML document that I'm working to build a schema for in order to bulk load these documents into a SQL Server table. The XML I'm focusing on looks like this:
<Coverage>
<CoverageCd>BI</CoverageCd>
<CoverageDesc>BI</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>30000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerPerson</LimitAppliesToCd>
</Limit>
<Limit>
<FormatCurrencyAmt>
<Amt>85000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>PerAcc</LimitAppliesToCd>
</Limit>
</Coverage>
<Coverage>
<CoverageCd>PD</CoverageCd>
<CoverageDesc>PD</CoverageDesc>
<Limit>
<FormatCurrencyAmt>
<Amt>50000.00</Amt>
</FormatCurrencyAmt>
<LimitAppliesToCd>Coverage</LimitAppliesToCd>
</Limit>
</Coverage>
Inside the Limit element, there's a child LimitAppliesToCd that I need to use to determine where the Amt element's value actually gets stored inside my table. Is this possible to do using the standard XML Bulk Load feature of SQL Server? Normally in XML I'd expect that the element would have an attribute containing the "PerPerson" or "PerAcc" information, but this standard we're using does not call for that.
If anyone has worked with the ACORD standard before, you might know what I'm working with here. Any help is greatly appreciated.
Don't know exactly what you are talking about, but this is a solution to get the information out of your XML.
Assumption: Your XML is already bulk-loaded into a declared variable #xml of type XML:
A CTE will pull the information out of your XML. The final query will then use PIVOT to put your data into the right column.
With a fitting table's structure the actual insert should be simple...
WITH DerivedTable AS
(
SELECT cov.value('CoverageCd[1]','varchar(max)') AS CoverageCd
,cov.value('CoverageDesc[1]','varchar(max)') AS CoverageDesc
,lim.value('(FormatCurrencyAmt/Amt)[1]','decimal(14,4)') AS Amt
,lim.value('LimitAppliesToCd[1]','varchar(max)') AS LimitAppliesToCd
FROM #xml.nodes('/root/Coverage') AS A(cov)
CROSS APPLY cov.nodes('Limit') AS B(lim)
)
SELECT p.*
FROM
(SELECT * FROM DerivedTable) AS tbl
PIVOT
(
MIN(Amt) FOR LimitAppliesToCD IN(PerPerson,PerAcc,Coverage)
) AS p

SSIS: Getting parent row field in sub-row output in XML Source

I have an SSIS Package which reads XML file using XML Source Component.
This XML File has two outputs. One is for "Invoice" and other is for "InvoiceDetail"
The structure of the XML File is like this.
<my:myFields>
<my:group1>
<my:Invoice>
<my:field1>1</my:field1>
<my:field2>2014-11-11</my:field2>
<my:field3>33370</my:field3>
<my:Group2>
<my:InvoiceDetail>
<my:Sub6 xsi:nil="true">100</my:Sub6>
<my:Sub7 xsi:nil="true">Charges</my:Sub7>
<my:Sub8>140</my:Sub8>
<my:Sub9 xsi:nil="true">78</my:Sub9>
<my:Sub10 xsi:nil="true">0</my:Sub10>
<my:Sub12>0</my:Sub12>
</my:InvoiceDetail>
</my:Group2>
<my:field18></my:field18>
</my:Invoice>
</my:group1>
</my:myFields>
I can get all fields of Invoice and InvoiceDetail in seperate outputs.
But, I cannot join these rows since InvoiceDetail doesn't have the ID (field1) which links to the Invoice.
Is there any idea to get the InvoiceID field also with the InvoiceDetail output ?
It Can be possible by XSLT transformation.
Create XSLT schema then give xml,flat file parameter to C# script xml transformation

Backup table to xml or csv file, content is long text

I have a table that contains content (like blog posts, so fairly long text) that I want to export to a xml file.
So I want it like:
<table>
<column1>1231</column1>
<column2>January 1, 2001</column2>
<column3>some very long text will all types of characters in it</column3>
</table>
Is there a built in way to do this?
Basically each column will have its own element.
The content should ideally be CDATA since the content can contain any type of character potentially.
I have sql server 2008 express.
From SQL Server 2005, the FOR XML clause provides a way to convert the results of an SQL query to XML.
E.g.
Consider a table building with Blgd, Suit, SQFT, PDate columns.
SELECT * FROM building FOR XML AUTO
will convert the contents of table to the following XML:
<building Bldg="1" SUit="1" SQFT="1000" PDate="2012-09-24T00:00:00" />
<building Bldg="1" SUit="1" SQFT="1500" PDate="2011-12-31T00:00:00" />
If you want the columns to be elements, then
SELECT * FROM building FOR XML AUTO, ELEMENTS
would convert the contents to following XML:
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1000</SQFT>
<PDate>2012-09-24T00:00:00</PDate>
</building>
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1500</SQFT>
<PDate>2011-12-31T00:00:00</PDate>
</building>
If you want to model your text fields as CDATA sections, then you should use the FOR XML EXPLICIT clause and define your XML schema as per the guidelines here.
If the above Building table has a text_col column of type TEXT that should be modeled as CDATA section in the generated XML, then the SELECT query would be as follows:
SELECT
1 as Tag,
NULL as Parent,
Bldg AS [Building!1!Bldg!ELEMENT],
text_col AS [Building!1!!CDATA]
FROM Building
WHERE text_col IS NOT NULL
FOR XML EXPLICIT
The results would be as follows:
<Building><Bldg>1</Bldg><![CDATA[From SQL Server 2005, the FOR XML clause provides a way to convert the results of an SQL query to XML.
E.g. Consider a table building with Blgd, Suit, SQFT, PDate columns.
SELECT * FROM building FOR XML AUTO
will convert the contents of table to the following XML:
<building Bldg="1" SUit="1" SQFT="1000" PDate="2012-09-24T00:00:00" />
<building Bldg="1" SUit="1" SQFT="1500" PDate="2011-12-31T00:00:00" />
If you want the columns to be elements, then
SELECT * FROM building FOR XML AUTO, ELEMENTS
would convert the contents to following XML:
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1000</SQFT>
<PDate>2012-09-24T00:00:00</PDate>
</building>
<building>
<Bldg>1</Bldg>
<SUit>1</SUit>
<SQFT>1500</SQFT>
<PDate>2011-12-31T00:00:00</PDate>
</building>]]></Building>
You can use the FOR XML SQL construct to do this. Please read here

Generate XML in proper syntax from SQL Server table

How to write a SQL statement to generate XML like this
<ROOT>
<Production.Product>
<ProductID>1 </ProductID>
<Name>Adjustable Race</Name>
........
</Production.Product>
</ROOT>
Currently I am getting this with
SELECT * FROM Production.Product
FOR XML auto
Result is:
<ROOT>
<Production.Product ProductID="1" Name="Adjustable Race"
ProductNumber="AR-5381" MakeFlag="0" FinishedGoodsFlag="0"
SafetyStockLevel="1000" ReorderPoint="750" StandardCost="0.0000"
ListPrice="0.0000" DaysToManufacture="0" SellStartDate="1998-06-01T00:00:00"
rowguid="694215B7-08F7-4C0D-ACB1-D734BA44C0C8"
ModifiedDate="2004-03-11T10:01:36.827" />
One simple way would be to use:
SELECT *
FROM Production.Product
FOR XML AUTO, ELEMENTS
Then, your data should be stored in XML elements inside the <Production.Product> node.
If you need even more control, then you should look at the FOR XML PATH syntax - check out this MSDN article on What's new in FOR XML in SQL Server 2005 which explains the FOR XML PATH (among other new features).
Basically, with FOR XML PATH, you can control very easily how things are rendered - as elements or as attributes - something like:
SELECT
ProductID AS '#ProductID', -- rendered as attribute on XML node
Name, ProductNumber, -- all rendered as elements inside XML node
.....
FROM Production.Product
FOR XML PATH('NewProductNode') -- define a new name for the XML node
This would give you something like:
<NewProductNode ProductID="1">
<Name>Adjustabel Race</Name>
<ProductNumber>AR-5381</ProductNumber>
.....
</NewProductNode>

Resources