Using SQL to transpose/flatten XML structure to columns - sql-server

I am using SQL Server (2008/2012) and I know there are similar answers from lots of searching, however I can't seem to find the appropriate example/pointers for my case.
I have an XML column in a SQL Server table holding this data:
<Items>
<Item>
<FormItem>
<Text>FirstName</Text>
<Value>My First Name</Value>
</FormItem>
<FormItem>
<Text>LastName</Text>
<Value>My Last Name</Value>
</FormItem>
<FormItem>
<Text>Age</Text>
<Value>39</Value>
</FormItem>
</Item>
<Item>
<FormItem>
<Text>FirstName</Text>
<Value>My First Name 2</Value>
</FormItem>
<FormItem>
<Text>LastName</Text>
<Value>My Last Name 2</Value>
</FormItem>
<FormItem>
<Text>Age</Text>
<Value>40</Value>
</FormItem>
</Item>
</Items>
So even though the structure of <FormItem> is going to be the same, I can have multiple (most commonly no more than 20-30) sets of form items..
I am essentially trying to return a query from SQL in the format below, i.e. dynamic columns based on /FormItem/Text:
FirstName LastName Age ---> More columns as new `<FormItem>` are returned
My First Name My Last Name 39 Whatever value etc..
My First Name 2 My Last Name 2 40
So, at the moment I had the following:
select
Tab.Col.value('Text[1]','nvarchar(100)') as Question,
Tab.Col.value('Value[1]','nvarchar(100)') as Answer
from
#Questions.nodes('/Items/Item/FormItem') Tab(Col)
Of course that hasn't transposed my XML rows into columns, and obviously is fixed with fields anyway.. I have been trying various "Dynamic SQL" approaches where the SQL performs a distinct selection of (in my case) the <Text> node, and then uses some sort of Pivot? but I couldn't seem to find the magic combination to return the results I need as a dynamic set of columns for each row (<Item> within the collection of <Items>).
I'm sure it can be done having seen so many very similar examples, however again the solution eludes me!
Any help gratefully received!!

Parsing the XML is fairly expensive so instead of parsing once to build a dynamic query and once to get the data you can create a temporary table with a Name-Value list and then use that as the source for a dynamic pivot query.
dense_rank is there to create the ID to pivot around.
To build the column list in the dynamic query it uses the for xml path('') trick.
This solution requires that your table has a primary key (ID). If you have the XML in a variable it can be somewhat simplified.
select dense_rank() over(order by ID, I.N) as ID,
F.N.value('(Text/text())[1]', 'varchar(max)') as Name,
F.N.value('(Value/text())[1]', 'varchar(max)') as Value
into #T
from YourTable as T
cross apply T.XMLCol.nodes('/Items/Item') as I(N)
cross apply I.N.nodes('FormItem') as F(N)
declare #SQL nvarchar(max)
declare #Col nvarchar(max)
select #Col =
(
select distinct ','+quotename(Name)
from #T
for xml path(''), type
).value('substring(text()[1], 2)', 'nvarchar(max)')
set #SQL = 'select '+#Col+'
from #T
pivot (max(Value) for Name in ('+#Col+')) as P'
exec (#SQL)
drop table #T
SQL Fiddle

select Tab.Col.value('(FormItem[Text = "FirstName"]/Value)[1]', 'varchar(32)') as FirstName,
Tab.Col.value('(FormItem[Text = "LastName"]/Value)[1]', 'varchar(32)') as LastName,
Tab.Col.value('(FormItem[Text = "Age"]/Value)[1]', 'int') as Age
from #Questions.nodes('/Items/Item') Tab(Col)

I wanted to add my "own answer" really just for completeness to possibly help others.. however it is most definitely based on the great help from #Mikael above!! so again, this is really for completeness only - all kudos to #Mikael.
Basically I ended up with the following proc. I needed to select some data/filter, and get some joined data too and allow some boolean filtering on some of the input params. Then drop into the next section which was create a temp table of my relational data and the required xml nodes via the cross apply. The final step was to then pivot the results/dynamically create the columns from the selected XML node..
CREATE PROCEDURE [dbo].[usp_RPT_ExtractFlattenentries]
#CompanyID int,
#MainSelector nvarchar(50) = null,
#SecondarySelector nvarchar(255) = null,
#DateFrom datetime = '01-jan-2012',
#DateTo datetime = '31-dec-2100',
#SysReference nvarchar(20) = null
AS
BEGIN
SET NOCOUNT ON;
-- Create the table var to hold the XML form data from the entries
declare #FeedbackXml table (
ID int identity primary key,
XMLCol xml,
CompanyName nvarchar(20),
SysReference nvarchar(20),
RecordDate datetime,
EntryName nvarchar(255),
MainSelector nvarchar(50)
)
-- STEP 1: Get the raw submission data based on the params passed in
-- *Note: The double casting is necessary as the "form" field is nvarchar (not varchar) and we need xml in UTF-8 format
begin
insert into #FeedbackXml
(XMLCol, CompanyName, SysReference, RecordDate, EntryName, MainSelector)
select cast(cast(e.form as nvarchar(max)) as xml), c.name, e.SysReference, e.RecordDate, e.name, e.wizard
from
entries s
left join
companies o on e.companies = c.ID
where
(#CompanyID = -1 or #CompanyID = e.companies)
and
(#MainSelector is null or #MainSelector = e.wizard)
and
(#SecondarySelector is null or #SecondarySelector = e.name)
and
(#SysReference is null or #SysReference = e.SysReference)
and
(e.RecordDate >= #DateFrom and e.RecordDate <= #DateTo)
end
-- STEP 2: Flatten the required XML structure to provide a base for the pivot, and include other fields we wish to output
select dense_rank() over(order by ID) as ID,
T.RecordDate, T.CompanyName, T.SysReference, T.EntryName, T.MainSelector,
F.N.value('(FieldNameNode/text())[1]', 'nvarchar(max)') as FieldName,
F.N.value('(FieldNameValue/text())[1]', 'nvarchar(max)') as FieldValue
into #TempData
from #FeedbackXml as T
cross apply T.XMLCol.nodes('/root/companies/') as I(N) -- Xpath to the desired node start point
cross apply I.N.nodes('company') as F(N) -- The actual node collection that forms the "field name" and "field value" data
-- STEP 3: Pivot the #TempData table creating a dynamic column structure based on the selected XML nodes in step 2
declare #SQL nvarchar(max)
declare #Col nvarchar(max)
select #Col =
(
select distinct ','+quotename(FieldName)
from #TempData
for xml path(''), type
).value('substring(text()[1], 2)', 'nvarchar(max)')
set #SQL = 'select CompanyName, SysReference, EntryName, MainSelector, RecordDate, '+#Col+'
from #TempData
pivot (max(FieldValue) for FieldName in ('+#Col+')) as P'
exec (#SQL)
drop table #TempData
END
Again, really only added this answer to provide a complete picture from my perspective, and may help others.

Related

How to extract schema from XML variable using XQuery

Technologies: T-SQL, XML, XQuery
I have an XML #variable in a database table which has a schema section and data section. I would only like to extra only the schema section and create a XML Schema Collection for it. It appears XQuery would be the quickest way. How do I specify the starting tag and ending tag in the following file (I only want to extract everything between <xs:schema xmlns and </xs:schema>?
CREATE FUNCTION [etl].[ufn_GetXmlSchema]
(
#DataLakeBlobId uniqueidentifier
)
RETURNS xml
AS
BEGIN
DECLARE #XmlSchema xml
,#XmlData xml
SET #XmlSchema = ( SELECT [XmlData]
FROM [landing].[v_tbForm] WITH (NOLOCK)
WHERE [DataLakeBlobId] = #DataLakeBlobId
)
--RETURN #XmlSchema.query('</xs:schema>')-- missing matching begin tag
--RETURN #XmlSchema.query('<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet">')-- Expected end tag 'xs:schema'
RETURN #XmlSchema.query('<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="NewDataSet"></xs:schema>')-- nothing in between was returned
END
GO
SELECT [etl].[ufn_GetXmlSchema]('A257667D-C3AA-471C-9F82-91FA35181833')
Any help is appreciated.
While waiting for a real scenario, here is a good jump start for you. As end result, it creates an XML Schema Collection named dbo.StateAndCities.
SQL
USE tempdb;
GO
-- DDL and sample data population, start
IF EXISTS (SELECT * FROM sys.xml_schema_collections
WHERE name = N'StateAndCities'
AND schema_id = SCHEMA_ID(N'dbo'))
DROP XML SCHEMA COLLECTION dbo.StateAndCities;
DECLARE #tbl TABLE (
ID INT IDENTITY PRIMARY KEY
, state CHAR(2)
, city VARCHAR(30)
);
INSERT INTO #tbl (state, city)
VALUES
('FL', 'Miami')
, ('CA', 'Los Angeles')
, ('TX', 'Austin');
-- DDL and sample data population, end
DECLARE #xml XML
, #XSD XML;
-- Generate XML plus embedded XSD schema
SET #xml = (SELECT NULL,
(
SELECT *
FROM #tbl AS [row]
FOR XML AUTO, ELEMENTS, TYPE, XMLSCHEMA('MyURI'))
FOR XML PATH(''), TYPE, ROOT('root')
);
-- just to see, XML plus embedded XSD schema
SELECT #xml;
-- retrive just XSD
;WITH xmlnamespaces ('http://www.w3.org/2001/XMLSchema' AS xsd)
SELECT #xsd = (SELECT #xml.query('/root/xsd:schema'));
-- just to see, XSD schema
SELECT #xsd AS xsd;
-- create schema collection
CREATE XML SCHEMA COLLECTION dbo.StateAndCities AS #xsd;

What is easiest and optimize way to find specific value from database tables?

As per my requirement, I have to find if some words like xyz#test.com value exists in which tables of columns. The database size is very huge and more than 2500 tables.
Can anyone please provide an optimal way to find this type of value from the database. I've created a loop query which took around almost more than 9 hrs to run.
9 hours is clearly a long time. Furthermore, 2,500 tables seems close to insanity for me.
Here is one approach that will run 1 query per table, not one per column. Now I have no idea how this will perform against 2,500 tables. I suspect it may be horrible. That said I would strongly suggest a test filter first like Table_Name like 'OD%'
Example
Declare #Search varchar(max) = 'cappelletti' -- Exact match '"cappelletti"'
Create Table #Temp (TableName varchar(500),RecordData xml)
Declare #SQL varchar(max) = ''
Select #SQL = #SQL+ ';Insert Into #Temp Select TableName='''+concat(quotename(Table_Schema),'.',quotename(table_name))+''',RecordData = (Select A.* for XML RAW) From '+concat(quotename(Table_Schema),'.',quotename(table_name))+' A Where (Select A.* for XML RAW) like ''%'+#Search+'%'''+char(10)
From INFORMATION_SCHEMA.Tables
Where Table_Type ='BASE TABLE'
and Table_Name like 'OD%' -- **** Would REALLY Recommend a REASONABLE Filter *** --
Exec(#SQL)
Select A.TableName
,B.*
,A.RecordData
From #Temp A
Cross Apply (
Select ColumnName = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From A.RecordData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('.','varchar(max)') Like '%'+#Search+'%'
) B
Drop Table #Temp
Returns
If it Helps, the individual queries would look like this
Select TableName='[dbo].[OD]'
,RecordData= (Select A.* for XML RAW)
From [dbo].[OD] A
Where (Select A.* for XML RAW) like '%cappelletti%'
On a side-note, you can search numeric data and even dates.
Make a procedure with VARCHAR datatype of column with table name and store into the temp table from system tables.
Now make one dynamic Query with executing a LOOP on each record with = condition with input parameter of email address.
If condition is matched in any statement using IF EXISTS statement, then store that table name and column name in another temp table. and retrieve the list of those records from temp table at end of the execution.

Including a sometimes missing column in TSQL if it's there (or alternative if it's not)

I have a weird table that I have to deal with that sometimes has a disappearing column. If the column is there, I need to use it. But if not, I need to account for that and use an alternative. But when I try this code when the column is missing, SSMS throws an error (Invalid column name 'DOB'). Shouldn't this short circuit if the column isn't there and never get to the part where it calls the column? So why the error message? Any solutions? Thanks in advance for any help!
SELECT
SalesClients.ClientName,
(CASE
WHEN (COL_LENGTH('dbo.SalesClients', 'DOB') IS NULL)
THEN DATEADD(month, -SalesClients.AgeInMonth, GETDATE())
WHEN SalesClients.DOB IS NULL
THEN DATEADD(month, -SalesClients.AgeInMonth, GETDATE())
ELSE SalesClients.DOB
END) AS DOB
FROM
dbo.SalesClients AS SalesClients
If performance matters (especially with many rows and large columns) you probably have to use dynamic SQL, but there is another approach using XML's generic abilities.
This won't be fast, but it can be fully inlined (in a VIEW or iTVF).
SELECT * FOR XML RAW
will generate an XML, where every row is one element <row>, while the columns are attributes. This allows a generic approach like here:
DECLARE #tbl TABLE(ID INT IDENTITY,SomeString VARCHAR(100));
INSERT INTO #tbl VALUES('test1'),('test2');
DECLARE #tbl2 TABLE(ID INT IDENTITY,SomeString VARCHAR(100),DOB DATE);
INSERT INTO #tbl2 VALUES('test1','20180101'),('test2','20180202');
--try this with #tbl and with #tbl2, it works in both cases
SELECT r.value('#ID','int') AS ID
,r.value('#DOB','date') AS DOB
FROM
(SELECT * FROM #tbl2 FOR XML RAW, TYPE) A(x)
CROSS APPLY x.nodes('/row') B(r);
The FOR XML RAW will generate something like this
<row ID="1" SomeString="test1" DOB="2018-01-01" />
<row ID="2" SomeString="test2" DOB="2018-02-02" />
... and .nodes('/row') will return each <row> as a table's row (a derived table).
The .value() method will return just a NULL if an attribute is not found.
Try this
SELECT
SalesClients.ClientName,
CASE WHEN EXISTS(
SELECT
COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'SalesClients'
AND COLUMN_NAME = 'DOB'
)
THEN
SalesClients.DOB
ELSE
DATEADD(month, -SalesClients.AgeInMonth, GETDATE())
END AS DOB
FROM
dbo.SalesClients AS SalesClients

How to check if a value is included by a list in an effective way in sql 2008?

I would like to use something like an .Include function in SQL Server 2008, but I could not find the correct syntax for it. I have a sql query like below:
--#values has to be varchar list and start & end with comma
declare #values varchar(max) = ',7,34,37,74,85,'
select (case when #values like '%,' + m.Id + ',%' then m.Name else null end)
from #myTable m
So the logic is, if ID of a record matches with one of the numbers in #values list, I would like to see its name in the output list. This query is working fine, but I would like to find a more professional way to handle it, maybe like:
case when #values.Include(m.Id) then m.Name else null end
Any advice would be appreciated. Thanks.
The fastest method to split a delimited string is using xquery in my experience.
Ex:
DECLARE #values VARCHAR(50), #XML XML
SET #values = ',7,34,37,74,85,'
SET #XML = cast(('<X>'+replace(#values,',' ,'</X><X>')+'</X>') as xml)
SELECT N.value('.', 'VARCHAR(255)') as value FROM #XML.nodes('X') as T(N)
declare #table table (id varchar(5))
insert into #table(id)
values ('7')
select *
from #table y
where exists (SELECT 1 FROM #XML.nodes('X') as T(N) where N.value('.', 'VARCHAR(255)') = y.id)
If you are calling this code from an application, you might want to consider using Table-Valued Parameters and a stored procedure to do this.
First, you would need to create a table type to use with the procedure:
create type dbo.Ids_udt as table (Id int not null);
go
Then, create the procedure:
create procedure dbo.get_names_from_list (
#Ids as dbo.Ids_udt readonly
) as
begin;
set nocount, xact_abort on;
select t.Name
from t
inner join #Ids i
on t.Id = i.Id
end;
go
Then, assemble and pass the list of Ids to the stored procedure using a DataTable added as a SqlParameter using SqlDbType.Structured.
Table Valued Parameter Reference:
SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven! - Leonard Lobel
Table Value Parameter Use With C# - Jignesh Trivedi
Using Table-Valued Parameters in SQL Server and .NET - Erland Sommarskog
Maximizing Performance with Table-Valued Parameters - Dan Guzman
Maximizing throughput with tvp - sqlcat
How to use TVPs with Entity Framework 4.1 and CodeFirst
Assuming that the data/list is not required to be structered as a comma separated list you could either use IN, EXISTS or SOME / ANY
If it is unavoidable you could use JiggsJedi way but since you asked for a fast way you should try to store the data in a way that in can be processed faster and does not require additional work to be queried.
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
Drop table #Temp
Create table #Temp (ID INt ,Name varchar(5))
INSERT into #Temp
SELECT 7,'AA' Union all
SELECT 34,'BA' Union all
SELECT 37,'CA' Union all
SELECT 74,'DA' Union all
SELECT 85,'TA'
DECLARE #values varchar(max) = ',,,,,,7,,34,,,74,85,,,,' --If extra commas are added in starting or end or in between of string it could handle
SET #values=','+#values+','
SELECT #values= LEFT(STUFF(#values,1,1,''),LEN(#values)-2)
DECLARE #SelectValuesIn TABLE(Value INT)
INSERT INTO #SelectValuesIn
SELECT Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
(
SELECT
CAST ('<M>' + REPLACE(#values, ',', '</M><M>') + '</M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
SELECT * FROM #Temp WHERE ID IN(SELECT Value from #SelectValuesIn)

SQL Server: Output an XML field as tabular data using a stored procedure

I am using a table with an XML data field to store the audit trails of all other tables in the database.
That means the same XML field has various XML information. For example my table has two records with XML data like this:
1st record:
<client>
<name>xyz</name>
<ssn>432-54-4231</ssn>
</client>
2nd record:
<emp>
<name>abc</name>
<sal>5000</sal>
</emp>
These are the two sample formats and just two records. The table actually has many more XML formats in the same field and many records in each format.
Now my problem is that upon query I need these XML formats to be converted into tabular result sets.
What are the options for me? It would be a regular task to query this table and generate reports from it. I want to create a stored procedure to which I can pass that I need to query "<emp>" or "<client>", then my stored procedure should return tabular data.
does this help?
INSERT INTO #t (data) SELECT '
<client>
<name>xyz</name>
<ssn>432-54-4231</ssn>
</client>'
INSERT INTO #t (data) SELECT '
<emp>
<name>abc</name>
<sal>5000</sal>
</emp>'
DECLARE #el VARCHAR(20)
SELECT #el = 'client'
SELECT
x.value('local-name(.)', 'VARCHAR(20)') AS ColumnName,
x.value('.','VARCHAR(20)') AS ColumnValue
FROM #t
CROSS APPLY data.nodes('/*[local-name(.)=sql:variable("#el")]') a (x)
/*
ColumnName ColumnValue
-------------------- --------------------
client xyz432-54-4231
*/
SELECT #el = 'emp'
SELECT
x.value('local-name(.)', 'VARCHAR(20)') AS ColumnName,
x.value('.','VARCHAR(20)') AS ColumnValue
FROM #t
CROSS APPLY data.nodes('/*[local-name(.)=sql:variable("#el")]') a (x)
/*
ColumnName ColumnValue
-------------------- --------------------
emp abc5000
*/
Neither xyz432-54-4231 nor abc5000 is valid XML.
You can try to select only one particular format with a like statement, f.e.:
select *
from YourTable
where YourColumn like '[a-z][a-z][a-z][0-9][0-9][0-9][0-9]'
This would match 3 letters followed by 4 numbers.
A better option is probably to add an extra column to the table, where you save the type of the logging. Then you can use that column to select all "emp" or "client" rows.
An option would be to create a series of views that present the aduit table, per type in the relations that you're execpting
for example
select
c.value('name','nvarchar(50)') as name,
c.value('ssn', 'nvarchar(20)') as ssn
from yourtable
cross apply yourxmlcolumn.nodes('/client') as t(c)
you could then follow the same pattern for the emp
you could also create a view (or computed column) to identify each xml type like this:
select yourxmlcolumn.value('local-name(/*[1])', 'varchar(100)') as objectType
from yourtable
Use open xml method
DECLARE #idoc int
EXEC sp_xml_preparedocument #idoc OUTPUT, #xmldoc
SELECT * into #test
FROM OPENXML (#idoc, 'xmlfilepath',2)
WITH (Name varchar(50),ssn varchar(20)
)
EXEC sp_xml_removedocument #idoc
after you get the data in the #test
and you can manipulate this.
you may be put the diff data in diff xml file.

Resources