SQL Server split SELECT XML column as arbitrary individual columns - sql-server

In my application, I have few pre-defined fields for an object and user can define custom fields. I am using XML data type to store the custom fields in a name value format.
e.g. I have Employees table that has FN, LN, Email as pre-defined columns and CustomFields as XML column to hold the user defined fields.
And different rows can contain different custom fields.
e.g. Row 1 -> John, Smith, jsmith#example.com,
<root>
<phone>123-123-1234</phone>
<country>USA</country>
</root>
and then Row 2 -> Smith, John, sjohn#example.com,
<root>
<age>50</age>
<sex>Male</sex>
</root>
And there can be any number of such custom fields defined for different employee records. The format will always be the same
<root><field>value</field></root>
How can I return Phone and Country as columns while selecting Row1 and return Age and Sex as columns while selecting Row2?

Take this temp table for all examples
CREATE TABLE #tbl (ID INT IDENTITY, FirstName VARCHAR(100),LastName VARCHAR(100),eMail VARCHAR(100),CustomFields XML);
INSERT INTO #tbl VALUES
('John','Smith','john.smith#test.com'
,'<root>
<phone>123-123-1234</phone>
<country>USA</country>
</root>')
, ('Jane','Miller','jane.miller#test.com'
,'<root>
<age>50</age>
<sex>Male</sex>
</root>');
Option 1
Assuming that there is a fix known set of custom fields.
This allows typesafe reading (age as INT)
all possible columns are returned, unused are NULL
Try this code
SELECT tbl.ID
,tbl.FirstName
,tbl.LastName
,tbl.eMail
,tbl.CustomFields.value('(/root/phone)[1]','nvarchar(max)') AS phone
,tbl.CustomFields.value('(/root/country)[1]','nvarchar(max)') AS country
,tbl.CustomFields.value('(/root/age)[1]','int') AS age
,tbl.CustomFields.value('(/root/sex)[1]','nvarchar(max)') AS sex
FROM #tbl AS tbl
This is the result
+----+-----------+----------+----------------------+--------------+---------+------+------+
| ID | FirstName | LastName | eMail | phone | country | age | sex |
+----+-----------+----------+----------------------+--------------+---------+------+------+
| 1 | John | Smith | john.smith#test.com | 123-123-1234 | USA | NULL | NULL |
+----+-----------+----------+----------------------+--------------+---------+------+------+
| 2 | Jane | Miller | jane.miller#test.com | NULL | NULL | 50 | Male |
+----+-----------+----------+----------------------+--------------+---------+------+------+
*/
Option 2
assuming you do not know the field names in advance you cannot name the output columns directly
But you can use generic names, read the data row-wise and do PIVOT
Try this:
SELECT p.*
FROM
(
SELECT tbl.FirstName
,tbl.LastName
,tbl.eMail
,N'Col_' + CAST(ROW_NUMBER() OVER(PARTITION BY tbl.ID ORDER BY (SELECT NULL)) AS NVARCHAR(max)) AS ColumnName
,A.cf.value('local-name(.)','nvarchar(max)') + ':' + A.cf.value('.','nvarchar(max)') AS cf
FROM #tbl AS tbl
CROSS APPLY tbl.CustomFields.nodes('/root/*') AS A(cf)
) AS x
PIVOT
(
MAX(cf) FOR ColumnName IN(Col_1,Col_2,Col_3,Col_4 /*add as many as you need*/)
) AS p
This is the result
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| FirstName | LastName | eMail | Col_1 | Col_2 | Col_3 | Col_4 |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| Jane | Miller | jane.miller#test.com | age:50 | sex:Male | NULL | NULL |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| John | Smith | john.smith#test.com | phone:123-123-1234 | country:USA | NULL | NULL |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
Option 3
assuming you do not know the columns, but you need the columns correctly named
attention: be aware of the fact, that such an approach will never be allowed in ad-hoc-SQL such as VIEW or inline TVF which might be a great back draw...
This needs dynamic creation of a statement. I will create the statement of Option 1 but replace the fix list with a dynamically created list:
DECLARE #DynamicColumns NVARCHAR(MAX)=
(
SELECT ',tbl.CustomFields.value(''(/root/' + A.cf.value('local-name(.)','nvarchar(max)') + ')[1]'',''nvarchar(max)'') AS ' + A.cf.value('local-name(.)','nvarchar(max)')
FROM #tbl AS tbl
CROSS APPLY tbl.CustomFields.nodes('/root/*') AS A(cf)
FOR XML PATH('')
);
DECLARE #DynamicSQL NVARCHAR(MAX)=
' SELECT tbl.ID
,tbl.FirstName
,tbl.LastName
,tbl.eMail'
+ #DynamicColumns +
' FROM #tbl AS tbl;'
EXEC(#DynamicSQL);
The result would be the same as in Option 1, but with a completely dynamic approach.
Cleanup
DROP TABLE #tbl;

Related

SQL Server 2017 - get column name, datatype and value of table

I thought it was a simple task but it's a couple of hours I'm still struggling :-(
I want to have the list of column names of a table, together with its datatype and the value contained in the columns, but have no idea how to bind the table itself to get the current value:
DECLARE #TTab TABLE
(
fieldName nvarchar(128),
dataType nvarchar(64),
currentValue nvarchar(128)
)
INSERT INTO #TTab (fieldName,dataType)
SELECT
i.COLUMN_NAME,
i.DATA_TYPE
FROM
INFORMATION_SCHEMA.COLUMNS i
WHERE
i.TABLE_NAME = 'Users'
Expected result:
+------------+----------+---------------+
| fieldName | dataType | currentValue |
+------------+----------+---------------+
| userName | nvarchar | John |
| active | bit | true |
| age | int | 43 |
| balance | money | 25.20 |
+------------+----------+---------------+
In general the answer is: No, this is impossible. But there is a hack using text-based containers like XML or JSON (v2016+):
--Let's create a test table with some rows
CREATE TABLE dbo.TestGetMetaData(ID INT IDENTITY,PreName VARCHAR(100),LastName NVARCHAR(MAX),DOB DATE);
INSERT INTO dbo.TestGetMetaData(PreName,LastName,DOB) VALUES
('Tim','Smith','20000101')
,('Tom','Blake','20000202')
,('Kim','Black','20000303')
GO
--Here's the query
SELECT C.colName
,C.colValue
,D.*
FROM
(
SELECT t.* FROM dbo.TestGetMetaData t
WHERE t.Id=2
FOR XML PATH(''),TYPE
) A(rowSet)
CROSS APPLY A.rowSet.nodes('*') B(col)
CROSS APPLY(VALUES(B.col.value('local-name(.)','nvarchar(500)')
,B.col.value('text()[1]', 'nvarchar(max)'))) C(colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON D.TABLE_SCHEMA='dbo'
AND D.TABLE_NAME='TestGetMetaData'
AND D.COLUMN_NAME=C.colName;
GO
--Clean-Up (carefull with real data)
DROP TABLE dbo.TestGetMetaData;
GO
Part of the result
+----------+------------+-----------+--------------------------+-------------+
| colName | colValue | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | IS_NULLABLE |
+----------+------------+-----------+--------------------------+-------------+
| ID | 2 | int | NULL | NO |
+----------+------------+-----------+--------------------------+-------------+
| PreName | Tom | varchar | 100 | YES |
+----------+------------+-----------+--------------------------+-------------+
| LastName | Blake | nvarchar | -1 | YES |
+----------+------------+-----------+--------------------------+-------------+
| DOB | 2000-02-02 | date | NULL | YES |
+----------+------------+-----------+--------------------------+-------------+
The idea in short:
Using FOR XML PATH(''),TYPE will create a XML representing your SELECT's result set.
The big advantage with this: The XML's element will carry the column's name.
We can use a CROSS APPLY to geht the column's name and value
Now we can JOIN the metadata from INFORMATION_SCHEMA.COLUMNS.
One hint: All values will be of type nvarchar(max) actually.
The value being a string type might lead to unexpected results due to implicit conversions or might lead into troubles with BLOBs.
UPDATE
The following query wouldn't even need to specify the table's name in the JOIN:
SELECT C.colName
,C.colValue
,D.DATA_TYPE,D.CHARACTER_MAXIMUM_LENGTH,IS_NULLABLE
FROM
(
SELECT * FROM dbo.TestGetMetaData
WHERE Id=2
FOR XML AUTO,TYPE
) A(rowSet)
CROSS APPLY A.rowSet.nodes('/*/#*') B(attr)
CROSS APPLY(VALUES(A.rowSet.value('local-name(/*[1])','nvarchar(500)')
,B.attr.value('local-name(.)','nvarchar(500)')
,B.attr.value('.', 'nvarchar(max)'))) C(tblName,colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON CONCAT(D.TABLE_SCHEMA,'.',D.TABLE_NAME)=C.tblName
AND D.COLUMN_NAME=C.colName;
Why?
Using FOR XML AUTO will use attribute centered XML. The elements name will be the tables name, while the values rest within attributes.
UPDATE 2
Fully generic function:
CREATE FUNCTION dbo.GetRowWithMetaData(#input XML)
RETURNS TABLE
AS
RETURN
SELECT C.colName
,C.colValue
,D.*
FROM #input.nodes('/*/#*') B(attr)
CROSS APPLY(VALUES(#input.value('local-name(/*[1])','nvarchar(500)')
,B.attr.value('local-name(.)','nvarchar(500)')
,B.attr.value('.', 'nvarchar(max)'))) C(tblName,colName,colValue)
LEFT JOIN INFORMATION_SCHEMA.COLUMNS D ON CONCAT(D.TABLE_SCHEMA,'.',D.TABLE_NAME)=C.tblName
AND D.COLUMN_NAME=C.colName;
--You call it like this (see the extra paranthesis!)
SELECT * FROM dbo.GetRowWithMetaData((SELECT * FROM dbo.TestGetMetaData WHERE ID=2 FOR XML AUTO));
As you see, the function does not even has to know anything in advance...

How to INSERT rows based on other rows?

I need to run a query that will INSERT new rows into a SQL Server join table.
Suppose I have the following tables to describe which products a store sells and in which states:
products:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1 | Laptop |
| 2 | Aspirin |
| 3 | Mattress |
+------------+--------------+
stores:
+----------+------------+
| store_id | store_name |
+----------+------------+
| 1 | Walmart |
| 2 | Best Buy |
| 3 | Sam's Club |
+----------+------------+
products_stores_states:
+------------+----------+-------+
| product_id | store_id | state |
+------------+----------+-------+
| 1 | 2 | AL |
| 1 | 2 | AR |
| 2 | 2 | AL |
| 2 | 2 | AR |
| 3 | 2 | AL |
| 3 | 2 | AR |
+------------+----------+-------+
So here we see that Best Buy sells all 3 products in AL and AR.
What I need to do is somehow insert rows into the products_stores_states table to add AZ for all products it currently sells.
With a small dataset, I could do this manually, row by row:
INSERT INTO products_stores_states (product_id, store_id, state) VALUES
(1,2,'AZ'),
(2,2,'AZ'),
(3,2,'AZ');
Since this is a large dataset, this is not really an option.
How would I go about inserting a new state for Best Buy for every product_id that the products_stores_states table already contains for Best Buy?
Bonus: If a query could be made to do this for multiple states that the same time, that would be even better.
Right now, I cannot wrap my head around how to do this, but I assume there would need to be a subquery to get the list of matching product_id values I need to use.
The following query will do what you want to do
DECLARE #temp TABLE (
state VARCHAR(20)
)
-- we are inserting state names into a temp table to use it further
INSERT INTO #temp (state)
VALUES
('AZ'),
('MA'),
('TX');
INSERT INTO products_stores_states(product_id, store_id, state )
SELECT
T.product_id,
T.store_id,
temp.state
FROM(
SELECT DISTINCT
Product_id, store_id
FROM
products_stores_states
WHERE store_id = 2 -- the store_id for which you want to make changes
) AS T
CROSS JOIN
#temp AS temp
At first, we are storing the state names into a table variable. Then we need to select only the distinct store_id and product_id combinations for a specific store.
Then we should insert the distinct values cross join with the table variable where we stored state names.
Here is the live demo.
Hope, this helps! Thanks.
If you are positive the inserted state is "NEW" to the table, something like this would work, changing the state variable to whatever you want to insert the new records.
DECLARE #State CHAR(2), #StoreId INT;
SET #State = 'AZ';
SET #StoreId = 2;
INSERT INTO products_stores_states (product_id, store_id, state)
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;
You could first see what this statement would add using this:
DECLARE #State CHAR(2)
SET #State = 'AZ';
SET #StoreId = 2;
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;

How to generate XML path within CASE

Edit: Trying to query a XML Path list that has been narrowed down by a case statement. Column 'displayname' contains over 700 unique values throughout the database. However, based on other criteria including the AccountID and if RenderedValue is = '', the remaining results will most likely be less than 5. The variables in my query is I cannot explicitly declare an Account Id or DisplayName.
I have a successful CASE statement on it's own. But trying to also have the XML PATH statement pulls all the data from the table and comma separates it instead of just the results from the previous CASE statement. Can't figure out how to nest them together. Besides the GUID in column 1, values are nvarchar.
Query w/o CASE
select tb1.AccountID,
tb3.DisplayName,
tb4.RenderedValue
from Accounts tb1
join Display tb2 on tb2.AccountID = tb1.AccountID
inner join ExtractDetail tb3 on tb3.ExtractID = tb2.ExtractID
left join ExtractDetailData tb4 on tb4.ExtractDetailID = tb3.ExtractDetailID
result:
+-----------+---------------+-----------------------+
| AccountID | DisplayName | RenderedValue |
+-----------+---------------+-----------------------+
| E8175 | FirstName | John |
| E8175 | LastName | Smith |
| E8175 | StreetAddress | 123 Washington Street |
| E8175 | City | |
| E8175 | State | NY |
| E8175 | ZipCode | |
| E8175 | PhoneNumber | 555-555-5555 |
| E8175 | Email | JohnSmith#aol.com |
+-----------+---------------+-----------------------+
Query w/ CASE
select tb1.AccountID,
CASE When tb4.RenderedValue = ''
Then tb3.DisplayName
Else ''
End As MissingField
from Accounts tb1
join Display tb2 on tb2.AccountID = tb1.AccountID
inner join ExtractDetail tb3 on tb3.ExtractID = tb2.ExtractID
left join ExtractDetailData tb4 on tb4.ExtractDetailID = tb3.ExtractDetailID
Where tb4.RenderedValue =''
Result:
+-----------+--------------+
| AccountID | MissingField |
+-----------+--------------+
| E8175 | City |
| E8175 | ZipCode |
+-----------+--------------+
Expected Output:
+-----------+--------------+
| AccountID | MissingField |
+-----------+--------------+
| E8175 | City,ZipCode |
+-----------+--------------+
i think this code will help you
create table #temp (AccountID varchar(20),DisplayName varchar(20),RenderedValue varchar(255))
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','FirstName','John')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','LastName','Smith')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','StreetAddress','123 Washington Street')
insert into #temp (AccountID,DisplayName,RenderedValue) values ('E8175','City','')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','State','NY')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','ZipCode','')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','PhoneNumber','555-555-5555 ')
insert into #temp (AccountID,DisplayName,RenderedValue) values
('E8175','Email','JohnSmith#aol.com')
SELECT distinct
P.AccountID,
STUFF
(
(
SELECT ',' + case when RenderedValue = '' Then DisplayName Else '' End
FROM #temp M
FOR XML PATH(''), type
).value('.', 'varchar(max)'), 1, 1, ''
) AS Temp
FROM
#temp P
Drop table #temp

Need help in a reverse pivot - Column names become data and then the values in that column

I am looking to pull data from a table and insert the results into a #temp table where the column name is part of the result set. I know I can get the column names from the schema information table but I need the data in one of the columns. There will be only 1 row from the original table, so I am basically doing a reverse STUFF command or reverse Pivot. The result set would be columnName and Value but multiple rows- as many rows as columns
So basically the result set or table with have just 2 columns, one for the column name and one for the value in that column. That is my goal. I know a pivot does this in reverse but can't seem to find a "Reverse pivot". I am using SQL Server 2008.
Any help would be appreciated. Thanks!
Are you able to give a better description of what you're after? For example, more information on the table structures, etc.
Regardless. Please see below an example of using a CROSS APPLY statement to transform a 'Pivot Table' into a flat table.
Data within the pivot table
+----+-----------+----------+----------------+
| Id | FirstName | LastName | Company |
+----+-----------+----------+----------------+
| 1 | Joe | Bloggs | A Company |
| 2 | Jane | Doe | Lost and Found |
+----+-----------+----------+----------------+
SQL statement to turn pivot table to flat table
IF OBJECT_ID('PivotedTable', 'U') IS NOT NULL
DROP TABLE PivotedTable
GO
CREATE TABLE PivotedTable (
Id INT IDENTITY,
FirstName VARCHAR(255),
LastName VARCHAR(255),
Company VARCHAR(255)
)
INSERT PivotedTable (FirstName, LastName, Company)
VALUES ('Joe', 'Bloggs', 'A Company'), ('Jane', 'Doe', 'Lost and Found')
SELECT
FlatTable.ColumnName,
FlatTable.Value
FROM PivotedTable t
CROSS APPLY (
VALUES
('FirstName', FirstName),
('LastName', LastName),
('Company', Company)
) FlatTable (ColumnName, Value)
Output of the query after turning into a flat table
+------------+----------------+
| ColumnName | Value |
+------------+----------------+
| FirstName | Joe |
| LastName | Bloggs |
| Company | A Company |
| FirstName | Jane |
| LastName | Doe |
| Company | Lost and Found |
+------------+----------------+

Parse non Xml string into Xml during Sql query

Say I have this subset of data. All I need to do is have John | John | 20 as my output. The main issue I am having is that my XmlData is stored in an nvarchar(Max) field and the update to fix this, breaks an unknown amount of other applications (talking a massive scale so I cannot simply modify the table design).
Name nvarchar(23) | XmlData (nvarchar(max) |
John |<Personal><name>John</name><age>20</age></Personal> |
Suzy |<Personal><name>Suzanne</name><age>24</age></Personal> |
etc...
What I have tried so far is similar to the following, but it fails.
SELECT Name,
[myTable].Value('(Personal[name ="name"]/value/text())[1]', 'nvarchar(100)') as 'XmlName',
[myTable].Value('(Personal[name ="age"]/value/text())[1]', 'nvarchar(100)') as 'XmlAge'
FROM [MyTable]
How can I achieve my goal of the following output?
Name | XmlName | XmlAge |
John | John | 20 |
Suzy | Suzanne | 24 |
etc...
First cast the field to the XML type, then use the value() method:
DECLARE #T TABLE (Name nvarchar(23), XmlData nvarchar(max));
INSERT #T VALUES ('John', '<Personal><name>John</name><age>20</age></Personal>');
SELECT Name,
CAST(XmlData AS XML).value('(Personal/name)[1]', 'nvarchar(100)') AS 'XmlName',
CAST(XmlData AS XML).value('(Personal/age)[1]', 'nvarchar(100)') AS 'XmlAge'
FROM #T;

Resources