Convert a table with unknown structure into Key/Value - sql-server

The report data we receive from analysts come in Table format with arbitrary structure. All we know is that each row has a CustomerId column. But the others, we do not know and can vary every time.
The destination system that receives this data only does in Key/Value format so we have to convert the report tables into Key/Value.
So, if for instance, the source report table has the following structure:
CREATE TABLE [dbo].[SampleSourceTable](
[CustomerId] [bigint] NULL,
[Column1] [nchar](10) NULL,
[Column2] [int] NULL,
[Column3] [datetime] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (1, N'aaa', 123, CAST(N'2019-01-01T00:00:00.000' AS DateTime))
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (2, N'bbb', 456, CAST(N'2018-01-01T00:00:00.000' AS DateTime))
GO
We would like this data to be converted into the following structure:
CREATE TABLE [dbo].[SampleDestinationTable](
[CustomerId] [bigint] NULL,
[Attribute] [nvarchar](255) NULL,
[Value] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column1', N'aaa')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column2', N'123')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column3', N'2019-01-01 00:00:00.000')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column1', N'bbb')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column2', N'456')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column3', N'2018-01-01 00:00:00.000')
GO
The challenge here, however, is that the source report table does not have a fixed structure.
At first, I thought about going through every row using a cursor and then using a nested cursor go through all the columns in that row. But apparently, there is no way of processing a row with an unknown structure using cursors. So for now, I am wondering if this is possible using PIVOT/UNPIVOT. But then again, I think they also require the column list.
I am running SQL Server 2017.
How do I do transform the data with an unknown structure?

One possible approach is to generate a dynamic statement using information from INFORMATION_SCHEMA.COLUMNS:
-- Declarations
DECLARE #stm nvarchar(max)
-- Dynamic part
SELECT
#stm = STUFF((
SELECT CONCAT(
N' UNION ALL SELECT CustomerID, ''',
[COLUMN_NAME],
N''' AS [Attribute], CONVERT(nvarchar(max), ',
QUOTENAME([COLUMN_NAME]),
CASE
WHEN DATA_TYPE = 'datetime' THEN N', 121'
-- Add additional conversion rules for other data types
ELSE N''
END,
N') AS [Value]',
N' FROM [SampleSourceTable]'
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE (TABLE_NAME = 'SampleSourceTable') AND (COLUMN_NAME <> 'CustomerId')
FOR XML PATH('')
), 1, 11, N'')
-- Whole statement and execution
SET #stm = #stm + N'ORDER BY CustomerID'
PRINT #stm
EXEC (#stm)
Output:
CustomerID Attribute Value
1 Column1 aaa
1 Column2 123
1 Column3 2019-01-01 00:00:00.000
2 Column3 2018-01-01 00:00:00.000
2 Column2 456
2 Column1 bbb

Ah, you opened a second question, I just placed an answer at your first...
So I will use this place to provide the same technique as my other answer, but without any need of dynamically created SQL. Try this out:
DECLARE #xml XML =(SELECT TOP 10 o.object_id,o.* FROM sys.objects o FOR XML RAW, ELEMENTS XSINIL);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #xml.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c);
The first column of the set will be returned as RowID. If this is not correct, you can force this by doing the same as I've done above to force the o.object_id in the first place. All Columns of your result will be returned as EAV.
Part of the result
+-------+---------------------+-------------------------+
| RowID | ColumnKey | ColumnValue |
+-------+---------------------+-------------------------+
| 3 | name | sysrscols |
+-------+---------------------+-------------------------+
| 3 | object_id | 3 |
+-------+---------------------+-------------------------+
| 3 | principal_id | NULL |
+-------+---------------------+-------------------------+
| 3 | schema_id | 4 |
+-------+---------------------+-------------------------+
| 3 | parent_object_id | 0 |
+-------+---------------------+-------------------------+
| 3 | type | S |
+-------+---------------------+-------------------------+
| 3 | type_desc | SYSTEM_TABLE |
+-------+---------------------+-------------------------+
| 3 | create_date | 2017-08-22T19:38:02.860 |
+-------+---------------------+-------------------------+
| 3 | modify_date | 2017-08-22T19:38:02.867 |
+-------+---------------------+-------------------------+
| 3 | is_ms_shipped | 1 |
+-------+---------------------+-------------------------+
| 3 | is_published | 0 |
+-------+---------------------+-------------------------+
| 3 | is_schema_published | 0 |
+-------+---------------------+-------------------------+
| 5 | name | sysrowsets |
+-------+---------------------+-------------------------+
| ... more rows ...

Related

How to SQL PIVOT on two columns and with dynamic column names?

I have a key value pair set of rows to associate to a unique identifier (ApplicationId).
The data would look something like this:
| ApplicationId | Key | Value | Date |
| 123 | A | abc | 2020-3-1 14:00:01.000 |
| 123 | B | abd | 2020-3-1 14:00:02.000 |
| 123 | C | abe | 2020-3-1 14:00:03.000 |
| 124 | A | abf | 2020-3-1 14:01:00.000 |
| 124 | D | abg | 2020-3-1 14:01:01.000 |
The end result i'm looking for would be this:
| ApplicationId | A | A_Date | B | B_Date | C | C_Date | D | D_Date |
| 123 | abc | 2020-3-1 14:00:01.000 | abd | 2020-3-1 14:00:02.000 | abe | 2020-3-1 14:00:03.000 | NULL | NULL |
| 124 | abf | 2020-3-1 14:01:00.000 | NULL | NULL | NULL | NULL | abg | 2020-3-1 14:01:01.000 |
The Keys A,B,C,D are unknown so hard coding the column names isn't possible.
Here is something that works with one PIVOT
IF OBJECT_ID('tempdb.dbo.#_BLAH') IS NOT NULL DROP TABLE #_BLAH
SELECT et.[ApplicationId] et.[Key], et.[Value], et.[Date]
INTO #_BLAH
FROM ExampleTbl et
WHERE et.[Date] > DATEADD(dd, -1, GetDate())
DECLARE #_cols AS NVARCHAR(MAX)
DECLARE #_sql AS NVARCHAR(MAX)
SELECT
#_cols += QUOTENAME([Key]) + ','
FROM
#_BLAH
GROUP BY
[Key];
SET #_cols = STUFF((SELECT ',' + QUOTENAME(T.[Key])
FROM #_BLAH AS T
GROUP BY T.[Key]
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'),1,1,'')
set #_sql = 'SELECT [ApplicationId], ' + #_cols + '
FROM ( SELECT * FROM #_BLAH) AS SRC
PIVOT ( MAX([Value]) FOR [Key] IN (' + #_cols + ') ) AS p';
EXEC(#_sql)
I've so far been unable to find an example or an article attempting to make a second dynamic column and adding in the value that relates the specific Key in my example.
My SQL above will accomplish creating the row i want except for the #_Date column i need.
Try this:
DROP TABLE IF EXISTS #DataSource;
DROP TABLE IF EXISTS #DataSourcePrepared;
CREATE TABLE #DataSource
(
[ApplicationId] INT
,[Key] CHAR(1)
,[Value] VARCHAR(12)
,[Date] DATETIME2(0)
);
INSERT INTO #DataSource ([ApplicationId], [Key], [Value], [Date])
VALUES (123, 'A', 'abc', '2020-3-1 14:00:01.000')
,(123, 'B', 'abd', '2020-3-1 14:00:02.000')
,(123, 'C', 'abe', '2020-3-1 14:00:03.000')
,(124, 'A', 'abf', '2020-3-1 14:01:00.000')
,(124, 'D', 'abg', '2020-3-1 14:01:01.000');
CREATE TABLE #DataSourcePrepared
(
[ApplicationId] INT
,[ColumnName] VARCHAR(32)
,[Value] VARCHAR(32)
)
INSERT INTO #DataSourcePrepared ([ApplicationId], [ColumnName], [Value])
SELECT [ApplicationId]
,[Key]
,[value]
FROM #DataSource
UNION ALL
SELECT [ApplicationId]
,[Key] + '_Date'
,CONVERT(VARCHAR(19), [Date], 121)
FROM #DataSource;
DECLARE #DymanimcTSQLSatement NVARCHAR(MAX)
,#DynamicColumns NVARCHAR(MAX);
SET #DynamicColumns = STUFF
(
(
SELECT ',' + QUOTENAME([ColumnName])
FROM #DataSourcePrepared
GROUP BY [ColumnName]
ORDER BY [ColumnName]
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1
,1
,''
);
SET #DymanimcTSQLSatement = N'
SELECT *
FROM #DataSourcePrepared
PIVOT
(
MAX([value]) FOR [ColumnName] IN (' + #DynamicColumns +')
) PVT;';
EXECUTE sp_executesql #DymanimcTSQLSatement;
You just need to prepare the data before the actual PIVOT. Also, note that I am ordering the columns when I am building the dynamic part by name. In your real case, you may want to change this to something complex.
you can try this
DECLARE #_cols AS NVARCHAR(MAX) =''
DECLARE #_sql AS NVARCHAR(MAX)
SELECT
#_cols +=','+ QUOTENAME([Key]) + ',' + QUOTENAME([Key]+'_Date')
FROM
(SELECT DISTINCT [Key] FROM ExampleTbl) T
SET #_cols = STUFF(#_cols,1,1,'')
set #_sql = 'SELECT * FROM (
SELECT ApplicationId, [Key], Value FROM ExampleTbl
UNION ALL
SELECT ApplicationId, [Key] + ''_Date'' AS [Key], CONVERT(VARCHAR(30), [Date],121 ) AS Value FROM ExampleTbl
) SRC
PIVOT (MAX(Value) FOR [Key] IN ('+#_cols +' )) AS PVT';
EXEC(#_sql)
Result:
ApplicationId A A_Date B B_Date C C_Date D D_Date
------------- ------- --------------------------- ---------- -------------------------- ------------ ------------------------- ------- -------------------------
123 abc 2020-03-01 14:00:01.000 abd 2020-03-01 14:00:02.000 abe 2020-03-01 14:00:03.000 NULL NULL
124 abf 2020-03-01 14:01:00.000 NULL NULL NULL NULL abg 2020-03-01 14:01:01.000

Convert one column into one row in T-SQL

I have a select statement which yields a single large column. What I would like to do is convert the one column into one row so that I can feed it into a another stored procedure. How could I transpose the table so that it is one single row? So far, I have tried UNIPIVOT but have not been able to get it working.
Here is the format of the current table (in reality, it's much longer with a variable amount of rows):
+------+
| Col1 |
+------+
| 1 |
| 56 |
| 83 |
| 345 |
| 4322 |
| 4456 |
+------+
which is stored in a local table #localtable
I would like to turn the above table into the below table:
+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | col5 | col6 |
+------+------+------+------+------+------+
| 1 | 56 | 83 | 345 | 4322 | 4456 |
+------+------+------+------+------+------+
as an intermediate step in converting it into the following comma-delimited string:
'1, 56, 83, 345, 4322, 4456'
With the goal of feeding it into an exec like:
exec myfunction '1, 56, 83, 345, 4322, 4456'
I was able to solve this with the following query, creating my desired comma-delimited string:
SELECT STUFF((SELECT ',' + CAST(Col1 AS VARCHAR(50))
FROM #localtable
FOR XML PATH('')), 1, 1, '') AS listStr
You can use string_agg() in the most recent versions of SQL Server:
select string_agg(col, ', ')
from #localtable;
Here is another solution that will work with older SQL Server version (where no string_agg is available):
--create test table
declare #tmp table (Col1 int)
--populate test table
insert into #tmp
values
( 1)
,( 56)
,( 83)
,( 345)
,( 4322)
,( 4456)
--declare a variable that will contain the final result
declare #result varchar(4000) = ''
--populate final result from table
select #result = #result + ', ' + cast(Col1 as varchar) from #tmp
--remove first unnecessary comma
set #result = STUFF(#result, 1, 1, '')
--print result
select #result as result
Result:

how to copy TreeView's rows from the same table and update with different ID column and copy the parent_id

I want copy the table and put different value on column Type= B and auto_increment id and copy the parent id
Table = Menu
Id | parent_id | order | section | name | url | type
100 | NULL | 7 | web | Tasks | ~/en/Tasks | A
102 | 100 | 1 | web | Pages | ~/en/Pages | A
103 | 100 | 4 | web | Category | ~/en/Category | A
104 | NULL | 3 | web | DLM | ~/en/DLM | A
105 | 104 | 6 | web | ONS | ~/en/ONS | A
106 | 104 | 2 | web | HBO | ~/en/HBO | A
107 | NULL | 7 | web | Tasks | ~/en/Tasks | B
108 | 107 | 1 | web | Pages | ~/en/Pages | B
109 | 107 | 4 | web | Category | ~/en/Category | B
110 | NULL | 3 | web | DLM | ~/en/DLM | B
111 | 110 | 6 | web | ONS | ~/en/ONS | B
112 | 110 | 2 | web | HBO | ~/en/HBO | B
This probably isn't the most efficient, but it gets the job done. It assumes that name is unique. I left out columns unnecessary to the example. Also, you can't put a variable into the identity clause, so that needs to be wrapped in an EXEC
IF OBJECT_ID (N'paths', N'U') IS NOT NULL
DROP TABLE paths
IF OBJECT_ID (N'new_paths', N'U') IS NOT NULL
DROP TABLE new_paths
CREATE TABLE paths (
id INT,
parent_id INT,
name NVARCHAR(20)
)
INSERT INTO dbo.paths
(id,parent_id,name)
VALUES
(100, NULL, 'Tasks'),
(102, 100, 'Pages'),
(103, 100, 'Category'),
(104, NULL, 'DLM'),
(105, 104, 'ONS'),
(106, 104, 'HBO')
DECLARE #start_value INT
SET #start_value = (SELECT MAX(id) FROM paths) + 1
DECLARE #sql nvarchar(1000)
SET #sql = N'
CREATE TABLE new_paths (
id INT IDENTITY(' + CAST(#start_value AS nvarchar) + ',1),
parent_id INT,
name NVARCHAR(20)
)
'
EXEC sp_executesql #stmt = #sql
INSERT INTO new_paths (parent_id,name)
SELECT Parent_id, name FROM dbo.paths
;WITH mappings AS (
SELECT n.*, p.id AS old_id
FROM new_paths n
INNER JOIN paths p
ON p.name = n.name
)
UPDATE n
SET n.parent_id = m.id
FROM new_paths n
INNER JOIN mappings m
ON m.old_id = n.parent_id
--SELECT * FROM new_paths
Please see below approach to resolve an issue, ask questions in the comments if something is unclear, I have added some explanation in code comments
EDITED, to manage GUID (as per comment)
-- declare table var
declare #table table ([Increment] int identity(1,1), Id uniqueidentifier, [parent_id] nvarchar(50), [order] int, [section] nvarchar(50), [name] nvarchar(50), [url] nvarchar(50), [type] nvarchar(50))
-- insert values into this table
insert into #table
select [Id],
[parent_id],
[order],
[section],
[name],
[url],
'B'
from your_table
where [type] = 'A'
-- loop your temp table
declare #max_temp int = (select max(Increment) from #table)
declare #curr int = 1
declare #parent_value uniqueidentifier = null
while (#curr <= #max_temp)
begin
-- do diffrent inserts depend on parent_id value
if (select parent_id from #table) = null
begin
-- set below var, it will be used in next insert where parent_id is not null
set #parent_value = (select Id from #table where Increment = #curr)
insert into your_table ([parent_id], [order], [section], [name], [url], [type])
select
[parent_id],
[order],
[section],
[name],
[url],
[type]
from #table
where Id = #curr
end
else
begin
insert into your_table ([parent_id], [order], [section], [name], [url], [type])
select
isnull(#parent_value, [parent_id]),
[order],
[section],
[name],
[url],
[type]
from #table
where Id = #curr
end
-- update current
set #curr = #curr + 1
end

Compare two tables using field map

I need to compare the values between tables in two SQL Server databases. The fieldnames in the tables in one database don't match the fieldnames in the tables in the second database. I have a link table that has the matching table names and the matching field names mapped.
Table1:
| Tab1_ID | Field1 | Field2 | Field3 |
|---------|--------|--------|--------|
| 1 | One | Two | Three |
| 2 | Two | Two | One |
| 3 | Three | Two | Two |
| 4 | Two | One | One |
Table2:
| Tab2_ID | Field_1 | Field_2 | Field_3 |
|---------|---------|---------|---------|
| 1 | One | Two | Three |
| 2 | Two | Five | One |
| 3 | Three | Two | Two |
| 4 | Two | One | Six |
Link Table:
| LinkTab_ID | Tab1 | Tab2 | Tab1Fld | Tab2Fld |
|------------|--------|--------|---------|--------------|
| 100 | Table1 | Table2 | Field1 | Field_1 |
| 105 | Table1 | Table2 | Field2 | Field_2 |
| 110 | Table1 | Table2 | Field3 | Field_3 |
| 124 | Table1 | Table4 | Field1 | Fieldname_01 |
| 166 | Table3 | Table5 | F3 | FN_3 |
Is it possible to use the Link Table to somehow specify the field names to compare between the two tables?
typically I'd do something like
SELECT
*
FROM
Table1 INNER JOIN Table2 ON Tab1_ID = Tab2_ID
WHERE
Table1.Field1 != Table2.Field_1
OR Table1.Field2 != Table2.Field_2
However I have many tables and many fields and the fieldnames change (i.e. new fields). My one constant is that the two are mapped in the link table.
The tables are one-to-one and fields are one-to-one.
This is only an approach you will need to expand it to suit. In particular you need a method for handling the join predicate(s).
also see this SQL Fiddle
CREATE TABLE LinkTable
([LinkTab_ID] int, [Tab1] varchar(6), [Tab2] varchar(6), [Tab1Fld] varchar(6), [Tab2Fld] varchar(12))
;
INSERT INTO LinkTable
([LinkTab_ID], [Tab1], [Tab2], [Tab1Fld], [Tab2Fld])
VALUES
(100, 'Table1', 'Table2', 'Field1', 'Field_1'),
(105, 'Table1', 'Table2', 'Field2', 'Field_2'),
(110, 'Table1', 'Table2', 'Field3', 'Field_3'),
(124, 'Table1', 'Table4', 'Field1', 'Fieldname_01'),
(166, 'Table3', 'Table5', 'F3', 'FN_3')
;
CREATE TABLE Table1
([Tab1_ID] int, [Field1] varchar(5), [Field2] varchar(3), [Field3] varchar(5))
;
INSERT INTO Table1
([Tab1_ID], [Field1], [Field2], [Field3])
VALUES
(1, 'One', 'Two', 'Three'),
(2, 'Two', 'Two', 'One'),
(3, 'Three', 'Two', 'Two'),
(4, 'Two', 'One', 'One')
;
CREATE TABLE Table2
([Tab2_ID] int, [Field_1] varchar(9), [Field_2] varchar(9), [Field_3] varchar(9))
;
INSERT INTO Table2
([Tab2_ID], [Field_1], [Field_2], [Field_3])
VALUES
('1', 'One', 'Two', 'Three'),
('2', 'Two', 'Five', 'One'),
('3', 'Three', 'Two', 'Two'),
('4', 'Two', 'One', 'Six')
;
Query 1:
DECLARE #t1 AS NVARCHAR(30) = 'Table1'
DECLARE #t2 AS NVARCHAR(30) = 'Table2'
DECLARE #filter AS NVARCHAR(MAX)
DECLARE #query AS NVARCHAR(MAX)
SET #filter = STUFF((SELECT ' OR ' + concat(Tab1, '.', Tab1Fld, ' <> ', Tab2, '.', Tab2Fld)
FROM LinkTable
WHERE Tab1 = #t1 AND Tab2 = #t2
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,4,'')
SET #query = 'SELECT * FROM '
+ #t1
+ ' INNER JOIN '
+ #t2
+ ' ON '
+ #t1
+ '.Tab1_ID = '
+ #t2
+ '.Tab2_ID'
+ ' WHERE '+ #filter
select #query
--execute(#query)
Results:
| SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Tab1_ID = Table2.Tab2_ID WHERE Table1.Field1 <> Table2.Field_1 OR Table1.Field2 <> Table2.Field_2 OR Table1.Field3 <> Table2.Field_3 |

Need help pivoting some data

I'm hoping someone can help me. I'm trying to pivot some data on SQL Server 2005 and can't quite get the results I'm looking for.
This is my current table schema:
| ProductCode | AttributeName | AttributeValue |
| 1 | AttributeA | 10 |
| 1 | AttributeB | 20 |
| 2 | AttributeA | 30 |
| 2 | AttributeB | 40 |
| 3 | AttributeA | 50 |
This is the results I'm trying to achieve:
| ProductCode | AttributeA | AttributeB |
| 1 | 10 | 20 |
| 2 | 30 | 40 |
| 3 | 50 | NULL |
I know that I can achieve this result with the following SQL:
SELECT DISTINCT ProductCode,
(SELECT AttributeValue
FROM attributes
WHERE ProductName = 'AttributeA' AND ProductCode=a.ProductCode) AttributeA,
(SELECT AttributeValue
FROM attributes
WHERE ProductName = 'AttributeB' AND ProductCode=a.ProductCode) AttributeB,
FROM attributes a
Although that SQL does produce the result I'm looking for, it's obviously not dynamic (in reality, I not only have more Attribute Types, but different products have different sets of attributes) and it also scans the table 3 times. It's also a maintenance nightmare.
I tried using the PIVOT functionality of SQL Server, but with no luck.
Can anyone help?
create table #attributes (ProductCode int,
AttributeName varchar(20),
AttributeValue int)
insert into #attributes values (1, 'AttributeA', 10)
insert into #attributes values (1, 'AttributeB', 20)
insert into #attributes values (2, 'AttributeA', 30)
insert into #attributes values (2, 'AttributeB', 40)
insert into #attributes values (3, 'AttributeA', 50)
declare #attributes_columns nvarchar(max)
set #attributes_columns
= (
select ', [' + AttributeName + ']'
from
(
select distinct AttributeName as AttributeName
from #attributes
) t
order by t.AttributeName
for xml path('')
)
set #attributes_columns = stuff(#attributes_columns,1,2,'')
declare #sql nvarchar(max)
set #sql = N'
select ProductCode, <attributes_columns>
from
(select ProductCode, AttributeName, AttributeValue
from #attributes )p
pivot
(
sum(AttributeValue) for AttributeName in (<attributes_columns>)
) as pvt
'
set #sql = replace(#sql, '<attributes_columns>', #attributes_columns)
print #sql
exec sp_executesql #sql
drop table #attributes

Resources