I have some genomic data that is in SQL Server 2016 Express and it is currently shaped in a long format with a reference genome and test genomes split by a SubjectID, genes, and codons (e.g. a 3-tuple).
What I really need is to reshape my data into one which the tuples are concatenated together but ONLY when there is a mutation (as compared to the reference genome) in the tuple. This will be a more usable format for everyone.
My data looks like this
DECLARE #myTable TABLE
(
SubjectID VARCHAR(MAX),
country VARCHAR(MAX),
gene VARCHAR(MAX),
position INT,
ReferenceNucleotide VARCHAR(1),
TestNucleotide VARCHAR(1),
codon INT,
nucleotide_order INT
)
INSERT INTO #myTable
VALUES
('1-0002','India','gyrA', 65,'A','x', 92,1),
('1-0002','India','gyrA', 66,'T','x', 92,2),
('1-0002','India','gyrA', 67,'C','C', 92,3),
('1-0002','India','gyrA', 68,'T','T', 93,1),
('1-0002','India','gyrA', 69,'A','A', 93,2),
('1-0002','India','gyrA', 70,'C','C', 93,3),
('1-0002','India','gyrA', 71,'G','G', 94,1),
('1-0002','India','gyrA', 72,'A','A', 94,2),
('1-0002','India','gyrA', 73,'C','C', 94,3),
('1-0002','India','gyrA', 74,'A','A', 95,1),
('1-0002','India','gyrA', 75,'G','C', 95,2),
('1-0002','India','gyrA', 76,'C','C', 95,3),
('1-0002','India','gyrA', 77,'C','C', 96,1),
('1-0002','India','gyrA', 78,'T','T', 96,2),
('1-0002','India','gyrA', 79,'G','N', 96,3)
However, there are a couple of conditions
If all three nucleotides are the same for the reference and test genome I want a 'WT'
If there is any difference in nucleotides, I want the 3-tuple from the test genome (in the nucleotide order)
I need to group by SubjectID and gene because I have lots of Subjects & genes
And my result would look like
1-0002 India gyrA 92 xxC
1-0002 India gyrA 93 WT
1-0002 India gyrA 94 WT
1-0002 India gyrA 95 ACC
1-0002 India gyrA 96 CTN
I can identify where the codons are that need to have the 3-tuple but am struggling with how to concatentate them
DECLARE #myCodons TABLE (SubjectID varchar(max), country varchar(max), gene varchar(max), codon int, WT int)
INSERT INTO #myCodons
SELECT
SubjectID, country, gene, codon,
SUM(CASE WHEN RefNucleotide=TestNucleotide THEN 0 ELSE 1 END) AS WT
FROM
#myTable
GROUP BY
SubjectID, country, gene, codon
SELECT *
FROM #myCodons
ORDER BY codon
Start with something like:
select T1.SubjectID, T1.country, T1.gene, T1.codon,
T1.RefGenome + T2.RefGenome + T3.RefGenome RefGenome,
T1.TestGenome + T2.TestGenome + T3.TestGenome TestGenome
from #myTable T1
inner join #myTable T2 on T1.SubjectID = T2.SubjectID and T1.country = T2.country
and T1.gene = T2.gene and T1.codon = T2.codon and T2.nucleotide_order = 2
inner join #myTable T3 on T1.SubjectID = T3.SubjectID and T1.country = T3.country
and T1.gene = T3.gene and T1.codon = T3.codon and T3.nucleotide_order = 3
where T1.nucleotide_order = 1
You can then build on this with case statements to work out whether to display the test genome or 'WT'
An alternative approach which might be of use:
select SubjectID, country, gene, codon, case when RefGenomeStr = TestGenomeStr then 'WT' else TestGenomeStr end wanted_string
from #myTable t1
cross apply(
SELECT
STUFF((
SELECT
', ' +RefGenome
FROM #myTable t2
WHERE t2.SubjectID= t1.SubjectID and t2.country = t1.country and t2.gene = t1.gene and t2.codon = t1.codon
FOR XML PATH ('')
)
, 1, 1, '')
, STUFF((
SELECT
', ' +TestGenome
FROM #myTable t2
WHERE t2.SubjectID= t1.SubjectID and t2.country = t1.country and t2.gene = t1.gene and t2.codon = t1.codon
FOR XML PATH ('')
)
, 1, 1, '')
) ca (RefGenomeStr,TestGenomeStr)
where nucleotide_order = 1
Result:
+----+-----------+---------+------+-------+---------------+
| | SubjectID | country | gene | codon | wanted_string |
+----+-----------+---------+------+-------+---------------+
| 1 | 1-0002 | India | gyrA | 92 | x, x, C |
| 2 | 1-0002 | India | gyrA | 93 | WT |
| 3 | 1-0002 | India | gyrA | 94 | WT |
| 4 | 1-0002 | India | gyrA | 95 | A, C, C |
| 5 | 1-0002 | India | gyrA | 96 | C, T, N |
+----+-----------+---------+------+-------+---------------+
Demo
Related
I have text, that include for example #(sharp) character. String contains parameters. Parameters begin with # and end with #.
declare #TEXT varchar(200) = 'Dear #NAMEOFGUEST# , we glad to see youSOMEHOTEL tomorrow.'
declare #scanChar char(1)='#'
select
SUBSTRING(#TEXT, CHARINDEX(#scanChar, #TEXT) + 1, (((LEN(#TEXT)) - CHARINDEX(#scanChar, REVERSE(#TEXT))) - CHARINDEX(#scanChar, #TEXT)))
Return:
NAMEOFGUEST
It's the correct result.
When string contains only one parameter #NAMEOFGUEST# it works. If we add SOMEHOTEL to into the #, as #SOMEHOTEL# result is not as we want.
declare #TEXT varchar(200) = 'Dear #NAMEOFGUEST# , we glad to see you #SOMEHOTEL# tomorrow.'
declare #scanChar char(1)='#'
Returns:
NAMEOFGUEST# , we glad to see you #SOMEHOTEL
I want the same result as in previous, like NAMEOFGUEST only.
Using CHARINDEX(#FindString, #PrintData, CHARINDEX(#FindString, #PrintData) + 1) you can find the second occurences of the #, then based on that the remaining calculation can be done.
The following query will work.
DECLARE #PrintData AS VARCHAR (200) = 'Dear #NAMEOFGUEST# , we glad to see you #SOMEHOTEL# tomorrow.';
DECLARE #FindString AS CHAR (1) = '#';
DECLARE #LenFindString AS INT = LEN(#FindString);
SELECT SUBSTRING(#PrintData,
CHARINDEX(#FindString, #PrintData) + #LenFindString,
CHARINDEX(#FindString, #PrintData, CHARINDEX(#FindString, #PrintData) + 1) - (CHARINDEX(#FindString, #PrintData) + #LenFindString)
);
Demo on db<>fiddle
You can use a recursive approach like this:
A mockup-table to simulate a set oriented scenario
declare #tbl TABLE(ID INT IDENTITY, SomeComment VARCHAR(100),SomeString varchar(200));
INSERT INTO #tbl VALUES('3 Terms','Dear #NAMEOFGUEST# , we glad to see you #SOMEHOTEL# tomorrow. And even #AThirdOne# is here.')
,('1 Term','Dear #NAMEOFGUEST# , we glad to see you soon.')
,('No Term','Dear Guest, nice to see you.')
,('invalid 1','Dear Guest, nice to #see you.')
,('invalid ?','Dear #Guest, nice# to see you.');
declare #scanChar char(1)='#';
--the query
WITH recCTE AS
(
SELECT t.ID
,t.SomeComment
,t.SomeString AS TextToWork
,1 AS TermIndex
,D.*
FROM #tbl t
OUTER APPLY(SELECT CHARINDEX(#scanChar,t.SomeString)) A(StartingAt)
OUTER APPLY(SELECT CHARINDEX(#scanChar,t.SomeString,A.StartingAt+1)) B(EndingAt)
OUTER APPLY(SELECT CASE WHEN A.StartingAt>0 AND B.EndingAt >0 THEN SUBSTRING(t.SomeString,A.StartingAt+1,B.EndingAt- A.StartingAt-1) END) C(TermCandidate)
OUTER APPLY(SELECT A.StartingAt,B.EndingAt,C.TermCandidate,SUBSTRING(t.SomeString,B.EndingAt+1,1000) AS RestString) D
UNION ALL
SELECT t.ID
,t.SomeComment
,t.RestString
,t.TermIndex+1
,D.*
FROM recCTE t
OUTER APPLY(SELECT CHARINDEX(#scanChar,t.RestString)) A(StartingAt)
OUTER APPLY(SELECT CHARINDEX(#scanChar,t.RestString,A.StartingAt+1)) B(EndingAt)
OUTER APPLY(SELECT CASE WHEN A.StartingAt>0 AND B.EndingAt >0 THEN SUBSTRING(t.RestString,A.StartingAt+1,B.EndingAt- A.StartingAt-1) END) C(TermCandidate)
OUTER APPLY(SELECT A.StartingAt,B.EndingAt,C.TermCandidate,SUBSTRING(t.RestString,B.EndingAt+1,1000) AS RestString) D
WHERE (LEN(t.RestString) - LEN(REPLACE(t.RestString,#scanChar,'')))%2=0 AND CHARINDEX(#scanChar,t.RestString)>0
)
SELECT ID
,SomeComment
,TermIndex
--this will exclude "Guest, nice" due to the blank
,CASE WHEN CHARINDEX(' ',TermCandidate)>0 THEN NULL ELSE TermCandidate END AS Term
FROM recCTE
ORDER BY ID,TermIndex;
The result
+----+-------------+-----------+-------------+
| ID | SomeComment | TermIndex | Term |
+----+-------------+-----------+-------------+
| 1 | 3 Terms | 1 | NAMEOFGUEST |
+----+-------------+-----------+-------------+
| 1 | 3 Terms | 2 | SOMEHOTEL |
+----+-------------+-----------+-------------+
| 1 | 3 Terms | 3 | AThirdOne |
+----+-------------+-----------+-------------+
| 2 | 1 Term | 1 | NAMEOFGUEST |
+----+-------------+-----------+-------------+
| 3 | No Term | 1 | NULL |
+----+-------------+-----------+-------------+
| 4 | invalid 1 | 1 | NULL |
+----+-------------+-----------+-------------+
| 5 | invalid ? | 1 | NULL |
+----+-------------+-----------+-------------+
Data Set 1:
Cust_Ref | ACC1 | ACC2 | ACC3
------------+-----------+-----------+---------
1000001 | ALPHA | BRAVO | CHARLIE
1000002 | ALPHA | BRAVO | CHARLIE
1000003 | ALPHA | BRAVO | CHARLIE
1000004 | DELTA | ECHO |
1000005 | DELTA | ECHO |
1000006 | FOXTROT |
1000007 | FOXTROT |
Data Set 2:
Cust_Ref | ACC
------------+--------
1000001 | ALPHA
1000001 | BRAVO
1000001 | DELTA
1000004 | DELTA
1000004 | ECHO
1000006 | FOXTROT
Data Set 1 shows the customer references and the accounts they should have. So for example 1000001 must have the accounts - ALPHA, BRAVO, CHARLIE. Customer 1000002 has DELTA and ECHO, etc.
Data Set 2 shows what accounts are actually associated with a customer reference.
Is there where I can return instances of missing accounts with T-SQL?
Example:
In the dataset I have provided customer 1000001 should have ALPHA, BRAVO, CHARLIE but Data Set 2 shows that the customer does not have CHARLIE.
Considering this DDL and sample data:
DECLARE #Table1 TABLE (
Cust_Ref VARCHAR(10) PRIMARY KEY,
ACC1 VARCHAR(10) NULL,
ACC2 VARCHAR(10) NULL,
ACC3 VARCHAR(10) NULL
)
INSERT INTO #Table1 VALUES
('1000001','ALPHA','BRAVO','CHARLIE'),
('1000002','ALPHA','BRAVO','CHARLIE'),
('1000003','ALPHA','BRAVO','CHARLIE'),
('1000004','DELTA','ECHO',NULL),
('1000005','DELTA','ECHO',NULL),
('1000006','FOXTROT','FOXTROT',NULL),
('1000007','FOXTROT','FOXTROT',NULL)
DECLARE #Table2 TABLE (
Cust_Ref VARCHAR(10) NOT NULL,
ACC VARCHAR(10) NOT NULL
)
INSERT INTO #Table2 VALUES
('1000001','ALPHA'),
('1000001','BRAVO'),
('1000001','DELTA'),
('1000004','DELTA'),
('1000004','ECHO'),
('1000006','FOXTROT')
You could use UNPIVOT and EXCEPT, this way:
SELECT Cust_Ref, ACC
FROM #Table1 UNPIVOT (ACC FOR COL IN (ACC1, ACC2, ACC3)) U
EXCEPT
SELECT Cust_Ref, ACC
FROM #Table2
(Select cust_ref, acc1 account
from dataSet1
union
Select cust_ref, acc2 account
from dataSet1
union
Select cust_ref, acc3 account
from dataSet1)z
Where Not exists (Select * from dataSet2
where cust_ref = z.CustRef
and acc = z.account)
WITH DataSet1 AS ( -- Define DataSet1
SELECT *
FROM (VALUES
(1000001, 'ALPHA', 'BRAVO', 'CHARLIE')
, (1000002, 'ALPHA', 'BRAVO', 'CHARLIE')
, (1000003, 'ALPHA', 'BRAVO', 'CHARLIE')
, (1000004, 'DELTA', 'ECHO', NULL)
, (1000005, 'DELTA', 'ECHO', NULL)
, (1000006, 'FOXTROT', NULL, NULL)
, (1000007, 'FOXTROT', NULL, NULL)
) A (Cust_Ref, ACC1, ACC2, ACC3)
), DataSet2 AS ( -- Define DataSet2
SELECT *
FROM (VALUES
(1000001, 'ALPHA')
, (1000001, 'BRAVO')
, (1000004, 'DELTA')
, (1000004, 'ECHO')
, (1000006, 'FOXTROT')
) A (Cust_Ref, ACC)
)
SELECT A.Cust_Ref
, B.ACC
FROM DataSet1 A
CROSS APPLY(VALUES -- Pivot ACC1, ACC2, and ACC3 into ACC
(A.ACC1)
, (A.ACC2)
, (A.ACC3)
) B (ACC)
WHERE B.ACC IS NOT NULL -- Remove NULL ACCs
AND NOT EXISTS ( -- Remove ACCs that exist in DataSet2
SELECT 1
FROM DataSet2
WHERE A.Cust_Ref = Cust_Ref
AND B.ACC = ACC
);
I know there are several topics on this, but none of them was suitable for me, that's why I took the chance to ask you again.
I have a table which has columns UserID, FirstName, Lastname.
I need to insert 300 000 records for each column and they have to be unique, for example:
UserID0001, John00001, Doe00001
UserID0002, John00002, Doe00002
UserID0003, John00003, Doe00003
I hope there is an easy way :)
Thank you in advance.
Best,
Lyubo
;with sequence as (
select N = row_number() over (order by ##spid)
from sys.all_columns c1, sys.all_columns c2
)
insert into [Table] (UserID, FirstName, Lastname)
select
'UserID' + right('000000' + cast(N as varchar(10)), 6),
'John' + right('000000' + cast(N as varchar(10)), 6),
'Doe' + right('000000' + cast(N as varchar(10)), 6)
from sequence where N <= 300000
You could use the ROW_NUMBER function to generate different numbers like this:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.users(
Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
user_id VARCHAR(20),
first_name VARCHAR(20),
last_name VARCHAR(20)
);
GO
DECLARE #NoOfRows INT = 7;
INSERT INTO dbo.users(user_id, first_name, last_name)
SELECT 'User_'+n, 'John_'+n, 'Doe_'+n
FROM(
SELECT REPLACE(STR(ROW_NUMBER()OVER(ORDER BY (SELECT NULL))),' ','0') n FROM(
select TOP(#NoOfRows) 1 x from sys.objects A,sys.objects B,sys.objects C,sys.objects D,sys.objects E,sys.objects F,sys.objects G
)X
)N
Query 1:
SELECT * FROM dbo.users
Results:
| ID | USER_ID | FIRST_NAME | LAST_NAME |
-----------------------------------------------------------
| 1 | User_0000000001 | John_0000000001 | Doe_0000000001 |
| 2 | User_0000000002 | John_0000000002 | Doe_0000000002 |
| 3 | User_0000000003 | John_0000000003 | Doe_0000000003 |
| 4 | User_0000000004 | John_0000000004 | Doe_0000000004 |
| 5 | User_0000000005 | John_0000000005 | Doe_0000000005 |
| 6 | User_0000000006 | John_0000000006 | Doe_0000000006 |
| 7 | User_0000000007 | John_0000000007 | Doe_0000000007 |
Just change the #NoOfRows to 300000 to get the number of rows you are looking for.
I've adapted a script found in this article:
DECLARE #RowCount INT
DECLARE #RowString VARCHAR(14)
DECLARE #First VARCHAR(14)
DECLARE #LAST VARCHAR(14)
DECLARE #ID VARCHAR(14)
SET #ID = 'UserID'
SET #First = 'John'
SET #Last = 'Doe'
SET #RowCount = 1
WHILE #RowCount < 300001
BEGIN
SET #RowString = CAST(#RowCount AS VARCHAR(10))
SET #RowString = REPLICATE('0', 6 - DATALENGTH(#RowString)) + #RowString
INSERT INTO TestTableSize (
UserID
,FirstName
,LastName
)
VALUES
(#ID + #RowString
, #First + #RowString
, #Last + #RowString)
SET #RowCount = #RowCount + 1
END
I have two tables, tempUsers and tempItems. These two tables have a one to many relationship.
When I use an inner join on these two tables the result looks like this:
**user | Category**
Jack | Shoes
Jack | Tie
Jack | Glass
Peggy | Shoe
Peggy | Skirt
Peggy | Bat
Peggy | Cat
Bruce | Laptop
Bruce | Beer
Chuck | Cell Phone
I would instead like a result that looks like this:
**User | Category1 | Category2 | Category3 | Category4**
Jack | Shoes | Tie | Glass | .....
Peggy | Shoe | Skirt | Bat | Cat
Bruce | Laptop | Beer |..... |......
Chuck | Cell Phone | ..... |....... |
The number of distinct categories in the category is dynamic - there can be any number of them for a given item.
How can I produce this result?
There are a few ways that you can transform the data from rows into columns.
Since you are using SQL Server 2008, then you can use the PIVOT function.
I would suggest using the row_number() function to assist in pivoting the data. If you have a known number of values, then you could hard-code the query:
select user, category1, category2, category3, category4
from
(
select [user], category,
'Category'+cast(row_number() over(partition by [user]
order by [user]) as varchar(3)) rn
from yt
) d
pivot
(
max(category)
for rn in (category1, category2, category3, category4)
) piv;
See SQL Fiddle with Demo.
For your situation you stated that you will have an unknown number of values that need to be columns. In that case, you will want to use dynamic SQL to generate the query string to execute:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME('Category'+cast(row_number() over(partition by [user]
order by [user]) as varchar(3)))
from yt
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT [user],' + #cols + '
from
(
select [user], category,
''Category''+cast(row_number() over(partition by [user]
order by [user]) as varchar(3)) rn
from yt
) d
pivot
(
max(category)
for rn in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo. Both give a result:
| USER | CATEGORY1 | CATEGORY2 | CATEGORY3 | CATEGORY4 |
----------------------------------------------------------
| Bruce | Laptop | Beer | (null) | (null) |
| Chuck | Cell Phone | (null) | (null) | (null) |
| Jack | Shoes | Tie | Glass | (null) |
| Peggy | Shoe | Skirt | Bat | Cat |
Sql Server does allow you to pivot data. However, like other relational database, it still requires that you know at the outset of a query how many columns (and of what type) the results will be, even with the PIVOT. The best you can hope for here is to use queries, combined with dynamic sql (building the query string in code at runtime), to first find out who has the most categories, and then build a query that PIVOTs your data to look for that many items.
The normal solution to pivoting with an unknown number of columns is do the pivot client side, from the code that calls into the server.
Here is the solution using multiple tables. This solution is entirely based on bluefeet's solution. I have just added user id.
create table #tmpUsers
(user_id int, user_name varchar(255));
insert into #tmpUsers values (1,'Jack');
insert into #tmpUsers values (2,'Peggy');
insert into #tmpUsers values (3,'Bruce');
insert into #tmpUsers values (4,'Chuck');
create table #tmpItems
(user_id int, category varchar(255));
insert into #tmpItems values(1,'Shoes');
insert into #tmpItems values(1,'Tie');
insert into #tmpItems values(1,'Glass');
insert into #tmpItems values(2,'Shoe');
insert into #tmpItems values(2,'Skirt');
insert into #tmpItems values(2,'Bat');
insert into #tmpItems values(2,'Cat');
insert into #tmpItems values(3,'Laptop');
insert into #tmpItems values(3,'Beer');
insert into #tmpItems values(4,'Cell Phone');
select TU.user_name,TI.category from #tmpUsers TU inner join #tmpItems TI on TU.user_id=TI.user_id
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME('Category'+cast(row_number() over(partition by TU.[user_id]
order by TU.[user_id]) as varchar(3)))
from #tmpUsers TU inner join #tmpItems TI on TU.user_id=TI.user_id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT [user_name],' + #cols + '
from
(
select TU.[user_name], TI.category,
''Category''+cast(row_number() over(partition by TU.[user_id]
order by TU.[user_id] ) as varchar(3)) rn
from #tmpUsers TU inner join #tmpItems TI on TU.user_id=TI.user_id
) d
pivot
(
max(category)
for rn in (' + #cols + ')
) p '
execute(#query)
drop table #tmpUsers
drop table #tmpItems
Applies to Microsoft SQL Server 2008 R2.
The problem is
If we have a few dozen Outer Apply (30) then they begin to work pretty slowly. In the middle of the Outer Apply I have something more complicated than a simple select, a view.
Details
I'm writing a sort of attributes assigned to tables (in the database). Generally, a few tables, holds a reference to a table of attributes (key, value).
Pseudo structure looks like this:
DECLARE #Lot TABLE (
LotId INT PRIMARY KEY IDENTITY,
SomeText VARCHAR(8))
INSERT INTO #Lot
OUTPUT INSERTED.*
VALUES ('Hello'), ('World')
DECLARE #Attribute TABLE(
AttributeId INT PRIMARY KEY IDENTITY,
LotId INT,
Val VARCHAR(8),
Kind VARCHAR(8))
INSERT INTO #Attribute
OUTPUT INSERTED.* VALUES
(1, 'Foo1', 'Kind1'), (1, 'Foo2', 'Kind2'),
(2, 'Bar1', 'Kind1'), (2, 'Bar2', 'Kind2'), (2, 'Bar3', 'Kind3')
LotId SomeText
----------- --------
1 Hello
2 World
AttributeId LotId Val Kind
----------- ----------- -------- --------
1 1 Foo1 Kind1
2 1 Foo2 Kind2
3 2 Bar1 Kind1
4 2 Bar2 Kind2
5 2 Bar3 Kind3
I can now run a query such as:
SELECT
[l].[LotId]
, [SomeText]
, [Oa1].[AttributeId]
, [Oa1].[LotId]
, 'Kind1Val' = [Oa1].[Val]
, [Oa1].[Kind]
, [Oa2].[AttributeId]
, [Oa2].[LotId]
, 'Kind2Val' = [Oa2].[Val]
, [Oa2].[Kind]
, [Oa3].[AttributeId]
, [Oa3].[LotId]
, 'Kind3Val' = [Oa3].[Val]
, [Oa3].[Kind]
FROM #Lot AS l
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind1') AS Oa1
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind2') AS Oa2
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind3') AS Oa3
LotId SomeText AttributeId LotId Kind1Val Kind AttributeId LotId Kind2Val Kind AttributeId LotId Kind3Val Kind
----------- -------- ----------- ----------- -------- -------- ----------- ----------- -------- -------- ----------- ----------- -------- --------
1 Hello 1 1 Foo1 Kind1 2 1 Foo2 Kind2 NULL NULL NULL NULL
2 World 3 2 Bar1 Kind1 4 2 Bar2 Kind2 5 2 Bar3 Kind3
The simple way to get the pivot table of attribute values and results for Lot rows that do not have attribute such a Kind3.
I know Microsoft PIVOT and it is not simple and do not fits here.
Finally, what will be faster and will give the same results?
In order to get the result you can unpivot and then pivot the data.
There are two ways that you can perform this. First, you can use the UNPIVOT and the PIVOT function:
select *
from
(
select LotId,
SomeText,
col+'_'+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from #Lot l
left join #Attribute a
on l.LotId = a.LotId
) src
unpivot
(
value
for col in (attributeid, a_Lotid, val, kind)
) unpiv
) d
pivot
(
max(value)
for col in (attributeid_1, a_LotId_1, Val_1, Kind_1,
attributeid_2, a_LotId_2, Val_2, Kind_2,
attributeid_3, a_LotId_3, Val_3, Kind_3)
) piv
See SQL Fiddle with Demo.
Or starting in SQL Server 2008+, you can use CROSS APPLY with a VALUES clause to unpivot the data:
select *
from
(
select LotId,
SomeText,
col+'_'+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from #Lot l
left join #Attribute a
on l.LotId = a.LotId
) src
cross apply
(
values ('attributeid', attributeid),('LotId', a_LotId), ('Value', Val), ('Kind', Kind)
) c (col, value)
) d
pivot
(
max(value)
for col in (attributeid_1, LotId_1, Value_1, Kind_1,
attributeid_2, LotId_2, Value_2, Kind_2,
attributeid_3, LotId_3, Value_3, Kind_3)
) piv
See SQL Fiddle with Demo.
The unpivot process takes the multiple columns for each LotID and SomeText and converts it into rows giving the result:
| LOTID | SOMETEXT | COL | VALUE |
--------------------------------------------
| 1 | Hello | attributeid_1 | 1 |
| 1 | Hello | LotId_1 | 1 |
| 1 | Hello | Value_1 | Foo1 |
| 1 | Hello | Kind_1 | Kind1 |
| 1 | Hello | attributeid_2 | 2 |
I added a row_number() to the inner subquery to be used to create the new column names to pivot. Once the names are created the pivot can be applied to the new columns giving the final result
This could also be done using dynamic SQL:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col+'_'+rn)
from
(
select
cast(ROW_NUMBER() over(partition by l.lotid order by a.attributeid) as varchar(10)) rn
from Lot l
left join Attribute a
on l.LotId = a.LotId
) t
cross apply (values ('attributeid', 1),
('LotId', 2),
('Value', 3),
('Kind', 4)) c (col, so)
group by col, rn, so
order by rn, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT LotId,
SomeText,' + #cols + '
from
(
select LotId,
SomeText,
col+''_''+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from Lot l
left join Attribute a
on l.LotId = a.LotId
) src
cross apply
(
values (''attributeid'', attributeid),(''LotId'', a_LotId), (''Value'', Val), (''Kind'', Kind)
) c (col, value)
) x
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
All three versions will give the same result:
| LOTID | SOMETEXT | ATTRIBUTEID_1 | LOTID_1 | VALUE_1 | KIND_1 | ATTRIBUTEID_2 | LOTID_2 | VALUE_2 | KIND_2 | ATTRIBUTEID_3 | LOTID_3 | VALUE_3 | KIND_3 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Hello | 1 | 1 | Foo1 | Kind1 | 2 | 1 | Foo2 | Kind2 | (null) | (null) | (null) | (null) |
| 2 | World | 3 | 2 | Bar1 | Kind1 | 4 | 2 | Bar2 | Kind2 | 5 | 2 | Bar3 | Kind3 |