SQL Server : show rows as columns / Pivot [duplicate] - sql-server

This question already has answers here:
SQL Server Pivot Table with multiple column aggregates
(3 answers)
Closed 6 years ago.
I have a SQL Server 2008 Express table like this:
rno gid uid dat origamt disamt
-----------------------------------------------
1 AA a 12-05-2016 200 210
2 AA b 12-05-2016 300 305
3 AA c 12-05-2016 150 116
4 BB a 12-05-2016 120 125
5 BB c 12-05-2016 130 136
6 CC a 12-05-2016 112 115
7 CC b 12-05-2016 135 136
and so on for different dates
I want to show it like this:
sno dat gid a_orig a_dis b_orig b_dis c_orig c_dis .....
1 12-05-2016 AA 200 210 300 305 150 116
2 12-05-2016 BB 120 125 0 0 130 136
3 12-05-2016 CC 112 115 135 136 0 0
NOTE: the values of uid are not fixed, they may vary dynamically, so, a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL.
NOTE: around 300 rows are expected on each date due to the cartesian product of gid and uid. and I will search datewise by implementing the LIKE clause since datatype of dat column is varchar(50).
Note: I would prefer datatype of origamt and disamt to be varchar(50) instead of Decimal(18, 0) but it is not compulsion.
I have tried to use PIVOT by taking reference from several articles posted here on stackoverflow and other website but couldn't get the work done completely.
Here is what I tried and got almost fine results with fixed uid and only fetched origamt:
select *
from
(
select gid, uid, dat, origamt
from vamounts
) as src
pivot
(
sum(origamt)
for uid IN ( a, b )
) as piv;
Kindly help me with the least bulky possible solution for this problem. I will prefer least lines of code and least complexity.

Errr, no. You can't generate your desired table using SQL. This isn't a valid pivot table.
"the values of uid are not fixed, they may vary dynamically, so,
a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL."
Sorry, this is also not possible. You must specify the exact values to be placed as the column headers. Whenever you write a SELECT statement, you must specify the names of the columns (fields) which you'll be returning. There's no way around this.
However, below are the steps required to create a "valid" SQL Server pivot table from your data:
I've got to admit, when I recently had to write my first PIVOT in SQL Server, I also Googled like mad, but didn't understand how to write it.
However, I eventually worked out what you need to do, so here's the step-by-step guide that you won't find anywhere else..!
(Readers can easily adapt these instructions, to use with your own data !)
1. Create your sample data
If you expect readers to reply to your Question, you should at least give them the SQL to create your sample data, so they have something to work off.
So, here's how I would create the data shown in your question:
CREATE TABLE tblSomething
(
[gid] nvarchar(100),
[uid] nvarchar(100),
[dat] datetime,
[origamt] int,
[disamt] int
)
GO
INSERT INTO tblSomething VALUES ('AA', 'a', '2016-05-12', 200, 210)
INSERT INTO tblSomething VALUES ('AA', 'b', '2016-05-12', 300, 305)
INSERT INTO tblSomething VALUES ('AA', 'c', '2016-05-12', 150, 116)
INSERT INTO tblSomething VALUES ('BB', 'a', '2016-05-12', 120, 125)
INSERT INTO tblSomething VALUES ('BB', 'c', '2016-05-12', 130, 136)
INSERT INTO tblSomething VALUES ('CC', 'a', '2016-05-12', 112, 115)
INSERT INTO tblSomething VALUES ('CC', 'b', '2016-05-12', 135, 136)
GO
2. Write a SQL Query which returns exactly three columns
The first column will contain the values which will appear in your PIVOT table's left-hand column.
The second column will contain the list of values which will appear on the top row.
The values in the third column will be positioned within your PIVOT table, based on the row/column headers.
Okay, here's the SQL to do this:
SELECT [gid], [uid], [origamt]
FROM tblSomething
This is the key to using a PIVOT. Your database structure can be as horribly complicated as you like, but when using a PIVOT, you can only work with exactly three values. No more, no less.
So, here's what that SQL will return. Our aim is to create a PIVOT table containing (just) these values:
3. Find a list of distinct values for the header row
Notice how, in the pivot table I'm aiming to create, I have three columns (fields) called a, b and c. These are the three unique values in your [uid] column.
So, to get a comma-concatenated list of these unique values, I can use this SQL:
DECLARE #LongString nvarchar(4000)
SELECT #LongString = COALESCE(#LongString + ', ', '') + '[' + [uid] + ']'
FROM [tblSomething]
GROUP BY [uid]
SELECT #LongString AS 'Subquery'
When I run this against your data, here's what I get:
Now, cut'n'paste this value: we'll need to place it twice in our overall SQL SELECT command to create the pivot table.
4. Put it all together
This is the tricky bit.
You need to combine your SQL command from Step 2 and the result from Step 3, into a single SELECT command.
Here's what your SQL would look like:
SELECT [gid],
-- Here's the "Subquery" from part 3
[a], [b], [c]
FROM (
-- Here's the original SQL "SELECT" statement from part 2
SELECT [gid], [uid], [origamt]
FROM tblSomething
) tmp ([gid], [uid], [origamt])
pivot (
MAX([origamt]) for [uid] in (
-- Here's the "Subquery" from part 3 again
[a], [b], [c]
)
) p
... and here's a confusing image, which shows where the components come from, and the results of running this command.
As you can see, the key to this is that SELECT statement in Step 2, and putting your three chosen fields in the correct place in this command.
And, as I said earlier, the columns (fields) in your pivot table come from the values obtained in step 3:
[a], [b], [c]
You could, of course, use a subset of these values. Perhaps you just want to see the PIVOT values for [a], [b] and ignore [c].
Phew !
So, that's how to create a pivot table out of your data.
I will prefer least lines of code and least complexity.
Yeah, good luck on that one..!!!
5. Merging two pivot tables
If you really wanted to, you could merge the contents of two such PIVOT tables to get the exact results you're looking for.
This is easy enough SQL for Shobhit to write himself.

You need dynamic SQL for this stuff.
At first create table with your data:
CREATE TABLE #temp (
rno int,
gid nvarchar(10),
[uid] nvarchar(10),
dat date,
origamt int,
disamt int
)
INSERT INTO #temp VALUES
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2, 'AA', 'b', '12-05-2016', 300, 305),
(3, 'AA', 'c', '12-05-2016', 150, 116),
(4, 'BB', 'a', '12-05-2016', 120, 125),
(5, 'BB', 'c', '12-05-2016', 130, 136),
(6, 'CC', 'a', '12-05-2016', 112, 115),
(7, 'CC', 'b', '12-05-2016', 135, 136)
And then declare variables with columns:
DECLARE #columns nvarchar(max), #sql nvarchar(max), #columns1 nvarchar(max), #columnsN nvarchar(max)
--Here simple columns like [a],[b],[c] etc
SELECT #columns =STUFF((SELECT DISTINCT ','+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here with ISNULL operation ISNULL([a],0) as [a],ISNULL([b],0) as [b],ISNULL([c],0) as [c]
SELECT #columnsN = STUFF((SELECT DISTINCT ',ISNULL('+QUOTENAME([uid])+',0) as '+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here columns for final table orig.a as a_orig, dis.a as a_dis,orig.b as b_orig, dis.b as b_dis,orig.c as c_orig, dis.c as c_dis
SELECT #columns1 = STUFF((SELECT DISTINCT ',orig.'+[uid] + ' as ' +[uid]+ '_orig, dis.'+[uid] + ' as ' +[uid]+ '_dis' FROM #temp FOR XML PATH('')),1,1,'')
And main query:
SELECT #sql = '
SELECT orig.gid,
orig.dat,
'+#columns1+'
FROM (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, origamt
FROM #temp
) as p
PIVOT (
SUM(origamt) FOR [uid] in ('+#columns+')
) as pvt
) as orig
LEFT JOIN (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, disamt
FROM #temp
) as p
PIVOT (
SUM(disamt) FOR [uid] in ('+#columns+')
) as pvt
) as dis
ON dis.gid = orig.gid and dis.dat = orig.dat'
EXEC(#sql)
Output:
gid dat a_orig a_dis b_orig b_dis c_orig c_dis
AA 2016-12-05 200 210 300 305 150 116
BB 2016-12-05 120 125 0 0 130 136
CC 2016-12-05 112 115 135 136 0 0

A join might help
declare #t table (rno int, gid varchar(2), uid varchar(1), dat varchar(10), origamt int, disamt int)
insert into #t
values
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2 , 'AA', 'b', '12-05-2016', 300, 305),
(3 , 'AA', 'c', '12-05-2016', 150, 116),
(4 , 'BB', 'a', '12-05-2016', 120, 125),
(5 , 'BB', 'c', '12-05-2016', 130, 136),
(6 , 'CC', 'a', '12-05-2016', 112, 115),
(7 , 'CC', 'b', '12-05-2016', 135, 136)
select -- piv.*,piv2.*
piv.gid,piv.dat
,piv.a as a_org
,piv2.a as a_dis
,piv.b as b_org
,piv2.b as b_dis
,piv.c as c_org
,piv2.c as c_dis
from
(
select gid, uid, dat, origamt
from #t
) as src
pivot
(
sum(origamt)
for uid IN ([a],[b],[c] )
) as piv
join
(select piv2.*
from
(
select gid, uid, dat, disamt
from #t
) as src
pivot
(
sum(disamt)
for uid IN ([a],[b],[c] )
) as piv2
) piv2
on piv2.gid = piv.gid and piv2.dat = piv.dat
This is a POC you would have to use dynamic sql to deal with the variable number of uids. Let me know if you don't know how to use dynamic SQL and I'll work up an example for you.

Related

How would one concatenate one column from multiple rows and then insert that value into a single column of another row?

So let's say we have this table:
CAR_PARTS
ID CAR_ID PART_NAME
1 11 Steering Wheel
2 22 Steering Wheel
3 22 Headlights
And we also have this table:
CARS
ID CAR_MODEL PART_NAME_LIST
11 Mustang
22 Camaro
33 F-150
How do add data to the PART_NAME_LIST column like this:
CARS
ID CAR_MODEL PART_NAME_LIST
11 Mustang Steering Wheel
22 Camaro Steering Wheel, Headlights
33 F-150 (No parts were found)
Ok. Try this for versions of SQL Server prior to SQL Server 2017:
declare #CAR_PARTS table(ID int, CAR_ID int, PART_NAME nvarchar(128));
insert #CAR_PARTS values
(1, 11, 'Steering Wheel'),
(2, 22, 'Steering Wheel'),
(3, 22, 'Headlights');
declare #CARS table(
ID int,
CAR_MODEL nvarchar(30),
PART_NAME_LIST nvarchar(max) null);
insert #CARS(ID, CAR_MODEL) values
(11, 'Mustang'),
(22, 'Camaro'),
(33, 'F-150');
select
c.ID,
c.CAR_MODEL,
stuff(cast((
select
',' + cp.PART_NAME [text()]
from #CAR_PARTS cp
where cp.CAR_ID = c.ID
for xml path('')
)as nvarchar(max)), 1, 1, '') PART_NAME_LIST
from #CARS c;
In SQL Server 2017 it is possible to use the string_agg function:
select
c.ID,
c.CAR_MODEL,
string_agg(cp.PART_NAME,',') PART_NAME_LIST
from #CARS c left join #CAR_PARTS cp
on c.ID = cp.CAR_ID
group by c.ID, c.CAR_MODEL;
Check on this online here.
UPD: possible UPDATE statements were added to the demo.

In T-SQL is there a built-in command to determine if a number is in a range from another table

This is not a homework question.
I'm trying to take the count of t-shirts in an order and see which price range the shirts fall into, depending on how many have been ordered.
My initial thought (I am brand new at this) was to ask another table if count > 1st price range's maximum, and if so, keep looking until it's not.
printing_range_max printing_price_by_range
15 4
24 3
33 2
So for example here, if the order count is 30 shirts they would be $2 each.
When I'm looking into how to do that, it looks like most people are using BETWEEN or IF and hard-coding the ranges instead of looking in another table. I imagine in a business setting it's best to be able to leave the range in its own table so it can be changed more easily. Is there a good/built-in way to do this or should I just write it in with a BETWEEN command or IF statements?
EDIT:
SQL Server 2014
Let's say we have this table:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
You can create a table with ranges that represent the correct price. Below is how you would do this in pre-2012 and post-2012 systems:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
-- post-2012 using LAG
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT * FROM pricerange;
-- pre-2012 using ROW_NUMBER and a self-join
WITH prices AS
(
SELECT
rn = ROW_NUMBER() OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
),
pricerange As
(
SELECT
printing_range_min = ISNULL(p2.printing_range_max, 0),
printing_range_max = p1.printing_range_max,
p1.printing_price_by_range
FROM prices p1
LEFT JOIN prices p2 ON p1.rn = p2.rn+1
)
SELECT * FROM pricerange;
Both queries return:
printing_range_min printing_range_max printing_price_by_range
------------------ ------------------ -----------------------
0 15 4
15 24 3
24 33 2
Now that you have that you can use BETWEEN for your join. Here's the full solution:
-- Sample data
DECLARE #priceRanges TABLE
(
printing_range_max tinyint,
printing_price_by_range tinyint
-- if you're on 2014+
,INDEX ix_xxx NONCLUSTERED(printing_range_max, printing_price_by_range)
-- note: second column should be an INCLUDE but not supported in table variables
);
DECLARE #orders TABLE
(
orderid int identity,
ordercount int
-- if you're on 2014+
,INDEX ix_xxy NONCLUSTERED(orderid, ordercount)
-- note: second column should be an INCLUDE but not supported in table variables
);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
INSERT #orders(ordercount) VALUES (10), (20), (25), (30);
-- Solution:
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT
o.orderid,
o.ordercount,
--p.printing_range_min,
--p.printing_range_max
p.printing_price_by_range
FROM pricerange p
JOIN #orders o ON o.ordercount BETWEEN printing_range_min AND printing_range_max
Results:
orderid ordercount printing_price_by_range
----------- ----------- -----------------------
1 10 4
2 20 3
3 25 2
4 30 2
Now that we have that we can

apply query to each part of table individually

I need to run some query against each rowset in a table (Azure SQL):
ID CustomerID MsgTimestamp Msg
-------------------------------------------------
1 123 2017-01-01 10:00:00 Hello
2 123 2017-01-01 10:01:00 Hello again
3 123 2017-01-01 10:02:00 Can you help me with my order
4 123 2017-01-01 11:00:00 Are you still there
5 456 2017-01-01 10:07:00 Hey I'm a new customer
What I want to do is to extract "chat session" for every customer from message records, that is, if the gap between someone's two consecutive messages is less than 30 minutes, they belong to the same session. I need to record the start and end time of each session in a new table. In the example above, start and end time of the first session for customer 123 are 10:00 and 10:02.
I know I can always use cursor and temp table to achieve that goal, but I'm thinking about utilizing any pre-built mechanism to reach better performance. Please kindly give me some input.
You can use window functions instead of cursor. Something like this should work:
declare #t table (ID int, CustomerID int, MsgTimestamp datetime2(0), Msg nvarchar(100))
insert #t values
(1, 123, '2017-01-01 10:00:00', 'Hello'),
(2, 123, '2017-01-01 10:01:00', 'Hello again'),
(3, 123, '2017-01-01 10:02:00', 'Can you help me with my order'),
(4, 123, '2017-01-01 11:00:00', 'Are you still there'),
(5, 456, '2017-01-01 10:07:00', 'Hey I''m a new customer')
;with x as (
select *, case when datediff(minute, lag(msgtimestamp, 1, '19000101') over(partition by customerid order by msgtimestamp), msgtimestamp) > 30 then 1 else 0 end as g
from #t
),
y as (
select *, sum(g) over(order by msgtimestamp) as gg
from x
)
select customerid, min(msgtimestamp), max(msgtimestamp)
from y
group by customerid, gg

query to pull one row per id when multiple rows of each id exist in table

I have a table holding email addresses where some people have more than one email address listed. I want to query the table to only pull a single email address per Individual.
Columns are:
ID
IndividualID
Email
Example data:
1 34 dave#gmail.com
2 65 bob#gmail.com
3 34 david#gmail.com
What I want as the result set is (Only pull one row per IndividualID):
1 34 dave#gmail.com
2 65 bob#gmail.com
Use ROW_NUMBER()
DECLARE #sample TABLE (
ID int,
IndividualID int,
Email varchar(128)
)
INSERT INTO #sample
VALUES
(1, 34, 'dave#gmail.com'),
(2, 65, 'bob#gmail.com'),
(3, 34, 'david#gmail.com')
SELECT *
FROM (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY IndividualId ORDER BY ID)
FROM #sample
) AS data
WHERE RN = 1

T-SQL - Filling in the gaps in running balance

I am working on a Data Warehouse project and the client provides daily sales data. On-hand quantities are provided in most lines but are often missing. I need help on how to fill those missing values based on prior OH and sales information.
Here's a sample data:
Line# Store Item OnHand SalesUnits DateKey
-----------------------------------------------
1 001 A 100 20 1
2 001 A 80 10 2
3 001 A null 30 3 --[OH updated with 70 (80-10)]
4 001 A null 5 4 --[OH updated with 40 (70-30)]
5 001 A 150 10 5 --[OH untouched]
6 001 B null 4 1 --[OH untouched - new item]
7 001 B 80 12 2
8 001 B null 10 3 --[OH updated with 68 (80-12]
Lines 1 and 2 are not to be updated because OnHand quantities exist.
Lines 3 and 4 are to be updated based on their preceding rows.
Line 5 is to be left untouched because OnHand is provided.
Line 6 is to be left untouched because it is the first row for Item B
Is there a way I can do this in a set operation? I know I can do it easily using a fast_forward cursor but it will take a long time (15M+ rows).
Thanks for your help!
Test data:
declare #t table(
Line# int, Store char(3), Item char, OnHand int, SalesUnits int, DateKey int
)
insert #t values
(1, '001', 'A', 100, 20, 1),
(2, '001', 'A', 80 , 10, 2),
(3, '001', 'A', null, 30, 3),
(4, '001', 'A', null, 5, 4),
(5, '001', 'A', 150, 10, 5),
(6, '001', 'B', null, 4, 1),
(7, '001', 'B', null, 4, 2),
(8, '001', 'B', 80, 12, 3),
(9, '001', 'B', null, 10, 4)
Script to populate not using cursor:
;with a as
(
select Line#, Store, Item, OnHand, SalesUnits, DateKey, 1 correctdata from #t where DateKey = 1
union all
select t.Line#, t.Store, t.Item, coalesce(t.OnHand, a.onhand - a.salesunits), t.SalesUnits, t.DateKey, t.OnHand from #t t
join a on a.DateKey = t.datekey - 1 and a.item = t.item and a.store = t.store
)
update t
set OnHand = a.onhand
from #t t join a on a.line# = t.line#
where a.correctdata is null
Script to populate using cursor:
declare #datekey int, #store int, #item char, #Onhand int,
#calculatedonhand int, #salesunits int, #laststore int, #lastitem char
DECLARE sales_cursor
CURSOR FOR
SELECT datekey+1, store, item, OnHand -SalesUnits, salesunits
FROM #t sales
order by store, item, datekey
OPEN sales_cursor;
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #calculatedonhand = case when #laststore = #store and #lastitem = #item
then coalesce(#onhand, #calculatedonhand - #salesunits) else null end
,#laststore = #store, #lastitem = #item
UPDATE s
SET onhand=#calculatedonhand
FROM #t s
WHERE datekey = #datekey and #store = store and #item = item
and onhand is null and #calculatedonhand is not null
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
END
CLOSE sales_cursor;
DEALLOCATE sales_cursor;
I recommand you use the cursor version, I doubt you can get a decent performance using the recursive query. I know people in here hate cursors, but when your table has that size, it can be the only solution.

Resources