SQL server: convert rows to columns - sql-server

I have a table with columns sales(int), month(int). I want to retrieve sum of sales corresponding to every month. I need ouput in form of 12 columns corresponding to each month in which there will be a single record containing sales for for each column(month).

You should take a look at PIVOT for switching rows with columns. This prevents a select statement for each month. Something like this:
DECLARE #salesTable TABLE
(
[month] INT,
sales INT
)
-- Note that I use SQL Server 2008 INSERT syntax here for inserting
-- multiple rows in one statement!
INSERT INTO #salesTable
VALUES (0, 2) ,(0, 2) ,(1, 2) ,(1, 2) ,(2, 2)
,(3, 2) ,(3, 2) ,(4, 2) ,(4, 2) ,(5, 2)
,(6, 2) ,(6, 2) ,(7, 2) ,(8, 2) ,(8, 2)
,(9, 2) ,(10, 2) ,(10, 2) ,(11, 2) ,(11, 2)
SELECT [0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]
FROM
(
SELECT [month], sales
FROM #salesTable
) AS SourceTable
PIVOT
(
SUM(sales)
FOR [month] IN ([0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11])
) AS PivotTable

Not pretty... but this works well
SELECT
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 1) [Sales1],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 2) [Sales2],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 3) [Sales3],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 4) [Sales4],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 5) [Sales5],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 6) [Sales6],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 7) [Sales7],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 8) [Sales8],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 9) [Sales9],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 10) [Sales10],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 11) [Sales11],
(SELECT SUM(Sales) FROM SalesTable WHERE [Month] = 12) [Sales12]

Here's an alternate way to write the pivot that gives you a little more control (especially over the column names). It's also a little easier to generate dynamic SQL for.
It's similar to Robin's answer, but has the advantage of only hitting the table once:
select
Sales1 = sum( case when Month = 1 then Sales end )
, Sales2 = sum( case when Month = 2 then Sales end )
, Sales3 = sum( case when Month = 3 then Sales end )
-- etc..
from SalesTable;
I did some investigation, and it seems like the new pivot operator is just syntax sugar for this type of query. The query plans end up looking identical.
As an interesting aside, the unpivot operator seems to also just be syntax sugar. For example:
If you have a table like:
Create Table Sales ( JanSales int, FebSales int, MarchSales int...)
You can write:
select unpivoted.monthName, unpivoted.sales
from Sales s
outer apply (
select 'Jan', JanSales union all
select 'Feb', FebSales union all
select 'March', MarchSales
) unpivoted( monthName, sales );
And get the unpivoted data...

You can do it with OLAP. Here is another link to MSDN documentation on the topic.
With OLAP, you can create a cube with the information you have, with the layout you need.
If you do not want to go that way, you will have to create summary tables with .NET, Java, TransacSQL, or your preferred language to manipulate SQLServer data.

To easily transpose columns into rows with its names you should use XML. In my blog I was described this with example: Link

Related

CTE - LEFT OUTER JOIN Performance Problem

Using SQL Server 2017.
SQL FIDDLE: LINK
CREATE TABLE [TABLE_1]
(
PLAN_NR decimal(28,6) NULL,
START_DATE datetime NULL,
);
CREATE TABLE [TABLE_2]
(
PLAN_NR decimal(28,6) NULL,
PERIOD_NR decimal(28,6) NULL,
);
INSERT INTO TABLE_1 (PLAN_NR, START_DATE)
VALUES (1, '2020-05-01'), (2, '2020-08-05');
INSERT INTO TABLE_2 (PLAN_NR, PERIOD_NR)
VALUES (1, 1), (1, 2), (1, 5), (1, 6), (1, 5), (1, 6), (1, 17),
(2, 2), (2, 3), (2, 5), (2, 2), (2, 17), (2, 28);
CREATE VIEW ALL_PERIODS
AS
WITH rec_cte AS
(
SELECT
PLAN_NR, START_DATE,
1 period_nr, DATEADD(day, 7, START_DATE) next_date
FROM
TABLE_1
UNION ALL
SELECT
PLAN_NR, next_date,
period_nr + 1, DATEADD(day, 7, next_date)
FROM
rec_cte
WHERE
period_nr < 100
),
cte1 AS
(
SELECT
PLAN_NR, period_nr, START_DATE
FROM
rec_cte
UNION ALL
SELECT
PLAN_NR, period_nr, DATEADD(DAY, 1, EOMONTH(next_date, -1))
FROM
rec_cte
WHERE
MONTH(START_DATE) <> MONTH(next_date)
),
cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PLAN_NR ORDER BY START_DATE) rn
FROM cte1
)
SELECT PLAN_NR, rn PERIOD_NR, START_DATE
FROM cte2
WHERE rn <= 100
Table_1 lists plans (PLAN_NR) and their start date (START_DATE).
Table_2 lists plan numbers (PLAN_NR) and periods (1 - X). Per plan number periods can appear several times but can also be missing.
A period lasts seven days, unless the period includes a change of month. Then the period is divided into a part before the end of the month and a part after the end of the month.
The view ALL_PERIODS lists 100 periods per plan according to this system.
My problem is the performance of the following select which I would like to use in a view:
SELECT
t2.PLAN_NR
, t2.PERIOD_NR
, a_p.START_DATE
from TABLE_2 as t2
left outer join ALL_PERIODS a_p on t2.PERIOD_NR = a_p.PERIOD_NR and t2.PLAN_NR = a_p.PLAN_NR
From about 4000 entries in TABLE_2 the select becomes incredibly slow.
The join itself does not yet slow down the query. Only with the additional select a_p.START_DATE everything becomes incredibly slow.
I read the view into a temporary table and did the join over that and got no performance issues. (2 seconds for the 4000 entries).
So I assume that the CTE used in the view is the reason for the slow performance.
Unfortunately I can't use temporary tables in views and I would hate to write the data to a normal table.
Is there a way in SQL Server to improve the CTE lag?
Instead of a recusive CTE, generate ALL_PERIODS with a CROSS join between the Plan table and a "number table" either persisted, or as a non-recursive CTE.
EG
WITH N As
(
select top 100 row_number() over (order by (select null)) i
from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) v1(i),
(values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) v2(i)
),
plan_period AS
(
SELECT
PLAN_NR, START_DATE,
N.i period_nr, DATEADD(day, 7*N.i, START_DATE) next_date
FROM TABLE_1 CROSS JOIN N
),
if you are able to modify the view I would recommend to do this :
add a table containing numbers starting from 0 to whatever you think you will need in database, you can use below command :
create table numbers ( id int)
go
;with cte (
select 0 num
union all
select num + 1
where num < 2000 -- change this
)
insert into number
from num from cte
change the first cte in the view to this :
WITH rec_cte AS
(
SELECT
PLAN_NR
, DATEADD(DAY, 7* id, START_DATE) START_DATE
, id +1 period_nr
, DATEADD(DAY, 7*( id+1), START_DATE) next_date
FROM
TABLE_1 t
CROSS apply intenum i
WHERE i.id <100
),...
Also consider using temp table instead of cte it might be helpful

Count the number of times a date is contained between 2 date columns

I have a table that looks like this
ID start_dt end_dt
--------------------------
1 1951-12-05 1951-12-21
2 1951-12-19 1951-12-31
3 1957-12-05 1957-12-19
4 1995-12-06 1995-12-20
5 1996-06-24 1996-07-08
6 1997-05-12 1997-05-26
7 1997-10-07 1997-10-21
8 1997-12-25 1998-01-08
9 1998-01-19 1998-02-02
10 1998-08-05 1998-08-19
I'd like to know how many times each individual date is contained between start_dt and end_dt.
From my example, the result set should look something like this
date count
------------------
1951-12-05 1
1951-12-06 1
...
1951-12-19 2
1951-12-20 2
1951-12-21 2
...
1998-08-19 1
What would be the best way to do this?
EDIT: To clarify, I need each date that appears at least once in a date range (between start_dt and end_dt) to get a row in my result set and I want the number of ranges that this date fits in next to it
hope this helps
When you need to turn 2 values (a range) into a series of rows you can use a number table (see Aaron Bertrand's The SQL Server Numbers Table article if you aren't familiar with the idea).
I've used shorter and simpler data but you should get the idea.
declare #dates table (id int not null, start_dt date not null, end_dt date not null)
insert #dates values (1, '20160601', '20160603'),
(2, '20160603', '20160605'),
(3, '20160610', '20160612')
;with cte as (
select
row_number() over (order by so1.object_id) - 1 as n
from
sys.objects so1
cross join sys.objects so2
)
select
dateadd(d, c.n, d.start_dt) as [date],
count(*)
from
#dates d
join cte c on dateadd(d, c.n, d.start_dt) <= d.end_dt
group by
dateadd(d, c.n, d.start_dt)
order by
dateadd(d, c.n, d.start_dt)
If there are no more than a few days (< 80 or so, depending in your sys.objects table) between start_dt and end_dt, you can use this approach (inspired on Rhys').
DECLARE #dates TABLE (id int not null, start_dt date not null, end_dt date not null)
INSERT #dates VALUES
(1, '1951-12-05', '1951-12-21'),
(2, '1951-12-19', '1951-12-31'),
(3, '1957-12-05', '1957-12-19'),
(4, '1995-12-06', '1995-12-20'),
(5, '1996-06-24', '1996-07-08'),
(6, '1997-05-12', '1997-05-26'),
(7, '1997-10-07', '1997-10-21'),
(8, '1997-12-25', '1998-01-08'),
(9, '1998-01-19', '1998-02-02'),
(10, '1998-08-05', '1998-08-19');
WITH RawData AS (
SELECT
DATEADD(d, n.n, d.start_dt) AS [date]
FROM #dates d
INNER JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY object_id) - 1 AS n FROM sys.objects
) n ON DATEADD(d, n.n, d.start_dt) <= d.end_dt
)
SELECT [date], COUNT(*) [count]
FROM RawData
GROUP BY [date]
ORDER BY [date]
I don't think this could take long even with 1000 date ranges. Perhaps you are using a table with more fields and even missing some index?
You could use a CTE
WITH CTE AS(SELECT start_dt AS dates FROM Table
UNION ALL
SELECT end_dt AS dates FROM Table)
SELECT CAST(dates as DATE) as Date, COUNT(dates) AS Count
FROM CTE c
GROUP BY c.dates
order by Count desc
Or perhaps you need something broader if your columns are of DATETIME data type. This way will GROUP BY the whole day:
WITH CTE AS(SELECT CAST(start_dt AS DATE) AS dates FROM Table
UNION ALL
SELECT CAST(end_dt AS DATE) AS dates FROM Table)
SELECT Dates as Date, COUNT(Dates) AS Count
FROM CTE c
GROUP BY c.dates
order by Count desc

SQL incorrect syntax in tableau

I have this error message when I insert my SQL query in Tableau. How to solve this issue?
Is it SQL Server currently does not allow CTE's inside a subquery??
Below is the Tableau output when I insert query in
[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'WITH'.
[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'with'. If this statement is a common table
expression, an xmlnamespaces clause or a change tracking context
clause, the previous statement must be terminated with a semicolon.
[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.
Below is my current CTE Query (recursive)
WITH shiftHours AS (
SELECT RowID,
y.EMPLOYEENAME AS EMPLOYEENAME,
-- flatten the first hour to remove the minutes and get the initial current hour
DATEADD(hour, DATEDIFF(hour, 0, ShiftA_Start), 0) AS currentHour,
ShiftA_Start,
ShiftA_End,
DATEPART(hour, ShiftA_Start) AS hourOrdinal,
-- determine how much of the first hour is applicable. if it is minute 0 then the whole hour counts
CAST(CASE
WHEN DATEADD(hour, DATEDIFF(hour, 0, ShiftA_Start), 0) = DATEADD(hour, DATEDIFF(hour, 0, ShiftA_End), 0) THEN DATEDIFF(minute, ShiftA_Start, ShiftA_End) / 60.0
WHEN DATEPART(minute, ShiftA_Start) = 0 THEN 1.0
ELSE (60 - DATEPART(minute, ShiftA_Start)) / 60.0
END AS DECIMAL(5,3)) AS hourValue
FROM (
-- use a ROW_NUMBER() to generate row IDs for the shifts to ensure each row is unique once it gets to the pivot
SELECT ROW_NUMBER() OVER(ORDER BY ShiftA_Start, ShiftA_End) AS RowID,
EMPLOYEENAME,
ShiftA_Start,
ShiftA_End
FROM (
-- this is where the data gets pulled from the source table and where the data types are converted from string to DATETIME
SELECT
EMPLOYEENAME,
CONVERT(DATETIME, LEFT(SHIFTA_start, 17), 103) AS ShiftA_Start,
CONVERT(DATETIME, LEFT(SHIFTA_end, 17), 103) AS ShiftA_End
from [TableName].[dbo].[TMS_People]
where
CONVERT(DATETIME, LEFT(SHIFTA_START, 17), 103) IS NOT NULL AND CONVERT(DATETIME, LEFT(SHIFTA_END, 17), 103) IS NOT NULL
AND CONVERT(DATETIME, LEFT(SHIFTA_START, 17), 103) != CONVERT(DATETIME, LEFT(SHIFTA_END, 17), 103)
-- this is also where you would add any filtering from the source table such as date ranges
) x
) AS y
UNION ALL
SELECT RowID,
EMPLOYEENAME,
-- add an hour to the currentHour each time the recursive CTE is called
DATEADD(hour, 1, currentHour) AS currentHour,
ShiftA_Start,
ShiftA_End,
DATEPART(hour, DATEADD(hour, 1, currentHour)) AS hourOrdinal,
CAST(CASE
-- when this is the last time period determine the amount of the hour that is applicable
WHEN DATEADD(hour, 2, currentHour) > ShiftA_End THEN DATEPART(minute, ShiftA_End) / 60.0
ELSE 1
END AS DECIMAL(5,3)) AS hourValue
from shiftHours
-- contine recursion until the next hour is after the ShiftEnd
WHERE DATEADD(hour, 1, currentHour) < ShiftA_End
)
SELECT *
FROM (
SELECT RowID,
EMPLOYEENAME,
ShiftA_Start,
ShiftA_End,
hourValue,
hourOrdinal
from shiftHours
) AS t
PIVOT (
SUM(hourValue)
FOR hourOrdinal IN ([0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23])
) AS pvt
OPTION (MAXRECURSION 0);
Tableau do not support SQL server CTE feature .
You should try to implement same logic using view or sub query or Stored proc/table valued function .
Please go throw below thread CTE (SQL Server) are not sported in Tableau.
https://community.tableau.com/thread/105965 .
View Sample code. for CTE
If we have user table (userId, userName, managerId)
CREATE VIEW vUSER AS
WITH UserCTE AS (
SELECT userId, userName, managerId,0 AS steps
FROM dbo.Users
WHERE userId = null
UNION ALL
SELECT mgr.userId, mgr.userName, mgr.managerId, usr.steps +1 AS steps
FROM UserCTE AS usr
INNER JOIN dbo.Users AS mgr
ON usr.managerId = mgr.userId
)
SELECT * FROM UserCTE ;
Create an sp ie MyProc and put all your query in it
ie
CREATE procedure dbo.MyProc AS
WITH UserCTE AS (
SELECT userId, userName, managerId,0 AS steps
FROM dbo.Users
WHERE userId = null
UNION ALL
SELECT mgr.userId, mgr.userName, mgr.managerId, usr.steps +1 AS steps
FROM UserCTE AS usr
INNER JOIN dbo.Users AS mgr
ON usr.managerId = mgr.userId
)
SELECT * FROM UserCTE ;
2 Create a link server using sp_addlinkedserver ie i am telling is Local
3 Call that sp in your view using openquery.
CREATE VIEW dbo.MyView
AS(
SELECT *
FROM openquery(Local,'exec mydb.dbo.MyProc'))
http://capnjosh.com/blog/how-to-call-a-stored-procedure-from-a-view-in-sql-server/

TSQL Finding maximum values in multiple rows

In my query I need to find the supplier with the highest costs for every single year.
SELECT YEAR(ORDERS.OrderDate),
MAX(ORDERS.Freight) AS [Greatest cost]
FROM ORDERS
GROUP BY YEAR(ORDERS.OrderDate)
ORDER BY YEAR(ORDERS.OrderDate) ASC
This code does give me the maximum cost per year, it doesn't give me the name of the supplier.
SELECT YEAR(ORDERS.OrderDate),
SHIPPERS.ShipperID,
SHIPPERS.CompanyName,
MAX(ORDERS.Freight) AS [Greatest cost]
FROM ORDERS, SHIPPERS
WHERE SHIPPERS.ShipperID = ORDERS.ShipVia
GROUP BY YEAR(ORDERS.OrderDate),
SHIPPERS.ShipperID,
SHIPPERS.CompanyName
ORDER BY YEAR(ORDERS.OrderDate) ASC
This code then gave me too much, as in, it gave me all the suppliers (with their highest numbers) for every single year, while I need the highest supplier per year.
Thanks in advance!
There are likely several ways to do this. Here's one: http://sqlfiddle.com/#!6/47d38/3/0
Test data:
create table ORDERS
(
OrderDate datetime,
ShipVia int,
Freight int
);
create table SHIPPERS
(
ShipperID int,
CompanyName nvarchar(100)
);
insert SHIPPERS values (1, 'Shipper1'), (2, 'Shipper2'), (3, 'Shipper3');
insert ORDERS values
('2011-2-1', 1, 10),
('2011-3-1', 1, 20),
('2011-2-2', 2, 5),
('2011-3-2', 2, 10),
('2011-2-3', 3, 18),
('2012-2-1', 1, 10),
('2012-3-1', 1, 20),
('2012-2-2', 2, 25),
('2012-3-2', 2, 40),
('2012-2-3', 3, 18);
Query:
with A as
(
select
YEAR(O.OrderDate) as [year],
S.ShipperID,
SUM(O.Freight) as [totalFreight]
from ORDERS as O
join SHIPPERS as S on O.ShipVia = S.ShipperId
group by YEAR(O.OrderDate), S.ShipperId
)
select A.*, S.CompanyName
from A
join SHIPPERS as S on A.ShipperID = S.ShipperID
where A.totalFreight >=ALL
(select totalFreight from A as Ainner where A.[year] = Ainner.[year]);
Results:
year ShipperID totalFreight CompanyName
2011 1 30 Shipper1
2012 2 65 Shipper2
select * from
(
SELECT YEAR(ORDERS.OrderDate),
SHIPPERS.ShipperID,
SHIPPERS.CompanyName,
MAX(ORDERS.Freight) over (partition by YEAR(ORDERS.OrderDate), SHIPPERS.ShipperID) as max,
row_number() over (partition by YEAR(ORDERS.OrderDate),
order by ORDERS.Freight desc) as rn
FROM ORDERS, SHIPPERS
WHERE SHIPPERS.ShipperID = ORDERS.ShipVia
) tt
where tt.rn = 1

Multiple Column Pivot in T-SQL

I am working with a table where there are multiple rows that I need pivoted into columns. So the pivot is the perfect solution for this, and works well when all I need is one field. I am needing to return several fields based upon the pivot. Here is the pseudo code with specifics stripped out:
SELECT
field1,
[1], [2], [3], [4]
FROM
(
SELECT
field1,
field2,
(ROW_NUMBER() OVER(PARTITION BY field1 ORDER BY field2)) RowID
FROM tblname
) AS SourceTable
PIVOT
(
MAX(field2)
FOR RowID IN ([1], [2], [3], [4])
) AS PivotTable;
The above syntax works brilliantly, but what do I do when I need to get additional information found in field3, field4....?
Rewrite using MAX(CASE...) and GROUP BY:
select
field1
, [1] = max(case when RowID = 1 then field2 end)
, [2] = max(case when RowID = 2 then field2 end)
, [3] = max(case when RowID = 3 then field2 end)
, [4] = max(case when RowID = 4 then field2 end)
from (
select
field1
, field2
, RowID = row_number() over (partition by field1 order by field2)
from tblname
) SourceTable
group by
field1
From there you can add in field3, field4, etc.
The trick to doing multiple pivots over a row_number is to modify that row number sequence to store both the sequence and the field number. Here's an example that does what you want with multiple PIVOT statements.
-- populate some test data
if object_id('tempdb..#tmp') is not null drop table #tmp
create table #tmp (
ID int identity(1,1) not null,
MainField varchar(100),
ThatField int,
ThatOtherField datetime
)
insert into #tmp (MainField, ThatField, ThatOtherField)
select 'A', 10, '1/1/2000' union all
select 'A', 20, '2/1/2000' union all
select 'A', 30, '3/1/2000' union all
select 'B', 10, '1/1/2001' union all
select 'B', 20, '2/1/2001' union all
select 'B', 30, '3/1/2001' union all
select 'B', 40, '4/1/2001' union all
select 'C', 10, '1/1/2002' union all
select 'D', 10, '1/1/2000' union all
select 'D', 20, '2/1/2000' --union all
-- pivot over multiple columns using the 1.1, 1.2, 2.1, 2.2 sequence trick
select
MainField,
max([1.1]) as ThatField1,
max([1.2]) as ThatOtherField1,
max([2.1]) as ThatField2,
max([2.2]) as ThatOtherField2,
max([3.1]) as ThatField3,
max([3.2]) as ThatOtherField3,
max([4.1]) as ThatField4,
max([4.2]) as ThatOtherField4
from
(
select x.*,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.1' as ThatFieldSequence,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.2' as ThatOtherFieldSequence
from #tmp x
) a
pivot (
max(ThatField) for ThatFieldSequence in ([1.1], [2.1], [3.1], [4.1])
) p1
pivot (
max(ThatOtherField) for ThatOtherFieldSequence in ([1.2], [2.2], [3.2], [4.2])
) p2
group by
MainField
I am unsure if you are using MS SQL Server, but if you are... You may want to take a look at the CROSS APPLY functionality of the engine. Basically it will allow you to apply the results of a table-valued UDF to a result set. This would require you to put your pivot query into a table-valued result set.
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
wrap your sql statement with something like:
select a.segment, sum(field2), sum(field3)
from (original select with case arguments) a
group by a.segment
It should collapse your results into one row, grouped on field1.
It is possible to pivot on multiple columns, but you need to be careful about reusing the pivot column across multiple pivots. Here is a good blog post on the subject:
http://pratchev.blogspot.com/2009/01/pivoting-on-multiple-columns.html

Resources