Creating blocks within a CTE - SQL Server - sql-server

I am trying to work out how I can tag unique (what i am calling) blocks (or segments if you will) which have a start and end based consecutive 'Trip' rows ordered by 'epoch' sharing the same 'code'. In this case group by 'trip', 'code' will not work as I need to measure the duration of the 'code' remains constant for the trip. I've tried to use a CTE but I have been unable to partition the data in such a way that it gives desired result shown below. The block number I've shown could be any value, just so long as it is unique so that it tags the consecutive occurrences of the same 'code' on the trip in order of 'epoch'.
Any ideas?
declare #data table (id int, trip int, code int NULL, epoch int, value1 int, value2 int);
insert into #data (id, trip, code, epoch, value1, value2)
values
(1, 1, null, 31631613, 0, 0),
(2, 2, 1, 31631614, 10, 40),
(3, 1, 1, 31631616, 10, 60),
(4, 1, 1, 31631617, 40, 60),
(5, 2, 1, 31631617, 23, 40),
(6, 2, 2, 31631620, 27, 40),
(7, 2, 2, 31631629, 23, 40),
(9, 1, 1, 31631618, 39, 60),
(10, 1, null, 31631621, 38, 60),
(12, 1, null, 31631625, 37, 60),
(15, 1, null, 31631627, 35, 60),
(19, 1, 1, 31631630, 39, 60),
(20, 1, 1, 31631632, 40, 60),
(21, 2, 1, 31631629, 23, 40);
block id trip code epoch value1 value2
1 1 1 NULL 31631613 0 0
2 2 2 1 31631614 10 40
2 5 2 1 31631617 23 40
3 3 1 1 31631616 10 60
3 4 1 1 31631617 40 60
3 9 1 1 31631618 39 60
4 6 2 2 31631620 27 40
4 7 2 2 31631629 23 40
5 10 1 NULL 31631621 38 60
5 12 1 NULL 31631625 37 60
5 15 1 NULL 31631627 35 60
6 19 1 1 31631630 39 60
6 20 1 1 31631632 40 60
7 21 2 1 31631629 23 40

You didn't update your expected output so I'm still not 100% sure this is what you want, but give it a try...
SELECT
DENSE_RANK() OVER (ORDER BY trip, code),
*
FROM
#data
ORDER BY
trip, code, epoch

Ok, it's far from perfect by any means but it is a starter that at least identifies the start and end of a contiguous block where the 'code' has remained the same for the trip. For the sake of at least contributing something I'll post what I jerried up. If I ever get time to do a proper job I'll post it.
declare #minint int; set #minint = -2147483648;
declare #maxint int; set #maxint = 2147483647;
declare #id_data table (pk int IDENTITY(1,1), id int, trip int, code int NULL, epoch int, value1 int, value2 int);
insert into #id_data VALUES(#minint, #minint, #minint, #minint, #minint, #minint);
insert into #id_data
SELECT id, trip, coalesce(code,0), epoch, value1, value2
FROM #data
order by trip, epoch, code;
insert into #id_data VALUES(#maxint, #maxint, #maxint, #maxint, #maxint, #maxint);
WITH CTE as
(
SELECT pk, id, trip, code, epoch, value1, value2, ROW_NUMBER() OVER (PARTITION BY trip ORDER BY epoch) as row_num
FROM #id_data
)
SELECT B.*, A.code, C.min_next_code
FROM CTE A
INNER JOIN CTE B ON (B.pk = A.pk + 1) AND (A.code != B.code) -- SELECTS THE RECORDS THAT START A NEW GROUP
OUTER APPLY (
SELECT min_next_code = MIN(pk) - 1 -- LOCATION OF NEXT GROUP
FROM CTE
WHERE pk > B.pk AND (trip = B.trip) AND (code != B.code)
) C
WHERE B.id < #maxint

Related

Setting variables in snowflake

I want to define variables before a CTE table and after a CTE table because some variables are dependent on the result of the CTE table. For example
SET(K,B) = (5,2);
with my_data(Key,Index,Value) as (
-- data table as cte
select * from values
(1, 3, 10),
(1, 5, 18),
(1, 14, 4),
(2, 2, 11),
(2, 13, 24),
(2, 29, 40)
)
SELECT VALUE + $K
FROM my_data
This examples works perfectly. But this code:
SET(K,B) = (5,2);
with my_data(Key,Index,Value) as (
-- data table as cte
select * from values
(1, 3, 10 ),
(1, 5, 18 ),
(1, 14, 4 ),
(2, 2, 11 ),
(2, 13, 24),
(2, 29, 40)
)
SET AVG_VAL = (SELECT AVG(VALUE) FROM my_data);
SELECT VALUE + $AVG_VAL
FROM my_data
doesn't because snowflake gives me this error
"SQL compilation error: syntax error line 34 at position 0 unexpected 'SET'."
Should I create a temporary table to store the result of this query (SELECT AVG(VALUE) FROM my_data) in it and then include/use this temporary table for future queries instead of a variable?
Your "CTE" is not a standalone "thing" it only exist in the context of a SELECT.
Thus
WITH cte_x AS (...)
SELECT * FROM cte_x
is one SELECT which has a CTE attached to it.
Thus for you variable assignment the CTE has to be "IN" the paren's
with my_data(Key,Index,Value) as (
select * from values
(1, 3, 10 ),
(1, 5, 18 ),
(1, 14, 4 ),
(2, 2, 11 ),
(2, 13, 24),
(2, 29, 40)
)
SELECT AVG(VALUE) FROM my_data;
AVG(VALUE)
17.833333
given that is a discrete chunk of SQL, that can be captured into the variable:
set AVG_VAL = (
with my_data(Key,Index,Value) as (
select * from values
(1, 3, 10 ),
(1, 5, 18 ),
(1, 14, 4 ),
(2, 2, 11 ),
(2, 13, 24),
(2, 29, 40)
)
SELECT AVG(VALUE) FROM my_data
);
status
Statement executed successfully.
now we can use that value:
select $AVG_VAL * 2;
$AVG_VAL * 2
35.666666
But the next query:
SELECT VALUE + $AVG_VAL
FROM my_data
002003 (42S02): SQL compilation error:
Object 'MY_DATA' does not exist or not authorized.
has no CTE called my_data, so that need to be insert:
with my_data(Key,Index,Value) as (
select * from values
(1, 3, 10 ),
(1, 5, 18 ),
(1, 14, 4 ),
(2, 2, 11 ),
(2, 13, 24),
(2, 29, 40)
)
SELECT VALUE + $AVG_VAL
FROM my_data
If you want a table that can be "used twice" you will need an actual table, at which point I would suggest a temporary table so it only have context in this session.
Which the nature of Pankaj's answer (ether via a permanent or temp table)
This can be done as in -
select * from d2;
+-----+-----+
| ID1 | ID2 |
|-----+-----|
| 1 | 2 |
| 100 | 2 |
| 3 | 4 |
| 300 | 4 |
+-----+-----+
Setting variable -
set (var1) = (select sum(id2) from d2);
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
Using variable -
select id1+$var1 from d2;
+-----------+
| ID1+$VAR1 |
|-----------|
| 13 |
| 112 |
| 15 |
| 312 |
+-----------+
An alternatvie approach is to simply use windowed AVG function:
with my_data(Key,Index,Value) as (
-- data table as cte
select * from values
(1, 3, 10),
(1, 5, 18),
(1, 14, 4),
(2, 2, 11),
(2, 13, 24),
(2, 29, 40)
)
SELECT VALUE, AVG(VALUE) OVER(),
VALUE + AVG(VALUE) OVER()
FROM my_data;
Output:
OVER() means that the window used to compute average spans over all rows.

Grouped result in recursive query (SQL Server)

I have a recursive query that is working as intended for calculating weighted average cost for inventory calculation. My problem is that I need multiple weighted average from the same query grouped by different columns. I know I can solve the issue by calculating it multiple times, one for each key-column. But because of query performance considerations, I want it to be traversed once. Sometimes I have 1M+ rows.
I have simplified the data and replaced weighted average to a simple sum to make my problem more easy to follow.
How can I get the result below using recursive cte? Remember that I have to use a recursive query to calculate weighted average cost. I am on sql server 2016.
Example data (Id is also the sort order. The Id and Key is unique together.)
Id Key1 Key2 Key3 Value
1 1 1 1 10
2 1 1 1 10
3 1 2 1 10
4 2 2 1 10
5 1 2 1 10
6 1 1 2 10
7 1 1 1 10
8 3 3 1 10
Expected result
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
EDIT
After some well deserved criticism I have to be much better in how I make a question.
Here is an example and why I need a recursive query. In the example I get the result for Key1, but I need it for Key2 and Key3 as well in the same query. I know that I can repeat the same query three times, but that is not preferable.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--The steps below will give me the cost "grouped" by Key1
WITH Key1RowNumber AS (
SELECT
IntentoryItemId,
ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS RowNumber
FROM #InventoryItem
)
UPDATE #InventoryItem
SET InventoryOrder = Key1RowNumber.RowNumber
FROM #InventoryItem InventoryItem
INNER JOIN Key1RowNumber
ON Key1RowNumber.IntentoryItemId = InventoryItem.IntentoryItemId;
WITH cte AS (
SELECT
IntentoryItemId,
InventoryOrder,
Key1,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
FROM #InventoryItem InventoryItem
WHERE InventoryItem.InventoryOrder = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
Sub.InventoryOrder,
Sub.Key1,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity) * Main.AvgPrice + Sub.Quantity * Sub.price)
/
NULLIF((Main.CurrentQuantity) + Sub.Quantity, 0)
) AS AvgPrice
FROM CTE Main
INNER JOIN #InventoryItem Sub
ON Main.Key1 = Sub.Key1
AND Sub.InventoryOrder = main.InventoryOrder + 1
)
SELECT cte.IntentoryItemId, cte.AvgPrice
FROM cte
ORDER BY IntentoryItemId
Why you will want to calculate on 1M+ rows ?
Secondly I think your db design is wrong ? key1 ,key2,key3 should have been unpivoted and one column called Keys and 1 more column to identify each key group.
It will be clear to you in below example.
If I am able to optimize my query then I can think of calculating many rows else I try to limit number of rows.
Also if possible you can think of keeping calculated column of Avg Price.i.e. when table is populated then you can calculate and store it.
First let us know, if output is correct or not.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--select * from #InventoryItem
--return
;with cte as
(
select *
, ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS rn1
, ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY IntentoryItemId) AS rn2
, ROW_NUMBER() OVER (PARTITION BY Key3 ORDER BY IntentoryItemId) AS rn3
from #InventoryItem
)
,cte1 AS (
SELECT
IntentoryItemId,
Key1 keys,
Quantity,
Price
,rn1
,rn1 rn
,1 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key2 keys,
Quantity,
Price
,rn1
,rn2 rn
,2 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key3 keys,
Quantity,
Price
,rn1
,rn3 rn
,3 pk
FROM cte c
)
, cte2 AS (
SELECT
IntentoryItemId,
rn,
Keys,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price)) a,
CONVERT(NUMERIC(22,9), InventoryItem.Price) b,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
,pk
FROM cte1 InventoryItem
WHERE InventoryItem.rn = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
sub.rn,
Sub.Keys,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),Main.CurrentQuantity * Main.AvgPrice),
CONVERT(NUMERIC(22,9),Sub.Quantity * Sub.price),
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity * Main.AvgPrice) + (Sub.Quantity * Sub.price))
/
NULLIF(((Main.CurrentQuantity) + Sub.Quantity), 0)
) AS AvgPrice
,sub.pk
FROM CTE2 Main
INNER JOIN cte1 Sub
ON Main.Keys = Sub.Keys and main.pk=sub.pk
AND Sub.rn = main.rn + 1
--and Sub.InventoryOrder<=2
)
select *
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice2
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice3
from cte2 c
where pk=1
ORDER BY pk,rn
Alternate Solution (for Sql 2012+) and many thanks to Jason,
SELECT *
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key1 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey1Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key2 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey2Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key3 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey3Price
from #InventoryItem
order by IntentoryItemId
Here's how to do it in SQL Server 2012 & later...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT,
Key1 INT,
Key2 INT,
Key3 INT,
[Value] INT
);
INSERT #TestData(Id, Key1, Key2, Key3, Value) VALUES
(1, 1, 1, 1, 10),
(2, 1, 1, 1, 10),
(3, 1, 2, 1, 10),
(4, 2, 2, 1, 10),
(5, 1, 2, 1, 10),
(6, 1, 1, 2, 10),
(7, 1, 1, 1, 10),
(8, 3, 3, 1, 10);
--=============================================================
SELECT
td.Id, td.Key1, td.Key2, td.Key3, td.Value,
Key1Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key1 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key2Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key2 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key3Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key3 ORDER BY td.Id ROWS UNBOUNDED PRECEDING)
FROM
#TestData td
ORDER BY
td.Id;
results...
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70

How do you validate that range doesn't overlap in a list of data?

I have a list of data :
Id StartAge EndAge Amount
1 0 2 50
2 2 5 100
3 5 10 150
4 6 9 160
I have to set Amount for various age group.
The age group >0 and <=2 need to pay 50
The age group >2 and <=5 need to pay 100
The age group >5 and <=10 need to pay 150
But
The age group >6 and <=9 need to pay 160 is an invalid input because >6 and <=9 already exist on 150 amount range.
I have to validate such kind of invalid input before inserting my data as a bulk.Once 5-10 range gets inserted anything that is within this range should not be accepted by system. For example: In above list, user should be allowed to insert 10-15 age group but any of the following should be checked as invalid.
6-9
6-11
3-5
5-7
If Invalid Input exists on my list I don't need to insert the list.
You could try to insert your data to the temporary table first.
DECLARE #TempData TABLE
(
[Id] TINYINT
,[StartAge] TINYINT
,[EndAge] TINYINT
,[Amount] TINYINT
);
INSERT INTO #TempData ([Id], [StartAge], [EndAge], [Amount])
VALUES (1, 0, 2, 50)
,(2, 2, 5, 100)
,(3, 5, 10, 150)
,(4, 6, 9, 160);
Then, this data will be transferred to your target table using INSERT INTO... SELECT... statement.
INSERT INTO <your target table>
SELECT * FROM #TempData s
WHERE
NOT EXISTS (
SELECT 1
FROM #TempData t
WHERE
t.[Id] < s.[Id]
AND s.[StartAge] < t.[EndAge]
AND s.[EndAge] > t.[StartAge]
);
I've created a demo here
We can use recursive CTE to find how records are chained by end age and start age pairs:
DECLARE #DataSource TABLE
(
[Id] TINYINT
,[StartAge] TINYINT
,[EndAge] TINYINT
,[Amount] TINYINT
);
INSERT INTO #DataSource ([Id], [StartAge], [EndAge], [Amount])
VALUES (1, 0, 2, 50)
,(2, 2, 5, 100)
,(3, 5, 10, 150)
,(4, 6, 9, 160)
,(5, 6, 11, 160)
,(6, 3, 5, 160)
,(7, 5, 7, 160)
,(9, 10, 15, 20)
,(8, 7, 15, 20);
WITH PreDataSource AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [StartAge] ORDER BY [id]) as [pos]
FROM #DataSource
), DataSource AS
(
SELECT [Id], [StartAge], [EndAge], [Amount], [pos]
FROM PreDataSource
WHERE [id] = 1
UNION ALL
SELECT R.[Id], R.[StartAge], R.[EndAge], R.[Amount], R.[pos]
FROM DataSource A
INNER JOIN PreDataSource R
ON A.[Id] < R.[Id]
AND A.[EndAge] = R.[StartAge]
AND R.[pos] =1
)
SELECT [Id], [StartAge], [EndAge], [Amount]
FROM DataSource;
This is giving us, the following output:
Note, that before this, we are using the following statement to prepare the data:
SELECT *, ROW_NUMBER() OVER (PARTITION BY [StartAge] ORDER BY [id]) as [pos]
FROM #DataSource;
The idea is to find records with same start age and to calculated which one is inserted first. Then, in the CTE we are getting only the first.
Assuming you are bulk inserting the mentioned data into a temp table(#tmp) or table variable (#tmp).
If you are working on sql server 2012 try the below.
select *
from(select *,lag(endage,1,0)over(order by endage) as [col1]
from #tmp)tmp
where startage>=col1 and endage>col1
The result of this query should be inserted into your main table.

SQL - Count and group records by month and field value from the last year

I need to count totals number of records in a table, 'a', where a field in 'a', say 'type', has a certain value, 'v'. From all these records where a.type = 'v', I need to group these twice: first by field 'b_id', and again by month. The date range for these records must be restricted to the last year from the current date
I already have the totals for the 'b_id' field with ISNULL() as follows:
SELECT ISNULL(
SELECT COUNT(*)
FROM a
WHERE a.type = 'v'
AND b.b_id = a.b_id
), 0) AS b_totals
The data lies in table a, and is joined on table b. 'b_id' is the primary key for table b, and is found in table a (thought it is not part of a's key). The key for a is irrelevant to the data I need to pull, but can be stated as "a_id" for simplicity.
How do I:
Restrict these records to the past twelve months from the current date.
Take the total for any and all values of b.id, and categorize them by month. This is in addition to the totals of b.id by year. The date is stored in field "date_occurred" in table 'a' as a standard date/time type.
The schema at the end should look something like this, assuming that the current month is October and the year is 2016:
b.id | b_totals | Nov. 2015 | Dec. 2015 | Jan. 2016 .... Oct. 2016
__________________________________________________________________
ID_1 1 0 0 0 1
ID_2 3 2 0 1 0
ID_3 5 1 1 3 0
EDIT: I should probably clarify that I'm counting the records in table 'a' where field 'f' has a certain value 'v.' From these records, I need to group them by building then by month/date. I updated my ISNULL query to make this more clear, as well as the keys for a and b. "date_occured" should be in table a, not b, that was a mistake/typo on my end.
If it helps, the best way I can describe the data from a high level without giving away any sensitive data:
'b' is a table of locations, and 'b.b_id' is the ID for each location
'a' is a table of events. The location for these events is found in 'a.b_id' and joined on 'b.b_id' The date that each event occured is in 'a.date_occurred'
I need to restrict the type of events to a certain value. In this case, the type is field 'type.' This is the "where" clause in my ISNULL SQL query that gets the totals by location.
From all the events of this particular type, I need to count how many times this event occurred in the past year for each location. Once I have these totals from the past year, I need to count them by month.
Table structure:
The table structure of a is something like
a.a_id | a.b_id | a.type | a.date_occurred
Again, I do not need the ID's from a: just a series of counts based on type, b_id, and date_occurred.
EDIT 2: I restricted the totals of b_id to the past year with the following query:
SELECT ISNULL(
SELECT COUNT(*)
FROM a
WHERE a.type = 'v'
AND b.b_id = a.b_id
AND a.date_occurred BETWEEN (DATEADD(yyyy, -1, GETDATE()) AND (GETDATE())
), 0) AS b_totals
Now need to do this with a PIVOT and the months.
In an attempt to make this sufficiently detailed from the absolute minimum of detail provided in the question I have created these 2 example tables with some data:
CREATE TABLE Bexample
([ID] int)
;
INSERT INTO Bexample
([ID])
VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9)
;
CREATE TABLE Aexample
([ID] int, [B_PK] int, [SOME_DT] datetime)
;
INSERT INTO Aexample
([ID], [B_PK], [SOME_DT])
VALUES
(1, 1, '2015-01-01 00:00:00'),
(2, 2, '2015-02-01 00:00:00'),
(3, 3, '2015-03-01 00:00:00'),
(4, 4, '2015-04-01 00:00:00'),
(5, 5, '2015-05-01 00:00:00'),
(6, 6, '2015-06-01 00:00:00'),
(7, 7, '2015-07-01 00:00:00'),
(8, 8, '2015-08-01 00:00:00'),
(9, 9, '2015-09-01 00:00:00'),
(10, 1, '2015-10-01 00:00:00'),
(11, 2, '2015-11-01 00:00:00'),
(12, 3, '2015-12-01 00:00:00'),
(13, 1, '2016-01-01 00:00:00'),
(14, 2, '2016-02-01 00:00:00'),
(15, 3, '2016-03-01 00:00:00'),
(16, 4, '2016-04-01 00:00:00'),
(17, 5, '2016-05-01 00:00:00'),
(18, 6, '2016-06-01 00:00:00'),
(19, 7, '2016-07-01 00:00:00'),
(20, 8, '2016-08-01 00:00:00'),
(21, 9, '2016-09-01 00:00:00'),
(22, 1, '2016-10-01 00:00:00'),
(23, 2, '2016-11-01 00:00:00'),
(24, 3, '2016-12-01 00:00:00')
;
Now, using those tables and data I can generate a result table like this:
id Nov 2015 Dec 2015 Jan 2016 Feb 2016 Mar 2016 Apr 2016 May 2016 Jun 2016 Jul 2016 Aug 2016 Sep 2016 Oct 2016
1 0 0 1 0 0 0 0 0 0 0 0 1
2 1 0 0 1 0 0 0 0 0 0 0 0
3 0 1 0 0 1 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 1 0 0
9 0 0 0 0 0 0 0 0 0 0 1 0
Using a query that needs both a "common table expression (CTE) and "dynamic sql" to produce that result:
"Dynamic SQL" is a query that generates SQL which is then executed. This is needed because the column names change month to month. So, for the dynamic sql we declare 2 variables that will hold the generated SQL. One of these is to store the columns names, which gets used in 2 places, and the other is to hold the completed query. Note instead of executing this you may display the generated SQL as you evelop your solution (note the comments near execute at the end of the query).
In addition to the example tables and data, we also have a "time series" of 12 months to consider. This is "dynamic" as it is calculated from today's date and I have assumed that if today is any day within November 2016, that "the last 12 months" starts at 1 Nov 2015, and concludes at 31 Oct 2016 (i.e. 12 full months, no partial months).
The core of calculating this is here:
DATEADD(month,-12, DATEADD(month, DATEDIFF(month,0,GETDATE()), 0) )
which firstly locates the first day of the current month with DATEDIFF(month,0,GETDATE()) then deducts a further 12 months from that date. With that as a start date a "recursive CTE" is used to generate 12 rows, one for each month for the past 12 full months.
The purpose of these 12 rows is to ensure that when we consider the actual table data there will be no gaps in the 12 columns. This is achieved by using the generated 12 rows as the "from table" in our query, and the "A" table is LEFT JOINED based on the year/month of a date column [some_dt] to the 12 monthly rows.
So, we generate 12 rows join the sample data to these which is used to generate the SQL necessary for a "PIVOT" of the data. Here it is useful to actually see that an example of that generated sql, which looks like this:
SELECT id, [Nov 2015],[Dec 2015],[Jan 2016],[Feb 2016],[Mar 2016],[Apr 2016],[May 2016],[Jun 2016],[Jul 2016],[Aug 2016],[Sep 2016],[Oct 2016] FROM
(
select
format([mnth],'MMM yyyy') colname
, b.id
, a.b_pk
from #mylist
cross join bexample b
left join aexample a on #mylist.mnth = DATEADD(month, DATEDIFF(month,0,a.some_dt), 0)
and b.id = a.b_pk
) sourcedata
pivot
(
count([b_pk])
FOR [colname] IN ([Nov 2015],[Dec 2015],[Jan 2016],[Feb 2016],[Mar 2016],[Apr 2016],[May 2016],[Jun 2016],[Jul 2016],[Aug 2016],[Sep 2016],[Oct 2016])
) p
So, hopefully you can see in that generated SQL code that the dynamically created 12 rows become 12 columns. Note that because we are executing "dynamic sql" the 12 rows we generated as a CTE need to be stored as a "temporary table" (#mylist).
The query to generate AND execute that SQL is this.
DECLARE #cols AS VARCHAR(MAX)
DECLARE #query AS VARCHAR(MAX)
;with mylist as (
select DATEADD(month,-12, DATEADD(month, DATEDIFF(month,0,GETDATE()), 0) ) as [mnth]
union all
select DATEADD(month,1,[mnth])
from mylist
where [mnth] < DATEADD(month,-1, DATEADD(month, DATEDIFF(month,0,GETDATE()), 0) )
)
select [mnth]
into #mylist
from mylist
SELECT #cols = STUFF((SELECT ',' + QUOTENAME(format([mnth],'MMM yyyy'))
FROM #mylist
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
SET #query = 'SELECT id, ' + #cols + ' FROM
(
select
format([mnth],''MMM yyyy'') colname
, b.id
, a.b_pk
from #mylist
cross join bexample b
left join aexample a on #mylist.mnth = DATEADD(month, DATEDIFF(month,0,a.some_dt), 0)
and b.id = a.b_pk
) sourcedata
pivot
(
count([b_pk])
FOR [colname] IN (' + #cols + ')
) p '
--select #query -- use select to inspect the generated sql
execute(#query) -- once satisfied that sql is OK, use execute
drop table #mylist
You can see this working at: http://rextester.com/VVGZ39193
I want to share another attempt at explaining the issues faced by your requirements.
To follow this you MUST understand this sample data. I have 2 tables #events (a) and #locations (b). The column names should be easy to follow I hope. The
declare #Events table
( [id] int IDENTITY(1007,2)
, [b_id] int
, [date_occurred] datetime
, [type] varchar(20)
)
;
INSERT INTO #Events
([b_id], [date_occurred],[type])
VALUES
(1, '2015-01-11 00:00:00','v'),
(2, '2015-02-21 00:00:00','v'),
(3, '2015-03-11 00:00:00','v'),
(4, '2015-04-21 00:00:00','v'),
(5, '2015-05-11 00:00:00','v'),
(6, '2015-06-21 00:00:00','v'),
(1, '2015-07-11 00:00:00','v'),
(2, '2015-08-11 00:00:00','v'),
(3, '2015-09-11 00:00:00','v'),
(5, '2015-10-11 00:00:00','v'),
(5, '2015-11-21 00:00:00','v'),
(6, '2015-12-21 00:00:00','v'),
(1, '2016-01-21 00:00:00','v'),
(2, '2016-02-21 00:00:00','v'),
(3, '2016-03-21 00:00:00','v'),
(4, '2016-04-21 00:00:00','v'),
(5, '2016-05-21 00:00:00','v'),
(6, '2016-06-21 00:00:00','v'),
(1, '2016-07-11 00:00:00','v'),
(2, '2016-08-21 00:00:00','v'),
(3, '2016-09-21 00:00:00','v'),
(4, '2016-10-11 00:00:00','v'),
(5, '2016-11-11 00:00:00','v'),
(6, '2016-12-11 00:00:00','v');
declare #Locations table
([id] int, [name] varchar(13))
;
INSERT INTO #Locations
([id], [name])
VALUES
(1, 'Atlantic City'),
(2, 'Boston'),
(3, 'Chicago'),
(4, 'Denver'),
(5, 'Edgbaston'),
(6, 'Melbourne')
;
OK. So with that data we can easily create a set of counts using this query:
select
b.id
, b.name
, format(a.date_occurred,'yyyy MMM') mnth
, count(*)
FROM #events a
inner join #locations b ON b.id = a.b_id
WHERE a.type = 'v'
and a.date_occurred >= DATEADD(month,-12, DATEADD(month, DATEDIFF(month,0,GETDATE()), 0) )
group by
b.id
, b.name
, format(a.date_occurred,'yyyy MMM')
And that output looks like this:
id name mnth
-- ------------- -------- -
1 Atlantic City 2016 Jan 1
1 Atlantic City 2016 Jul 1
2 Boston 2016 Aug 1
2 Boston 2016 Feb 1
3 Chicago 2016 Mar 1
3 Chicago 2016 Sep 1
4 Denver 2016 Apr 1
4 Denver 2016 Oct 1
5 Edgbaston 2015 Nov 1
5 Edgbaston 2016 May 1
5 Edgbaston 2016 Nov 1
6 Melbourne 2015 Dec 1
6 Melbourne 2016 Dec 1
6 Melbourne 2016 Jun 1
So, with a "simple" query that is easy to pass parameters into, the output is BY ROWS
and the column headings are FIXED
NOW do you understand why transposing those rows into columns, with VARIABLE COLUMN HEADING forces the use of dynamic sql?
Your requirements, no matter how many words you throw at it, leads to complexity in the sql.
You can run the above data/query here: https://data.stackexchange.com/stackoverflow/query/574718/count-and-group-records-by-month-and-field-value-from-the-last-year

T-SQL - Filling in the gaps in running balance

I am working on a Data Warehouse project and the client provides daily sales data. On-hand quantities are provided in most lines but are often missing. I need help on how to fill those missing values based on prior OH and sales information.
Here's a sample data:
Line# Store Item OnHand SalesUnits DateKey
-----------------------------------------------
1 001 A 100 20 1
2 001 A 80 10 2
3 001 A null 30 3 --[OH updated with 70 (80-10)]
4 001 A null 5 4 --[OH updated with 40 (70-30)]
5 001 A 150 10 5 --[OH untouched]
6 001 B null 4 1 --[OH untouched - new item]
7 001 B 80 12 2
8 001 B null 10 3 --[OH updated with 68 (80-12]
Lines 1 and 2 are not to be updated because OnHand quantities exist.
Lines 3 and 4 are to be updated based on their preceding rows.
Line 5 is to be left untouched because OnHand is provided.
Line 6 is to be left untouched because it is the first row for Item B
Is there a way I can do this in a set operation? I know I can do it easily using a fast_forward cursor but it will take a long time (15M+ rows).
Thanks for your help!
Test data:
declare #t table(
Line# int, Store char(3), Item char, OnHand int, SalesUnits int, DateKey int
)
insert #t values
(1, '001', 'A', 100, 20, 1),
(2, '001', 'A', 80 , 10, 2),
(3, '001', 'A', null, 30, 3),
(4, '001', 'A', null, 5, 4),
(5, '001', 'A', 150, 10, 5),
(6, '001', 'B', null, 4, 1),
(7, '001', 'B', null, 4, 2),
(8, '001', 'B', 80, 12, 3),
(9, '001', 'B', null, 10, 4)
Script to populate not using cursor:
;with a as
(
select Line#, Store, Item, OnHand, SalesUnits, DateKey, 1 correctdata from #t where DateKey = 1
union all
select t.Line#, t.Store, t.Item, coalesce(t.OnHand, a.onhand - a.salesunits), t.SalesUnits, t.DateKey, t.OnHand from #t t
join a on a.DateKey = t.datekey - 1 and a.item = t.item and a.store = t.store
)
update t
set OnHand = a.onhand
from #t t join a on a.line# = t.line#
where a.correctdata is null
Script to populate using cursor:
declare #datekey int, #store int, #item char, #Onhand int,
#calculatedonhand int, #salesunits int, #laststore int, #lastitem char
DECLARE sales_cursor
CURSOR FOR
SELECT datekey+1, store, item, OnHand -SalesUnits, salesunits
FROM #t sales
order by store, item, datekey
OPEN sales_cursor;
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #calculatedonhand = case when #laststore = #store and #lastitem = #item
then coalesce(#onhand, #calculatedonhand - #salesunits) else null end
,#laststore = #store, #lastitem = #item
UPDATE s
SET onhand=#calculatedonhand
FROM #t s
WHERE datekey = #datekey and #store = store and #item = item
and onhand is null and #calculatedonhand is not null
FETCH NEXT FROM sales_cursor
INTO #datekey, #store, #item, #Onhand, #salesunits
END
CLOSE sales_cursor;
DEALLOCATE sales_cursor;
I recommand you use the cursor version, I doubt you can get a decent performance using the recursive query. I know people in here hate cursors, but when your table has that size, it can be the only solution.

Resources