I am trying to assign a group number to distinct groups of rows in a dataset that has changing data over time. The changing fields are tran_seq, prog_id, deg-id, cur_id, and enroll_status in my example. When any of those fields are different from the previous row, I need a new grouping number. When the fields are the same as the prior row, then the grouping number should stay the same. When I try ROW_NUMBER(), RANK(), or DENSE_RANK(), I get increasing values for the same group (e.g. the first 2 rows in example). I feel I need to ORDER BY start_date as it is temporal data.
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date | desired |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2004-12-11 | 1 |
| 2 | 1 | 6 | 9 | 3 | ENRL | 2006-01-10 | 2006-05-06 | 1 |
| 3 | 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 | 2 |
| 4 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-05-16 | 3 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2014-08-18 | 2014-12-05 | 3 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 | 4 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 | 5 |
| 8 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 | 6 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 | 7 |
| 10 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 | 8 |
| 11 | 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2017-05-05 | 9 |
| 12 | 2 | 12 | 23 | 45 | ENRL | 2018-01-17 | 2018-05-11 | 9 |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
Once I have grouping numbers, I think I can group by those to get what I'm ultimately after: a timeline of different statuses with start dates and end dates. For the example data above, that would be:
+---+----------+---------+--------+--------+---------------+------------+------------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+---+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 2 | 1 | 6 | 9 | 59 | ENRL | 2004-08-29 | 2006-12-16 |
| 3 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 4 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 8 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2017-01-17 | 2018-05-06 |
+---+----------+---------+--------+--------+---------------+------------+------------+
This is a classic XY problem, in that you are asking for an intermediate step to a different solution, rather than asking about the solution itself.
As you included your overall end goal as a bit of an addendum however, here is how you can reach that without your intermediate step:
declare #t table(tran_seq int, prog_id int, deg_id int, cur_id int, enroll_status varchar(4), start_date date, end_date date, desired int)
insert into #t values
(1,6,9,3 ,'ENRL','2004-08-22','2004-12-11',1)
,(1,6,9,3 ,'ENRL','2006-01-10','2006-05-06',1)
,(1,6,9,59 ,'ENRL','2006-08-29','2006-12-16',2)
,(2,12,23,45,'ENRL','2014-01-21','2014-05-16',3)
,(2,12,23,45,'ENRL','2014-08-18','2014-12-05',3)
,(2,12,23,45,'LOAP','2015-01-20','2015-05-15',4)
,(2,12,23,45,'ENRL','2015-08-25','2015-12-11',5)
,(2,12,23,45,'LOAP','2016-01-12','2016-05-06',6)
,(2,12,23,45,'ENRL','2016-05-16','2016-08-05',7)
,(2,12,23,45,'LOAJ','2016-08-23','2016-12-02',8)
,(2,12,23,45,'ENRL','2017-01-18','2017-05-05',9)
,(2,12,23,45,'ENRL','2018-01-17','2018-05-11',9)
;
select tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,min(start_date) as start_date
,max(end_date) as end_date
from(select *
,row_number() over (order by end_date) - row_number() over (partition by tran_seq,prog_id,deg_id,cur_id,enroll_status order by end_date) as grp
from #t
) AS g
group by tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,grp
order by start_date;
Output
+----------+---------+--------+--------+---------------+------------+------------+
| tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 |
| 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2018-05-11 |
+----------+---------+--------+--------+---------------+------------+------------+
I have a table be like:
| Date | Week | Name | No | Count |
|-----------|------|--------|----|-------|
| 2019/4/1 | 14 | John | 1 | 1 |
| 2019/4/1 | 14 | Mary | 2 | 1 |
| 2019/4/9 | 15 | Kevin | 3 | 2 |
| 2019/4/9 | 15 | John | 4 | 1 |
| 2019/4/9 | 15 | Jessie | 5 | 1 |
| 2019/4/18 | 16 | Kevin | 6 | 1 |
| 2019/4/18 | 16 | John | 7 | 1 |
| 2019/4/18 | 16 | Jessie | 8 | 2 |
| 2019/4/18 | 16 | Mary | 9 | 3 |
| 2019/4/18 | 16 | Mary | 10 | 1 |
| 2019/4/18 | 16 | Jessie | 11 | 1 |
| 2019/4/24 | 17 | Mary | 12 | 1 |
| 2019/4/24 | 17 | Jessie | 13 | 1 |
What I want to do is to calculate people's total count per Week.
And sort by their total count.
I know GROUP BY can make this happen, I've tried, but just can't figure it out.
This is what I expect:
| Name | 14 | 15 | 16 | 17 | Total |
|--------|----|----|----|----|-------|
| Mary | 1 | 0 | 4 | 1 | 6 |
| Jessie | 0 | 1 | 3 | 1 | 5 |
| John | 1 | 1 | 1 | 0 | 3 |
| Kevin | 0 | 2 | 1 | 0 | 3 |
| Total | 2 | 4 | 9 | 2 | 17 |
How can I do?
Select [Name]
,sum(case when [Week] = 14 then [Count] else 0 end) as Week14
,sum(case when [Week] = 15 then [Count] else 0 end) as Week15
,sum(case when [Week] = 16 then [Count] else 0 end) as Week16
,sum(case when [Week] = 17 then [Count] else 0 end) as Week17
,sum([Count]) as Total
from [table]
group by [Name]
order by Total
I'm not sure which version of DB2 you're using (LUW/zOS/i) so this is a general answer. The week number can be made to be more flexible but a certain amount of hard coding will need to be done for the number of weeks.
I am trying to build a simple model with tables containing forecast data.
The first I will call [Year Forecast] and the second [Month Forecast]. Like so:
| Year | Forecast |
-------------------
| 2018 | 144000 |
| 2019 | 180000 |
| 2020 | 220000 |
| .... | ...... |
I want the DB to allow manual input in the [Year Forecast] for [Year] > year(getdate())+2. So in the example the forecast number of 2020 would have been manually entered as a whole number. (Note that Year would be a unique identifier)
For [Year] < year(getdate())+2 the table [Year Forecast] should take the sum of [Month Forecast]. This would be for 2018 and 2019 in this example.
| ID | Year | Month | Forecast |
--------------------------------
| 1 | 2018 | 1 | 12000 |
| 2 | 2018 | 2 | 12000 |
| 3 | 2018 | 3 | 12000 |
| 4 | 2018 | 4 | 12000 |
| 5 | 2018 | 5 | 12000 |
| 6 | 2018 | 6 | 12000 |
| 7 | 2018 | 7 | 12000 |
| 8 | 2018 | 8 | 12000 |
| 9 | 2018 | 9 | 12000 |
| 10 | 2018 | 10 | 12000 |
| 11 | 2018 | 11 | 12000 |
| 12 | 2018 | 12 | 12000 |
| 13 | 2019 | 1 | 15000 |
| 14 | 2019 | 2 | 15000 |
| .. | .... | ..... | ........ |
Relationship would be straightforward, but I want to define a procedure that takes the sum of Forecast of the related year in [Month Forecast] and prohibits manual data input for [Year] < year(getdate())+2
I've come quite far in my SQL journey and I know this should be possible but is still a bit above my skill level. How should I go about this?
I have a table named stock and sales as below :
Stock Table :
+--------+----------+---------+
| Stk_ID | Stk_Name | Stk_Qty |
+--------+----------+---------+
| 1001 | A | 20 |
| 1002 | B | 50 |
+--------+----------+---------+
Sales Table :
+----------+------------+------------+-----------+
| Sales_ID | Sales_Date | Sales_Item | Sales_Qty |
+----------+------------+------------+-----------+
| 2001 | 2016-07-15 | A | 5 |
| 2002 | 2016-07-20 | B | 7 |
| 2003 | 2016-07-23 | A | 4 |
| 2004 | 2016-07-29 | A | 2 |
| 2005 | 2016-08-03 | B | 15 |
| 2006 | 2016-08-07 | B | 10 |
| 2007 | 2016-08-10 | A | 5 |
+----------+------------+------------+-----------+
With the table above, how can I find the available stock Ava_Stk for each stock after every sales?
Ava_Stk is expected to subtract Sales_Qty from Stk_Qty after every sales.
+----------+------------+------------+-----------+---------+
| Sales_ID | Sales_Date | Sales_Item | Sales_Qty | Ava_Stk |
+----------+------------+------------+-----------+---------+
| 2001 | 2016-07-15 | A | 5 | 15 |
| 2002 | 2016-07-20 | B | 7 | 43 |
| 2003 | 2016-07-23 | A | 4 | 11 |
| 2004 | 2016-07-29 | A | 2 | 9 |
| 2005 | 2016-08-03 | B | 15 | 28 |
| 2006 | 2016-08-07 | B | 10 | 18 |
| 2007 | 2016-08-10 | A | 5 | 4 |
+----------+------------+------------+-----------+---------+
Thank you!
You want a cumulative sum and to subtract it from the stock table. In SQL Server 2012+:
select s.*,
(st.stk_qty -
sum(s.sales_qty) over (partition by s.sales_item order by sales_date)
) as ava_stk
from sales s join
stock st
on s.sales_item = st.stk_name;