I am having trouble narrowing down sales in top regions that occurred in consecutive months. I know I need to use some form of window function with Row_Number or Dense_Rank, but I am having trouble getting the final output
Here is my source data:
+--------+-----------+------------+
| Fruit | SaleDate | Top_Region |
+--------+-----------+------------+
| Apple | 1/1/2017 | 1 |
| Apple | 2/1/2017 | 1 |
| Apple | 3/1/2017 | 1 |
| Apple | 4/1/2017 | 0 |
| Apple | 5/1/2017 | 0 |
| Apple | 6/1/2017 | 0 |
| Apple | 7/1/2017 | 1 |
| Apple | 8/1/2017 | 1 |
| Apple | 9/1/2017 | 1 |
| Apple | 10/1/2017 | 1 |
| Apple | 11/1/2017 | 0 |
| Apple | 12/1/2017 | 0 |
| Banana | 1/1/2017 | 0 |
| Banana | 2/1/2017 | 0 |
| Banana | 3/1/2017 | 1 |
| Banana | 4/1/2017 | 1 |
| Banana | 5/1/2017 | 1 |
| Banana | 6/1/2017 | 1 |
| Banana | 7/1/2017 | 1 |
| Banana | 8/1/2017 | 1 |
| Banana | 9/1/2017 | 0 |
| Banana | 10/1/2017 | 1 |
| Banana | 11/1/2017 | 1 |
| Banana | 12/1/2017 | 0 |
+--------+-----------+------------+
This is the expected output:
+--------+-----------+-----------+-------+
| Fruit | Start | End | Total |
+--------+-----------+-----------+-------+
| Apple | 1/1/2017 | 3/1/2017 | 3 |
| Apple | 7/1/2017 | 10/1/2017 | 4 |
| Banana | 3/1/2017 | 8/1/2017 | 6 |
| Banana | 10/1/2017 | 11/1/2017 | 2 |
+--------+-----------+-----------+-------+
The goal is to have instances of top region sales in succession with missing in one month.
So far I have tried a few different combinations, with this being the closest.
SELECT fruit,
MIN(saledate) AS spanStart ,
MAX(saledate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT s.* ,
( ROW_NUMBER() OVER ( ORDER BY month )
- ROW_NUMBER() OVER ( PARTITION BY fruit, topregion ORDER BY month ) ) AS fruits
FROM #salesdata s
) s
GROUP BY fruit,fruits ,
topregion
HAVING topregion = 1
ORDER BY COUNT(*) DESC;
Any help would be greatly appreciated
This is a typical gaps-and-island problem. One strategy is to identify the groups of adjacent rows groups by computing the difference between two row_number()s. We can then filter on groups having top_region = 1 and use aggregation to get the start date, end date and number of records per group.
Your query is really close, but the first row_number() is missing a partition by fruit in its over() clause. And I find that aliasing that column fruits where another column is called fruit is error prone.
select
fruit,
min(sale_date) start_date,
max(sale_date) end_date,
count(*) total
from (
select
t.*,
row_number() over(partition by fruit order by sale_date) rn1,
row_number() over(partition by fruit, top_region order by sale_date) rn2
from mytable t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date
You can run the inner query separately to see the result it produces.
Demo on DB Fiddle:
fruit | start_date | end_date | total
:----- | :--------- | :--------- | ----:
Apple | 2017-01-01 | 2017-01-03 | 3
Apple | 2017-01-07 | 2017-01-10 | 4
Banana | 2017-01-03 | 2017-01-08 | 6
Banana | 2017-01-10 | 2017-01-11 | 2
Related
I'm trying to calculate how long a machine was in a specific state then sum by hour. The state is only recorded on change so we can assume it was in the same state until changed.
I was trying to use partition, but I don't think that is the correct approach.
My table structure ordered desc:
+----------+-------------------------+
| state_id | t_stamp |
+----------+-------------------------+
| 0 | 2020-06-01 10:44:06.663 |
| 2 | 2020-06-01 10:43:56.660 |
| 0 | 2020-06-01 10:43:06.653 |
| 2 | 2020-06-01 10:42:56.653 |
| 0 | 2020-06-01 10:41:36.643 |
| 3 | 2020-06-01 10:41:26.640 |
| 0 | 2020-06-01 10:41:16.640 |
| 2 | 2020-06-01 10:40:56.637 |
| 0 | 2020-06-01 10:40:06.630 |
| 3 | 2020-06-01 10:39:56.630 |
+----------+-------------------------+
What I'm trying to get to:
+----------+------------------+
| state_id | duration_seconds |
+----------+------------------+
| 2 | 10 |
| 0 | 50 |
+----------+------------------+
You can use window functions, then aggregation:
select
state_id,
sum(datediff(second, t_stamp, lead_t_stamp)) duration_second
from (
select
t.*,
lead(t_stamp) over(order by t_stamp) lead_t_stamp
from mytable t
) t
where lead_t_stamp is not null
group by state_id
order by state_id
This demo on DB Fiddle with your uample data returns:
state_id | duration_second
-------: | --------------:
0 | 190
2 | 40
3 | 20
I have following SQL Server table ITEM:
+------------+-----------+------+--------+-----------+------------+
| Date | item_code | name | in/out | total_qty | SortNumber |
+------------+-----------+------+--------+-----------+------------+
| 08/07/2019 | 001 | A | -50 | 100 | 8 |
| 07/07/2019 | 001 | A | 50 | 100 | 7 |
| 06/07/2019 | 003 | C | 25 | 25 | 6 |
| 05/07/2019 | 001 | A | 50 | 50 | 5 |
| 04/07/2019 | 002 | B | 100 | 200 | 4 |
| 03/07/2019 | 003 | C | -25 | 0 | 3 |
| 02/07/2019 | 003 | C | 25 | 25 | 2 |
| 01/07/2019 | 002 | B | 100 | 100 | 1 |
+------------+-----------+------+--------+-----------+------------+
I've tried:
select itemcode, max(Sort_Number)
from ITEM
group by item_code
order by item_code asc
but I want result:
+---------------------+-----------+------------------+
| Distinct(item_code) | Total_qty | Max(Sort_Number) |
+---------------------+-----------+------------------+
| 001 | 100 | 8 |
| 002 | 200 | 4 |
| 003 | 25 | 6 |
+---------------------+-----------+------------------+
Can anyone help me?
The below query gives you the desired result -
With cteItem as
(
select item_code, total_qty, SortNumber,
Row_Number() over (partition by item_code order by SortNumber desc) maxSortNumber
from ITEM
)
select item_code, total_qty, SortNumber from cteItem where maxSortNumber = 1
just need to add max(sort_number) to your query
select item_code ,max(total_qty), max(sort_number)
from ITEM
group by item_code
order by item_code asc
Let's say I have the following table (data is completely fiction):
ID | MonthDate | PersonID | Name | Status | MonthsAgoSinceLastCheck
1 | 2017-12 | 900 | Jack | Ill | -
2 | 2018-01 | 900 | Jack | Ill | 1
3 | 2018-02 | 900 | Jack | Ill | 2
4 | 2018-03 | 900 | Jack | Healthy | 1
5 | 2017-02 | 901 | Bill | Ill | -
6 | 2017-03 | 901 | Bill | Ill | 1
7 | 2017-05 | 901 | Bill | Healthy | 1
For each record, I would like to see the previous status that person had X months ago since last check (column MonthsAgoSinceLastCheck). Notice that MonthDate can skip months.
So in this case, the result would be
ID | MonthDate | PersonID | Name | Status | MonthsAgoSinceLastCheck | PreviousSatus
1 | 2017-12 | 900 | Jack | Ill | - | -
2 | 2018-01 | 900 | Jack | Ill | 1 | Ill
3 | 2018-02 | 900 | Jack | Ill | 2 | Ill
4 | 2018-03 | 900 | Jack | Healthy | 1 | Ill
5 | 2017-02 | 901 | Bill | Healthy | - | -
6 | 2017-03 | 901 | Bill | Healthy | 1 | Healthy
7 | 2017-05 | 901 | Bill | Ill | 2 | Healthy
Any sugestions/tips? I tried to do this with CTE's and self-joins but failed on both.
It's way easier to use full dates than year and months separately. The first thing you should do is generate a full date from your year + month. Then just self join with previous month, depending on the last check.
;WITH DataWithDates AS
(
SELECT
T.ID,
MonthDate = CONVERT(DATE, T.MonthDate + '-01'),
T.PersonID,
T.Name,
T.Status,
T.MonthsAgoSinceLastCheck
FROM
YourTable AS T
)
SELECT
D.ID,
D.MonthDate,
D.PersonID,
D.Name,
D.Status,
D.MonthsAgoSinceLastCheck,
PreviousStatus = N.Status
FROM
DataWithDates AS D
LEFT JOIN DataWithDates AS N ON
D.PersonID = N.PersonID AND
N.MonthDate = DATEADD(MONTH, -1 * D.MonthsAgoSinceLastCheck, D.MonthDate)
I'm assuming your MonthDate has values for all rows, otherwise the conversion will fail. I'm also assuming that your - values for MonthsAgoSinceLastCheck are actually NULL.
try this:
select *,LAG(Status) OVER(Partition by Name Order by MonthDate,Id) AS PreviousSatus
from tab1
order by id
SQl Fiddle:http://sqlfiddle.com/#!18/04407/4
I have a table with a data as bellow :
+--------+----------+-------+------------+--------------+
| month | code | type | date | PersonID |
+--------+----------+-------+------------+--------------+
| 201501 | 178954 | 3 | 2014-12-3 | 10 |
| 201501 | 178954 | 3 | 2014-12-3 | 10 |
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178956 | 2 | 2014-12-11 | 10 |
| 201501 | 178958 | 1 | 2014-12-10 | 10 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178954 | 1 | 2014-12-11 | 13 |
| 201501 | 178954 | 1 | 2014-12-11 | 13 |
+--------+----------+-------+------------+--------------+
In my first 6 lines i have the same PersonID in the same Month What i want if i have the same personID in the same Month i want to select the person who have the type is 2 with the recent date in my case the output will be like as bellow:
+--------+--------+------+------------+----------+
| month | code | type| date | PersonID |
+--------+--------+------+------------+----------+
| 201501 | 178955 | 2 | 2014-12-13 | 10 |
| 201501 | 178959 | 2 | 2014-12-12 | 15 |
| 201501 | 178954 | 2 | 2014-12-11 | 13 |
+--------+--------+------+------------+----------+
Also if they are some duplicate rows i don't want to display it
They are any solution to that ?
Simply use GROUP BY:
https://msdn.microsoft.com/de-de/library/ms177673(v=sql.120).aspx
SELECT mont, code, ... FROM tabelname GROUP BY PersonID, date, ...
Note that you have to specifiy all columns in the group by.
SELECT DISTINCT A.month, A.code, A.type, B.date, B.PersonID FROM YourTable A
INNER JOIN (SELECT PersonID, MAX(date) as date FROM YourTable
GROUP BY PersonID) B
ON (A.PersonID = B.PersonID
AND A.date = B.date)
WHERE A.type = 2 ORDER BY B.date DESC, A.PersonID
Just in case you/others are still wondering.
I have a table that holds values for particular months:
| MFG | DATE | FACTOR |
-----------------------------
| 1 | 2013-01-01 | 1 |
| 2 | 2013-01-01 | 0.8 |
| 2 | 2013-02-01 | 1 |
| 2 | 2013-12-01 | 1.55 |
| 3 | 2013-01-01 | 1 |
| 3 | 2013-04-01 | 1.3 |
| 3 | 2013-05-01 | 1.2 |
| 3 | 2013-06-01 | 1.1 |
| 3 | 2013-07-01 | 1 |
| 4 | 2013-01-01 | 0.9 |
| 4 | 2013-02-01 | 1 |
| 4 | 2013-12-01 | 1.8 |
| 5 | 2013-01-01 | 1.4 |
| 5 | 2013-02-01 | 1 |
| 5 | 2013-10-01 | 1.3 |
| 5 | 2013-11-01 | 1.2 |
| 5 | 2013-12-01 | 1.5 |
What I would like to do is pivot these using a calendar table (already defined):
And finally, cascade the NULL columns to use the previous value.
What I've got so far is a query that will populate the NULLs with the last value for mfg = 3. Each mfg will always have a value for the first of the year. My question is; how do I pivot this and extend to all mfg?
SELECT c.[date],
f.[factor],
Isnull(f.[factor], (SELECT TOP 1 factor
FROM factors
WHERE [date] < c.[date]
AND [factor] IS NOT NULL
AND mfg = 3
ORDER BY [date] DESC)) AS xFactor
FROM (SELECT [date]
FROM calendar
WHERE Datepart(yy, [date]) = 2013
AND Datepart(d, [date]) = 1) c
LEFT JOIN (SELECT [date],
[factor]
FROM factors
WHERE mfg = 3) f
ON f.[date] = c.[date]
Result
| DATE | FACTOR | XFACTOR |
---------------------------------
| 2013-01-01 | 1 | 1 |
| 2013-02-01 | (null) | 1 |
| 2013-03-01 | (null) | 1 |
| 2013-04-01 | 1.3 | 1.3 |
| 2013-05-01 | 1.2 | 1.2 |
| 2013-06-01 | 1.1 | 1.1 |
| 2013-07-01 | 1 | 1 |
| 2013-08-01 | (null) | 1 |
| 2013-09-01 | (null) | 1 |
| 2013-10-01 | (null) | 1 |
| 2013-11-01 | (null) | 1 |
| 2013-12-01 | (null) | 1 |
SQL Fiddle
Don't know if you need the dates to be dynamic from the calender table or if mfg can be more than 5 but this should give you some ideas.
select *
from (
select c.date,
t.mfg,
(
select top 1 f.factor
from factors as f
where f.date <= c.date and
f.mfg = t.mfg and
f.factor is not null
order by f.date desc
) as factor
from calendar as c
cross apply(values(1),(2),(3),(4),(5)) as t(mfg)
) as t
pivot (
max(t.factor) for t.date in ([20130101], [20130201], [20130301],
[20130401], [20130501], [20130601],
[20130701], [20130801], [20130901],
[20131001], [20131101], [20131201])
) as P
SQL Fiddle