Sql Server Rank on Value Range - sql-server

I have a table with three columns, ID, Date, Value. I want to rank the rows such that, within an ID, the Ranking goes up with each date where Value is at least X, otherwise, Ranking stays the same.
Given ID, Date, and Values like these
1, 6/1, 8
1, 6/2, 12
1, 6/3, 14
1, 6/4, 9
1, 6/5, 11
I would like to return a ranking based on values of at least 10, such that I would have ID, Date, Value, and Rank like this:
1, 6/1, 8, 0
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, 2
1, 6/5, 11, 3
In other words, the ranking increases each time the value exceeds a threshhold, otherwise it stays the same.
What I have tried is
SELECT T1.*, X.Ranking FROM TABLE T1
LEFT JOIN ( SELECT *, DENSE_RANK( ) OVER ( PARTITION BY T2.ID ORDER BY T2.DATE ) Ranking
FROM TABLE T2 WHERE T2.VALUE >= 10 ) X
ON T1.ID = T2.ID AND T1.Date = T2.Date
This almost works. It gets me output like
1, 6/1, 8, NULL
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, NULL
1, 6/5, 11, 3
Then, I want to turn the first NULL into a 0, and the second into a 2.
I turned the above query into a cte and tried
SELECT T1.*, CASE WHEN T1.Ranking IS NULL THEN ISNULL( (
SELECT MAX( T2.Ranking )
FROM cte T2 WHERE T1.ID = T2.ID AND T1.Date > T2.Date, 0 )
ELSE T1.Ranking END NewRanking
FROM cte T1
This looks like it would work, but my table has 200,000 rows and the query ran for 25 minutes... So, I'm looking for something a little more out of the box than the SELECT MAX.

You are using SQL Server 2012, so you can do a cumulative sum:
select t.*,
sum(case when value >= 10 then 1 else 0 end) over
(partition by id order by date) as ranking
from table t;

EDIT: This actually does not work. In spirit it fetches the previous LAG value and increment it, but this is not how LAG works... it would be 'recursive' in essence which results in a 'my_rank' is undefined syntax error. Better solution is the accepted answer based on a cumulative sum.
If you have SQL Server 2012 (you didn't tag your question), you can do something like:
SELECT
LAG(my_rank, 1, 0) OVER (ORDER BY DATE)
+ CASE WHEN VALUE >= 10 THEN 1 ELSE 0 END AS my_rank
FROM T1

Related

Insert dummy rows to fill missing values into a SQL Table

I have this SQL Server table table1 which I want to fill with dummy rows per acct up to latest previous month end date period e.g now would be up to 2021-06-30.
In this example, acct 1 has n number of rows which ends at 2020-05-31, and I want to insert dummy rows with same values for acct and amt with begin_date and end_date incrementing by 1 month up to 06-30-2021.
Let's assume acct 2 already ends at 06-30-2021 so this doesn't need dummy rows to be inserted.
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
2 , 50, 2021-05-01, 2021-05-31
2 , 50, 2021-06-01, 2021-06-30
So for acct 1, I want n number of rows to be inserted from last period of 2020-05-31 up to previous month end which is now 06-30-2021 and I want the amt and acct to remain same. So it would look like this below:
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
1 , 10, 2020-06-01, 2020-06-30
1 , 10, 2020-07-01, 2020-07-31
.............................
.............................
1 , 10, 2021-06-01, 2021-06-30
Based on some data anamolies, I realize I need another condition to the solution. Suppose another column type was added to the table1. So acct and type would be the composite key that identifies each related row hence acct 2 type A and acct 2 type B are not related. So we have the updated table:
acct,type,amt,begin_date,end_date
1, A, 10, 2020-04-01, 2020-04-30
1, A, 10, 2020-05-01, 2020-05-31
2, A, 50, 2021-05-01, 2021-05-31
2, A, 50, 2021-06-01, 2021-06-30
2, B, 50, 2021-01-01, 2021-01-31
2, B, 50, 2021-02-01, 2021-02-28
I would now need dummy rows to be created for acct 2 type B up to 2021-06-30. We already know acct 2 type A would be ok since it already has rows up to 2021-06-30
You can generate the rows using a recursive CTE:
with cte as (
select acct, amt,
dateadd(day, 1, end_date) as begin_date,
eomonth(dateadd(day, 1, end_date)) as end_date
from (select t.*,
row_number() over (partition by acct order by end_date desc) as seqnum
from t
) t
where seqnum = 1 and end_date < '2021-06-30'
union all
select acct, amt, dateadd(month, 1, begin_date),
eomonth(dateadd(month, 1, begin_date))
from cte
where begin_date < '2021-06-01'
)
select *
from cte;
You can then use insert to insert these rows into a table. Or use union all if you simply want a result set with all the rows.
Here is a db<>fiddle.

MSSQL Join Calculation

i am searching for a solution in regards to joining and MSSQL.
I have two tables.
The first one the Basic Table:
ID, Name, Key
1, Test1, 1x11
2, Test2, 2x22
3, Test3, 3x33
The second is the table which I want to join to the Basic table:
Key, Action, create, close, duration
1x11, 1, 01/01/2021 06:00,01/01/2021 07:00, 1
1x11, 5, 01/01/2021 07:00,01/01/2021 10:00, 1
1x11, 10, 01/01/2021 10:00,0, 0
2x22, 1, 01/01/2021 10:00,01/01/2021 11:00, 1
2x22, 5, 01/01/2021 11:00,01/01/2021 12:00, 1
2x22, 7, 01/01/2021 12:00,01/01/2021 13:00, 1
2x22, 5, 01/01/2021 13:00,01/01/2021 14:00, 1
2x22, 10, 01/01/2021 14:00,0, 0
3x33, 1, 01/01/2021 10:00,01/01/2021 12:00, 2
3x33, 10, 01/01/2021 12:00,0, 0
In this table the closedate was not given, so i had to use the following command to get the closedate (closedate is the next createdate):
lead (create,1) OVER (PARTITION BY Key ORDER BY create) AS close
Now, my goal is to join the sum(of ActionNumber 5 per Key) to the basic table
Can someone tell me how to do that? I am really frustrated.
Final Table:
ID, Name, Key, join(sum of 5)
1, Test1, 1x11,1
2, Test2, 2x22,2 (because there are two times one hour that means 2h)
3, Test3, 3x33,0
Thanks for helping. Christian
If the two tables exist then this should be a simple aggregation.
SELECT
B.ID,
B.Name,
B.Key,
CountAction5 = SUM(CASE WHEN S.Action = 5 THEN Duration ELSE 0 END)
FROM
BasicTable B
INNER JOIN SecondTable S ON S.Key = B.Key
GROUP BY
B.ID,
B.Name,
B.Key
This is simple, all you need is to do conditional aggregation:
SELECT [key], SUM(CASE WHEN Action = 5 THEN duration ELSE 0 END)
FROM t
GROUP BY [key]
where t is the second table.
Output:
key sum_of_5
-------------
1x11 1
2x22 2
3x33 0
To join back to the original table use a derived table:
SELECT [key], name, sum_of_5
FROM t1
JOIN (
SELECT
[key]
, SUM(CASE WHEN Action = 5 THEN duration ELSE 0 END)
FROM t
GROUP BY [key]
) t2 ON t1.[key] = t2.[key]
Demo here

SQL Server - Behaviour of ROW_NUMBER Partition by Null Value

I find this behaviour very strange and counterintuitive. (Even for SQL).
set ansi_nulls off
go
;with sampledata(Value, CanBeNull) as
(
select 1, 1
union
select 2, 2
union
select 3, null
union
select 4, null
union
select 5, null
union
select 6, null
)
select ROW_NUMBER() over(partition by CanBeNull order by value) 'RowNumber',* from sampledata
Which returns
1 3 NULL
2 4 NULL
3 5 NULL
4 6 NULL
1 1 1
1 2 2
Which means that all of the nulls are being treated as part of the same group for the purpose of calculating the row number. It doesn't matter whether the SET ANSI_NULLLS is on or off.
But since by definition the null is totally unknown then how can the nulls be grouped together like this? It is saying that for the purposes of placing things in a rank order that apples and oranges and the square root of minus 1 and quantum black holes or whatever can be meaningfully ordered. A little experimentation suggests that the first column is being used to generate the rank order as
select 1, '1'
union
select 2, '2'
union
select 5, null
union
select 6, null
union
select 3, null
union
select 4, null
generates the same values. This has significant implications which have caused problems in legacy code I am dealing with. Is this the expected behaviour and is there any way of mitigating it other than replacing the null in the select query with a unique value?
The results I would have expected would have been
1 3 NULL
1 4 NULL
1 5 NULL
1 6 NULL
1 1 1
1 2 2
Using Dense_Rank() makes no difference.
Yo.
So the deal is that when T-SQL is dealing with NULLs in predicates, it uses ternary logic (TRUE, FALSE or UNKNOWN) and displays the behavior that you have stated that you expect from your query. However, when it comes to grouping values, T-SQL treats NULLs as one group. So your query will group the NULLs together and start numbering the rows within that window.
For the results that you say you would like to see, this query should work...
WITH sampledata (Value, CanBeNull)
AS
(
SELECT 1, 1
UNION
SELECT 2, 2
UNION
SELECT 3, NULL
UNION
SELECT 4, NULL
UNION
SELECT 5, NULL
UNION
SELECT 6, NULL
)
SELECT
DENSE_RANK() OVER (PARTITION BY CanBeNull ORDER BY CASE WHEN CanBeNull IS NOT NULL THEN value END ASC) as RowNumber
,Value
,CanBeNull
FROM sampledata

Add new rows to resultset in MSSQL

I am running a SQL query in MSSQL 2008 R2 which should always return a consistent resultset, meaning that all dates within a selected date range should be shown, although there are no rows/values in the database for a particular date within the date range. It should for example look like this for the dates 2013-07-03 - 2013-07-04 when there are values for id 1 and 2.
Scenario 1
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
2013-07-04-1, 10, 2
2013-07-04-2, 12, 2
2013-07-04-...
2013-07-04-24, 10, 2
However, if id 2 is missing values for 2013-07-04, I will normally only get a resultset which looks like this:
Scenario 2
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
Scenario 2 will create an inconsistent resultset which will affect the output. Is there any way to make the SQL query always return as scenario 1 even when there are missing values, so at least to return NULL if there are no values for a specific date within the date range. If the resultset returns id 1 and 2 then all dates for id 1 and 2 should be covered. If id 1, 2 and 3 are returned then all dates for id 1, 2 and 3 should be covered.
I have two tables which look like this:
tbl_measurement
id, date, hour1, hour2, ..., hour24
tbl_plane
planeId, id, maxSpeed
The SQL query I am running look like this:
SELECT DISTINCT hour00_01, hour01_02, mr.date, mr.id, maxSpeed
FROM tbl_measurement as mr, tbl_plane as p
WHERE (date >= '2013-07-03' AND date <= '2013-07-04') AND p.id = mr.id
GROUP BY mr.id, mr.date, hour00_01, hour01_02, p.maxSpeed
ORDER BY mr.id, mr.date
I have been looking around quite a bit, and perhaps PIVOT tables are the way to solve this? Could you please help me out? I would appreciate if you can help me out with how to write the SQL query for this purpose.
You can use a recursive CTE to generate a list of dates. If you cross join that with planes, you get one row per date per plane. With a left join, you can link in measurements if they exist. A left join will leave the row even if no measurement is found.
For example:
declare #startDt date = '2013-01-01'
declare #endDt date = '2013-06-30'
; with AllDates as
(
select #startDt as dt
union all
select dateadd(day, 1, dt)
from AllDates
where dateadd(day, 1, dt) <= #endDt
)
select *
from AllDates ad
cross join
tbl_plane p
left join
(
select row_number() over (partition by Id, cast([date] as date) order by id) rn
, *
from tbl_measurement
where m.inputType = 'forecast'
) m
on p.Id = m.Id
and m.date = ad.dt
and m.rn = 1 -- Only one per day
where p.planeType = 3
option (maxrecursion 0)

Tsql group by clause with exceptions

I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.

Resources