Insert dummy rows to fill missing values into a SQL Table - sql-server

I have this SQL Server table table1 which I want to fill with dummy rows per acct up to latest previous month end date period e.g now would be up to 2021-06-30.
In this example, acct 1 has n number of rows which ends at 2020-05-31, and I want to insert dummy rows with same values for acct and amt with begin_date and end_date incrementing by 1 month up to 06-30-2021.
Let's assume acct 2 already ends at 06-30-2021 so this doesn't need dummy rows to be inserted.
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
2 , 50, 2021-05-01, 2021-05-31
2 , 50, 2021-06-01, 2021-06-30
So for acct 1, I want n number of rows to be inserted from last period of 2020-05-31 up to previous month end which is now 06-30-2021 and I want the amt and acct to remain same. So it would look like this below:
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
1 , 10, 2020-06-01, 2020-06-30
1 , 10, 2020-07-01, 2020-07-31
.............................
.............................
1 , 10, 2021-06-01, 2021-06-30
Based on some data anamolies, I realize I need another condition to the solution. Suppose another column type was added to the table1. So acct and type would be the composite key that identifies each related row hence acct 2 type A and acct 2 type B are not related. So we have the updated table:
acct,type,amt,begin_date,end_date
1, A, 10, 2020-04-01, 2020-04-30
1, A, 10, 2020-05-01, 2020-05-31
2, A, 50, 2021-05-01, 2021-05-31
2, A, 50, 2021-06-01, 2021-06-30
2, B, 50, 2021-01-01, 2021-01-31
2, B, 50, 2021-02-01, 2021-02-28
I would now need dummy rows to be created for acct 2 type B up to 2021-06-30. We already know acct 2 type A would be ok since it already has rows up to 2021-06-30

You can generate the rows using a recursive CTE:
with cte as (
select acct, amt,
dateadd(day, 1, end_date) as begin_date,
eomonth(dateadd(day, 1, end_date)) as end_date
from (select t.*,
row_number() over (partition by acct order by end_date desc) as seqnum
from t
) t
where seqnum = 1 and end_date < '2021-06-30'
union all
select acct, amt, dateadd(month, 1, begin_date),
eomonth(dateadd(month, 1, begin_date))
from cte
where begin_date < '2021-06-01'
)
select *
from cte;
You can then use insert to insert these rows into a table. Or use union all if you simply want a result set with all the rows.
Here is a db<>fiddle.

Related

Filter table to show only most recent values [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 11 months ago.
I have a table that looks like this.
Category
Type
fromDate
Value
1
1
1/1/2022
5
1
2
1/1/2022
10
2
1
1/1/2022
7.5
2
2
1/1/2022
15
3
1
1/1/2022
3.5
3
2
1/1/2022
5
3
1
4/1/2022
5
3
2
4/1/2022
10
I'm trying to filter this table down to filter down and keep the most recent grouping of Category/Type. IE rows 5 and 6 would be removed in the query since they are older records.
So far I have the below query but I am getting an aggregate error due to not aggregating the "Value" column. My question is how do I get around this without aggregating? I want to keep the actual value that is in the column.
SELECT T1.Category, T1.Type, T2.maxDate, T1.Value
FROM (SELECT Category, Type, MAX(fromDate) AS maxDate
FROM Table GROUP BY Category,Type) T2
INNER JOIN Table T1 ON T1.Category=T2.Category
GROUP BY T1.Category, T1.Type, T2.MaxDate
This has been asked and answered dozens and dozens of times. But it was quick and painless to type up an answer. This should work for you.
declare #MyTable table
(
Category int
, Type int
, fromDate date
, Value decimal(5,2)
)
insert #MyTable
select 1, 1, '1/1/2022', 5 union all
select 1, 2, '1/1/2022', 10 union all
select 2, 1, '1/1/2022', 7.5 union all
select 2, 2, '1/1/2022', 15 union all
select 3, 1, '1/1/2022', 3.5 union all
select 3, 2, '1/1/2022', 5 union all
select 3, 1, '4/1/2022', 5 union all
select 3, 2, '4/1/2022', 10
select Category
, Type
, fromDate
, Value
from
(
select *
, RowNum = ROW_NUMBER() over(partition by Category, Type order by fromDate desc)
from #MyTable
) x
where x.RowNum = 1
order by x.Category
, x.Type

Group records based on time interval starting from timestamp of first record in each group

Struggling with this; need to group records within a specific time interval starting from the first timestamp (FREEZE_TIME) - but the first record outside the first group is the starting point for the time interval for the next group and so on. Expected result, THAW_COUNT, is the count of all groups for a PARENT_SAMPLE_ID. So for table:
SAMPLE_ID
FREEZE_TIME
PARENT_SAMPLE_ID
1
null
null
2
2015-11-27 10:23:10
1
3
2015-11-27 10:59:23
1
4
2015-11-27 11:05:43
1
5
2015-11-27 12:53:48
1
6
2015-11-27 13:42:25
1
I would like to get a result of:
PARENT_SAMPLE_ID
THAW_COUNT
1
2
So sample_id:s 2,3 and 4 should be in the same group and sample id:s 5 and 6 are in the next group.
I have tried something like:
with SampleList as
(
select PARENT_SAMPLE_ID, FREEZE_TIME,
ROW_NUMBER() OVER (partition by PARENT_SAMPLE_ID order by FREEZE_TIME asc) RN
from
SAMPLE
)
,
FirstSample as
(
select PARENT_SAMPLE_ID, FREEZE_TIME
from SampleList
where RN = 1
)
,
SelectedSample as
(
select
s.PARENT_SAMPLE_ID,
ABS(DATEDIFF(MINUTE, s.FREEZE_TIME, sFirst.FREEZE_TIME))/60 DiffToFirst
from SampleList s
inner join FirstSample sFirst ON s.PARENT_SAMPLE_ID = sFirst.PARENT_SAMPLE_ID
group by s.PARENT_SAMPLE_ID, ABS(DATEDIFF(MINUTE, s.FREEZE_TIME, sFirst.FREEZE_TIME))/60
)
select PARENT_SAMPLE_ID, count(*) THAW_COUNT
from SelectedSample
group by PARENT_SAMPLE_ID
But this will return a THAW_COUNT of 3 as sampleId:s 5 and 6 will be in different groups because the grouping is based on hour intervals from freeze time of sampleId 2 only. How do I get the grouping for group 2 to start from the first record outside the first group (sampleId 5) and so on?
This can be treated as a gaps and islands problem. Using some windows functions to check counts and using LAG to look at the "previous" row we can solve this. If you have multiple values for SAMPLE_ID you will want to add some partitioning.
create table #Something
(
SAMPLE_ID int
, FREEZE_TIME datetime
, PARENT_SAMPLE_ID int
)
insert #Something
select 1, null, null union all
select 2, '2015-11-27 10:23:10', 1 union all
select 3, '2015-11-27 10:59:23', 1 union all
select 4, '2015-11-27 11:05:43', 1 union all
select 5, '2015-11-27 12:53:48', 1 union all
select 6, '2015-11-27 13:42:25', 1;
with MyGroups as
(
select *
, GroupNum = count(IsNewGroup) over (order by FREEZE_TIME rows unbounded preceding)
from
(
select *
, IsNewGroup = case when LAG(FREEZE_TIME, 1, '') over(order by FREEZE_TIME) < dateadd(hour, -1, FREEZE_TIME) then 1 end
from #Something
) x
)
select coalesce(PARENT_SAMPLE_ID, SAMPLE_ID)
, count(distinct GroupNum)
from MyGroups
group by coalesce(PARENT_SAMPLE_ID, SAMPLE_ID)
drop table #Something

Finding the Datediff between Records in same Table

IP QID ScanDate Rank
101.110.32.80 6 2016-09-28 18:33:21.000 3
101.110.32.80 6 2016-08-28 18:33:21.000 2
101.110.32.80 6 2016-05-30 00:30:33.000 1
I have a Table with certain records, grouped by Ipaddress and QID.. My requirement is to find out which record missed the sequence in the date column or other words the date difference is more than 30 days. In the above table date diff between rank 1 and rank 2 is more than 30 days.So, i should flag the rank 2 record.
You can use LAG in Sql 2012+
declare #Tbl Table (Ip VARCHAR(50), QID INT, ScanDate DATETIME,[Rank] INT)
INSERT INTO #Tbl
VALUES
('101.110.32.80', 6, '2016-09-28 18:33:21.000', 3),
('101.110.32.80', 6, '2016-08-28 18:33:21.000', 2),
('101.110.32.80', 6, '2016-05-30 00:30:33.000', 1)
;WITH Result
AS
(
SELECT
T.Ip ,
T.QID ,
T.ScanDate ,
T.[Rank],
LAG(T.[Rank]) OVER (ORDER BY T.[Rank]) PrivSRank,
LAG(T.ScanDate) OVER (ORDER BY T.[Rank]) PrivScanDate
FROM
#Tbl T
)
SELECT
R.Ip ,
R.QID ,
R.ScanDate ,
R.Rank ,
R.PrivScanDate,
IIF(DATEDIFF(DAY, R.PrivScanDate, R.ScanDate) > 30, 'This is greater than 30 day. Rank ' + CAST(R.PrivSRank AS VARCHAR(10)), '') CFlag
FROM
Result R
Result:
Ip QID ScanDate Rank CFlag
------------------------ ----------- ----------------------- ----------- --------------------------------------------
101.110.32.80 6 2016-05-30 00:30:33.000 1
101.110.32.80 6 2016-08-28 18:33:21.000 2 This is greater than 30 day. Rank 1
101.110.32.80 6 2016-09-28 18:33:21.000 3 This is greater than 30 day. Rank 2
While Window Functions could be used here, I think a self join might be more straight forward and easier to understand:
SELECT
t1.IP,
t1.QID,
t1.Rank,
t1.ScanDate as endScanDate,
t2.ScanDate as beginScanDate,
datediff(day, t2.scandate, t1.scandate) as scanDateDays
FROM
table as t1
INNER JOIN table as t2 ON
t1.ip = t2.ip
t1.rank - 1 = t2.rank --get the record from t2 and is one less in rank
WHERE datediff(day, t2.scandate, t1.scandate) > 30 --only records greater than 30 days
It's pretty self-explanatory. We are joining the table to itself and joining the ranks together where rank 2 gets joined to rank 1, rank 3 gets joined to rank 2, and so on. Then we just test for records that are greater than 30 days using the datediff function.
I would use windowed function to avoid self join which in many case will perform better.
WITH cte
AS (
SELECT
t.IP
, t.QID
, LAG(t.ScanDate) OVER (PARTITION BY t.IP ORDER BY T.ScanDate) AS beginScanDate
, t.ScanDate AS endScanDate
, DATEDIFF(DAY,
LAG(t.ScanDate) OVER (PARTITION BY t.IP ORDER BY t.ScanDate),
t.ScanDate) AS Diff
FROM
MyTable AS t
)
SELECT
*
FROM
cte c
WHERE
Diff > 30;

Sql Server Rank on Value Range

I have a table with three columns, ID, Date, Value. I want to rank the rows such that, within an ID, the Ranking goes up with each date where Value is at least X, otherwise, Ranking stays the same.
Given ID, Date, and Values like these
1, 6/1, 8
1, 6/2, 12
1, 6/3, 14
1, 6/4, 9
1, 6/5, 11
I would like to return a ranking based on values of at least 10, such that I would have ID, Date, Value, and Rank like this:
1, 6/1, 8, 0
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, 2
1, 6/5, 11, 3
In other words, the ranking increases each time the value exceeds a threshhold, otherwise it stays the same.
What I have tried is
SELECT T1.*, X.Ranking FROM TABLE T1
LEFT JOIN ( SELECT *, DENSE_RANK( ) OVER ( PARTITION BY T2.ID ORDER BY T2.DATE ) Ranking
FROM TABLE T2 WHERE T2.VALUE >= 10 ) X
ON T1.ID = T2.ID AND T1.Date = T2.Date
This almost works. It gets me output like
1, 6/1, 8, NULL
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, NULL
1, 6/5, 11, 3
Then, I want to turn the first NULL into a 0, and the second into a 2.
I turned the above query into a cte and tried
SELECT T1.*, CASE WHEN T1.Ranking IS NULL THEN ISNULL( (
SELECT MAX( T2.Ranking )
FROM cte T2 WHERE T1.ID = T2.ID AND T1.Date > T2.Date, 0 )
ELSE T1.Ranking END NewRanking
FROM cte T1
This looks like it would work, but my table has 200,000 rows and the query ran for 25 minutes... So, I'm looking for something a little more out of the box than the SELECT MAX.
You are using SQL Server 2012, so you can do a cumulative sum:
select t.*,
sum(case when value >= 10 then 1 else 0 end) over
(partition by id order by date) as ranking
from table t;
EDIT: This actually does not work. In spirit it fetches the previous LAG value and increment it, but this is not how LAG works... it would be 'recursive' in essence which results in a 'my_rank' is undefined syntax error. Better solution is the accepted answer based on a cumulative sum.
If you have SQL Server 2012 (you didn't tag your question), you can do something like:
SELECT
LAG(my_rank, 1, 0) OVER (ORDER BY DATE)
+ CASE WHEN VALUE >= 10 THEN 1 ELSE 0 END AS my_rank
FROM T1

Add new rows to resultset in MSSQL

I am running a SQL query in MSSQL 2008 R2 which should always return a consistent resultset, meaning that all dates within a selected date range should be shown, although there are no rows/values in the database for a particular date within the date range. It should for example look like this for the dates 2013-07-03 - 2013-07-04 when there are values for id 1 and 2.
Scenario 1
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
2013-07-04-1, 10, 2
2013-07-04-2, 12, 2
2013-07-04-...
2013-07-04-24, 10, 2
However, if id 2 is missing values for 2013-07-04, I will normally only get a resultset which looks like this:
Scenario 2
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
Scenario 2 will create an inconsistent resultset which will affect the output. Is there any way to make the SQL query always return as scenario 1 even when there are missing values, so at least to return NULL if there are no values for a specific date within the date range. If the resultset returns id 1 and 2 then all dates for id 1 and 2 should be covered. If id 1, 2 and 3 are returned then all dates for id 1, 2 and 3 should be covered.
I have two tables which look like this:
tbl_measurement
id, date, hour1, hour2, ..., hour24
tbl_plane
planeId, id, maxSpeed
The SQL query I am running look like this:
SELECT DISTINCT hour00_01, hour01_02, mr.date, mr.id, maxSpeed
FROM tbl_measurement as mr, tbl_plane as p
WHERE (date >= '2013-07-03' AND date <= '2013-07-04') AND p.id = mr.id
GROUP BY mr.id, mr.date, hour00_01, hour01_02, p.maxSpeed
ORDER BY mr.id, mr.date
I have been looking around quite a bit, and perhaps PIVOT tables are the way to solve this? Could you please help me out? I would appreciate if you can help me out with how to write the SQL query for this purpose.
You can use a recursive CTE to generate a list of dates. If you cross join that with planes, you get one row per date per plane. With a left join, you can link in measurements if they exist. A left join will leave the row even if no measurement is found.
For example:
declare #startDt date = '2013-01-01'
declare #endDt date = '2013-06-30'
; with AllDates as
(
select #startDt as dt
union all
select dateadd(day, 1, dt)
from AllDates
where dateadd(day, 1, dt) <= #endDt
)
select *
from AllDates ad
cross join
tbl_plane p
left join
(
select row_number() over (partition by Id, cast([date] as date) order by id) rn
, *
from tbl_measurement
where m.inputType = 'forecast'
) m
on p.Id = m.Id
and m.date = ad.dt
and m.rn = 1 -- Only one per day
where p.planeType = 3
option (maxrecursion 0)

Resources