Add new rows to resultset in MSSQL - sql-server

I am running a SQL query in MSSQL 2008 R2 which should always return a consistent resultset, meaning that all dates within a selected date range should be shown, although there are no rows/values in the database for a particular date within the date range. It should for example look like this for the dates 2013-07-03 - 2013-07-04 when there are values for id 1 and 2.
Scenario 1
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
2013-07-04-1, 10, 2
2013-07-04-2, 12, 2
2013-07-04-...
2013-07-04-24, 10, 2
However, if id 2 is missing values for 2013-07-04, I will normally only get a resultset which looks like this:
Scenario 2
Date-hour, value, id
2013-07-03-1, 10, 1
2013-07-03-2, 12, 1
2013-07-03-...
2013-07-03-24, 9, 1
2013-07-04-1, 10, 1
2013-07-04-2, 10, 1
2013-07-04-...
2013-07-04-24, 10, 1
2013-07-03-1, 11, 2
2013-07-03-2, 12, 2
2013-07-03-...
2013-07-03-24, 9, 2
Scenario 2 will create an inconsistent resultset which will affect the output. Is there any way to make the SQL query always return as scenario 1 even when there are missing values, so at least to return NULL if there are no values for a specific date within the date range. If the resultset returns id 1 and 2 then all dates for id 1 and 2 should be covered. If id 1, 2 and 3 are returned then all dates for id 1, 2 and 3 should be covered.
I have two tables which look like this:
tbl_measurement
id, date, hour1, hour2, ..., hour24
tbl_plane
planeId, id, maxSpeed
The SQL query I am running look like this:
SELECT DISTINCT hour00_01, hour01_02, mr.date, mr.id, maxSpeed
FROM tbl_measurement as mr, tbl_plane as p
WHERE (date >= '2013-07-03' AND date <= '2013-07-04') AND p.id = mr.id
GROUP BY mr.id, mr.date, hour00_01, hour01_02, p.maxSpeed
ORDER BY mr.id, mr.date
I have been looking around quite a bit, and perhaps PIVOT tables are the way to solve this? Could you please help me out? I would appreciate if you can help me out with how to write the SQL query for this purpose.

You can use a recursive CTE to generate a list of dates. If you cross join that with planes, you get one row per date per plane. With a left join, you can link in measurements if they exist. A left join will leave the row even if no measurement is found.
For example:
declare #startDt date = '2013-01-01'
declare #endDt date = '2013-06-30'
; with AllDates as
(
select #startDt as dt
union all
select dateadd(day, 1, dt)
from AllDates
where dateadd(day, 1, dt) <= #endDt
)
select *
from AllDates ad
cross join
tbl_plane p
left join
(
select row_number() over (partition by Id, cast([date] as date) order by id) rn
, *
from tbl_measurement
where m.inputType = 'forecast'
) m
on p.Id = m.Id
and m.date = ad.dt
and m.rn = 1 -- Only one per day
where p.planeType = 3
option (maxrecursion 0)

Related

Table not sorting correctly as date field is a string, using an over function and need to know to convert Order by to Datetime

I am working with a dashboard software that has limited features, i.e. no way to set the yaxis. Therefore this has to be worked into the code. This means the datetime field has to be alphanumeric, due to the yaxis aspect. However, the table now sorts incorrectly. So rather than 1, 2, 3 it sorts as 1, 10, 11, 12 . I've look on here for the answer, but there's nothing that works with an Over function, which is necessary as its for a line graph showing cumulative sales figures over a month.
It is MSSQL that I am using:
SELECT DATENAME(day, DATEADD(day,day(oh_datetime),-1)),
SUM(SUM((CASE oh_sot_id WHEN 1 THEN 1 WHEN 4 THEN -1 WHEN 2 THEN 0 WHEN 3 THEN 0 WHEN 6 THEN 0 WHEN 11 THEN 0 END) * oht_net)) over (ORDER BY day(oh_datetime)) AS 'Orders In($)',
SUM((CASE oh_cd_id WHEN 11728 THEN 1 END) * oht_net) AS 'Target($)'
FROM order_header_total
JOIN order_header ON oht_oh_id = oh_id
WHERE year(oh_datetime) = year(GETDATE()) AND month(oh_datetime) = month(GETDATE())
GROUP BY day(oh_datetime)
UNION SELECT 'YAxis','0','0'

Insert dummy rows to fill missing values into a SQL Table

I have this SQL Server table table1 which I want to fill with dummy rows per acct up to latest previous month end date period e.g now would be up to 2021-06-30.
In this example, acct 1 has n number of rows which ends at 2020-05-31, and I want to insert dummy rows with same values for acct and amt with begin_date and end_date incrementing by 1 month up to 06-30-2021.
Let's assume acct 2 already ends at 06-30-2021 so this doesn't need dummy rows to be inserted.
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
2 , 50, 2021-05-01, 2021-05-31
2 , 50, 2021-06-01, 2021-06-30
So for acct 1, I want n number of rows to be inserted from last period of 2020-05-31 up to previous month end which is now 06-30-2021 and I want the amt and acct to remain same. So it would look like this below:
acct,amt,begin_date,end_date
1 , 10, 2020-04-01, 2020-04-30
1 , 10, 2020-05-01, 2020-05-31
1 , 10, 2020-06-01, 2020-06-30
1 , 10, 2020-07-01, 2020-07-31
.............................
.............................
1 , 10, 2021-06-01, 2021-06-30
Based on some data anamolies, I realize I need another condition to the solution. Suppose another column type was added to the table1. So acct and type would be the composite key that identifies each related row hence acct 2 type A and acct 2 type B are not related. So we have the updated table:
acct,type,amt,begin_date,end_date
1, A, 10, 2020-04-01, 2020-04-30
1, A, 10, 2020-05-01, 2020-05-31
2, A, 50, 2021-05-01, 2021-05-31
2, A, 50, 2021-06-01, 2021-06-30
2, B, 50, 2021-01-01, 2021-01-31
2, B, 50, 2021-02-01, 2021-02-28
I would now need dummy rows to be created for acct 2 type B up to 2021-06-30. We already know acct 2 type A would be ok since it already has rows up to 2021-06-30
You can generate the rows using a recursive CTE:
with cte as (
select acct, amt,
dateadd(day, 1, end_date) as begin_date,
eomonth(dateadd(day, 1, end_date)) as end_date
from (select t.*,
row_number() over (partition by acct order by end_date desc) as seqnum
from t
) t
where seqnum = 1 and end_date < '2021-06-30'
union all
select acct, amt, dateadd(month, 1, begin_date),
eomonth(dateadd(month, 1, begin_date))
from cte
where begin_date < '2021-06-01'
)
select *
from cte;
You can then use insert to insert these rows into a table. Or use union all if you simply want a result set with all the rows.
Here is a db<>fiddle.

Finding the Datediff between Records in same Table

IP QID ScanDate Rank
101.110.32.80 6 2016-09-28 18:33:21.000 3
101.110.32.80 6 2016-08-28 18:33:21.000 2
101.110.32.80 6 2016-05-30 00:30:33.000 1
I have a Table with certain records, grouped by Ipaddress and QID.. My requirement is to find out which record missed the sequence in the date column or other words the date difference is more than 30 days. In the above table date diff between rank 1 and rank 2 is more than 30 days.So, i should flag the rank 2 record.
You can use LAG in Sql 2012+
declare #Tbl Table (Ip VARCHAR(50), QID INT, ScanDate DATETIME,[Rank] INT)
INSERT INTO #Tbl
VALUES
('101.110.32.80', 6, '2016-09-28 18:33:21.000', 3),
('101.110.32.80', 6, '2016-08-28 18:33:21.000', 2),
('101.110.32.80', 6, '2016-05-30 00:30:33.000', 1)
;WITH Result
AS
(
SELECT
T.Ip ,
T.QID ,
T.ScanDate ,
T.[Rank],
LAG(T.[Rank]) OVER (ORDER BY T.[Rank]) PrivSRank,
LAG(T.ScanDate) OVER (ORDER BY T.[Rank]) PrivScanDate
FROM
#Tbl T
)
SELECT
R.Ip ,
R.QID ,
R.ScanDate ,
R.Rank ,
R.PrivScanDate,
IIF(DATEDIFF(DAY, R.PrivScanDate, R.ScanDate) > 30, 'This is greater than 30 day. Rank ' + CAST(R.PrivSRank AS VARCHAR(10)), '') CFlag
FROM
Result R
Result:
Ip QID ScanDate Rank CFlag
------------------------ ----------- ----------------------- ----------- --------------------------------------------
101.110.32.80 6 2016-05-30 00:30:33.000 1
101.110.32.80 6 2016-08-28 18:33:21.000 2 This is greater than 30 day. Rank 1
101.110.32.80 6 2016-09-28 18:33:21.000 3 This is greater than 30 day. Rank 2
While Window Functions could be used here, I think a self join might be more straight forward and easier to understand:
SELECT
t1.IP,
t1.QID,
t1.Rank,
t1.ScanDate as endScanDate,
t2.ScanDate as beginScanDate,
datediff(day, t2.scandate, t1.scandate) as scanDateDays
FROM
table as t1
INNER JOIN table as t2 ON
t1.ip = t2.ip
t1.rank - 1 = t2.rank --get the record from t2 and is one less in rank
WHERE datediff(day, t2.scandate, t1.scandate) > 30 --only records greater than 30 days
It's pretty self-explanatory. We are joining the table to itself and joining the ranks together where rank 2 gets joined to rank 1, rank 3 gets joined to rank 2, and so on. Then we just test for records that are greater than 30 days using the datediff function.
I would use windowed function to avoid self join which in many case will perform better.
WITH cte
AS (
SELECT
t.IP
, t.QID
, LAG(t.ScanDate) OVER (PARTITION BY t.IP ORDER BY T.ScanDate) AS beginScanDate
, t.ScanDate AS endScanDate
, DATEDIFF(DAY,
LAG(t.ScanDate) OVER (PARTITION BY t.IP ORDER BY t.ScanDate),
t.ScanDate) AS Diff
FROM
MyTable AS t
)
SELECT
*
FROM
cte c
WHERE
Diff > 30;

Sql Server Rank on Value Range

I have a table with three columns, ID, Date, Value. I want to rank the rows such that, within an ID, the Ranking goes up with each date where Value is at least X, otherwise, Ranking stays the same.
Given ID, Date, and Values like these
1, 6/1, 8
1, 6/2, 12
1, 6/3, 14
1, 6/4, 9
1, 6/5, 11
I would like to return a ranking based on values of at least 10, such that I would have ID, Date, Value, and Rank like this:
1, 6/1, 8, 0
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, 2
1, 6/5, 11, 3
In other words, the ranking increases each time the value exceeds a threshhold, otherwise it stays the same.
What I have tried is
SELECT T1.*, X.Ranking FROM TABLE T1
LEFT JOIN ( SELECT *, DENSE_RANK( ) OVER ( PARTITION BY T2.ID ORDER BY T2.DATE ) Ranking
FROM TABLE T2 WHERE T2.VALUE >= 10 ) X
ON T1.ID = T2.ID AND T1.Date = T2.Date
This almost works. It gets me output like
1, 6/1, 8, NULL
1, 6/2, 12, 1
1, 6/3, 14, 2
1, 6/4, 9, NULL
1, 6/5, 11, 3
Then, I want to turn the first NULL into a 0, and the second into a 2.
I turned the above query into a cte and tried
SELECT T1.*, CASE WHEN T1.Ranking IS NULL THEN ISNULL( (
SELECT MAX( T2.Ranking )
FROM cte T2 WHERE T1.ID = T2.ID AND T1.Date > T2.Date, 0 )
ELSE T1.Ranking END NewRanking
FROM cte T1
This looks like it would work, but my table has 200,000 rows and the query ran for 25 minutes... So, I'm looking for something a little more out of the box than the SELECT MAX.
You are using SQL Server 2012, so you can do a cumulative sum:
select t.*,
sum(case when value >= 10 then 1 else 0 end) over
(partition by id order by date) as ranking
from table t;
EDIT: This actually does not work. In spirit it fetches the previous LAG value and increment it, but this is not how LAG works... it would be 'recursive' in essence which results in a 'my_rank' is undefined syntax error. Better solution is the accepted answer based on a cumulative sum.
If you have SQL Server 2012 (you didn't tag your question), you can do something like:
SELECT
LAG(my_rank, 1, 0) OVER (ORDER BY DATE)
+ CASE WHEN VALUE >= 10 THEN 1 ELSE 0 END AS my_rank
FROM T1

How to calculate overlapping subscription days from orders with sql-server

I have an ordertable with orders. I want to calculate the amount of subscriptiondays for each user (preffered in a set-based way) for a specific day.
create table #orders (orderid int, userid int, subscriptiondays int, orderdate date)
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 10, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 10, '2011-01-15'
declare #currentdate date = '2011-01-20'
--userid 1 is expected to have 10 subscriptiondays left
(since there is 5 left when the seconrd order is placed)
--userid 2 is expected to have 5 subscriptionsdays left
I'm sure this has been done before, I just dont know what to search for.
Pretty much like a running total?
So when I set #currentdate to '2011-01-20' I want this result:
userid subscriptiondays
1 10
2 5
When I set #currentdate to '2011-01-25'
userid subscriptiondays
1 5
2 0
When I set #currentdate to '2011-01-11'
userid subscriptiondays
1 9
2 0
Thanks!
I think you would need to use a recursive common table expression.
EDIT: I've also added a procedural implementation further below instead of using a recursive common table expression. I recommend using that procedural approach, as I think there may be a number of data scenarios that the recursive CTE query that I've included probably doesn't handle.
The query below gives the correct answers for the scenarios that you've provided, but you would probably want to think up some additional complex scenarios and see whether there are any bugs.
For instance, I have a feeling that this query may break down if you have multiple previous orders overlapping with a later order.
with CurrentOrders (UserId, SubscriptionDays, StartDate, EndDate) as
(
select
userid,
sum(subscriptiondays),
min(orderdate),
dateadd(day, sum(subscriptiondays), min(orderdate))
from #orders
where
#orders.orderdate <= #currentdate
-- start with the latest order(s)
and not exists (
select 1
from #orders o2
where
o2.userid = #orders.userid
and o2.orderdate <= #currentdate
and o2.orderdate > #orders.orderdate
)
group by
userid
union all
select
#orders.userid,
#orders.subscriptiondays,
#orders.orderdate,
dateadd(day, #orders.subscriptiondays, #orders.orderdate)
from #orders
-- join any overlapping orders
inner join CurrentOrders on
#orders.userid = CurrentOrders.UserId
and #orders.orderdate < CurrentOrders.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) > CurrentOrders.StartDate
)
select
UserId,
sum(SubscriptionDays) as TotalSubscriptionDays,
min(StartDate),
sum(SubscriptionDays) - datediff(day, min(StartDate), #currentdate) as RemainingSubscriptionDays
from CurrentOrders
group by
UserId
;
Philip mentioned a concern about the recursion limit on common table expressions. Below is a procedural alternative using a table variable and a while loop, which I believe accomplishes the same thing.
While I've verified that this alternative code does work, at least for the sample data provided, I'd be glad to hear anyone's comments on this approach. Good idea? Bad idea? Any concerns to be aware of?
declare #ModifiedRows int
declare #CurrentOrders table
(
UserId int not null,
SubscriptionDays int not null,
StartDate date not null,
EndDate date not null
)
insert into #CurrentOrders
select
userid,
sum(subscriptiondays),
min(orderdate),
min(dateadd(day, subscriptiondays, orderdate))
from #orders
where
#orders.orderdate <= #currentdate
-- start with the latest order(s)
and not exists (
select 1
from #orders o2
where
o2.userid = #orders.userid
and o2.orderdate <= #currentdate
-- there does not exist any other order that surpasses it
and dateadd(day, o2.subscriptiondays, o2.orderdate) > dateadd(day, #orders.subscriptiondays, #orders.orderdate)
)
group by
userid
set #ModifiedRows = ##ROWCOUNT
-- perform an extra update here in case there are any additional orders that were made after the start date but before the specified #currentdate
update co set
co.SubscriptionDays = co.SubscriptionDays + #orders.subscriptiondays
from #CurrentOrders co
inner join #orders on
#orders.userid = co.UserId
and #orders.orderdate <= #currentdate
and #orders.orderdate >= co.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) < co.EndDate
-- Keep attempting to update rows as long as rows were updated on the previous attempt
while(#ModifiedRows > 0)
begin
update co set
SubscriptionDays = co.SubscriptionDays + overlap.subscriptiondays,
StartDate = overlap.orderdate
from #CurrentOrders co
-- join any overlapping orders
inner join (
select
#orders.userid,
sum(#orders.subscriptiondays) as subscriptiondays,
min(orderdate) as orderdate
from #orders
inner join #CurrentOrders co2 on
#orders.userid = co2.UserId
and #orders.orderdate < co2.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) > co2.StartDate
group by
#orders.userid
) overlap on
overlap.userid = co.UserId
set #ModifiedRows = ##ROWCOUNT
end
select
UserId,
sum(SubscriptionDays) as TotalSubscriptionDays,
min(StartDate),
sum(SubscriptionDays) - datediff(day, min(StartDate), #currentdate) as RemainingSubscriptionDays
from #CurrentOrders
group by
UserId
EDIT2: I've made some adjustments to the code above to address various special cases, such as if there just happen to be two orders for a user that both end on the same date.
For instance, changing the setup data to the following caused issues with the original code, which I've now corrected:
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 10, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 6, '2011-01-15'
union
select 5, 2, 4, '2011-01-17'
EDIT3: I've made some additional adjustments to address other special cases. In particular, the previous code ran into issues with the following setup data, which I've now corrected:
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 6, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 10, '2011-01-15'
union
select 5, 1, 4, '2011-01-12'
If my clarifying comment/question is correct, then you want to use DATEDIFF:
DATEDIFF(dd, orderdate, #currentdate)
My interpretation of the problem:
On day X, customer buys a “span” of subscription days (i.e. good for N days)
The span starts on the day of purchase and is good for X through day X + (N - 1)... but see below
If customer purchases a second span after the first expires (or any new span after all existing spans expire), repeat process. (A single 10-day purchase 30 days ago has no impact on a second purhcase made today.)
If customer purchases a span while existing span(s) are still in effect, the new span applies to day immediately after end of current span(s) through that date + (N – 1)
This is iterative. If customer buys 10-day spans on Jan 1st, Jan 2nd, and Jan 3rd, it would look something like:
As of 1st: Jan 1 – Jan 10
As of 2nd: Jan 1 – Jan 10, Jan 11 – Jan 20 (in effect, Jan 1 to Jan 20)
As of 3rd: Jan 1 – Jan 10, Jan 11 – Jan 20, Jan 21 – Jan 30 (in effect, Jan 1 to Jan 30)
If this is indeed the problem, then it is a horrible problem to solve in T-SQL. To deterimine the “effective span” of a given purchase, you have to calculate the effective span of all prior purchases in the order that they were purchased, because of that overall cumulative effect. This is a trivial problem with 1 user and 3 rows, but non-trivial with thousands of users with dozens of purchases (which, presumably, is what you want).
I would solve it like so:
Add column EffectiveDate of datatype date to the table
Build a one-time process to walk through every row user-by-user and orderdate by orderdate, and calculate the EffectiveDate as discussed above
Modify the process used to insert the data to calculate the EffectiveDate at the time a new entry is made. Done this way, you’d only ever have to reference the most recent purchase made by that user.
Wrangle out subsequent issues regarding deleting (cancelled?) or updating (mis-set?) orders
I may be wrong, but I don't see any way to address this using set-based tactics. (Recursive CTEs and the like would work, but they can only recurse to so many levels, and we don't know the limit for this problem -- let alone how often you'll need to run it, or how well it must perform.) I'll watch and upvote anyone who solves this without recursion!
And of course this only applies if my understanding of the problem is correct. If not, please disregard.
In fact, we need calculate summ of subscriptiondays minus days beetwen first subscrible date and #currentdate like:
select userid,
sum(subsribtiondays)-
DATEDIFF('dd',
(select min(orderdate)
from #orders as a
where a.userid=userid), #currentdate)
from #orders
where orderdate <= #currentdata
group by userid

Resources