Tsql getting depending data over multiple rows in one query - sql-server

I would like to calculate an average of a value in one year. I have a historical data table that saves the changes of the value in time.
I know how to do this with a (sub)query for each individual month, but Im hopeful that there is a simple way to do it in one query.
Example:
ID, Value, DateUntilActivity
1, 10.00, 2014-03-01
2, 5.00, 2014-05-01
3, 3.00, 2014-07-01
4, 12.00, 2014-10-01
So - the correct calculation here is:
(2x10.00 + 2x5.00 + 2x3.00 + 3x12.00 + 3x<current_value_in_a_different_table>)/12
The calculation includes the number of moths the data was active for - the first value, 10.00 was valid in 2 months - January and February.
And consider the value current_value_in_a_different_table a fixed value.
Also, it needs to work on MSSQL server 2005.
Thank you in advance!

;with cte as
(
select value, DateUntilActivity from yourtable
union
select 100 as currentvalue, '2015-1-1' from yourothertable
)
select avg(value)
from
(
select (select top 1 value from cte where DateUntilActivity>DATEADD(MONTH,number, '2014-1-1') order by DateUntilActivity ) as value
from master..spt_values
where type='p' and number <=11
) v
If my memory is wrong and you can't use a CTE, this is equivalent to
select avg(value)
from
(
select
(select top 1 value
from
(
select value, DateUntilActivity from yourtable
union
select 100 as currentvalue, '2015-1-1' from yourothertable
) v
where DateUntilActivity>DATEADD(MONTH,number, '2014-1-1') order by DateUntilActivity ) as value
from master..spt_values
where type='p' and number <=11
) v

Related

Creating rows in a table based on min and max date in Snowflake SQL

Is there a relatively simple way to create rows in a table based on a range of dates?
For example; given:
ID
Date_min
Date_max
1
2022-02-01
2022-20-05
2
2022-02-09
2022-02-12
I want to output:
ID
Date_in_Range
1
2022-02-01
1
2022-02-02
1
2022-02-03
1
2022-02-04
1
2022-02-05
2
2022-02-09
2
2022-02-10
2
2022-02-11
2
2022-02-12
I saw a solution when the range is integer based (How to create rows based on the range of all values between min and max in Snowflake (SQL)?)
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
So, I was wondering if there is a simpler way to do it when dealing with dates instead of integers? Any hints anyone can provide?
Another one without using generator -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
Date_min,
Date_max,
dateadd(day, index, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(day, Date_min, Date_max)-1),','));
SQL with first date -
with data (ID,Date_min,Date_max) as (
select * from values
(1,to_date('2022-02-01','YYYY-DD-MM'),to_date('2022-20-05','YYYY-DD-MM')),
(2,to_date('2022-02-09','YYYY-DD-MM'),to_date('2022-02-12','YYYY-DD-MM'))
)
select id,
dateadd(month, index-1, Date_min) day_slots from data,
table(split_to_table(repeat(',',datediff(month, Date_min, Date_max)),','));
But in order to use that approach GENERATOR(ROWCOUNT => 1000) I have to convert my dates to integers and back, and it just gets very messy very quick, especially since I need to apply this to millions of rows.
There is no need to convert date to int back and forth, just simple DATEADD('day', num, start_date)
Pseudocode:
WITH sample_data(id, date_min, date_max) AS (
SELECT 1, '2022-02-01'::DATE, '2022-02-05'::DATE
UNION
SELECT 2, '2022-02-09'::DATE, '2022-02-12'::DATE
) , numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY SEQ4())-1 AS num -- 0 based
FROM TABLE(GENERATOR(ROWCOUNT => 1000)) -- should match max anticipated span
)
SELECT s.id, DATEADD(DAY, n.num, s.date_min) AS calculated_date
FROM sample_data AS s
JOIN numbers AS n
ON DATEADD('DAY', n.num, s.date_min) BETWEEN s.date_min AND s.date_max
ORDER BY s.id, calculated_date;
Ouptut:

SQL Query to list all hours of the day in datetime format in one column

I need a query that returns all the hours of the day in 12 hour format
ex: 12:00 am, 1:00am, 2:00am etc. This is going to be used in SSRS as a selection field for a parameter for time. I need to select records within a date range and then from a time range in that date range. I have this query which returns the time in 24 hour format but it is not working properly in SSRS:
With CTE(N)
AS
(
SELECT 0
UNION ALL
SELECT N+30
FROM CTE
WHERE N+5<24*60
)
SELECT CONVERT(TIME,DATEADD(minute,N,0) ,108)
FROM CTE
OPTION (MAXRECURSION 0)
This is how I would do it:
DECLARE #t time(1) = '00:00'; --I use 1 as when I use REPLACE later it means that I can "identify" the correct :00 to remove
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT TOP 24 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2),
Times AS(
SELECT DATEADD(HOUR, I,#t) AS [Time]
FROM Tally)
SELECT T.[Time],
REPLACE(CONVERT(varchar(12),T.Time,9),':00.0',' ') AS TimeString
FROM Times T
ORDER BY T.[Time] ASC;
Note that I return both a time and varchar datatype; both are important as the ordering of the data for a varchar would be quite different to start with and if you are using SSRS, I suspect you want the value of TimeString as a presentation thing and not the actual value.

SQL Server : find Cust with Continuous Enrollment

I have a task to solve well known problem in industry task to ID those CustID who have continuous activity , for given period of time and we allow little breaks between contracts.
I did first part populating matrix table like in snippet below for whole period of time and setting flag if it's active for this date, I think this is the only reliable way to do this, as contracts can have overlaps, etc..
So now I need to check if CustID is 1/0 for cont activity, I stuck into the task how to track this, let say in my example there is 3 days break which is OK, but I need to make sure that those days are one after another.
Do you have any good ideas how I can do this nicely, appreciate your help and leads. I saw some examples but they done in SAS so it's hard to understand.
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
;with matrix_dd as
(
select *
from
(select 111 CustID, '2015-1-11' dd, 1 Active union
select 111 CustID, '2015-1-12' dd, 0 Active union
select 111 CustID, '2015-1-13' dd, 0 Active union
select 111 CustID, '2015-1-14' dd, 0 Active union
select 111 CustID, '2015-1-15' dd, 1 Active union
select 111 CustID, '2015-1-16' dd, 1 Active union
select 111 CustID, '2015-1-17' dd, 1 Active union
select 111 CustID, '2015-1-18' dd, 1 Active union
select 111 CustID, '2015-1-19' dd, 0 Active union
select 111 CustID, '2015-1-20' dd, 0 Active) a
)
select *
from matrix_dd
Best
M
This solution calculates the active ranges and how long of a break it's been since the last interval ended:
declare #maxBreak int = 3 -- 3 days max allowed for continuse contract
declare #PeriodStart date = '2015-1-11', #PeriodEnd date = '2015-1-19';
with matrix_dd as
(
select * from ( values
(111, '2015-1-11', 1 ),
(111, '2015-1-12', 0 ),
(111, '2015-1-13', 0 ),
(111, '2015-1-14', 0 ),
(111, '2015-1-15', 1 ),
(111, '2015-1-16', 1 ),
(111, '2015-1-17', 1 ),
(111, '2015-1-18', 1 ),
(111, '2015-1-19', 0 ),
(111, '2015-1-20', 0 )
) as x(CustID, dd, Active)
), active_with_groups as (
select *,
row_number() over (partition by CustID order by dd) -
datediff(day, '2000-01-01', dd) as gid
from matrix_dd
where active = 1
and dd between #PeriodStart and #PeriodEnd
), islands as (
select CustId, min(dd) as islandStart, max(dd) as islandEnd
from active_with_groups
group by CustID, gid
), islands_with_gaps as (
select *,
datediff(
day,
lag(islandEnd, 1, islandStart)
over (partition by CustID order by islandStart),
islandStart
) - 1 as [break]
from islands
)
select *
from islands_with_gaps
where [break] >= #maxBreak
order by islandStart
Let's break it down. In the "active_with_groups" common table expression (CTE), all I'm doing is converting the dates into integers that have the same relationship by using datediff(). Why? Integers are easier to work with for this problem. Note that I'm also using row_number() to get a contiguous sequence and then getting the difference between that and the datediff() value. The key observation is that if the days also don't go up contiguously, that difference will be, well, different. Likewise, if the dates do go up contiguously, then the difference will be the same. Therefore, we can use this value as a group identifier for values that are in a contiguous range.
Next, we use that the group identifier to group by (bet you didn't see that coming!). This gives us the start and end of each interval. Nothing very clever is going on here.
The next step is to calculate the amount of time that's passed between when the last interval ended and the current one began. For this, we use a simple call to the lag() function. The only thing to note here is that I've chosen to have the lag() function emit a default value of islandStart in the case of the first interval. It could have just as easily been no default (which would have then caused it to emit a NULL value).
Lastly, we look for intervals with a gap over the specified threshold.
Similar to Ben's answer. I'm assuming that all your dates are represented in the data. So really we just need to make sure there isn't a run of zeroes longer than 3.
with inactive_runs as (
select
CustID,
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where 3 >= all (
select count(*) from inactive_runs ir
where ir.CustID = m.CustID
group by grp
);
http://rextester.com/AHI22250
Using all isn't particularly common. Here's an alternative:
...
with inactive_runs as (
select
CustID, dd, /* <-- had to add dd */
row_number() over (partition by CustID order by dd)
- datediff(day, min(dd) over (partition by CustID), dd) as grp
from #matrix_dd
where Active = 0
)
select distinct CustID from matrix_dd m
where not exists (
select 1 from inactive_runs ir
where ir.CustID = m.CustID
group by grp
having datediff(day, min(dd), max(dd)) > 2
);
I glanced at your comment above. I think it confirms my suspicion that you've got a single row for every date. If you've got a new version of SQL Server you can just sum over the previous three rows. Unfortunately you wouldn't be able to use a variable for the window size if the length is variable:
with cust as (
select
CustID,
case when
sum(case when Active = 0 then 1 end) over (
partition by CustID
order by dd
rows between 3 preceding and current row
) = 4 then 1
end as isBrk
from matrix_dd
)
select CustID
from cust
group by CustID
having count(isBrk) = 0;
Edit:
Based on your comment with the data in a "pre-matrix" format, yes, that's a simpler query. At that point you're just looking at the previous end date and the current row's start date.
with data as (
select * from (
values (111, 1230, '2014-12-11', '2015-01-11'),
(111, 1231, '2015-01-15', '2015-01-18'),
(111, 1232, '2015-03-22', '2015-04-01')
) as t (CustID, ContractID, StartDD, EndDD)
), gaps as (
select
CustID,
datediff(day,
lag(EndDD, 1, StartDD) over (partition by CustID order by StartDD),
StartDD
) as days
from data
)
select CustID
from gaps
group by CustID;
having max(days) <= 3;

How to auto-add dates in column, SQL Server 2014

I'm very new to SQL Server and I want to have dates from today up to 30 days ahead of todays date in one column, which way is the most considered efficient and "correct" way? ( I'm not asking for code ).
I read that loops should preferably be avoided in SQL Server, is that correct? Also, I thought of solving the date-issue with using a logon trigger (adding 30 days ahead of today whenever a logon happens), anyone know a more efficient and "correct" way?
Thanks
You can use recursive CTE to get sequential dates for next 30 days.
CREATE TABLE Dates
(
allDates DATE
)
;WITH MyCTE
AS (SELECT getdate() AS ddate,
dateadd(day, 30, getdate()) AS lastDate
UNION ALL
SELECT dateadd(day, 1, ddate),
lastDate
FROM MyCTE
WHERE dateadd(day, 1, ddate) <= lastDate)
INSERT INTO Dates(allDates)
SELECT ddate FROM MyCTE
SELECT * FROM Dates
SQL Fiddle Demo
The most efficient way to do this would be a Job. SQL Server Agent provides the ability to run any script you want on any interval you choose. A very simplistic approach would be to create a job which runs nightly and inserts a row for [Today + 30 Days].
I believe you are seeking 30 rows from a query with each row representing a date starting at today, and finishing 30 days after today.
There are many potential solutions for this that don't use a cursor/loop, for example
select
dateadd(day,nums.number,nums.today) as a_date
from (
select
number
, cast(getdate() as date) as today
FROM master.dbo.spt_values as sv
WHERE sv.type = 'P'
AND sv.number BETWEEN 0 and 29
) nums
see: this SQLfiddle demo
Note that query is using master.dbo.spt_values and some prefer not to use this (refer here). So instead you could use a small union all with cross join to generate the rows, or you can use a recursive "common table expression" (CTE) as an alternative.
;WITH
Digits AS (
SELECT 0 AS digit UNION ALL
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, Tally AS (
SELECT [tens].digit * 10 + [ones].digit AS number
FROM Digits [ones]
CROSS JOIN Digits [tens]
)
select
dateadd(day,nums.number,nums.today) as a_date
from (
select
number
, cast(getdate() as date) as today
FROM tally
WHERE number BETWEEN 0 and 29
) nums
To get todays date + 30 days do this:
select dateadd(dd,30,getdate())

Group data rows by near time

Here is the problem I am facing:
I got a large table containing rows, I want to group them by near time, more specifically the time difference less than 2 minutes, example as following
With following input data:
A 16:01:01
B 16:01:20
C 16:14:02
D 16:15:01
E 16:20:02
the expected result is
16:01:01 2
16:14:02 2
16:20:02 1
If you're using SQL server 2012, you'r in luck and you can use lag function and rolling total sum:
with cte as (
select
case
when datediff(mi, lag(data) over (order by data), data) <= 1 then 0
else 1
end as ch,
data
from test
), cte2 as (
select
data, sum(ch) over (order by data) as grp
from cte
)
select
min(data) as data, count(*) as cn
from cte2
group by grp
sql fiddle demo
SELECT CONVERT(VARCHAR(8),
DATEADD(minute, (DATEDIFF(n, 0, time) / 2) * 2, 0),
108),
COUNT(*)
FROM times
GROUP BY DATEDIFF(n, 0, time) / 2
Explanation:
CONVERT displays a DateTime in hh:mm:ss format (= 108).
DATEDIFF converts to minutes and then divides by two, rounding to an integer so each GROUP of 2 minutes resolves to the same integer.
DATEADD is used to convert this number of minutes back to a DateTime, having multiplied by 2 to get back to the correct (rounded) time.
See SQL Fiddle Demo
Declare #m_TestTable table
(
DateRecorded datetime
)
Insert into #m_TestTable Values ('16:01:01' )
Insert into #m_TestTable Values ('16:01:20' )
Insert into #m_TestTable Values ('16:14:02' )
Insert into #m_TestTable Values ('16:15:01' )
Insert into #m_TestTable Values ('16:20:01' );
With tblDifference as
(
Select Row_Number() OVER (Order by DateRecorded) as RowNumber,DateRecorded from #m_TestTable
)
select cur.DateRecorded as prvD, prv.DateRecorded as prvC, dateDiff(n, cur.DateRecorded,prv.DateRecorded) from tblDifference cur LEFT OUTER JOIN tblDifference prv
ON cur.RowNumber = prv.RowNumber + 1
this will give you the time difference in minutes between 2 rows. You can select any row that has a time difference less then 2 mins. It will also give you the upper and lower value.
It should be usefull to find any values closer then 2 minutes apart.
prvD prvC Diff
1900-01-01 16:01:01.000 NULL NULL
1900-01-01 16:01:20.000 1900-01-01 16:01:01.000 0
1900-01-01 16:14:02.000 1900-01-01 16:01:20.000 -13
1900-01-01 16:15:01.000 1900-01-01 16:14:02.000 -1
1900-01-01 16:20:01.000 1900-01-01 16:15:01.000 -5

Resources