SAS PROC SQL where between 4 mondays ago and last monday - sql-server

I am trying to sum up the past 4 weeks forecast vs sales data with each week starting on a monday.
For reference today is 8/6 so i want to start collecting their weekly sales and forecast for 7/5 going up until previous week monday, 7/26. I eventually plan to sum this 4wk forecast into one row using a do until last. statement grouping by Store and SKU to where i can use a put statement to make a new column to signal if they sold more than they forecasted or less.
For instance lets pretend they had the below data
|Mon_DT |STORE |SKU |wk_FCST|wk_Sales|
|05July21:00:00:00 | 5 | abc | 10 | 12 |
|12July21:00:00:00 | 5 | abc | 10 | 16 |
|19July21:00:00:00 | 5 | abc | 10 | 7 |
|26July21:00:00:00 | 5 | abc | 10 | 12 |
with the do until last. the forecast will read as 40 and sales as 47 and ill say if sales < forecast then LOWforecast = 'Y';
However, I am having trouble just getting the between statement to work to pull only the last 4 weeks (Starting on monday).
DATA Getweeks;
StartOversell = intnx('week.2',today(),-4);
endoversell = intnx('week.2',today(),-1);
format StartOversell yymmdd10.;
format endoversell yymmdd10.;
Run;
Proc sql;
connect to odbc (dsn='****' uid='****' pwd='***');
create table work.Forecast1 as select distinct * from connection to odbc
(select MON_DT as DATE, Store_Number as Store, PROD_PKG_ID as SKU, FCST AS WK_FCST, SALES AS WK_sales, DIFF_QTY
From FCST
where Mon_DT >= 'StartOversell'd and Mon_DT <= 'endoversell'd );
disconnect from odbc;
quit;
I tried to use a macro variable as well but no luck.

Use macro variables since the code uses explicit pass through and DB is expecting a database compliant date literal. All your SQL must be DB compliant in explicit pass through - not SAS SQL.
If you used MS SQL and it needs dates as "MM/DD/YY" for literals. So I will use the mmddyyS10. format which creates a macro variable that will look like that. You can convert the values using a put() function.
It's also a good idea to include the quotes in the macro variable in this case because Oracle needs single quotes, not double - not sure about MS SQL. The quote() function can be used to add quotes without issue.
DATA Getweeks;
StartOversell = intnx('week.2',today(),-4);
call symputx('startOversell', quote(put(startOverSell, MMDDYYS10.), "'"));
...
Run;
%put &startOversell;
Proc sql;
connect to odbc (dsn='****' uid='****' pwd='***');
create table work.Forecast1 as select distinct * from connection to odbc
(select MON_DT as DATE, Store_Number as Store, PROD_PKG_ID as SKU, FCST AS WK_FCST, SALES AS WK_sales, DIFF_QTY
From FCST
where Mon_DT >= &startOverSell and Mon_DT <= &endOverSell );
disconnect from odbc;
quit;
Edit: you may want to consider what happens if you run it on a Monday and check if your dates align as expected.

Related

How to compare Time value in table with Current Time in SQL?

I have a table named TimeList:
| Slot |
==============
| 10:00 |
| 11:00 |
| 12:00 |
| 13:00 | and so on
That saves the Times in Varchar(5)
The desired result should be showing the rows with time that is more than the current time, for example if the current time is 11:12 A.M. the result should return:
| Slot |
==============
| 12:00 |
| 13:00 |
I tried to Convert the two values into time and comparing them with:
SELECT *
FROM TimeList
WHERE Convert(Time, Slot) > Convert(Time, GETDATE())
But it didn't work saying that Time is not a recognizable format in SQL
Is there anyway I could compare the two time slots?
Depends on the version of SQL Server you're running, I think. There is a CAST(.. as time) in 2012 or later, but I think that's a fairly new development. So... to compare the current date/time with the Timelist where the times are converted to "time, if it were today," something like this should work :
SELECT *
FROM TimeList
WHERE Convert(Datetime, FORMAT (GETDATE(), 'd') + ' ' + Slot) > GETDATE()
Conversely, if you want to compare the times to the current time, as text:
SELECT *
FROM TimeList
WHERE Slot > FORMAT(GETDATE(), N'hh\:mm')
Try This.....
SELECT *
FROM TimeList
WHERE Slot > CONVERT(time,GETDATE())
Thank you very much for all the answers, fortunately I found the answer to my question inspired by your answers.
The solution is:
SELECT *
FROM TimeList
WHERE Slot > CONVERT(varchar(5),GETDATE(), 108)
Where it seems that 108 is the format for time saved as char/varchar in which Slot was categorized as too

Convert integer value to DateTime in SQL Server 2012

I have a table called "EventLog" which has the column called nDateTime of type int.
This is the table "EventLog" with some values:
-----------------
| nDateTime |
-----------------
| 978307200 |
-----------------
| 978307219 |
-----------------
| 978513562 |
-----------------
| 978516233 |
-----------------
| 978544196 |
-----------------
| 1450379547 |
-----------------
| 1472299563 |
-----------------
| 1472299581 |
-----------------
| 1472300635 |
-----------------
| 1472300644 |
-----------------
| 1472300673 |
-----------------
I need to get the DateTime value, and I tried the following statements, but I receive these errors:
Test #1:
SELECT CONVERT(DATETIME, CONVERT(CHAR(8), nDateTime), 103) AS 'Formatted date'
FROM EventLog
The error says:
Conversion failed when converting date and/or time from character string.
Test #2: modified from here:
SELECT CONVERT(DATETIME, nDateTime, 103) AS 'Formatted date'
FROM EventLog
And Test #3 goes:
SELECT CAST(nDateTime AS datetime) AS 'Formatted date'
FROM EventLog
The duplicate question doesn't answer my question because (both, test #2 and test #3) generates this error:
Arithmetic overflow error converting expression to data type datetime.
I admit that I never saw such value as a Date, and for that, I'm kind of confused in how to proceed.
My question is: How can get the valid DateTime value from the sample data?
Almost every time you see a date/time represented as an integer, that number represents the passage of time since a known epoch. This is the basis of Unix time which is, put simply, the number of seconds which have elapsed since 1st January 1970 00:00:00
Using this, we can check with some values you have provided
declare #dt DATETIME = '1970-01-01' -- epoch start
print dateadd(second,978307200,#dt ) -- Jan 1 2001 12:00AM
print dateadd(second,1472300673,#dt ) -- Aug 27 2016 12:24PM
Seems possible, but who knows?!
You can check every date in your table simply using
declare #dt DATETIME = '1970-01-01' -- epoch start
SELECT
nDateTime AS OriginalData,
DATEADD(second, nDateTime,#dt) AS ActualDateTime
FROM EventLog
Just for giggles, I took a stab at having the base date of 1970-01-01, but without KNOWING the base, it is just a guess
Declare #Log table (DateInt int)
Insert Into #Log values
(978307200),
(978307219),
(978513562),
(978516233),
(978544196),
(1450379547),
(1472299563),
(1472299581),
(1472300635),
(1472300644),
(1472300673)
Select DateInt,Converted= DateAdd(SECOND,DateInt,'1970-01-01') From #Log
Returns
DateInt Converted
978307200 2001-01-01 00:00:00.000
978307219 2001-01-01 00:00:19.000
978513562 2001-01-03 09:19:22.000
978516233 2001-01-03 10:03:53.000
978544196 2001-01-03 17:49:56.000
1450379547 2015-12-17 19:12:27.000
1472299563 2016-08-27 12:06:03.000
1472299581 2016-08-27 12:06:21.000
1472300635 2016-08-27 12:23:55.000
1472300644 2016-08-27 12:24:04.000
1472300673 2016-08-27 12:24:33.000
The "2038" Problem with Unix Timestamps
There's a serious issue with writing code to convert UNIX Timestamps that are based on seconds... DATEADD can only handle INTs and that brings us to the "2038/Y2K38/Friday the 13th" problem (the day of the week when then "wraparound" to the most negative number an INT can have happens after the date below).
That means that the largest positive value it can handle is 2147483647. If we use DATEADD to add that number of seconds to the UNIX Epoch of the first instant of the year 1970, we end up with a DATETIME that clearly explains what they mean by the "2038" issue.
SELECT DATEADD(ss,2147483647,'1970');
The Standard Fix for the "2038" Problem
The standard way to get around that is to first store the UNIX Timestamp as a BIGINT and do two date adds... one for seconds and one for days.
There are 84600 seconds in a day. If we do Integer Division and ...
Use the Quotient to derive the number of days to add to
'1970'...
And use the Remainder to derive the number of
seconds to add to '1970'...
... we'll get the correct date not only for the MAX INT value...
DECLARE #SomeUnixTS BIGINT = 2147483647
,#SecsPerDay BIGINT = 86400
;
SELECT DATEADD(ss,#SomeUnixTS%#SecsPerDay,DATEADD(dd,#SomeUnixTS/#SecsPerDay,'1970'))
;
... but also for the last possible date in seconds for SQL Server. If we calculate the UNIX Timestamp (in seconds) for the last possible second that's available in SQL Server...
SELECT DATEDIFF_BIG(ss,'1970','9999-12-31 23:59:59');
... it still works with lots of room to spare and no "2038" problem.
DECLARE #SomeUnixTS BIGINT = 253402300799
,#SecsPerDay BIGINT = 86400
;
SELECT DATEADD(ss,#SomeUnixTS%#SecsPerDay,DATEADD(dd,#SomeUnixTS/#SecsPerDay,'1970'))
;
UNIX Timestamps Based on Milliseconds
Working with UNIX Timestamps that are based on Milliseconds are only slightly different but must be handled the same way...
DECLARE #SomeUnixTS BIGINT = DATEDIFF_BIG(ms,'1970','9999-12-31 23:59:59.999')
,#msUnixEpoch DATETIME2(3) = '1970'
,#msPerDay BIGINT = 86400000
;
SELECT SomeUnixTS = #SomeUnixTS
,msUnixEpoch = #msUnixEpoch
,Converted = DATEADD(ms,#SomeUnixTS%#msPerDay,DATEADD(dd,#SomeUnixTS/#msPerDay,#msUnixEpoch))
;
As a bit of a sidebar, you have to wonder what Microsoft was or was not thinking when they created DATEDIFF_BIG() but didn't create a DATEADD_BIG(). Amazing even more is the they have SQL Server that will work in a UNIX environment and still no CONVERT(ts) functionality.
Here's whats new in 2022 in the area I'm talking about...
https://learn.microsoft.com/en-us/sql/sql-server/what-s-new-in-sql-server-2022?view=sql-server-ver16#language
And, last but not least, do not convert UNIX Timestamps that are based on milliseconds directly to DATETIME because the rounding in DATETIME can take you to the next day, week, month, and even year. You must do a "units place" detection for "9" and "-1" and make the appropriate substitution of "7" and "-3" respectively.
Your input is > 8 digits hence it is throwing arithmentic overflow error.. If it is 8 digits you will get converted data:
For Example:
DECLARE #ndatetime int = 978307200
SELECT CONVERT(datetime, convert(varchar(10), #ndatetime, 112))
-- this throws arithmetic overflow error
DECLARE #ndatetime int = 97830720 -- with 8 digits only
SELECT CONVERT(datetime, convert(varchar(10), #ndatetime, 112))
This returns converted date
You can try try_convert which will return null if it is wrong date
DECLARE #ndatetime int = 978307200
SELECT TRY_CONVERT(datetime, convert(varchar(10), #ndatetime, 112))

Show averages of a dataset, different date/time ranges

DB: MS SQL Server 11.0.3156.
I have a table where I record periodic data values. The key columns are:
fldObjectGUID (varchar), fldDataTimestamp (datetime), fldConfigItem (varchar), fldConfigItemValue (numeric)
I want to retrieve data for a different time frame (day, week, month). But to keep the number of returned data pints to a manageable number (e.g. less < 350), Therefore, I'd like to get averages.
For example:
Day - Return all Data (already got this!)
Week - Return the data in hourly average values (e.g. there would be 24 * 1 Hour Averages, * 7 days)
Month - Return the data in 3-hourly average values (e.g. 8 * Average
over 3 hours, * 30)
Yearly - Return the data in daily average values (e.g. 1 * Average
over 24 hours, * 365)
A small example of the data set is shown here:
+--------------------------------------------------------------------------------+
+ fldObjectGUID | fldRecordUpdatedTimestamp | fldConfigItem | fldConfigItemValue |
+ 40010000 | 2015-06-16 18:20:48.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-16 19:22:00.000 | ICMPResponseTime | 15.00 |
+ 40010000 | 2015-06-16 20:22:14.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-17 17:35:19.000 | ICMPResponseTime | 6.00 |
+ 40010000 | 2015-06-17 18:36:26.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-28 02:18:31.000 | ICMPResponseTime | 19.00 |
+ 40010000 | 2015-06-28 03:18:54.000 | ICMPResponseTime | 9.00 |
+ 40010000 | 2015-06-02 17:25:16.000 | ICMPResponseTime | 3.00 |
+------------------------------------------------------------------------------------+
Data is added for an object (fldObjectGUID) at different rates. This could be one row every 5 minutes or 1 one row every hour. There can be gaps in the data (hours or even days). I want to graph the fldConfigItemValue data for each object over different time frames; Day (last 24 hours), Week, Month and Year. The periods of the returned data don't need to be exact. So, a month could just be the last 30 days, or just 1 calendar month back from today's date.
The SQL only needs to return data for a single fldObjectGUID and fldConfigItem combination - I'll then amend the SQL at run-time to get the data for the required object/configitem.
There may be gaps in the data, so no data points within a given period. So, the return value can be zero.
I'm retrieving data using Classic ASP, creating the SQL statement and parsing the results. I could achieve the result programatically in my ASP code. So for the 'Week' required set, I could make repeated calls to the DB, using the AVERAGE function, and a WHERE clause to retrieve a subset of records (NOW to NOW - 1 hour). Store the value, then repeat using a WHERE clause for (NOW - 1 hour to NOW - 2 Hours). And just step back in time until I've got all the values for a week. The 'Month' and 'Yearly' routines would be the same, just different timeframes in the WHERE clauses.
However, even to me, this seems a clumsy way of doing it and just one SQL routine (or a different SQL routine for Week, Month and Year) must be quicker and / or more elegant.
At the moment, I have some SQL (from StackOverflow?) that I thought might work and I have my code build up the SQL for the 'Month' view like this (I've hard-coded the fldObjectGUID and fldConfigItem in the example, to make the example clearer):
SELECT top 30 convert(date, l.fldDataTimestamp) as 'fldDataTimestamp_result', l.fldConfigItemValue, l.fldConfigItemValue
FROM tblObjectHealthCheckData_Historic l
INNER JOIN (
SELECT MIN(fldDataTimestamp) first_timestamp
FROM tblObjectHealthCheckData_Historic
where fldObjectGUID = '10050400' and fldConfigItem = 'AvailableRAM'
group by Convert(Date, fldDataTimestamp)
) sub_l ON (sub_l.first_timestamp = l.fldDataTimestamp)
where fldObjectGUID = '10050400' and l.fldConfigItem = 'AvailableRAM'
order by fldDataTimestamp desc
But this gets just the first data point for each day (as you can guess, whilst I do understand SQL and programming, they are a hobby, not something I do for a living) and so I'm struggling to fix this code.
I'm assuming that people agree, it's more efficient doing this in code that making many separate SQL calls - but can anyone help?
I would try with DATEPART function, this way you can get different parts of the fldRecordUpdatedTimestamp and then AVG field fldConfigItemValue.
This goes down to a single Hour of your timestamp (could be minutes, check MSDN for DATEPART in T-SQL), so if you wish to get daily averages per week then you need to include:
day_fldRecordUpdatedTimestamp
week_fldRecordUpdatedTimestamp
this will Average for each day inside each week.
Below example shows average per month - mind the year, if you have more than a years worth of data make sure you include year_fldRecordUpdatedTimestamp etc.
WITH PartsTable As
(
SELECT
fldObjectGUID
, fldRecordUpdatedTimestamp
, fldConfigItem
, fldConfigItemValue
, DATEPART(HOUR, fldRecordUpdatedTimestamp) As hour_fldRecordUpdatedTimestamp
, DATEPART(DAY, fldRecordUpdatedTimestamp) As day_fldRecordUpdatedTimestamp
, DATEPART(WEEK, fldRecordUpdatedTimestamp) As week_fldRecordUpdatedTimestamp
, DATEPART(MONTH, fldRecordUpdatedTimestamp) As month_fldRecordUpdatedTimestamp
, DATEPART(YEAR, fldRecordUpdatedTimestamp) As year_fldRecordUpdatedTimestamp
FROM
YourLogTable
--WHERE
-- Perhaps set a limit here to not get a huge set in the first step.
)
SELECT
COUNT(1) As setcount /* Shows how many rows are in each AVG calculation. */
, fldObjectGUID
, fldConfigItem
, month_fldRecordUpdatedTimestamp /* Change this column for specific span you're intrested in. */
, AVG(fldConfigItemValue) As avg_fldConfigItemValue
FROM
PartsTable
GROUP BY
fldObjectGUID
, fldConfigItem
, month_fldRecordUpdatedTimestamp /* Change this column for specific span you're intrested in. */
;
One final note: make sure you include month_, week_ etc. column in both SELECT and GROUP BY.

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities.
I have a database view that unions data from all of these tables giving us the ActivityView that looks like this.
As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.
+----------------------+---------------+---------------------+-------------------+
| activity_client_id | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
| 112 | Interview | 2015-06-01 09:00 | 2015-06-01 11:00 |
| 112 | CV updating | 2015-06-01 09:30 | 2015-06-01 11:30 |
| 112 | Course | 2015-06-02 09:00 | 2015-06-02 16:00 |
| 112 | Interview | 2015-06-03 09:00 | 2015-06-03 10:00 |
+----------------------+---------------+---------------------+-------------------+
Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:
+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
| 112 | 2015-05-20 |
+-----------+---------------------+
I need to create a report that will show the following columns:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible.
So this report will tell us how long from sign up does it take each client to achieve this figure.
What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:
In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30.
For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.
So the report for our sample client would give the following result:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
| 112 | 2015-05-20 | 2015-06-02 |
+-----------+---------------------+--------------------------------------------+
So my question is how can I create the report using a select statement ?
I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table.
But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.
I am using SQL Server 2005.
See SQL Fiddle here.
with tbl as (
-- this will generate daily merged ovelaping time
select distinct
a.id
,(
select min(x.starttime)
from act x
where x.id=a.id and ( x.starttime between a.starttime and a.endtime
or a.starttime between x.starttime and x.endtime )
) start1
,(
select max(x.endtime)
from act x
where x.id=a.id and ( x.endtime between a.starttime and a.endtime
or a.endtime between x.starttime and x.endtime )
) end1
from act a
), tbl2 as
(
-- this will add minute and total minute column
select
*
,datediff(mi,t.start1,t.end1) mi
,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
from tbl t
), tbl3 as
(
-- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
select
t.id
,min(t.start1) starttime
,min(case when t.totalmi>300 then t.end1 else null end) endtime
from tbl2 t
group by t.id
)
-- final result
select *
from tbl3
where endtime is not null
This is one way to do it:
;WITH CTErn AS (
SELECT activity_client_id, activity_type,
activity_start_date, activity_end_date,
ROW_NUMBER() OVER (PARTITION BY activity_client_id
ORDER BY activity_start_date) AS rn
FROM activities
),
CTEdiff AS (
SELECT c1.activity_client_id, c1.activity_type,
x.activity_start_date, c1.activity_end_date,
DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id
ORDER BY x.activity_start_date) AS seq
FROM CTErn AS c1
LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
CROSS APPLY (SELECT CASE
WHEN c1.activity_start_date < c2.activity_end_date
THEN c2.activity_end_date
ELSE c1.activity_start_date
END) x(activity_start_date)
)
SELECT TOP 1 client_id, client_sign_up_date, activity_start_date,
hoursOfActivicty
FROM CTEdiff AS c1
INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id
CROSS APPLY (SELECT SUM(diff) / 60.0
FROM CTEdiff AS c3
WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
WHERE hoursOfActivicty >= 5
ORDER BY seq
Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.
Demo here
The first CTE, i.e. CTErn, produces the following output:
client_id activity_type start_date end_date rn
112 Interview 2015-06-01 09:00 2015-06-01 11:00 1
112 CV updating 2015-06-01 09:30 2015-06-01 11:30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 4
The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:
client_id activity_type start_date end_date diff seq
112 Interview 2015-06-01 09:00 2015-06-01 11:00 120 1
112 CV updating 2015-06-01 11:00 2015-06-01 11:30 30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 420 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 60 4
The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.
The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.
A Geometric Approach
For another issue, I've taken a geometric approach to date
packing. Namely, I convert dates and times to a sql geometry
type and utilize geometry::UnionAggregate to merge the ranges.
I don't believe this will work in sql-server 2005. But your
problem was such an interesting puzzle that I wanted to see
whether the geometrical approach would work. So any future
users running into this problem that have access to a later
version can consider it.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In 'redate':
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations (This is usually the end goal, but you need more).
I calculate the difference in minutes between activity start and
end dates (I do this first in seconds then divide by 60 for the
sake of a precision issue).
I calculate the cumulative sume of these minutes for each row.
In the outer query:
I align the previous cumulative minutes sum with each current row
I filter for the row where the 5hr goal was met but where the
previous minutes shows that the 5hr goal for the previous row
was not met.
I then calculate where in the current row's range the user has
met the 5 hours, to not only arrive at the date the five hour
goal was met, but the exact time.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #activities -- where I put your data
),
mergeLines as (
select activity_client_id,
lines = geometry::UnionAggregate(line)
from #activities
cross apply (select
startP = geometry::Point(convert(float,activity_start_date), 0, 0),
stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by activity_client_id
),
redate as (
select client_id = activity_client_id,
activities_start_date,
activities_end_date,
minutes,
rollingMinutes = sum(minutes) over(
partition by activity_client_id
order by activities_start_date
rows between unbounded preceding and current row
)
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
activities_start_date = convert(datetime, l.line.STPointN(1).STX),
activities_end_date = convert(datetime, l.line.STPointN(3).STX)
) unprepare
cross apply (select minutes =
round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
) duration
)
select client_id,
activities_start_date,
activities_end_date,
met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date)
from (
select *,
prevRoll = lag(rollingMinutes) over (
partition by client_id
order by rollingMinutes
)
from redate
) ranker
where rollingMinutes >= 60 * 5
and prevRoll < 60 * 5;

Need help with a SQL query selecting date ranges from a table of period quarters

I have a table called Periods that looks like this
PeriodID | PeriodYear | PeriodQuarter
7 | 2009 | 1
8 | 2009 | 2
9 | 2009 | 3
10 | 2009 | 4
11 | 2010 | 1
12 | 2010 | 2
Each row in the table represents 1 of the 4 quarters of the year (like 3-monthly school terms). E.g. The first row represents Period 1 of 2009 (i.e. the date range 1 Jan 2009 - 31 March 2009.
Now I need to write a query that selects rows/periods from the above table, where the period occurs between 2 date ranges, as per the following pseudocode.
select *
from Periods
where Period is between #startDate and #endDate
The query will be used inside a table-valued function called dbo.GetPeriodsFromDateRange, and #startDate and #endDate are parameters to the function.
I'm stuck and can't figure out how to do it. Please help. This applies to T-SQL (MS SQL Server 2000/2005)
Try
select *
from Periods
where dateadd(qq,PeriodQuarter-1,dateadd(yy,PeriodYear -1900,0))
between #startDate and #endDate
A seek instead of a scan is possible:
SELECT *
FROM Periods
WHERE
PeriodYear BETWEEN Year(#startdate) AND Year(#enddate)
AND PeriodYear * 4 + PeriodQuarter
BETWEEN Year(#startdate) * 4 + DATEPART(Quarter, #startdate)
AND Year(#startdate) * 4 + DATEPART(Quarter, #enddate)
Explanation:
I'm composing a new, scaled integer from two component pieces, the year and the quarter, treating each combination of year and quarter as a single number.
Imagine instead that I had done it this way:
AND PeriodYear + (PeriodQuarter - 1) / 4.0
BETWEEN Year(#startdate) + (DATEPART(Quarter, #startdate) - 1) / 4.0
AND Year(#startdate) + (DATEPART(Quarter, #enddate) - 1) / 4.0
Calling my original expression "Mult" and this new one "Div", here are some years and quarters and what those expressions will evaluate to:
Year Qtr Div Mult
2009 1 2009.00 8037
2009 2 2009.25 8038
2009 3 2009.50 8039
2009 4 2009.75 8040
2010 1 2010.00 8041
2010 2 2010.25 8042
2010 3 2010.50 8043
So now if we run a WHERE clause against these rows:
WHERE Div BETWEEN 2009.25 AND 2010.00
You can see how it will return the correct rows. The Mult version really does exactly the same, just scaling the year up instead of the quarter down. The reason I used it is because integer math and multiplication are faster than fractional math and division.
The reason that I use two conditions starting with just the year is to make the query sargable. We want to do the seek based on just year, which isn't possible if we're multiplying it by 4 or doing other math on it. So we get the scan into only the right years first, then fine tune it to eliminate any quarters that shouldn't be in the result.
Another option is to add a calculated column and put an index on it. This wouldn't require any changes to code inserting or updating (as long as they properly use column lists), but would let you do regular range math as you desire.
I would be tempted to add 2 further columns to the table...
StartDate and EndDate - these will store the date that each period starts and ends (i.e. in your example StartDate=1st Jan 2009 and EndDate=31st March 2009)
This will give you more flexibility if the quarters are defined differently than you have suggested.
If you do this, then the query become fairly simple...
select *
from Periods
where #startDate<Periods.StartDate and #endDate>Periods.EndDate
This is assuming you only want to include Periods which are completely encapsulated between #StartDate and #EndDate. If you want Periods that overlap then try something like...
select *
from Periods
where #EndDate>Periods.StartDate and #StartDate<Periods.EndDate

Resources