Show averages of a dataset, different date/time ranges - sql-server

DB: MS SQL Server 11.0.3156.
I have a table where I record periodic data values. The key columns are:
fldObjectGUID (varchar), fldDataTimestamp (datetime), fldConfigItem (varchar), fldConfigItemValue (numeric)
I want to retrieve data for a different time frame (day, week, month). But to keep the number of returned data pints to a manageable number (e.g. less < 350), Therefore, I'd like to get averages.
For example:
Day - Return all Data (already got this!)
Week - Return the data in hourly average values (e.g. there would be 24 * 1 Hour Averages, * 7 days)
Month - Return the data in 3-hourly average values (e.g. 8 * Average
over 3 hours, * 30)
Yearly - Return the data in daily average values (e.g. 1 * Average
over 24 hours, * 365)
A small example of the data set is shown here:
+--------------------------------------------------------------------------------+
+ fldObjectGUID | fldRecordUpdatedTimestamp | fldConfigItem | fldConfigItemValue |
+ 40010000 | 2015-06-16 18:20:48.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-16 19:22:00.000 | ICMPResponseTime | 15.00 |
+ 40010000 | 2015-06-16 20:22:14.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-17 17:35:19.000 | ICMPResponseTime | 6.00 |
+ 40010000 | 2015-06-17 18:36:26.000 | ICMPResponseTime | 4.00 |
+ 40010000 | 2015-06-28 02:18:31.000 | ICMPResponseTime | 19.00 |
+ 40010000 | 2015-06-28 03:18:54.000 | ICMPResponseTime | 9.00 |
+ 40010000 | 2015-06-02 17:25:16.000 | ICMPResponseTime | 3.00 |
+------------------------------------------------------------------------------------+
Data is added for an object (fldObjectGUID) at different rates. This could be one row every 5 minutes or 1 one row every hour. There can be gaps in the data (hours or even days). I want to graph the fldConfigItemValue data for each object over different time frames; Day (last 24 hours), Week, Month and Year. The periods of the returned data don't need to be exact. So, a month could just be the last 30 days, or just 1 calendar month back from today's date.
The SQL only needs to return data for a single fldObjectGUID and fldConfigItem combination - I'll then amend the SQL at run-time to get the data for the required object/configitem.
There may be gaps in the data, so no data points within a given period. So, the return value can be zero.
I'm retrieving data using Classic ASP, creating the SQL statement and parsing the results. I could achieve the result programatically in my ASP code. So for the 'Week' required set, I could make repeated calls to the DB, using the AVERAGE function, and a WHERE clause to retrieve a subset of records (NOW to NOW - 1 hour). Store the value, then repeat using a WHERE clause for (NOW - 1 hour to NOW - 2 Hours). And just step back in time until I've got all the values for a week. The 'Month' and 'Yearly' routines would be the same, just different timeframes in the WHERE clauses.
However, even to me, this seems a clumsy way of doing it and just one SQL routine (or a different SQL routine for Week, Month and Year) must be quicker and / or more elegant.
At the moment, I have some SQL (from StackOverflow?) that I thought might work and I have my code build up the SQL for the 'Month' view like this (I've hard-coded the fldObjectGUID and fldConfigItem in the example, to make the example clearer):
SELECT top 30 convert(date, l.fldDataTimestamp) as 'fldDataTimestamp_result', l.fldConfigItemValue, l.fldConfigItemValue
FROM tblObjectHealthCheckData_Historic l
INNER JOIN (
SELECT MIN(fldDataTimestamp) first_timestamp
FROM tblObjectHealthCheckData_Historic
where fldObjectGUID = '10050400' and fldConfigItem = 'AvailableRAM'
group by Convert(Date, fldDataTimestamp)
) sub_l ON (sub_l.first_timestamp = l.fldDataTimestamp)
where fldObjectGUID = '10050400' and l.fldConfigItem = 'AvailableRAM'
order by fldDataTimestamp desc
But this gets just the first data point for each day (as you can guess, whilst I do understand SQL and programming, they are a hobby, not something I do for a living) and so I'm struggling to fix this code.
I'm assuming that people agree, it's more efficient doing this in code that making many separate SQL calls - but can anyone help?

I would try with DATEPART function, this way you can get different parts of the fldRecordUpdatedTimestamp and then AVG field fldConfigItemValue.
This goes down to a single Hour of your timestamp (could be minutes, check MSDN for DATEPART in T-SQL), so if you wish to get daily averages per week then you need to include:
day_fldRecordUpdatedTimestamp
week_fldRecordUpdatedTimestamp
this will Average for each day inside each week.
Below example shows average per month - mind the year, if you have more than a years worth of data make sure you include year_fldRecordUpdatedTimestamp etc.
WITH PartsTable As
(
SELECT
fldObjectGUID
, fldRecordUpdatedTimestamp
, fldConfigItem
, fldConfigItemValue
, DATEPART(HOUR, fldRecordUpdatedTimestamp) As hour_fldRecordUpdatedTimestamp
, DATEPART(DAY, fldRecordUpdatedTimestamp) As day_fldRecordUpdatedTimestamp
, DATEPART(WEEK, fldRecordUpdatedTimestamp) As week_fldRecordUpdatedTimestamp
, DATEPART(MONTH, fldRecordUpdatedTimestamp) As month_fldRecordUpdatedTimestamp
, DATEPART(YEAR, fldRecordUpdatedTimestamp) As year_fldRecordUpdatedTimestamp
FROM
YourLogTable
--WHERE
-- Perhaps set a limit here to not get a huge set in the first step.
)
SELECT
COUNT(1) As setcount /* Shows how many rows are in each AVG calculation. */
, fldObjectGUID
, fldConfigItem
, month_fldRecordUpdatedTimestamp /* Change this column for specific span you're intrested in. */
, AVG(fldConfigItemValue) As avg_fldConfigItemValue
FROM
PartsTable
GROUP BY
fldObjectGUID
, fldConfigItem
, month_fldRecordUpdatedTimestamp /* Change this column for specific span you're intrested in. */
;
One final note: make sure you include month_, week_ etc. column in both SELECT and GROUP BY.

Related

Summing over a numeric column with moving window of varying size in Snowflake

I have a sample dataset given as follows;
time | time_diff | amount
time1 | time1-time2 | 1000
time2 | time2-time3 | 2000
time3 | time3-time4 | 3000
time4 | time4-time5 | 4500
time5 | NULL | 1000
Quick explanation; first column gives time of transaction, second column gives difference with next row to get transaction interval(in hours), and third column gives money made in a particular transaction. We have sorted the data in ascending order using time column.
Some values are given as;
time | time_diff | amount
time1 | 2. | 1000
time2 | 3. | 2000
time3 | 1. | 3000
time4 | 19. | 4500
time5 | NULL | 1000
The goal is to find the total transaction for a given time, which occurred within 24 hours of that transaction. For example, the output for time1 shd be; 1000+2000+3000=6000. Because if we add the value at time4, the total time interval becomes 25, hence we omit the value of 4500 from the sum.
Example output:
time | amount
time1 | 6000
time2 | 9500
time3 | 7500
time4 | 4500
time5 | 1000
The concept of Mong window sum should work, in my knowledge, but here the width of the window is variable. Thats the challenge I am facing.Can I kindly get some help here?
You could ignore the time_diff column and use a theta self-join based on a timestamp range, like this:
WITH srctab AS ( SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:00:00') AS "time", 1000::INT AS "amount"
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:02:00'), 2000::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:05:00'), 3000::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:06:00'), 4500::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-16 00:01:00'), 1000::INT
)
SELECT t1."time", SUM(t2."amount") AS tot
FROM srctab t1
JOIN srctab t2 ON t2."time" BETWEEN t1."time" AND TIMESTAMPADD(HOUR, +24, t1."time")
GROUP BY t1."time"
ORDER BY t1."time";
Minor detail: if your second column gives the time difference with the next row then I'd say the first value should be 10500 (not 6000) because it's only your 5th transaction of 1000 which is more than 24 hours ahead... I'm guessing your actual timestamps are at 0, 2, 5, 6 and 25 hours?
Another option might be to use the sliding WINDOW function by tweaking your transactional data to include each hour.
It's perhaps an overkill but might be a useful technique.
Firstly generate a placeholder for each hour using the timestamps. I utilised time_slice to map each timestamp into nice hour blocks and generator with dateadd to back fill each hour putting a zero in where no transactions took place.
So now I can use the sliding window function knowing that I can safely choose the 23 preceding hours.
Copy|Paste|Run
WITH SRCTAB AS (
SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:00:00') AS TRANS_TS, 1000::INT AS AMOUNT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 02:00:00'), 2000::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 05:00:00'), 3000::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 06:00:00'), 4500::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-16 01:00:00'), 1000::INT
)
SELECT
TRANS_TIME_HOUR
,SUM(AMOUNT) OVER ( ORDER BY TRANS_TIME_HOUR ROWS BETWEEN 23 PRECEDING AND 0 PRECEDING ) OVERKILL FROM (
SELECT
TRANS_TIME_HOUR,
SUM(AMOUNT) AMOUNT
FROM
(
SELECT
DATEADD(HOUR, NUMBER, TRANS_TS) TRANS_TIME_HOUR,
DECODE( DATEADD(HOUR, NUMBER, TRANS_TS), TIME_SLICE(TRANS_TS, 1, 'HOUR', 'START'), AMOUNT,0) AMOUNT
FROM
SRCTAB,
(SELECT SEQ4() NUMBER FROM TABLE(GENERATOR(ROWCOUNT => 24)) ) G
)
GROUP BY
TRANS_TIME_HOUR
)

SAS PROC SQL where between 4 mondays ago and last monday

I am trying to sum up the past 4 weeks forecast vs sales data with each week starting on a monday.
For reference today is 8/6 so i want to start collecting their weekly sales and forecast for 7/5 going up until previous week monday, 7/26. I eventually plan to sum this 4wk forecast into one row using a do until last. statement grouping by Store and SKU to where i can use a put statement to make a new column to signal if they sold more than they forecasted or less.
For instance lets pretend they had the below data
|Mon_DT |STORE |SKU |wk_FCST|wk_Sales|
|05July21:00:00:00 | 5 | abc | 10 | 12 |
|12July21:00:00:00 | 5 | abc | 10 | 16 |
|19July21:00:00:00 | 5 | abc | 10 | 7 |
|26July21:00:00:00 | 5 | abc | 10 | 12 |
with the do until last. the forecast will read as 40 and sales as 47 and ill say if sales < forecast then LOWforecast = 'Y';
However, I am having trouble just getting the between statement to work to pull only the last 4 weeks (Starting on monday).
DATA Getweeks;
StartOversell = intnx('week.2',today(),-4);
endoversell = intnx('week.2',today(),-1);
format StartOversell yymmdd10.;
format endoversell yymmdd10.;
Run;
Proc sql;
connect to odbc (dsn='****' uid='****' pwd='***');
create table work.Forecast1 as select distinct * from connection to odbc
(select MON_DT as DATE, Store_Number as Store, PROD_PKG_ID as SKU, FCST AS WK_FCST, SALES AS WK_sales, DIFF_QTY
From FCST
where Mon_DT >= 'StartOversell'd and Mon_DT <= 'endoversell'd );
disconnect from odbc;
quit;
I tried to use a macro variable as well but no luck.
Use macro variables since the code uses explicit pass through and DB is expecting a database compliant date literal. All your SQL must be DB compliant in explicit pass through - not SAS SQL.
If you used MS SQL and it needs dates as "MM/DD/YY" for literals. So I will use the mmddyyS10. format which creates a macro variable that will look like that. You can convert the values using a put() function.
It's also a good idea to include the quotes in the macro variable in this case because Oracle needs single quotes, not double - not sure about MS SQL. The quote() function can be used to add quotes without issue.
DATA Getweeks;
StartOversell = intnx('week.2',today(),-4);
call symputx('startOversell', quote(put(startOverSell, MMDDYYS10.), "'"));
...
Run;
%put &startOversell;
Proc sql;
connect to odbc (dsn='****' uid='****' pwd='***');
create table work.Forecast1 as select distinct * from connection to odbc
(select MON_DT as DATE, Store_Number as Store, PROD_PKG_ID as SKU, FCST AS WK_FCST, SALES AS WK_sales, DIFF_QTY
From FCST
where Mon_DT >= &startOverSell and Mon_DT <= &endOverSell );
disconnect from odbc;
quit;
Edit: you may want to consider what happens if you run it on a Monday and check if your dates align as expected.

How to compare Time value in table with Current Time in SQL?

I have a table named TimeList:
| Slot |
==============
| 10:00 |
| 11:00 |
| 12:00 |
| 13:00 | and so on
That saves the Times in Varchar(5)
The desired result should be showing the rows with time that is more than the current time, for example if the current time is 11:12 A.M. the result should return:
| Slot |
==============
| 12:00 |
| 13:00 |
I tried to Convert the two values into time and comparing them with:
SELECT *
FROM TimeList
WHERE Convert(Time, Slot) > Convert(Time, GETDATE())
But it didn't work saying that Time is not a recognizable format in SQL
Is there anyway I could compare the two time slots?
Depends on the version of SQL Server you're running, I think. There is a CAST(.. as time) in 2012 or later, but I think that's a fairly new development. So... to compare the current date/time with the Timelist where the times are converted to "time, if it were today," something like this should work :
SELECT *
FROM TimeList
WHERE Convert(Datetime, FORMAT (GETDATE(), 'd') + ' ' + Slot) > GETDATE()
Conversely, if you want to compare the times to the current time, as text:
SELECT *
FROM TimeList
WHERE Slot > FORMAT(GETDATE(), N'hh\:mm')
Try This.....
SELECT *
FROM TimeList
WHERE Slot > CONVERT(time,GETDATE())
Thank you very much for all the answers, fortunately I found the answer to my question inspired by your answers.
The solution is:
SELECT *
FROM TimeList
WHERE Slot > CONVERT(varchar(5),GETDATE(), 108)
Where it seems that 108 is the format for time saved as char/varchar in which Slot was categorized as too

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities.
I have a database view that unions data from all of these tables giving us the ActivityView that looks like this.
As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.
+----------------------+---------------+---------------------+-------------------+
| activity_client_id | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
| 112 | Interview | 2015-06-01 09:00 | 2015-06-01 11:00 |
| 112 | CV updating | 2015-06-01 09:30 | 2015-06-01 11:30 |
| 112 | Course | 2015-06-02 09:00 | 2015-06-02 16:00 |
| 112 | Interview | 2015-06-03 09:00 | 2015-06-03 10:00 |
+----------------------+---------------+---------------------+-------------------+
Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:
+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
| 112 | 2015-05-20 |
+-----------+---------------------+
I need to create a report that will show the following columns:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible.
So this report will tell us how long from sign up does it take each client to achieve this figure.
What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:
In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30.
For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.
So the report for our sample client would give the following result:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
| 112 | 2015-05-20 | 2015-06-02 |
+-----------+---------------------+--------------------------------------------+
So my question is how can I create the report using a select statement ?
I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table.
But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.
I am using SQL Server 2005.
See SQL Fiddle here.
with tbl as (
-- this will generate daily merged ovelaping time
select distinct
a.id
,(
select min(x.starttime)
from act x
where x.id=a.id and ( x.starttime between a.starttime and a.endtime
or a.starttime between x.starttime and x.endtime )
) start1
,(
select max(x.endtime)
from act x
where x.id=a.id and ( x.endtime between a.starttime and a.endtime
or a.endtime between x.starttime and x.endtime )
) end1
from act a
), tbl2 as
(
-- this will add minute and total minute column
select
*
,datediff(mi,t.start1,t.end1) mi
,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
from tbl t
), tbl3 as
(
-- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
select
t.id
,min(t.start1) starttime
,min(case when t.totalmi>300 then t.end1 else null end) endtime
from tbl2 t
group by t.id
)
-- final result
select *
from tbl3
where endtime is not null
This is one way to do it:
;WITH CTErn AS (
SELECT activity_client_id, activity_type,
activity_start_date, activity_end_date,
ROW_NUMBER() OVER (PARTITION BY activity_client_id
ORDER BY activity_start_date) AS rn
FROM activities
),
CTEdiff AS (
SELECT c1.activity_client_id, c1.activity_type,
x.activity_start_date, c1.activity_end_date,
DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id
ORDER BY x.activity_start_date) AS seq
FROM CTErn AS c1
LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
CROSS APPLY (SELECT CASE
WHEN c1.activity_start_date < c2.activity_end_date
THEN c2.activity_end_date
ELSE c1.activity_start_date
END) x(activity_start_date)
)
SELECT TOP 1 client_id, client_sign_up_date, activity_start_date,
hoursOfActivicty
FROM CTEdiff AS c1
INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id
CROSS APPLY (SELECT SUM(diff) / 60.0
FROM CTEdiff AS c3
WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
WHERE hoursOfActivicty >= 5
ORDER BY seq
Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.
Demo here
The first CTE, i.e. CTErn, produces the following output:
client_id activity_type start_date end_date rn
112 Interview 2015-06-01 09:00 2015-06-01 11:00 1
112 CV updating 2015-06-01 09:30 2015-06-01 11:30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 4
The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:
client_id activity_type start_date end_date diff seq
112 Interview 2015-06-01 09:00 2015-06-01 11:00 120 1
112 CV updating 2015-06-01 11:00 2015-06-01 11:30 30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 420 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 60 4
The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.
The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.
A Geometric Approach
For another issue, I've taken a geometric approach to date
packing. Namely, I convert dates and times to a sql geometry
type and utilize geometry::UnionAggregate to merge the ranges.
I don't believe this will work in sql-server 2005. But your
problem was such an interesting puzzle that I wanted to see
whether the geometrical approach would work. So any future
users running into this problem that have access to a later
version can consider it.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In 'redate':
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations (This is usually the end goal, but you need more).
I calculate the difference in minutes between activity start and
end dates (I do this first in seconds then divide by 60 for the
sake of a precision issue).
I calculate the cumulative sume of these minutes for each row.
In the outer query:
I align the previous cumulative minutes sum with each current row
I filter for the row where the 5hr goal was met but where the
previous minutes shows that the 5hr goal for the previous row
was not met.
I then calculate where in the current row's range the user has
met the 5 hours, to not only arrive at the date the five hour
goal was met, but the exact time.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #activities -- where I put your data
),
mergeLines as (
select activity_client_id,
lines = geometry::UnionAggregate(line)
from #activities
cross apply (select
startP = geometry::Point(convert(float,activity_start_date), 0, 0),
stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by activity_client_id
),
redate as (
select client_id = activity_client_id,
activities_start_date,
activities_end_date,
minutes,
rollingMinutes = sum(minutes) over(
partition by activity_client_id
order by activities_start_date
rows between unbounded preceding and current row
)
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
activities_start_date = convert(datetime, l.line.STPointN(1).STX),
activities_end_date = convert(datetime, l.line.STPointN(3).STX)
) unprepare
cross apply (select minutes =
round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
) duration
)
select client_id,
activities_start_date,
activities_end_date,
met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date)
from (
select *,
prevRoll = lag(rollingMinutes) over (
partition by client_id
order by rollingMinutes
)
from redate
) ranker
where rollingMinutes >= 60 * 5
and prevRoll < 60 * 5;

Need help with a SQL query selecting date ranges from a table of period quarters

I have a table called Periods that looks like this
PeriodID | PeriodYear | PeriodQuarter
7 | 2009 | 1
8 | 2009 | 2
9 | 2009 | 3
10 | 2009 | 4
11 | 2010 | 1
12 | 2010 | 2
Each row in the table represents 1 of the 4 quarters of the year (like 3-monthly school terms). E.g. The first row represents Period 1 of 2009 (i.e. the date range 1 Jan 2009 - 31 March 2009.
Now I need to write a query that selects rows/periods from the above table, where the period occurs between 2 date ranges, as per the following pseudocode.
select *
from Periods
where Period is between #startDate and #endDate
The query will be used inside a table-valued function called dbo.GetPeriodsFromDateRange, and #startDate and #endDate are parameters to the function.
I'm stuck and can't figure out how to do it. Please help. This applies to T-SQL (MS SQL Server 2000/2005)
Try
select *
from Periods
where dateadd(qq,PeriodQuarter-1,dateadd(yy,PeriodYear -1900,0))
between #startDate and #endDate
A seek instead of a scan is possible:
SELECT *
FROM Periods
WHERE
PeriodYear BETWEEN Year(#startdate) AND Year(#enddate)
AND PeriodYear * 4 + PeriodQuarter
BETWEEN Year(#startdate) * 4 + DATEPART(Quarter, #startdate)
AND Year(#startdate) * 4 + DATEPART(Quarter, #enddate)
Explanation:
I'm composing a new, scaled integer from two component pieces, the year and the quarter, treating each combination of year and quarter as a single number.
Imagine instead that I had done it this way:
AND PeriodYear + (PeriodQuarter - 1) / 4.0
BETWEEN Year(#startdate) + (DATEPART(Quarter, #startdate) - 1) / 4.0
AND Year(#startdate) + (DATEPART(Quarter, #enddate) - 1) / 4.0
Calling my original expression "Mult" and this new one "Div", here are some years and quarters and what those expressions will evaluate to:
Year Qtr Div Mult
2009 1 2009.00 8037
2009 2 2009.25 8038
2009 3 2009.50 8039
2009 4 2009.75 8040
2010 1 2010.00 8041
2010 2 2010.25 8042
2010 3 2010.50 8043
So now if we run a WHERE clause against these rows:
WHERE Div BETWEEN 2009.25 AND 2010.00
You can see how it will return the correct rows. The Mult version really does exactly the same, just scaling the year up instead of the quarter down. The reason I used it is because integer math and multiplication are faster than fractional math and division.
The reason that I use two conditions starting with just the year is to make the query sargable. We want to do the seek based on just year, which isn't possible if we're multiplying it by 4 or doing other math on it. So we get the scan into only the right years first, then fine tune it to eliminate any quarters that shouldn't be in the result.
Another option is to add a calculated column and put an index on it. This wouldn't require any changes to code inserting or updating (as long as they properly use column lists), but would let you do regular range math as you desire.
I would be tempted to add 2 further columns to the table...
StartDate and EndDate - these will store the date that each period starts and ends (i.e. in your example StartDate=1st Jan 2009 and EndDate=31st March 2009)
This will give you more flexibility if the quarters are defined differently than you have suggested.
If you do this, then the query become fairly simple...
select *
from Periods
where #startDate<Periods.StartDate and #endDate>Periods.EndDate
This is assuming you only want to include Periods which are completely encapsulated between #StartDate and #EndDate. If you want Periods that overlap then try something like...
select *
from Periods
where #EndDate>Periods.StartDate and #StartDate<Periods.EndDate

Resources