Get most recent date IF that date is before given date - sql-server

I have a table called 'Days' set up like this (The Day column is type DateTime):
User_ID | Day
----------------------------------
39 | 2010-05-16 00:00:00.000
39 | 2009-05-16 00:00:00.000
40 | 2008-06-12 00:00:00.000
40 | 2008-10-07 00:00:00.000
41 | 2010-04-10 00:00:00.000
41 | 2010-03-02 00:00:00.000
42 | 2010-08-12 00:00:00.000
42 | 2011-09-15 00:00:00.000
What I'm trying to do is, when given a User_ID and a certain date, get the most recent "Day" IF and only IF that user does not have any "Day" equal to or after the given date.
For example, given User_ID = 39 and date = 2010-05-15, nothing should be returned since one of the "Day" dates listed for User_ID = 39 is after the given date 2010-05-15. However, if User_ID = 40, it would return 2008-10-07, since that is the most recent date that is NOT equal to OR after the given date 2010-05-15.
I have this so far, but I have no clue how to add in the "if" clause. Below just returns the most recent date for the given user.
SELECT MAX(Day) from Days WHERE User_ID = 39
After that it would be something like if MAX(Day) < 2010-05-15
What's the correct way to do this?

Try this:
SELECT MAX([Day]) as TheDay FROM Days
WHERE User_ID = 39
AND [Day] < '20100515'
AND NOT EXISTS (SELECT 1 FROM Days WHERE User_ID = 39 AND [DAY] >= '20100515')
With CTE:
with CTE(User_ID, [Day]) as
(
SELECT * FROM Days WHERE User_ID = 39
)
SELECT MAX([Day]) as TheDay FROM CTE
WHERE [Day] < '20100515'
AND NOT EXISTS (SELECT 1 FROM CTE WHERE [DAY] >= '20100515')

Untested:
General Logic: Generate a set of data having the max date for each user then limit that set to be only those records with dates less than date desired and for desired user.
One approach is to use an inline view and a self join. In my example I called the inline view B. in it I get max date and user from days.
Then, join back to days based on user and day listed to only return a user's max day. We then limit to user and max date desired. If max date is less than date desired then a record would be returned. If max date is >= date provided it will return no record for the user.
SELECT d.day
FROM days d
INNER join (SELECT user_ID, max(date) mdate
FROM days
GROUP BY user_ID) B
on d.user_ID = b.user_ID
and b.mdate = d.day
WHERE d.day < 'inputdate'
and d.user_ID = 'userid'
or perhaps more simply...
Since you want the max date for each user, but only if their date is not greater than a desired dates... simply return the max date for any user, then eliminate those > desired date for desired user.
SELECT mdate
FROM (SELECT user_ID, max(date) mdate
FROM days
GROUP BY user_ID) B
WHERE mdate <= 'datedesired'
and user_ID = 'user_ID_Desired'

You need to add day condition in WHERE Clause
SELECT MAX(Day) from Days WHERE User_ID = 39 AND Day < yourInputDate

Related

get unique records existing only once per day in a date range

I have a case where I want to extract the device ids (DIDs) that exist only and only once for each day in a certain period. I have tried different methods and partitions but I seem to only be able to get that data individually per day (where date = X, but I need a query where I can put where date between X & Y)
Example, this is the data:
DID date
A 2019-01-01
A 2019-01-01
A 2019-01-02
A 2019-01-03
B 2019-01-01
B 2019-01-02
B 2019-01-03
C 2019-01-01
C 2019-01-02
C 2019-01-02
C 2019-01-03
D 2019-01-01
D 2019-01-02
D 2019-01-03
The query should return only B & D(because B & D exists once in each day from 01 to 03)
I also wish to get the count, which would be 2 in this case
thanks!
You want the devices to exist only once on each day of the period, so if you group by did you need to return the dids that have count(date) and count(distinct date) equal to the number of days of that period:
select did
from tablename
where date between cast('2019-01-01' as date) and cast('2019-01-03' as date)
group by did
having
count(distinct date) = cast('2019-01-03' as date) - cast('2019-01-01' as date) + 1
and
count(date) = cast('2019-01-03' as date) - cast('2019-01-01' as date) + 1
See the demo.
Or:
select t.did
from (
select did, date
from tablename
where date between cast('2019-01-01' as date) and cast('2019-01-03' as date)
group by did, date
having count(*) = 1
)t
group by t.did
having count(*) = cast('2019-01-03' as date) - cast('2019-01-01' as date) + 1
See the demo.
Result:
| did |
| --- |
| B |
| D |
One option would be to aggregate by DID and assert that the total count is equal to the count of distinct dates. If this assertion passes, it means that a given DID has only distinct dates present.
SELECT DID
FROM yourTable
GROUP BY DID
HAVING COUNT(date) = COUNT(DISTINCT date);
Demo
If you want to get the total count of matching DID, then you could subquery the above and take COUNT(*). Or, if you wanted to use the same query you might try:
SELECT DID, COUNT(*) OVER () AS total_cnt
FROM yourTable
GROUP BY DID
HAVING COUNT(date) = COUNT(DISTINCT date);

how to get record for which given date falls between two dates of same column in PostgreSql

My table is having data e.g. empcode designation code and promotion date, I want to get what was an employee's designation on some given date. for eg.
EmpCode DesignationCode PromotionDate
101 50 2010-01-25
101 10 2014-01-01
101 11 2015-01-01
102 10 2009-10-01
103 15 2015-01-01
now if I check designation as on 2014-02-01 it should give result as following
EmpCode DesignationCode PromotionDate
101 10 2014-01-01
102 10 2009-10-01
Can anyone please tell what query should I write ?
Thanks in Advance.
You can try:
SELECT DISTINCT ON (EmpCode) EmpCode, DesignationCode, PromotionDate
FROM mytable
WHERE PromotionDate <= '2014-02-01'
ORDER BY EmpCode, PromotionDate DESC
The query first filters out any records having a PromotionDate that is past given date, i.e. '2014-02-01'.
Using DISTINCT ON (EmpCode) we get one row per EmpCode. This is the one having the most recent PromotionDate (this is achieved by placing PromotionDate DESC in the ORDER BY clause).
Demo here

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities.
I have a database view that unions data from all of these tables giving us the ActivityView that looks like this.
As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.
+----------------------+---------------+---------------------+-------------------+
| activity_client_id | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
| 112 | Interview | 2015-06-01 09:00 | 2015-06-01 11:00 |
| 112 | CV updating | 2015-06-01 09:30 | 2015-06-01 11:30 |
| 112 | Course | 2015-06-02 09:00 | 2015-06-02 16:00 |
| 112 | Interview | 2015-06-03 09:00 | 2015-06-03 10:00 |
+----------------------+---------------+---------------------+-------------------+
Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:
+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
| 112 | 2015-05-20 |
+-----------+---------------------+
I need to create a report that will show the following columns:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible.
So this report will tell us how long from sign up does it take each client to achieve this figure.
What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:
In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30.
For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.
So the report for our sample client would give the following result:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
| 112 | 2015-05-20 | 2015-06-02 |
+-----------+---------------------+--------------------------------------------+
So my question is how can I create the report using a select statement ?
I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table.
But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.
I am using SQL Server 2005.
See SQL Fiddle here.
with tbl as (
-- this will generate daily merged ovelaping time
select distinct
a.id
,(
select min(x.starttime)
from act x
where x.id=a.id and ( x.starttime between a.starttime and a.endtime
or a.starttime between x.starttime and x.endtime )
) start1
,(
select max(x.endtime)
from act x
where x.id=a.id and ( x.endtime between a.starttime and a.endtime
or a.endtime between x.starttime and x.endtime )
) end1
from act a
), tbl2 as
(
-- this will add minute and total minute column
select
*
,datediff(mi,t.start1,t.end1) mi
,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
from tbl t
), tbl3 as
(
-- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
select
t.id
,min(t.start1) starttime
,min(case when t.totalmi>300 then t.end1 else null end) endtime
from tbl2 t
group by t.id
)
-- final result
select *
from tbl3
where endtime is not null
This is one way to do it:
;WITH CTErn AS (
SELECT activity_client_id, activity_type,
activity_start_date, activity_end_date,
ROW_NUMBER() OVER (PARTITION BY activity_client_id
ORDER BY activity_start_date) AS rn
FROM activities
),
CTEdiff AS (
SELECT c1.activity_client_id, c1.activity_type,
x.activity_start_date, c1.activity_end_date,
DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id
ORDER BY x.activity_start_date) AS seq
FROM CTErn AS c1
LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
CROSS APPLY (SELECT CASE
WHEN c1.activity_start_date < c2.activity_end_date
THEN c2.activity_end_date
ELSE c1.activity_start_date
END) x(activity_start_date)
)
SELECT TOP 1 client_id, client_sign_up_date, activity_start_date,
hoursOfActivicty
FROM CTEdiff AS c1
INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id
CROSS APPLY (SELECT SUM(diff) / 60.0
FROM CTEdiff AS c3
WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
WHERE hoursOfActivicty >= 5
ORDER BY seq
Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.
Demo here
The first CTE, i.e. CTErn, produces the following output:
client_id activity_type start_date end_date rn
112 Interview 2015-06-01 09:00 2015-06-01 11:00 1
112 CV updating 2015-06-01 09:30 2015-06-01 11:30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 4
The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:
client_id activity_type start_date end_date diff seq
112 Interview 2015-06-01 09:00 2015-06-01 11:00 120 1
112 CV updating 2015-06-01 11:00 2015-06-01 11:30 30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 420 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 60 4
The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.
The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.
A Geometric Approach
For another issue, I've taken a geometric approach to date
packing. Namely, I convert dates and times to a sql geometry
type and utilize geometry::UnionAggregate to merge the ranges.
I don't believe this will work in sql-server 2005. But your
problem was such an interesting puzzle that I wanted to see
whether the geometrical approach would work. So any future
users running into this problem that have access to a later
version can consider it.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In 'redate':
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations (This is usually the end goal, but you need more).
I calculate the difference in minutes between activity start and
end dates (I do this first in seconds then divide by 60 for the
sake of a precision issue).
I calculate the cumulative sume of these minutes for each row.
In the outer query:
I align the previous cumulative minutes sum with each current row
I filter for the row where the 5hr goal was met but where the
previous minutes shows that the 5hr goal for the previous row
was not met.
I then calculate where in the current row's range the user has
met the 5 hours, to not only arrive at the date the five hour
goal was met, but the exact time.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #activities -- where I put your data
),
mergeLines as (
select activity_client_id,
lines = geometry::UnionAggregate(line)
from #activities
cross apply (select
startP = geometry::Point(convert(float,activity_start_date), 0, 0),
stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by activity_client_id
),
redate as (
select client_id = activity_client_id,
activities_start_date,
activities_end_date,
minutes,
rollingMinutes = sum(minutes) over(
partition by activity_client_id
order by activities_start_date
rows between unbounded preceding and current row
)
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
activities_start_date = convert(datetime, l.line.STPointN(1).STX),
activities_end_date = convert(datetime, l.line.STPointN(3).STX)
) unprepare
cross apply (select minutes =
round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
) duration
)
select client_id,
activities_start_date,
activities_end_date,
met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date)
from (
select *,
prevRoll = lag(rollingMinutes) over (
partition by client_id
order by rollingMinutes
)
from redate
) ranker
where rollingMinutes >= 60 * 5
and prevRoll < 60 * 5;

Selecting rows with the nearest date using SQL

I have a SQL statement.
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
I need rows with the nearest date to the 11-11-2012, but the table returns all the values. What am I doing wrong. Thanks
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-08 00:00:00.000 0
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
14 CAR STREET,UDUPI 234 2012-08-14 00:00:00.000 0
279 MADHUGIRI 234 2012-08-08 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
I want to show only the rows with dates less than or equal to the given date. The required result is
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
;WITH x AS
(
SELECT ID, Location, Code, Date, Flag,
rn = ROW_NUMBER() OVER
(PARTITION BY ID, Location, Code ORDER BY [Date] DESC)
FROM dbo.TABLE1 AS t1
WHERE [Date] <= '20121111'
AND ID IN (14, 279) -- sorry, missed this
AND EXISTS (SELECT 1 FROM #TEMP_CODE WHERE CODE = t1.CODE)
)
SELECT ID, Location, Code, Date, Flag
FROM x WHERE rn = 1;
This yields:
ID LOCATION CODE [Date] FLAG
--- ---------------- ---- ---------- ----
14 CAR STREET,UDUPI 234 2012-08-14 0
279 MADHUGIRI 234 2012-08-11 0
This disagrees with your required results, but I think those are wrong and I think you should check them.
Use a subquery to get the max date for each ID, and then join that to your table:
SELECT
ID, LOCATION, CODE, DATE, FLAG
FROM
TABLE1
JOIN (
SELECT ID AS SubID, MAX(DATE) AS SubDATE
FROM TABLE1
WHERE DATE < '11/11/2012'
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY ID
) AS SUB ON ID = SubID AND DATE = SubDATE
add a Order BY DATE LIMIT 0,2
With the order by you will make the date order by the closest to your condition in where and with the limit will return only the top 2 values!
SET ROWCOUNT 2
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
ORDER BY DATE

Resources