Given a table which consists of:
ID_User, Date
I'd like to find the ratio between every two consecutive days,
The ratio between the same people who attended day x and day x+1.
i'll give an example:
let's say :
Bill 12155 2018-05-01
Jim 52135 2018-05-01
Homer 52135 2018-05-01
Jecki 56135 2018-05-01
Michael 45644 2018-05-02
Jim 52135 2018-05-02
Jessy 45645 2018-05-02
Homer 52135 2018-05-02
So the ratio would be 2/4 = 0.5
I tried resolving it on my own for the last day but had some struggles.
I started by grouping by date:
Select Date, ID_USER
GROUP BY DATE, ID_USER
ORDER BY DATE, ID_USER
can someone please give me some pointers,
Thank you all!
Try this:
SELECT t1.[Date],
( CONVERT(decimal, SUM(CASE WHEN t2.[ID] IS NOT NULL THEN 1 ELSE 0 END) ) / COUNT(t1.[ID]) ) AS [Ratio]
FROM #YourTbl t1
LEFT OUTER JOIN #YourTbl t2 ON t2.[ID] = t1.[ID] AND t2.[Date] = DATEADD(DAY, 1, t1.[Date])
GROUP BY t1.[Date]
Group your data by the first Date (in your sample, 05-01-2018).
Then, self-join the table by doing a LEFT OUTER JOIN so you have the full list of data and a second list of only the data where the same user (based on ID) is in the data again for the next day (DATEADD( DAY, 1, ... )).
Then you can tell if any user has attended two days in a row based on a given date by checking any field in t2 to be NULL.
To get a ratio of Users who attended t1.[Date] and the next date t2.[Date], total up the users in t2 where the ID is NOT NULL and divide it by the total count of users for that day in t1. Now, since SUM returns an INT in this case and you need a decimal, CONVERT the SUM to DECIMAL and you will get a decimal number.
Here are the results for your sample data: Note: After changing the ID of either Jim or Homer since they originally had the same ID.
Date Ratio
2018-05-01 0.50000000000
2018-05-02 0.00000000000
The self-join solution is valid. You might try this approach as well:
with data as (
select "date",
case when dateadd(day, 1, "date") =
lead("date") over (partition by id order by "date")
then 1 end as returned
from T
)
select "date", count(returned) * 1. / count(*) as ratio
from data
group by "date";
If you want to eliminate the final date since it's always zero, you could easily add case when "date" <> max("date") over () then 1 end as notfinal and filter based on that.
https://rextester.com/HHL82126
Related
I have 3 columns in Invoice table.
InvoicePeriod
InvoiceType
Fees
I have data like this:
InvoicePeriod InvoiceType Fees
2020-06-30 ABC 10.0
2020-06-30 ABC 40.0
2020-06-30 ABC 32.0
2020-09-30 ABC 5.0
2020-09-30 XYZ 30.0
2020-12-31 ABC 20.0
2020-12-31 ABC 10.0
2021-01-31 XYZ 60.0
2021-02-01 DEF 36.0
Now I want the last(max) of invoice period of each invoice type and the summation of fees of previous dates.
Output:
InvoicePeriod InvoiceType Fees
2020-12-31 ABC 87.0
2021-01-31 XYZ 30.0
2021-02-01 DEF 0.0
How can I achieve this?
Thanks,
Ankit
You want to group by InvoiceType (since you want one row per type) and you want the aggregate functions max and sum to combine values within those groups.
So
SELECT MAX(InvoicePeriod), InvoiceType, SUM(Fees)
FROM mytable
GROUP BY InvoiceType
Edited to exclude the fees that match the max date, now that I understand the problem better:
SELECT t2.MaxPeriod, t2.InvoiceType, SUM(CASE WHEN t1.InvoicePeriod < t2.MaxPeriod THEN t1.Fees ELSE 0 END)
FROM test t1 INNER JOIN
(
SELECT MAX(InvoicePeriod) MaxPeriod, InvoiceType
FROM test
GROUP BY InvoiceType
) t2 ON t1.InvoiceType = t2.InvoiceType
GROUP BY t2.MaxPeriod, t2.InvoiceType
There are different ways of doing this, but I think the above does what you want so you could build off of it. The inner query gets the max InvoicePeriod for each InvoiceType. The outer query uses that and also sums the Fees when the date is less than the max for that group.
I think this is what you're looking for.
SELECT
MAX(in_main.InvoicePeriod) AS InvoicePeriod
, InvoiceType
/* Subtract out fees on last invoice date*/
, SUM(Fees) - (
SELECT COALESCE(SUM(Fees), 0)
FROM Invoice in_sub
WHERE (
in_sub.InvoiceType = in_main.InvoiceType
AND
in_sub.InvoicePeriod = MAX(in_main.InvoicePeriod)
)
) AS Fees
FROM Invoice in_main
GROUP BY InvoiceType
http://sqlfiddle.com/#!18/288adf/2/0
Steps:
Aggregate per period and type.
Get the sum of a type's previous periods.
Use TOP WITH TIES in combination with ROW_NUMBER in order to keep all types' last periods.
The query:
select top(1) with ties
invoiceperiod,
invoicetype,
coalesce(sum(sum(fees)) over (
partition by invoicetype
order by invoiceperiod
rows between unbounded preceding and 1 preceding
), 0.0) as sum_fees
from invoice
group by invoiceperiod, invoicetype
order by row_number() over (partition by invoicetype order by invoiceperiod desc);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=6a651f85961b27be687026c2ce73c8f9
I have a table with a column for ID, StartDate, EndDate, And whether or not there was a gap between the enddate of that row and the next start date. If there was only one set instance of that ID i know that I could just do
SELECT min(startdate),max(enddate)
FROM table
GROUP BY ID
However, I have multiple instances of these IDs in several non-connected timespans. So if I were to do that I would get the very first start date and the last enddate for a different set of time for that personID. How would I go about making sure I get the min a max dates for the specific blocks of time?
I thought about potentially creating a new column where it would have a number for each set of time. So for the first set of time that has no gaps, it would have 1, then when the next row has a gap it will add +1 corresponding to a new set of time. but I am not really sure how to go about that. Here is some sample data to illustrate what I am working with:
ID StartDate EndDate NextDate Gap_ind
001 1/1/2018 1/31/2018 2/1/2018 N
001 2/1/2018 2/30/2018 3/1/2018 N
001 3/1/2018 3/31/2018 5/1/2018 Y
001 5/1/2018 5/31/2018 6/1/2018 N
001 6/1/2018 6/30/2018 6/30/2018 N
This is a classic "gaps and islands" problem, where you are trying to define the boundaries of your islands, and which you can solve by using some windowing functions.
Your initial effort is on track. Rather than getting the next start date, though, I used the previous end date to calculate the groupings.
The innermost subquery below gets the previous end date for each of your date ranges, and also assigns a row number that we use later to keep our groupings in order.
The next subquery out uses the previous end date to identify which groups of date ranges go together (overlap, or nearly so).
The outermost query is the end result you're looking for.
SELECT
Grp.ID,
MIN(Grp.StartDate) AS GroupingStartDate,
MAX(Grp.EndDate) AS GroupingEndDate
FROM
(
SELECT
PrevDt.ID,
PrevDt.StartDate,
PrevDt.EndDate,
SUM(CASE WHEN DATEADD(DAY,1,PrevDt.PreviousEndDate) >= PrevDt.StartDate THEN 0 ELSE 1 END)
OVER (PARTITION BY PrevDt.ID ORDER BY PrevDt.RN) AS GrpNum
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate, EndDate) as RN,
ID,
StartDate,
EndDate,
LAG(EndDate,1) OVER (PARTITION BY ID ORDER BY StartDate) AS PreviousEndDate
FROM
tbl
) AS PrevDt
) AS Grp
GROUP BY
Grp.ID,
Grp.GrpNum;
Results:
+-----+------------------+--------------+
| ID | InitialStartDate | FinalEndDate |
+-----+------------------+--------------+
| 001 | 2018-01-01 | 2018-03-01 |
| 001 | 2018-05-01 | 2018-06-01 |
+-----+------------------+--------------+
SQL Fiddle demo.
Further reading:
The SQL of Gaps and Islands in Sequences
Gaps and Islands Across Date Ranges
This is an example of a gaps-and-islands problem. A simple solution is to use lag() to determine if there are overlaps. When there is none, you have the start of a group. A cumulative sum defines the group -- and you aggregate on that.
select t.id, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate)
then 0 else 1
end) over (partition by id order by startdate) as grp
from (select t.*, lag(enddate) over (partition by id order by startdate) as prev_enddate
from t
) t
) t
group by id, grp;
I have a table that holds tasks. Each task has an allotted number of hours that it's supposed to take to complete the task.
I'm storing the data in a table, like so:
declare #fromtable table (recordid int identity(1,1), orderdate date, deptid int, task varchar(500), estimatedhours int);
I also have a function that calculates the completion date of the task, based on the start date, estimated hours, and department, and some other math that computes headcount, hours available to work, etc.
dbo.fn_getCapEndDate(aStartDate,estimatedHours,deptID)
I need to generate the start and end date for each record in #fromtable. The first record will start with column orderdate as the start date for the computation, then each subsequent record will use the previous record's computedEndDate as their start date.
What I'm trying to achieve:
Here's what I have started with:
with MyCTE as
(
select mt.recordID, mt.deptID, mt.estimatedhours, mt.JobNumber, ROW_NUMBER() over (order by recordID) as RowNum,
convert(date,mt.orderdate) as computedStart,
case when mt.recordID = 1 then convert(date,dbo.fn_getCapEndDate(mt.orderdate,mt.estimatedhours,mt.deptid)) end as computedEnd
from #fromtable mt
)
select c1.*, c2.recordID,
case when c2.recordid is null then c1.computedStart else c2.computedEnd end as StartDate,
case when c2.recordid is null then c1.computedEnd else dbo.fn_getCapEndDate(c2.computedEnd,c1.estimatedhours,c1.deptid) end as computedEnd
from MyCTE c1
left join MyCTE c2 on c1.RowNum = c2.RowNum + 1;
With this, the first two columns have the correct start/end dates. Every column after that computes NULL for its start and end values. It "loses" the value of the previous column's computed end date.
What can I do to fix the issue and return the values as needed?
EDIT: Sample data in text format:
estimatedhours OrderDate
0 1/1/2017
0 1/1/2017
0 1/1/2017
0 1/1/2017
500 1/1/2017
32 1/1/2017
0 1/1/2017
0 1/1/2017
320 1/1/2017
0 1/1/2017
5 1/1/2017
0 1/1/2017
4 1/1/2017
You can use lead as below:
select RecordId, EstimatedHours, StartDate,
ComputedEnd = LEAD(StartDate) over (order by RecordId)
From yourTable
I have a list of accounts and their cost which changes every few days.
In this list I only have the start date every time the cost updates to a new one, but no column for the end date.
Meaning, I need to populate a list of dates when the end date for a specific account and cost, should be deduced as the start date of the same account with a new cost.
More or less like that:
Account start date cost
one 1/1/2016 100$
two 1/1/2016 150$
one 4/1/2016 200$
two 3/1/2016 200$
And the result I need would be:
Account date cost
one 1/1/2016 100$
one 2/1/2016 100$
one 3/1/2016 100$
one 4/1/2016 200$
two 1/1/2016 150$
two 2/1/2016 150$
two 3/1/2016 200$
For example, if the cost changed in the middle of the month, than the sample data will only hold two records (one per each unique combination of account-start date-cost), while the results will hold 30 records with the cost for each and every day of the month (15 for the first cost and 15 for the second one). The costs are a given, and no need to calculate them (inserted manually).
Note the result contains more records because the sample data shows only a start date and an updated cost for that account, as of that date. While the results show the cost for every day of the month.
Any ideas?
Solution is a bit long.
I added an extra date for test purposes:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500) -- extra row
;WITH CTE as
( SELECT
row_number() over (partition by account order by startdate) rn,
*
FROM #t
),N(N)AS
(
SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))M(N)
),
tally(N) AS -- tally is limited to 1000 days
(
SELECT ROW_NUMBER()OVER(ORDER BY N.N) - 1 FROM N,N a,N b
),GROUPED as
(
SELECT
cte.account, cte.startdate, cte.cost, cte2.cost cost2, cte2.startdate enddate
FROM CTE
JOIN CTE CTE2
ON CTE.account = CTE2.account
and CTE.rn = CTE2.rn - 1
)
-- used DISTINCT to avoid overlapping dates
SELECT DISTINCT
CASE WHEN datediff(d, startdate,enddate) = N THEN cost2 ELSE cost END cost,
dateadd(d, N, startdate) startdate,
account
FROM grouped
JOIN tally
ON datediff(d, startdate,enddate) >= N
Result:
cost startdate account
100 2016-01-01 one
100 2016-01-02 one
100 2016-01-03 one
150 2016-01-01 two
150 2016-01-02 two
200 2016-01-03 two
200 2016-01-04 one
200 2016-01-04 two
200 2016-01-05 two
500 2016-01-06 two
Thank you #t-clausen.dk!
It didn't solve the problem completely, but did direct me in the correct way.
Eventually I used the LEAD function to generate an end date for every cost per account, and then I was able to populate a list of dates based on that idea.
Here's how I generate the end dates:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500)
select account
,[startdate]
,DATEADD(DAY, -1, LEAD([Startdate], 1,'2100-01-01') OVER (PARTITION BY account ORDER BY [Startdate] ASC)) AS enddate
,cost
from #t
It returned the expected result:
account startdate enddate cost
one 2016-01-01 2016-01-03 100
one 2016-01-04 2099-12-31 200
two 2016-01-01 2016-01-02 150
two 2016-01-03 2016-01-05 200
two 2016-01-06 2099-12-31 500
Please note that I set the end date of current costs to be some date in the far future which means (for me) that they are currently active.
I have a table in MSSQL with the following structure:
PersonId
StartDate
EndDate
I need to be able to show the number of distinct people in the table within a date range or at a given date.
As an example i need to show on a daily basis the totals per day, e.g. if we have 2 entries on the 1st June, 3 on the 2nd June and 1 on the 3rd June the system should show the following result:
1st June: 2
2nd June: 5
3rd June: 6
If however e.g. on of the entries on the 2nd June also has an end date that is 2nd June then the 3rd June result would show just 5.
Would someone be able to assist with this.
Thanks
UPDATE
This is what i have so far which seems to work. Is there a better solution though as my solution only gets me employed figures. I also need unemployed on another column - unemployed would mean either no entry in the table or date not between and no other entry as employed.
CREATE TABLE #Temp(CountTotal int NOT NULL, CountDate datetime NOT NULL);
DECLARE #StartDT DATETIME
SET #StartDT = '2015-01-01 00:00:00'
WHILE #StartDT < '2015-08-31 00:00:00'
BEGIN
INSERT INTO #Temp(CountTotal, CountDate)
SELECT COUNT(DISTINCT PERSON.Id) AS CountTotal, #StartDT AS CountDate FROM PERSON
INNER JOIN DATA_INPUT_CHANGE_LOG ON PERSON.DataInputTypeId = DATA_INPUT_CHANGE_LOG.DataInputTypeId AND PERSON.Id = DATA_INPUT_CHANGE_LOG.DataItemId
LEFT OUTER JOIN PERSON_EMPLOYMENT ON PERSON.Id = PERSON_EMPLOYMENT.PersonId
WHERE PERSON.Id > 0 AND DATA_INPUT_CHANGE_LOG.Hidden = '0' AND DATA_INPUT_CHANGE_LOG.Approved = '1'
AND ((PERSON_EMPLOYMENT.StartDate <= DATEADD(MONTH,1,#StartDT) AND PERSON_EMPLOYMENT.EndDate IS NULL)
OR (#StartDT BETWEEN PERSON_EMPLOYMENT.StartDate AND PERSON_EMPLOYMENT.EndDate) AND PERSON_EMPLOYMENT.EndDate IS NOT NULL)
SET #StartDT = DATEADD(MONTH,1,#StartDT)
END
select * from #Temp
drop TABLE #Temp
You can use the following query. The cte part is to generate a set of serial dates between the start date and end date.
DECLARE #ViewStartDate DATETIME
DECLARE #ViewEndDate DATETIME
SET #ViewStartDate = '2015-01-01 00:00:00.000';
SET #ViewEndDate = '2015-02-25 00:00:00.000';
;WITH Dates([Date])
AS
(
SELECT #ViewStartDate
UNION ALL
SELECT DATEADD(DAY, 1,Date)
FROM Dates
WHERE DATEADD(DAY, 1,Date) <= #ViewEndDate
)
SELECT [Date], COUNT(*)
FROM Dates
LEFT JOIN PersonData ON Dates.Date >= PersonData.StartDate
AND Dates.Date <= PersonData.EndDate
GROUP By [Date]
Replace the PersonData with your table name
If startdate and enddate columns can be null, then you need to add
addditional conditions to the join
It assumes one person has only one record in the same date range
You could do this by creating data where every start date is a +1 event and end date is -1 and then calculate a running total on top of that.
For example if your data is something like this
PersonId StartDate EndDate
1 20150101 20150201
2 20150102 20150115
3 20150101
You first create a data set that looks like this:
EventDate ChangeValue
20150101 +2
20150102 +1
20150115 -1
20150201 -1
And if you use running total, you'll get this:
EventDate Total
2015-01-01 2
2015-01-02 3
2015-01-15 2
2015-02-01 1
You can get it with something like this:
select
p.eventdate,
sum(p.changevalue) over (order by p.eventdate asc) as total
from
(
select startdate as eventdate, sum(1) as changevalue from personnel group by startdate
union all
select enddate, sum(-1) from personnel where enddate is not null group by enddate
) p
order by p.eventdate asc
Having window function with sum() requires SQL Server 2012. If you're using older version, you can check other options for running totals.
My example in SQL Fiddle
If you have dates that don't have any events and you need to show those too, then the best option is probably to create a separate table of dates for the whole range you'll ever need, for example 1.1.2000 - 31.12.2099.
-- Edit --
To get count for a specific day, it's possible use the same logic, but just sum everything up to that day:
declare #eventdate date
set #eventdate = '20150117'
select
sum(p.changevalue)
from
(
select startdate as eventdate, 1 as changevalue from personnel
where startdate <= #eventdate
union all
select enddate, -1 from personnel
where enddate < #eventdate
) p
Hopefully this is ok, can't test since SQL Fiddle seems to be unavailable.