Calculate Time and Cost Using Values From Next Row - sql-server

Consider the following data showing the time it takes for engineers to travel to a job. The ChargeBand column shows the various rates charged at different hours of the day. Many engineers can attend one job.
I want to be able to cap the travel time for each engineer to one hour. So even if it takes longer than an hour to travel, the maximum to be paid is one hour travel cost, no more. This is fine and I can do this when the travel time is contained within one charge band using this CASE statement:
CASE
WHEN NumberOfHours > 1.0 AND CallIDStartDate >= '2019-11-01' Then cast((1 * isnull(x.Rate,1)) as
decimal(20,7))
ELSE cast((NumberOfHours * isnull(x.Rate,1)) as decimal(20,7))
END as LabourCost
However the problem is when an hour travel straddles 2 charge bands. This is the issue I am unable to resolve and would like help with. In the first two rows for example on Monday, the engineer travel from 07.46 til 09.41 but his travel cost should be capped at 8.46. So the first 14 mins are charged at 46.62 and the remaining 44 mins at 37.67. How do I do this?
The multiple tuesday entries signify multiple engineers attending. The challenge is also to identify two rows for each engineer as being the 07.46/44 and 08:00 StartTimes and cap the travel charge for one hour as per the Monday example.
I thought to partition the table by day so it was apparent that any StartTime less than the value in the previous row belongs to a different engineer attending the same job but this doesn't help with the calculation itself.
I also thought to use the LEAD() and LAG() functions to calculate the time or charge from the following row value, and perhaps the answer is with them, but I don't know how to apply in the code.

you can try with CTE and LEAD function. This example works if you have two records per employee for each workday
with cte as(
select employeeid, weekday, starttime, finishtime, chargeband, rate firstrate,
lead(starttime,1) over(partition by employeeid, weekday order by starttime) nextstarttime,
lead(finishtime,1) over(partition by employeeid, weekday order by starttime) nextfinishtime,
lead(rate,1) over(partition by employeeid, weekday order by starttime) nextrate
from ratetable)
select employeeid, weekday, case when firsttime>= 1 then firstrate
else firsttime*firstrate+ (case when secondtime>1 then 1-firsttime else secondtime-firsttime end)*nextrate end
from(
select employeeid, weekday, DATEDIFF(second, starttime, finishtime) / 3600.0 firsttime ,firstrate,
DATEDIFF(second, nextstarttime, nextfinishtime) / 3600.0 secondtime,nextrate
from cte where nextstarttime is not null) x

Related

How to find difference between dates and find first purchase in an eCommerce database

I am using Microsoft SQL Server Management Studio. I am trying to measure the customer retention rate of an eCommerce site.
For this, I need four values:
customer_id
order_purchase_timestamp
age_by_month
first_purchase
The values of age_by_month and first_purchase are not in my database. I want to calculate them.
In my database, I have customer_id and order_purchase_timestamp.
The first_purchase should be the earliest instance of order_purchase_timestamp. I only want the month and year.
The age_by_month should be the difference of months from first_purchase to order_purchase_timestamp.
I only want to measure the retention of the customer for each month so if two purchases are made in the same month it shouldn't be shown.
the dates are between 2016-10-01 to 2018-09-30. it should be ordered by order_purchase_timestamp
An example
customer_id
order_purchase_timestamp
1
2016-09-04
2
2016-09-05
3
2016-09-05
3
2016-09-15
1
2016-10-04
to
customer_id
first_purchase
age_by_month
order_purchase_timestamp
1
2016-09
0
2016-09-04
2
2016-09
0
2016-09-05
3
2016-09
0
2016-09-05
1
2016-09
1
2016-10-04
What I have done
SELECT
customer_id, order_purchase_timestamp
FROM
orders
WHERE
(order_purchase_timestamp BETWEEN '2016-10-01' AND '2016-12-31')
OR (order_purchase_timestamp BETWEEN '2017-01-01' AND '2017-03-31')
OR (order_purchase_timestamp BETWEEN '2017-04-01' AND '2017-06-30')
OR (order_purchase_timestamp BETWEEN '2017-07-01' AND '2017-09-30')
OR (order_purchase_timestamp BETWEEN '2017-10-01' AND '2017-12-31')
OR (order_purchase_timestamp BETWEEN '2018-01-01' AND '2018-03-31')
OR (order_purchase_timestamp BETWEEN '2018-04-01' AND '2018-06-30')
OR (order_purchase_timestamp BETWEEN '2018-07-01' AND '2018-09-30')
ORDER BY
order_purchase_timestamp
Originally I was going to do it by quarters but I want to do it in months now.
The following approach is designed to be relatively easy to understand. There are other ways (e.g., windowed functions) that may be marginally more efficient; but this makes it easy to maintain at your current SQL skill level.
Note that the SQL commands below build on one another (so the answer is at the end). To follow along, here is a db<>fiddle with the working.
It's based around a simple query (which we'll use as a sub-query) that finds the first order_purchase_timestamp for each customer.
SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
The next thing is DATEDIFF to find the difference between 2 dates.
Then, you can use the above as a subquery to get the first date onto each row - then find the date difference e.g.,
SELECT orders.customer_id,
orders.order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, orders.order_purchase_timestamp) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
Note - DATEDIFF has a 'gotcha' that gets most people but is good for you - when comparing months, it ignores the day component e.g., if finding the difference in months, there is 0 difference in months between 1 Jan and 31 Jan. On the other hand, there will be a difference on 1 month between 31 Jan and 1 Feb. However, I think this is actually what you want!
The above, however, repeats when a customer has multiple purchases within the month (it has one row per purchase). Instead, we can GROUP BY to group by the month it's in, then only take the first purchase for that month.
A 'direct' approach to this would be to group on YEAR(orders.order_purchase_timestamp) AND MONTH(orders.order_purchase_timestamp). However, I use a little trick below - using EOMONTH which finds the last day of the month. EOMONTH returns the same date for any date in that month; therefore, we can group by that.
Finally, you can add the WHERE expression and ORDER BY to get the results you asked for (between the two dates)
SELECT orders.customer_id,
MIN(orders.order_purchase_timestamp) AS order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders AS orders_ref
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
WHERE orders.order_purchase_timestamp BETWEEN '20161001' AND '20180930'
GROUP BY orders.customer_id, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)
ORDER BY order_purchase_timestamp;
Results - note they are different from yours because you wanted the earliest date to be 1/10/2016.
customer_id order_purchase_timestamp first_purchase_date age_by_month
1 2016-10-04 00:00:00.000 2016-09-04 00:00:00.000 1
Edit: Because someone else will do it like this otherwise!
You can do this with a single read-through that will potentially run a little faster. It is also a bit shorter - but harder to understand imo.
The below uses windows functions to calculate both the customer's earliest purchase, and the earliest purchase for each month (and uses DISTINCT rather than a GROUP BY). With that, it just does the DATEDIFF to calculate the difference.
WITH monthly_orders AS
(SELECT DISTINCT orders.customer_id,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id, EOMONTH(orders.order_purchase_timestamp)) AS order_purchase_timestamp,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id) AS first_purchase_date
FROM orders)
SELECT *, DATEDIFF(month, first_purchase_date, order_purchase_timestamp) AS age_by_month
FROM monthly_orders
WHERE order_purchase_timestamp BETWEEN '20161001' AND '20180930';
Note however this has one difference in the results. If you have 2 orders in a month, and your lowest date filter is between the to (e.g., orders on 15/10 and 20/10, and your minimum date is 16/10) then the row won't be included as the earliest purchase in the month is outside the filter range.
Also beware with both of these and what type of date or datetime field you are using - if you have datetimes rather than just dates, BETWEEN '20161001' AND '20180930' is not the same as >= '20161001' AND < '20181001'
Here is short query that achieves all you want (descriptions of methods used are inline):
declare #test table (
customer_id int,
order_purchase_timestamp date
)
-- some test data
insert into #test values
(1, '2016-09-04'),
(2, '2016-09-05'),
(3, '2016-09-05'),
(3, '2016-09-15'),
(1, '2016-10-04');
select
customer_id,
-- takes care of correct display of first_purchase
format(first_purchase, 'yyyy-MM') first_purchase,
-- used to get the difference in months
datediff(m, first_purchase, order_purchase_timestamp) age_by_month,
order_purchase_timestamp
from (
select
*,
-- window function used to find min value for given column within group
-- for each row
min(order_purchase_timestamp) over (partition by customer_id) first_purchase
from #test
) a

Microsoft SQL Server Time and Date query

I am working with a large data set that records all jobs that have run for an ETL tool. There are about 1500 jobs that run per day and a lot of them can run for less than 10 seconds.
Data
What I am trying to do is get some statistics for run times of these run times.
Ideally I wanted a run time for all jobs per day and was running this query:
SELECT
RUNDATE = CAST(runstart AS date),
TOTALRUNTIME = CAST(DATEADD(ms, SUM(CAST(DATEDIFF(ms, '00:00:00', CAST(RUNTIME AS time)) AS BIGINT)), '00:00:00') AS Time)
FROM
"CADIS"."vwSolutionHistory"
GROUP BY
CAST(runstart AS date)
However this returned some weird results.
For example one day showed a total run time of 23:56:27.4833333, and on average we only have jobs running about 11 - 12 hours a day so a total run time of 23 hours didn't make sense. So if anyone has ideas on how to make that work that would be fantastic
Anyway I moved on and grouped the results by name as well so am getting a sum of the total time per job per day
SELECT
RUNDATE = CAST(runstart AS date),
NAME,
TOTALRUNTIME = CONVERT(varchar(50), DATEADD(ms, SUM(DATEDIFF(ms, '00:00:00.000', RUNTIME)), '00:00:00.000'), 108)
FROM
"CADIS"."vwSolutionHistory"
WHERE
TIMESINCELASTENTRY IS NULL
GROUP BY
name, CAST(runstart AS date)
ORDER BY
2, 1
What I would like to do in addition to this is to (1) get the total time of each time per day (per above query) (2) get the maximum time a job took per day (3) Get the median time a job took per day (4) Get the minimum time a job took per day
However, I'm not sure what how to get all of those 4 things in one query.

T-SQL recursion, date shifting based on previous iteration

I have a data set that includes a customer, payment date, and the number of days they have paid for. I need to be calculate the coverage start/end dates that each payment is covering. This is difficult when a payment is made before the current coverage period ends.
The best way I've come up with to think about this would be a month to month cell phone plan where the customer may pay for a specified number of days at any point during a given month. The next covered period should always start the day after the previous covered period expires.
Here is the code sample using a temp table.
CREATE TABLE #Payments
(Customer_ID INTEGER,
Payment_Date DATE,
Days_Paid INTEGER);
INSERT INTO #Payments
VALUES (1,'2018-01-01',30);
INSERT INTO #Payments
VALUES (1,'2018-01-29',20);
INSERT INTO #Payments
VALUES (1,'2018-02-15',30);
INSERT INTO #Payments
VALUES (1,'2018-04-01',30);
I need to get the coverage start/end dates back.
The initial payment is made on 2018-01-01 and they paid for 30 days. That means they are covered until 2018-01-30 (Payment_Date + Paid_Days - 1 since the payment date is included as a covered day). However they made their next payment on 2018-01-29, so I need calculate the start date of the next coverage window, which in this case would be the previous Payment_Date + previous Paid_Days. In this case, coverage window 2 starts on 2018-02-01 and would extend through the 2018-02-19 since they only paid for 20 days on Payment_Date 2018-01-29.
The expected output is:
Customer_ID | Payment_Date | Days_Paid | Coverage_Start_Date | Coverage_End_Date
--------------------------------------------------------------------------------
1 | '2018-01-01'| 30 | '2018-01-01'| '2018-01-30'
1 | '2018-01-29'| 20 | '2018-01-31'| '2018-02-19'
1 | '2018-02-15'| 30 | '2018-02-20'| '2018-03-21'
1 | '2018-04-01'| 30 | '2018-04-01'| '2018-04-30'
Because the current record's coverage start date will depend of the previous record's coverage end date, I feel like this would be a good candidate for recursion, but I can't figure out how to do it.
I have a way to do this in a while loop, but I would like to complete it using a recursive CTE. I have also thought about simply adding up the Days_Paid and adding that to the first payment's start date, however this only works if a payment is made before the previous coverage has expired. In addition, I need to calculate the coverage start/end dates for each Payment_Date.
Finally, using LAG/LEAD functions doesn't appear to work because it does not consider the result of the previous iteration, only the current value of the previous record. Using LAG/LEAD, you get the correct answer for the 2nd payment record, but not the third.
Is there a way to do this with a recursive CTE?
NOTE: This is not a recursive solution, but it is set-based vs. your loop solution.
While trying to solve this recursively it hit me that this is essentially a "running totals" problem, and can be easily solved with window functions.
WITH runningTotal AS
(
SELECT p.*, SUM(Days_Paid) OVER(ORDER BY p.Payment_Date) AS runningTotalDays, MIN(Payment_Date) OVER(ORDER BY p.Payment_Date) startDate
FROM #Payments p
)
SELECT r.Customer_Id, r.Payment_Date,Days_Paid, COALESCE(DATEADD(DAY, LAG(runningTotalDays) OVER(ORDER BY r.Payment_Date) +1, startDate), startDate) AS Coverage_Start_Date, DATEADD(DAY, runningTotalDays, startDate) AS Coverage_End_Date
FROM runningTotal r
Each end date is the "running total" of all the previous Days_Paid added together. Using LAG to get the previous records end date+1 gets you the start date. The COALESCE is to handle the first record. For more than a single customer, you can PARTITION BY Customer_Id.
So of course, right after posting this I came across a similar question that was already answered.
Here's the link: Recursively retrieve LAG() value of previous record
Based on that solution, I was able construct the following solution to my own question.
The key here was adding the "prep_data" CTE which made the recursion problem much easier.
;WITH prep_data AS
(SELECT Customer_ID,
ROW_NUMBER() OVER (PARTITION BY Customer_ID ORDER BY Payment_Date) AS payment_seq_num,
Payment_Date,
Days_Paid,
Payment_Date as Coverage_Start_Date,
DATEADD(DAY,Days_Paid-1,Payment_Date) AS Coverage_End_Date
FROM #Payments),
recursion AS
(SELECT Customer_ID,
payment_seq_num,
Payment_Date,
Days_Paid,
Coverage_Start_Date,
Coverage_End_Date
FROM prep_data
WHERE payment_seq_num = 1
UNION ALL
SELECT r.Customer_ID,
p.payment_seq_num,
p.Payment_Date,
p.Days_Paid,
CASE WHEN r.Coverage_End_Date >= p.Payment_Date THEN DATEADD(DAY,1,r.Coverage_End_Date) ELSE p.Payment_Date END AS Coverage_Start_Date,
DATEADD(DAY,p.Days_Paid-1,CASE WHEN r.Coverage_End_Date >= p.Payment_Date THEN DATEADD(DAY,1,r.Coverage_End_Date) ELSE p.Payment_Date END) AS Coverage_End_Date
FROM recursion r
JOIN prep_data p ON r.payment_seq_num + 1 =p.payment_seq_num
)
SELECT Customer_ID,
Payment_Date,
Days_Paid,
Coverage_Start_Date,
Coverage_End_Date
FROM recursion
ORDER BY payment_seq_num;

Most efficient way to get booking data for time period?

I have a booking system that allows a user to book places for 30 min timeslots (e.g. 1pm, 1:30pm, 2pm etc...)
In the sql database I may have one booking for 10am, a booking for 1pm and two for 2pm. I am trying to display a view of all 30 min booking slots in between a date time range displaying number of current bookings for each slot.
I am not storing each slot explicitly as it's not very efficient. Is there a way to make sql return 'empty' timeslots in a single query? I don't want to create a timeslot array then query each timeslot individually for the total count of bookings.
I am using sql server and asp.net mvc6 as my technology base. Some suggestions on technique would be appreciated.
Thanks.
you need to build a 30 minute interval time range table and do left join with your table to get all time slots
This query generates 30 minute interval times starting from startDate , total 12 time slots are generated, you can modify it accordingly.
declare #startDate datetime ='2014-01-12 12:00:00'
;with cte(value,nextval,n)
as
(
select CONVERT(VARCHAR(5),#startDate,108) as value,
dateadd(minute, datediff(minute, 0, #startDate)+30, 0) as nextval, 1 as n
union all
select CONVERT(VARCHAR(5),cte.nextval,108) as value,
dateadd(minute, datediff(minute, 0, cte.nextval)+30, 0) as nextval, n+1
from cte
where n <=12
)
select * from cte
left join Table1
on cte.nextval = Table1.timeslotvalue

How to get analysis result faster using Partition?

I have a table in SQL Server 2012, which has these 2 columns:
Date, Amount
I want to get the summary of a month, i.e. for May 2013. And I also want to get the summary of last month, the same month last year, and average of past 12 month. I know I can use GROUP BY to get the data for each month, then get all the data I need. However, the table has so many rows, I want to make it faster.
One possibility is to use Partition By
SELECT DISTINCT YEAR(Date), MONTH(Date), SUM(Amount) OVER (Partiotion By YEAR(Date), MONTH(Date))
FROM myTable
However, how can I use this to get data like: last month, same month last year, and average of past 12 month?
Or, I need to use partition by to get monthly data, and then use ROWS to get them?
Any ideas?
Thanks
The key idea is to first aggregate the data in a subquery or CTE. Then you can express the conditions you want using window functions:
SELECT yr, mon, amount,
LAG(Amount) OVER (ORDER BY yr*100+mon) as LastMonth,
LAG(Amount, 12) OVER (ORDER BY yr*100+mon) as LastYearMonth,
AVG(Amount) OVER (ORDER BY yr*100 + mon RANGE BETWEEN 11 PRECEDING AND CURRENT ROW)
FROM (SELECT YEAR(Date) as yr, MONTH(Date) as mon, SUM(Amount) as Amount
FROM myTable
GROUP BY YEAR(Date), MONTH(Date)
) ym;

Resources