Factoring public holidays in to a SQL code - sql-server

Apologies if this is a simple one. I'm looking for some help with the following:
SELECT *
FROM (
SELECT TOP 7
RIGHT (CONVERT (VARCHAR, CompletedDate, 108), 8) AS Time,
WorkType
FROM Table
WHERE WorkType = 'WorkType1'
OR DATEPART (DW, CompletedDate) IN ('7','1')
AND WorkType = 'WorkType2'
ORDER BY CompletedDate DESC) Table
ORDER BY CompletedDate ASC
Multiple events run every day, and the above searches for the last one scheduled to run each day, and pulls the time from it for the past 7 days. This time marks the end of the day's events, and is the value I'm after.
Events run at a different order on weekends, so I search for a different WorkType. WorkType 1 is unique to weekdays. WorkType2 is run both at weekdays and weekends, however it is not the final event on a weekday, so I don't search for it then.
However, this kind of falls apart when public holidays such as bank holidays come into play, as they use the weekend timings. I still need to capture these times, but the above skips over them. If I were to remove or expand the DATEPART, I would end up with duplicate values for each day that don't mark the final job of the day.
What changes can I make to this to capture these lost holiday timings, without manually going through and checking every holiday? Is there a way that I can return a value for JobType2, if JobType1 does not appear on a day?

I suggest a materialized calendar table with one row per date along with the desired WorkType for that day. That will allow you to simply join on to the calendar table to determine the proper WorkType value without embedding the logic in the query itself.
With this table loaded with all dates for your reporting domain:
CREATE TABLE dbo.WorkTypeCalendar(
CalendarDate date NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
, WorkType varchar(10) NOT NULL
);
GO
The query can be refactored as below:
SELECT *
FROM ( SELECT TOP 7
RIGHT(CONVERT (varchar, CompletedDate, 108), 8) AS Time
, WorkType
FROM Table1 AS t
JOIN WorkTypeCalendar AS c ON t.WorkType = c.WorkType
AND t.CompletedDate >= c.CalendarDate
AND t.CompletedDate < DATEADD(DAY,
1,
c.CalendarDate)
ORDER BY CompletedDate DESC
) Table1
ORDER BY CompletedDate ASC
You also might consider making this a generalized utility calendar table. See http://www.dbdelta.com/calendar-table-and-datetime-functions/ for an complete example of such a table and script to load US holidays you can adjust for your needs and locale.

Related

How to find difference between dates and find first purchase in an eCommerce database

I am using Microsoft SQL Server Management Studio. I am trying to measure the customer retention rate of an eCommerce site.
For this, I need four values:
customer_id
order_purchase_timestamp
age_by_month
first_purchase
The values of age_by_month and first_purchase are not in my database. I want to calculate them.
In my database, I have customer_id and order_purchase_timestamp.
The first_purchase should be the earliest instance of order_purchase_timestamp. I only want the month and year.
The age_by_month should be the difference of months from first_purchase to order_purchase_timestamp.
I only want to measure the retention of the customer for each month so if two purchases are made in the same month it shouldn't be shown.
the dates are between 2016-10-01 to 2018-09-30. it should be ordered by order_purchase_timestamp
An example
customer_id
order_purchase_timestamp
1
2016-09-04
2
2016-09-05
3
2016-09-05
3
2016-09-15
1
2016-10-04
to
customer_id
first_purchase
age_by_month
order_purchase_timestamp
1
2016-09
0
2016-09-04
2
2016-09
0
2016-09-05
3
2016-09
0
2016-09-05
1
2016-09
1
2016-10-04
What I have done
SELECT
customer_id, order_purchase_timestamp
FROM
orders
WHERE
(order_purchase_timestamp BETWEEN '2016-10-01' AND '2016-12-31')
OR (order_purchase_timestamp BETWEEN '2017-01-01' AND '2017-03-31')
OR (order_purchase_timestamp BETWEEN '2017-04-01' AND '2017-06-30')
OR (order_purchase_timestamp BETWEEN '2017-07-01' AND '2017-09-30')
OR (order_purchase_timestamp BETWEEN '2017-10-01' AND '2017-12-31')
OR (order_purchase_timestamp BETWEEN '2018-01-01' AND '2018-03-31')
OR (order_purchase_timestamp BETWEEN '2018-04-01' AND '2018-06-30')
OR (order_purchase_timestamp BETWEEN '2018-07-01' AND '2018-09-30')
ORDER BY
order_purchase_timestamp
Originally I was going to do it by quarters but I want to do it in months now.
The following approach is designed to be relatively easy to understand. There are other ways (e.g., windowed functions) that may be marginally more efficient; but this makes it easy to maintain at your current SQL skill level.
Note that the SQL commands below build on one another (so the answer is at the end). To follow along, here is a db<>fiddle with the working.
It's based around a simple query (which we'll use as a sub-query) that finds the first order_purchase_timestamp for each customer.
SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
The next thing is DATEDIFF to find the difference between 2 dates.
Then, you can use the above as a subquery to get the first date onto each row - then find the date difference e.g.,
SELECT orders.customer_id,
orders.order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, orders.order_purchase_timestamp) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
Note - DATEDIFF has a 'gotcha' that gets most people but is good for you - when comparing months, it ignores the day component e.g., if finding the difference in months, there is 0 difference in months between 1 Jan and 31 Jan. On the other hand, there will be a difference on 1 month between 31 Jan and 1 Feb. However, I think this is actually what you want!
The above, however, repeats when a customer has multiple purchases within the month (it has one row per purchase). Instead, we can GROUP BY to group by the month it's in, then only take the first purchase for that month.
A 'direct' approach to this would be to group on YEAR(orders.order_purchase_timestamp) AND MONTH(orders.order_purchase_timestamp). However, I use a little trick below - using EOMONTH which finds the last day of the month. EOMONTH returns the same date for any date in that month; therefore, we can group by that.
Finally, you can add the WHERE expression and ORDER BY to get the results you asked for (between the two dates)
SELECT orders.customer_id,
MIN(orders.order_purchase_timestamp) AS order_purchase_timestamp,
first_purchases.first_purchase_date,
DATEDIFF(month, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)) AS age_by_month
FROM orders
INNER JOIN
(SELECT customer_id, MIN(order_purchase_timestamp) AS first_purchase_date
FROM orders AS orders_ref
GROUP BY customer_id
) AS first_purchases ON orders.customer_id = first_purchases.customer_id
WHERE orders.order_purchase_timestamp BETWEEN '20161001' AND '20180930'
GROUP BY orders.customer_id, first_purchases.first_purchase_date, EOMONTH(orders.order_purchase_timestamp)
ORDER BY order_purchase_timestamp;
Results - note they are different from yours because you wanted the earliest date to be 1/10/2016.
customer_id order_purchase_timestamp first_purchase_date age_by_month
1 2016-10-04 00:00:00.000 2016-09-04 00:00:00.000 1
Edit: Because someone else will do it like this otherwise!
You can do this with a single read-through that will potentially run a little faster. It is also a bit shorter - but harder to understand imo.
The below uses windows functions to calculate both the customer's earliest purchase, and the earliest purchase for each month (and uses DISTINCT rather than a GROUP BY). With that, it just does the DATEDIFF to calculate the difference.
WITH monthly_orders AS
(SELECT DISTINCT orders.customer_id,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id, EOMONTH(orders.order_purchase_timestamp)) AS order_purchase_timestamp,
MIN(orders.order_purchase_timestamp) OVER (PARTITION BY orders.customer_id) AS first_purchase_date
FROM orders)
SELECT *, DATEDIFF(month, first_purchase_date, order_purchase_timestamp) AS age_by_month
FROM monthly_orders
WHERE order_purchase_timestamp BETWEEN '20161001' AND '20180930';
Note however this has one difference in the results. If you have 2 orders in a month, and your lowest date filter is between the to (e.g., orders on 15/10 and 20/10, and your minimum date is 16/10) then the row won't be included as the earliest purchase in the month is outside the filter range.
Also beware with both of these and what type of date or datetime field you are using - if you have datetimes rather than just dates, BETWEEN '20161001' AND '20180930' is not the same as >= '20161001' AND < '20181001'
Here is short query that achieves all you want (descriptions of methods used are inline):
declare #test table (
customer_id int,
order_purchase_timestamp date
)
-- some test data
insert into #test values
(1, '2016-09-04'),
(2, '2016-09-05'),
(3, '2016-09-05'),
(3, '2016-09-15'),
(1, '2016-10-04');
select
customer_id,
-- takes care of correct display of first_purchase
format(first_purchase, 'yyyy-MM') first_purchase,
-- used to get the difference in months
datediff(m, first_purchase, order_purchase_timestamp) age_by_month,
order_purchase_timestamp
from (
select
*,
-- window function used to find min value for given column within group
-- for each row
min(order_purchase_timestamp) over (partition by customer_id) first_purchase
from #test
) a

How do I dynamically generate dates between two dates in Snowflake?

I've been searching for a good generate_series analog in Snowflake but what I've found so far is a bit limiting in scope. Most of the examples I've seen use rowcount but I need something more dynamic than that.
I have these columns:
location_id, subscription_id, start_date, end_date
The datediff of the date columns is usually a year but there are many instances where it isn't so I need to account for that.
How do I generate a gapless date range between my start and end dates?
Thank you!
There are several ways to approach this, but here's the way I do it with SQL Generator function Datespine_Groups.
The reason I like to do it this way, is because its flexible enough that I can add weekly, hourly, or monthly intervals between the dates and reuse the code.
The parameter group bounds changes the way the join happens in a subtle way that allows you to control how the dates get filtered out:
global - every location_id, subscription_id combination will start on the same start_date
local - every location_id, subscription_id has their own start/end dates based on the first and last values in the date column
mixed - every location_id, subscription_id has their own start/end dates, but they all share the same end date
Rather than try and make it perfect in 1 query, I think it's probably easier to generate it with mixed and then filter out where the group_start_date occurs after the end_date of your original data.
Here's the SQL. At the very beginning you can either (1) find a way to dynamically generate the 3 parameters, or (2) hard code a ridiculous range that'll last your career and let the rest of the query filter them out :)
You can change month to another datepart, I only assumed you were looking for monthly.
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
NULL
) as INTERVAL_ID,
DATEADD(
'month',
(INTERVAL_ID - 1),
'2018-01-01T00:00' :: timestamp_ntz
) as SPINE_START,
DATEADD(
'month', INTERVAL_ID, '2018-01-01T00:00' :: timestamp_ntz
) as SPINE_END
FROM
TABLE (
GENERATOR(ROWCOUNT => 2192)
)
),
GROUPS AS (
SELECT
location_id,
subscription_id,
MIN(start_date) AS LOCAL_START,
MAX(start_date) AS LOCAL_END
FROM
My_First_Table
GROUP BY
location_id,
subscription_id
),
GROUP_SPINE AS (
SELECT
location_id,
subscription_id,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM
GROUPS G CROSS
JOIN LATERAL (
SELECT
SPINE_START,
SPINE_END
FROM
GLOBAL_SPINE S
WHERE
S.SPINE_START >= G.LOCAL_START
)
)
SELECT
G.location_id AS GROUP_BY_location_id,
G.subscription_id AS GROUP_BY_subscription_id,
GROUP_START,
GROUP_END,
T.*
FROM
GROUP_SPINE G
LEFT JOIN My_First_Table T ON start_date >= G.GROUP_START
AND start_date < G.GROUP_END
AND G.location_id = T.location_id
AND G.subscription_id = T.subscription_id

Partition and group by query

I have table containing the columns:
1. ClockifyId,
2. StartTime EndTime of every Task
3. Date.
4. Duration
The image is attached below
My goal is to write query to calculate the total duration of every user(which is ClockifyId) of every date.
As One User can have multiple task in one day, I wanted to sum duration of all those task. In short,
I wanted to have total task duration of every user(which is clockifyid) of every date.
enter image description here
There are a couple of details missing here, but this should get you close enough.
The first thing you need to do is convert the StartTime and EndTime to datetime fields if they aren't already. Doing a DATEDIFF on them allows you to figure out per record what the difference in minutes is. You can change the unit of measure as needed.
Once you do that, you use the SUM() which is an aggregate function. This makes it necessary to use the GROUP BY. You then group by which ever fields, in this case the ClockifyId and the StartTime as a date. You have to do it as a date without the datetime or you will get multiple rows back for a single Clockify record in a day.
SELECT
ClockifyId
, SUM(DATEDIFF(mi, CAST(StartTime AS datetime), CAST(EndTime AS datetime))) AS DurationInMinutes
, CAST(StartTime AS date)
FROM TableName
GROUP BY
ClockifyId
, CAST(StartTime AS date)
It's worth noting that this assumes there is always a valid StartTime and EndTime. This will throw some errors if those fields have nulls.

SQL Delete Rows with Duplicate Key Keeping Most Recent

Dearest Professionals,
I have a table that sometimes has rows created with duplicate Invoice #'s (EMP_ID). In these rows, there separate date (FILE_DATE) and time (FILE_TIME) columns (genius database design there). I need to remove the older rows of any duplicated EMP_ID's in this database, keeping the most recent date (from FILE_DATE) + time (from FILE_TIME).
Both FILE_DATE and FILE_TIME are date/time field in the database. The software we use writes to this table, adding the date of the invoice to the FILE_DATE column, with YYYY-MM-DD 00:00:00.000 (the zeros all hard coded). Then the FILE_TIME field has 1900-01-01 HH:mm:ss.SSS, the 1900-01-01 hard coded. (the time stamp comes from the time the row was written to the database)
So, long story short, I need to marry these two together, to get the DATE portion of FILE_DATE and the time portion of FILE_TIME, to get the most recent (IF duplicates exist of EMP_ID) and delete all duplicated that are not the most recent of the married FILE_DATE & FILE_TIME.
Here is a sample of what a Before & After situation would look like.
BEFORE:
AFTER:
Any and all help would be insanely appreciated.
Using some good old CTE "magic":
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EMP_ID
ORDER BY FILE_DATE DESC, FILE_TIME DESC) AS RN
FROM YourTable)
DELETE FROM CTE
WHERE RN > 1;
I think this can be accomplished using MAX and GROUP BY:
select B.EMP_ID
, B.File_date
, Max(B.File_Time) as MaxFileTime
, B.DESC_TEXT_1
from Before B
group
by B.EMP_ID
, B.File_date
, B.DESC_TEXT_1

SQL Server Retrieving Recurring Appointments By Date

I'm working on a system to store appointments and recurring appointments. My schema looks like this
Appointment
-----------
ID
Start
End
Title
RecurringType
RecurringEnd
RecurringTypes
---------------
Id
Name
I've keeped the Recurring Types simple and only support
Week Days,
Weekly,
4 Weekly,
52 Weekly
If RecurringType is null then that appointment does not recur, RecurringEnd is also nullable and if its null but RecurringType is a value then it will recur indefinatly. I'm trying to write a stored procedure to return all appointments and their dates for a given date range.
I've got the stored procedure working for non recurring meetings but am struggling to work out the best way to return the recurrences this is what I have so far
ALTER PROCEDURE GetAppointments
(
#StartDate DATETIME,
#EndDate DATETIME
)
AS
SELECT
appointment.id,
appointment.title,
appointment.recurringType,
appointment.recurringEnd,
appointment.start,
appointment.[end]
FROM
mrm_booking
WHERE
(
Start >= #StartDate AND
[End] <= #EndDate
)
I now need to add in the where clauses to also pick up the recurrences and alter what is returned in the select to return the Start and End Dates for normal meetings and the calculated start/end dates for the recurrences.
Any pointers on the best way to handle this would be great. I'm using SQL Server 2005
you need to store the recurring dates as each individual row in the schedule. that is, you need to expand the recurring dates on the initial save. Without doing this it is impossible to (or extremely difficult) to expand them on the fly when you need to see them, check for conflicts, etc. this will make all appointments work the same, since they will all actually have a row in the table to load, etc. I would suggest that when a user specifies their recurring date, you make them pick an actual number of recurring occurrences. When you go to save that recurring appointment, expand them all out as individual rows in the table. You could use a FK to a parent appointment row and link them like a linked list:
Appointment
-----------
ID
Start
End
Title
RecurringParentID FK to ID
sample data:
ID .... RecurringParentID
1 .... null
2 .... 1
3 .... 2
4 .... 3
5 .... 4
if in the middle of the recurring appointments schedule run, say ID=3, they decide to cancel them, you can follow the chain and delete the remaining ID=3,4,5.
as for expanding the dates, you could use a CTE, numbers table, while loop, etc. if you need help doing that, just ask. the key is to save them as regular rows in the table so you don't need to expand them on the fly every time you need to display or evaluate them.
I ended up doing this by creating a temp table of everyday between the start and end date along with their respective day of the week. I limited the recurrence intervals to weekdays and a set amount of weeks and added where clauses like this
--Check Week Days Reoccurrence
(
mrm_booking.repeat_type_id = 1 AND
#ValidWeeklyDayOfWeeks.dow IN (1,2,3,4,5)
) OR
--Check Weekly Reoccurrence
(
mrm_booking.repeat_type_id = 2 AND
DATEPART(WEEKDAY, mrm_booking.start_date) = #ValidWeeklyDayOfWeeks.dow
) OR
--Check 4 Weekly Reoccurences
(
mrm_booking.repeat_type_id = 3 AND
DATEDIFF(d,#ValidWeeklyDayOfWeeks.[Date],mrm_booking.start_date) % (7*4) = 0
) OR
--Check 52 Weekly Reoccurences
(
mrm_booking.repeat_type_id = 4 AND
DATEDIFF(d,#ValidWeeklyDayOfWeeks.[Date],mrm_booking.start_date) % (7*52) = 0
)
In case your interested I built up a table of the days between the start and end date using this
INSERT INTO #ValidWeeklyDayOfWeeks
--Get Valid Reoccurence Dates For Week Day Reoccurences
SELECT
DATEADD(d, offset - 1, #StartDate) AS [Date],
DATEPART(WEEKDAY,DATEADD(d, offset - 1, #StartDate)) AS Dow
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY s1.id) AS offset
FROM syscolumns s1, syscolumns s2
) a WHERE offset <= DATEDIFF(d, #StartDate, DATEADD(d,1,#EndDate))
Its not very elegant and probably very specific to my needs but it does the job I needed it to do.

Resources