How do I dynamically generate dates between two dates in Snowflake? - snowflake-cloud-data-platform

I've been searching for a good generate_series analog in Snowflake but what I've found so far is a bit limiting in scope. Most of the examples I've seen use rowcount but I need something more dynamic than that.
I have these columns:
location_id, subscription_id, start_date, end_date
The datediff of the date columns is usually a year but there are many instances where it isn't so I need to account for that.
How do I generate a gapless date range between my start and end dates?
Thank you!

There are several ways to approach this, but here's the way I do it with SQL Generator function Datespine_Groups.
The reason I like to do it this way, is because its flexible enough that I can add weekly, hourly, or monthly intervals between the dates and reuse the code.
The parameter group bounds changes the way the join happens in a subtle way that allows you to control how the dates get filtered out:
global - every location_id, subscription_id combination will start on the same start_date
local - every location_id, subscription_id has their own start/end dates based on the first and last values in the date column
mixed - every location_id, subscription_id has their own start/end dates, but they all share the same end date
Rather than try and make it perfect in 1 query, I think it's probably easier to generate it with mixed and then filter out where the group_start_date occurs after the end_date of your original data.
Here's the SQL. At the very beginning you can either (1) find a way to dynamically generate the 3 parameters, or (2) hard code a ridiculous range that'll last your career and let the rest of the query filter them out :)
You can change month to another datepart, I only assumed you were looking for monthly.
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
NULL
) as INTERVAL_ID,
DATEADD(
'month',
(INTERVAL_ID - 1),
'2018-01-01T00:00' :: timestamp_ntz
) as SPINE_START,
DATEADD(
'month', INTERVAL_ID, '2018-01-01T00:00' :: timestamp_ntz
) as SPINE_END
FROM
TABLE (
GENERATOR(ROWCOUNT => 2192)
)
),
GROUPS AS (
SELECT
location_id,
subscription_id,
MIN(start_date) AS LOCAL_START,
MAX(start_date) AS LOCAL_END
FROM
My_First_Table
GROUP BY
location_id,
subscription_id
),
GROUP_SPINE AS (
SELECT
location_id,
subscription_id,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM
GROUPS G CROSS
JOIN LATERAL (
SELECT
SPINE_START,
SPINE_END
FROM
GLOBAL_SPINE S
WHERE
S.SPINE_START >= G.LOCAL_START
)
)
SELECT
G.location_id AS GROUP_BY_location_id,
G.subscription_id AS GROUP_BY_subscription_id,
GROUP_START,
GROUP_END,
T.*
FROM
GROUP_SPINE G
LEFT JOIN My_First_Table T ON start_date >= G.GROUP_START
AND start_date < G.GROUP_END
AND G.location_id = T.location_id
AND G.subscription_id = T.subscription_id

Related

Binding tables with fulltext search query?

I run into a problem. I have three tables: product, where is stored his price and name. Then table Query with atributtes like description of searched words and their frequency(so there are no duplicates). And table UsersQuery, where each of the searched words of the user are stored.
PRODUCT
id
price
name
QUERY
id
description_query
number_of_freq
USERSQUERY
id
query_id FK
user_id FK
timestamp
I have to calculate for each month in a given year and subsequent years (January 2018, February 2018,…), calculate the ratio between those search queries that contain the product name and those that do not. If the given ratio is not defined for the given month, the output should be NULL.
Do you guys know how it would be possible?
So far I just have this
select q.description_query,
to_char(uq.timestamp, 'YYYY-MM') as year_month
from usersquery as uq
join query as q ON q.id = uq.query_id;
But I dont really know how to bind table with products, just with his atributte name. Should I use some sort of fulltext search using tsvector?
-- table is case insensitive so use product,query, user_query. please refer manual 4.1 lexical structure.
demo
I hope I understand correctly. The number_of_freq refer to the time the query contain the product name. and if number_of_freqtext = 0 means that this query don't contain the product key word.
basically a generate_series to generate date series data(later for left or right join), count filter function to count the freq is 0.
final code:
WITH cte AS (
SELECT
to_char(querytimestamp, 'YYYY-MM') AS tochar1,
count(number_of_freq) AS count_all,
count(number_of_freq) FILTER (WHERE number_of_freq = 0) AS count_0
FROM
query
JOIN user_query uq ON query.query_id = uq.query_id
WHERE
querytimestamp >= '2021-01-01 00:00' at time zone 'UTC'
AND querytimestamp <= '2022-12-31 23:59' at time zone 'UTC'
GROUP BY
1
),
cte2 (
yearmonth
) AS (
SELECT
to_char(g, 'YYYY-MM')
FROM
generate_series('2021-01-01', '2022-12-31', interval '1 month') g
)
SELECT
yearmonth,
cte.*,
round(cte.count_0::numeric / count_all, 2)
FROM
cte
RIGHT JOIN cte2 ON cte.tochar1 = yearmonth;
updated demo
About count the frequency of the word. full text search won't help.
Since full text search will parse 'product.id' as 'product.id'.
You may need regexp split string functions.
refer count frequency demo to solve the words count frequency issue:

Snowflake: Dateadd only weekdays

Is it possible to add only weekdays in a date function?
dateadd(day, 10, business_date)
Instead of returning next 10 days, is it possible to retrieve next 10 weekdays?
Regards,
Sridar
There are some semi complicated functions out there in other languages for this that could be converted, but without knowing more, I'd generally recommend the method of creating a calendar table.
In that table, you can label dates as weekdays and then join and filter with that table.
This also lets you extend to holidays with an additional IsHoliday flag
Then you can join to lists of dates with queries like this
SELECT DateColumnValue, RANK() OVER(ORDER BY DATEKEY) RNK
FROM DIMDATE
WHERE DateColumnValue >= CURRENT_DATE
AND isWeekday = 1
AND isHoliday = 0
QUALIFY RNK <= 10

How do I query most recent date in a table that's been converted from BIGINT format?

I have a table that contains one or more entries for each user by date. The format of the date field is in BIGINT format. I’m able to convert the date into a readable format using “DATEADD(SS, CONVERT(BIGINT, Create_Date__c), '19700101')” however, I also need to retrieve only the most recent date for each user. Everything I’ve found indicates you can’t use the MAX function with a DATEADD function. Is there another command? I’m using SQL Server 17.9.1.
Those dates are unix dates...encoded as the number of seconds since 1/1/1970. You sure don't need to convert them to dates to figure out which are the most recent ones. You can select the most-recent dates and users as keys of a virtual table...and then join that to the original table:
select
dateadd( ss, orig.[date], '19700101' ) as realDate,
--> other stuff you need here...
from
(
select
[user],
max( Create_Date__c ) [date]
from
someTable
group by
[user]
) as recent
inner join
someTable orig
on
recent.[user] = orig.[user]
and
recent.[date] = orig.Create_Date__c
BTW, and if you're wondering, I put the [user] and [date] column names in brackets because they're reserved words.

Factoring public holidays in to a SQL code

Apologies if this is a simple one. I'm looking for some help with the following:
SELECT *
FROM (
SELECT TOP 7
RIGHT (CONVERT (VARCHAR, CompletedDate, 108), 8) AS Time,
WorkType
FROM Table
WHERE WorkType = 'WorkType1'
OR DATEPART (DW, CompletedDate) IN ('7','1')
AND WorkType = 'WorkType2'
ORDER BY CompletedDate DESC) Table
ORDER BY CompletedDate ASC
Multiple events run every day, and the above searches for the last one scheduled to run each day, and pulls the time from it for the past 7 days. This time marks the end of the day's events, and is the value I'm after.
Events run at a different order on weekends, so I search for a different WorkType. WorkType 1 is unique to weekdays. WorkType2 is run both at weekdays and weekends, however it is not the final event on a weekday, so I don't search for it then.
However, this kind of falls apart when public holidays such as bank holidays come into play, as they use the weekend timings. I still need to capture these times, but the above skips over them. If I were to remove or expand the DATEPART, I would end up with duplicate values for each day that don't mark the final job of the day.
What changes can I make to this to capture these lost holiday timings, without manually going through and checking every holiday? Is there a way that I can return a value for JobType2, if JobType1 does not appear on a day?
I suggest a materialized calendar table with one row per date along with the desired WorkType for that day. That will allow you to simply join on to the calendar table to determine the proper WorkType value without embedding the logic in the query itself.
With this table loaded with all dates for your reporting domain:
CREATE TABLE dbo.WorkTypeCalendar(
CalendarDate date NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
, WorkType varchar(10) NOT NULL
);
GO
The query can be refactored as below:
SELECT *
FROM ( SELECT TOP 7
RIGHT(CONVERT (varchar, CompletedDate, 108), 8) AS Time
, WorkType
FROM Table1 AS t
JOIN WorkTypeCalendar AS c ON t.WorkType = c.WorkType
AND t.CompletedDate >= c.CalendarDate
AND t.CompletedDate < DATEADD(DAY,
1,
c.CalendarDate)
ORDER BY CompletedDate DESC
) Table1
ORDER BY CompletedDate ASC
You also might consider making this a generalized utility calendar table. See http://www.dbdelta.com/calendar-table-and-datetime-functions/ for an complete example of such a table and script to load US holidays you can adjust for your needs and locale.

SQL Server Retrieving Recurring Appointments By Date

I'm working on a system to store appointments and recurring appointments. My schema looks like this
Appointment
-----------
ID
Start
End
Title
RecurringType
RecurringEnd
RecurringTypes
---------------
Id
Name
I've keeped the Recurring Types simple and only support
Week Days,
Weekly,
4 Weekly,
52 Weekly
If RecurringType is null then that appointment does not recur, RecurringEnd is also nullable and if its null but RecurringType is a value then it will recur indefinatly. I'm trying to write a stored procedure to return all appointments and their dates for a given date range.
I've got the stored procedure working for non recurring meetings but am struggling to work out the best way to return the recurrences this is what I have so far
ALTER PROCEDURE GetAppointments
(
#StartDate DATETIME,
#EndDate DATETIME
)
AS
SELECT
appointment.id,
appointment.title,
appointment.recurringType,
appointment.recurringEnd,
appointment.start,
appointment.[end]
FROM
mrm_booking
WHERE
(
Start >= #StartDate AND
[End] <= #EndDate
)
I now need to add in the where clauses to also pick up the recurrences and alter what is returned in the select to return the Start and End Dates for normal meetings and the calculated start/end dates for the recurrences.
Any pointers on the best way to handle this would be great. I'm using SQL Server 2005
you need to store the recurring dates as each individual row in the schedule. that is, you need to expand the recurring dates on the initial save. Without doing this it is impossible to (or extremely difficult) to expand them on the fly when you need to see them, check for conflicts, etc. this will make all appointments work the same, since they will all actually have a row in the table to load, etc. I would suggest that when a user specifies their recurring date, you make them pick an actual number of recurring occurrences. When you go to save that recurring appointment, expand them all out as individual rows in the table. You could use a FK to a parent appointment row and link them like a linked list:
Appointment
-----------
ID
Start
End
Title
RecurringParentID FK to ID
sample data:
ID .... RecurringParentID
1 .... null
2 .... 1
3 .... 2
4 .... 3
5 .... 4
if in the middle of the recurring appointments schedule run, say ID=3, they decide to cancel them, you can follow the chain and delete the remaining ID=3,4,5.
as for expanding the dates, you could use a CTE, numbers table, while loop, etc. if you need help doing that, just ask. the key is to save them as regular rows in the table so you don't need to expand them on the fly every time you need to display or evaluate them.
I ended up doing this by creating a temp table of everyday between the start and end date along with their respective day of the week. I limited the recurrence intervals to weekdays and a set amount of weeks and added where clauses like this
--Check Week Days Reoccurrence
(
mrm_booking.repeat_type_id = 1 AND
#ValidWeeklyDayOfWeeks.dow IN (1,2,3,4,5)
) OR
--Check Weekly Reoccurrence
(
mrm_booking.repeat_type_id = 2 AND
DATEPART(WEEKDAY, mrm_booking.start_date) = #ValidWeeklyDayOfWeeks.dow
) OR
--Check 4 Weekly Reoccurences
(
mrm_booking.repeat_type_id = 3 AND
DATEDIFF(d,#ValidWeeklyDayOfWeeks.[Date],mrm_booking.start_date) % (7*4) = 0
) OR
--Check 52 Weekly Reoccurences
(
mrm_booking.repeat_type_id = 4 AND
DATEDIFF(d,#ValidWeeklyDayOfWeeks.[Date],mrm_booking.start_date) % (7*52) = 0
)
In case your interested I built up a table of the days between the start and end date using this
INSERT INTO #ValidWeeklyDayOfWeeks
--Get Valid Reoccurence Dates For Week Day Reoccurences
SELECT
DATEADD(d, offset - 1, #StartDate) AS [Date],
DATEPART(WEEKDAY,DATEADD(d, offset - 1, #StartDate)) AS Dow
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY s1.id) AS offset
FROM syscolumns s1, syscolumns s2
) a WHERE offset <= DATEDIFF(d, #StartDate, DATEADD(d,1,#EndDate))
Its not very elegant and probably very specific to my needs but it does the job I needed it to do.

Resources