Generate a date range in Snowflake within a tabular UDF

Generate a date range in Snowflake within a tabular UDF - snowflake-cloud-data-platform

I'm trying to replace a PostGres function that generates a list of timestamps using the current timestamp that starts [X] [start datepart]s ago and ends with the current timestamp with an interval of [end datepart]s. For example, a user could ask for a series that starts five days ago and returns a timestamp for every hour.
I've found plenty of examples that do something like this using the GENERATOR function. This is how it is done now:
CREATE OR REPLACE FUNCTION fn_get_timestamps_in_range(grain VARCHAR, start_tsmp TIMESTAMP_TZ, maxrange NUMERIC(38,0))
RETURNS TABLE(out_tsmp TIMESTAMP_TZ)
LANGUAGE SQL
STRICT
AS
$$
SELECT CONVERT_TIMEZONE('UTC', CASE LOWER(grain)
WHEN 'min' THEN DATEADD('min', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'hour' THEN DATEADD('hour', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'day' THEN DATEADD('day', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'month' THEN DATEADD('month',ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'year' THEN DATEADD('year', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
END) FROM TABLE(GENERATOR(ROWCOUNT=>maxrange)) ORDER BY 1
$$
;
However, to use that function I have to provide maxrange -- i.e. know how many rows are going to be returned -- so I have to do a DATEDIFF([datepart], start, end). I have this in a function:
CREATE OR REPLACE FUNCTION fn_get_datediff(grain VARCHAR, start_tsmp TIMESTAMP_TZ, end_tsmp TIMESTAMP_TZ)
RETURNS NUMBER(18,0)
LANGUAGE SQL
STRICT
AS
$$
CASE LOWER(grain)
WHEN 'min' THEN DATEDIFF('min', start_tsmp, end_tsmp)
WHEN 'hour' THEN DATEDIFF('hour', start_tsmp, end_tsmp)
WHEN 'day' THEN DATEDIFF('day', start_tsmp, end_tsmp)
WHEN 'month' THEN DATEDIFF('month',start_tsmp, end_tsmp)
WHEN 'year' THEN DATEDIFF('year', start_tsmp, end_tsmp)
ELSE -1
END + 1::NUMERIC(18,0)
$$
;
Try as I may, the GENERATOR will not accept that result -- I keep getting "argument 1 to function GENERATOR needs to be constant" error. Even if the calc is performed in a calling UDF, the error still returns.
For example:
SELECT * FROM TABLE(fn_get_timestamps_in_range(
$end_grain
,fn_get_offset_start_tsmp($start_grain, 1 - $start_range, $time_zone)
,fn_get_datediff($end_grain
,etl.fn_get_offset_start_tsmp($start_grain, 1 - $start_range, $time_zone)
,etl.fn_normalize_date($end_grain, CURRENT_TIMESTAMP, $time_zone))
)
)
;
...returns the error:
SQL compilation error: argument 1 to function GENERATOR needs to be constant, found '(CAST(DATE_DIFFTIMESTAMPINMINUTES(DATE_ADDMINUTESTOTIMESTAMP(-19, ENSURE_NULLABLE('2022-01-05 21:01:00.000000000Ztz=1080')), ENSURE_NULLABLE('2022-01-05 21:01:00.000000000Ztz=1080')) AS NUMBER(18,0))) + 1'
If I replace the datediff call with a number, it works fine.
How can I get the GENERATOR function to accept the values I need?

Found the solution -- I set a static value for the GENERATOR and then put a QUALIFY statement on it to limit the values to the first maxrange returned. Sorry if I wasted anyone's time.
The resulting code looked like:
CREATE OR REPLACE FUNCTION fn_get_timestamps_in_range(grain VARCHAR, start_tsmp TIMESTAMP_TZ, end_tsmp TIMESTAMP_TZ)
RETURNS TABLE(out_tsmp TIMESTAMP_TZ)
LANGUAGE SQL
STRICT
AS
$$
SELECT CONVERT_TIMEZONE('UTC', CASE LOWER(grain)
WHEN 'min' THEN DATEADD('min', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'hour' THEN DATEADD('hour', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'day' THEN DATEADD('day', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'month' THEN DATEADD('month',ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
WHEN 'year' THEN DATEADD('year', ROW_NUMBER() OVER(ORDER BY NULL) - 1, start_tsmp)
END) AS tsmp
FROM TABLE(GENERATOR(ROWCOUNT=>10000))
QUALIFY ROW_NUMBER() OVER(ORDER BY NULL) < fn_get_datediff(grain, start_tsmp, end_tsmp) + 1
ORDER BY 1
$$
;

Related

Invalid length parameter passed to the LEFT or SUBSTRING function while trying to sort by alphanumeric value

I'm trying to sort a column that has alpha numeric values, I found this code on another post and when I execute it I'm getting an error.
This is my query:
SELECT
pmno, enrollno, membername, addr, photo,
CAST(insdate AS DATE) AS reg_date
FROM
dbo.Member
WHERE
CAST(insdate as DATE) < '2020-01-20'
AND court_name = 'City Court Unit'
ORDER BY
LEFT(pmno, PATINDEX('%[0-9]%', pmno) - 1), -- alphabetical sort
CONVERT(INT, SUBSTRING(pmno, PATINDEX('%[0-9]%', pmno), LEN(pmno))) -- numerical

The problem is that PATINDEX is returning 0 for some values.
You can use a combination of NULLIF and ISNULL to worka round that.
SELECT
pmno, enrollno, membername, addr, photo,
CAST(insdate AS DATE) AS reg_date
FROM
dbo.Member
WHERE
insdate < '2020-01-20'
AND court_name = 'City Court Unit'
ORDER BY
ISNULL(LEFT(pmno, NULLIF(PATINDEX('%[0-9]%', pmno), 0) - 1), pmno), -- alphabetical sort
CONVERT(INT, SUBSTRING(pmno, NULLIF(PATINDEX('%[0-9]%', pmno), 0), LEN(pmno)))
Side note: I strongly suggest you do not use CAST(insdate as DATE) < '2020-01-20' as a filter, as it will not use indexing properly. Instead use insdate < '2020-01-20'

Can't Seem to Insert CTE Into Table

My cte run find, and gives me the numbers I expect, but I can't seem to insert the results into a table. I did some research online, before posting, and the setup seems correct, based on what I've seen, but I must be missing some step somewhere, because this script doesn't work. Can someone see something that I can't see? I am on SQL Server 2008.
with cte as
(
select
*, rn = row_number() over (partition by Credit_Line_NO order by REVIEW_FREQUENCY)
from TBL_FBNK_LIMIT_HIST
)
(select CREDIT_LINE_NO
,LIMIT_CURRENCY
,(CAST(AVAIL_AMT AS DECIMAL(30,15)) * (CAST(SUBSTRING(CUSIP_NO,1,CHARINDEX('%',CUSIP_NO)-1) AS DECIMAL(30,15))/100))/(12/CAST(LEFT(SUBSTRING(REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY)),2) AS DECIMAL)) AS AMOUNT
,REVIEW_FREQUENCY
,CAST(LEFT(REVIEW_FREQUENCY, 8) AS DATE) AS STARTDATE
,CAST(EXPIRY_DATE AS DATE) AS EXPIRY_DATE
,CAST(round((DATEDIFF(MONTH,cast(LEFT(REVIEW_FREQUENCY,8) as DATE),CAST(EXPIRY_DATE AS DATE)))/cast(LEFT(SUBSTRING (REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY)),2) as decimal)+0.4,0) AS INTEGER) AS FREQUENCY
,CAST(DATEADD(MONTH, (rn-1)* LEFT((SUBSTRING(REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY))),2),LEFT(REVIEW_FREQUENCY, 8)) AS DATE) AS EFFECTIVESTARTDATE
FROM cte
WHERE AVAIL_AMT NOT LIKE '%]%'
AND CUSIP_NO IS NOT NULL
AND CUSIP_NO <> '0'
AND AVAIL_AMT <> '0'
AND AVAIL_AMT IS NOT NULL)
INSERT TBL_FBNK_LIMIT_HIST_TRANS_SPLIT (CREDIT_LINE_NO,LIMIT_CURRENCY,AMOUNT,REVIEW_FREQUENCY,START_DATE,EXPIRY_DATE,FREQUENCY,AsOfDate,EFFECTIVESTARTDATE)
Select CREDIT_LINE_NO,LIMIT_CURRENCY,AMOUNT,REVIEW_FREQUENCY,START_DATE,EXPIRY_DATE,FREQUENCY,AsOfDate,EFFECTIVESTARTDATE
From cte
Thanks!

You are not really using the cute in the insert. Try it like:
with cte as
(
select
*, rn = row_number() over (partition by Credit_Line_NO order by REVIEW_FREQUENCY)
from TBL_FBNK_LIMIT_HIST
),
cte2 as
(select CREDIT_LINE_NO
,LIMIT_CURRENCY
,(CAST(AVAIL_AMT AS DECIMAL(30,15)) * (CAST(SUBSTRING(CUSIP_NO,1,CHARINDEX('%',CUSIP_NO)-1) AS DECIMAL(30,15))/100))/(12/CAST(LEFT(SUBSTRING(REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY)),2) AS DECIMAL)) AS AMOUNT
,REVIEW_FREQUENCY
,CAST(LEFT(REVIEW_FREQUENCY, 8) AS DATE) AS STARTDATE
,CAST(EXPIRY_DATE AS DATE) AS EXPIRY_DATE
,CAST(round((DATEDIFF(MONTH,cast(LEFT(REVIEW_FREQUENCY,8) as DATE),CAST(EXPIRY_DATE AS DATE)))/cast(LEFT(SUBSTRING (REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY)),2) as decimal)+0.4,0) AS INTEGER) AS FREQUENCY
,CAST(DATEADD(MONTH, (rn-1)* LEFT((SUBSTRING(REVIEW_FREQUENCY, CHARINDEX('M',review_frequency)+1,LEN(REVIEW_FREQUENCY))),2),LEFT(REVIEW_FREQUENCY, 8)) AS DATE) AS EFFECTIVESTARTDATE
FROM cte
WHERE AVAIL_AMT NOT LIKE '%]%'
AND CUSIP_NO IS NOT NULL
AND CUSIP_NO <> '0'
AND AVAIL_AMT <> '0'
AND AVAIL_AMT IS NOT NULL)
INSERT TBL_FBNK_LIMIT_HIST_TRANS_SPLIT (CREDIT_LINE_NO,LIMIT_CURRENCY,AMOUNT,REVIEW_FREQUENCY,START_DATE,EXPIRY_DATE,FREQUENCY,AsOfDate,EFFECTIVESTARTDATE)
Select CREDIT_LINE_NO,LIMIT_CURRENCY,AMOUNT,REVIEW_FREQUENCY,START_DATE,EXPIRY_DATE,FREQUENCY,AsOfDate,EFFECTIVESTARTDATE
From cte2;
Code looks like redundant but at least it would work the way you think it should.

You can only use the CTE in the statement for which it is defined. In your case, you have the CTE definition, and a SELECT statement that reads the CTE.
Then, you have a totally separate statement which attempts to read the CTE again for the INSERT. This is not permitted, because the CTE does not exist in the second query's context. So, from the perspective of your INSERT statement, the CTE does not exist. I'm sure you're getting this message:
Msg 208, Level 16, State 1, Line [x] Invalid object name 'cte'.
Get rid of the SELECT statement and replace it with your INSERT.
Alternatively, if you must have the SELECT statement used in both the SELECT and INSERT statements, a CTE may not be appropriate for the use case, or you will need to include the CTE definition for both the SELECT and INSERT.

It should be working fine unless those columns are not in TBL_FBNK_LIMIT_HIST_TRANS_SPLIT or the data types do not match.
That select in the middle is not part of the insert.
INSERT TBL_FBNK_LIMIT_HIST_TRANS_SPLIT
(CREDIT_LINE_NO, LIMIT_CURRENCY, AMOUNT, REVIEW_FREQUENCY, START_DATE, EXPIRY_DATE, FREQUENCY, AsOfDate, EFFECTIVESTARTDATE)
Select CREDIT_LINE_NO, LIMIT_CURRENCY, AMOUNT, REVIEW_FREQUENCY, START_DATE, EXPIRY_DATE, FREQUENCY, AsOfDate, EFFECTIVESTARTDATE
From cte

SQL Server contiguous dates - summarizing multiple rows into contiguous start and end date rows without CTE's, loops,...s

Is it possible to write an sql query that will summarize rows with start and end dates into rows that have contiguous start and end dates?
The constraint is that it has to be regular sql, i.e. no CTE's, loops and the like as a third party tool is used that only allows an sql statement to start with Select.
e.g.:
ID StartDate EndDate
1001, Jan-1-2018, Jan-04-2018
1002, Jan-5-2018, Jan-13-2018
1003, Jan-14-2018, Jan-18-2018
1004, Jan-25-2018, Feb-05-2018
The required output needs to be:
Jan-1-2018, Jan-18-2018
Jan-25-2018, Feb-05-2018
Thank you

You can take advantage of both window functions and the use of a concept called gaps-and-islands. In your case, contiguous dates would be the island, and the the gaps are self explanatory.
I wrote the answer below in a verbose way to help make it clear what the query is doing, but it could most likely be written in a different way that is more concise. Please see my comments in the answer explaining what each step (sub-query) does.
--Determine Final output
select min(c.StartDate) as StartDate
, max(c.EndDate) as EndDate
from (
--Assign a number to each group of Contiguous Records
select b.ID
, b.StartDate
, b.EndDate
, b.EndDatePrev
, b.IslandBegin
, sum(b.IslandBegin) over (order by b.ID asc) as IslandNbr
from (
--Determine if its Contiguous (IslandBegin = 1, means its not Contiguous with previous record)
select a.ID
, a.StartDate
, a.EndDate
, a.EndDatePrev
, case when a.EndDatePrev is NULL then 1
when datediff(d, a.EndDatePrev, a.StartDate) > 1 then 1
else 0
end as IslandBegin
from (
--Determine Prev End Date
select tt.ID
, tt.StartDate
, tt.EndDate
, lag(tt.EndDate, 1, NULL) over (order by tt.ID asc) as EndDatePrev
from dbo.Table_Name as tt
) as a
) as b
) as c
group by c.IslandNbr
order by c.IslandNbr

I hope following SQL query can help you to identify gaps and covered dates for given case
I did not use a CTE expression of a dates table function, etc
On the other hand, I used a numbers table using master..spt_values to generate the dates table as the main table of a LEFT join
You can create a numbers table or a dates table if it does not fit to your requirements
In the query, to catch changes between borders I used SQL LAG() function which enables me to compare with previous value of a column in a sorted list
select
max(startdate) as startdate,
max(enddate) as enddate
from (
select
date,
case when exist = 1 then date else null end as startdate,
case when exist = 0 then dateadd(d,-1,date) else null end as enddate,
( row_number() over (order by date) + 1) / 2 as rn
from (
select date, exist, case when exist <> (lag(exist,1,'') over (order by date)) then 1 else 0 end as changed
from (
select
d.date,
case when exists (select * from Periods where d.date between startdate and enddate) then 1 else 0 end as exist
from (
SELECT dateadd(dd,number,'20180101') date
FROM master..spt_values
WHERE Type = 'P' and dateadd(dd,number,'20180101') <= '20180228'
) d
) cte
) tbl
where changed = 1
) dates
group by rn
Here is the result

Calculate same day start/end dates as 0 days if another occurrence already exists

I have a query where there are instances where a "phase" starts and ends on the same day - this is calculated as 1 day. If, however, another "phase" starts and ends on the same day against the same ref. no. and period no., then I'd like to calculate this as 0 days.
Example:
**Ref. Period. Phase StDt EndDt**
013 3 KAA 01/01/16 01/01/16 - This is one day
013 3 TAA 02/01/16 03/01/16 - this is 2 days
013 3 KAT 01/01/16 01/01/16 - **would like this to be counted as 0 day**
013 3 TTA 04/04/16 04/04/16 - this is one day
I would like this unique calculation to be done in the data grouped by Ref. And Period numbers. This is a tricky one....
Thanks

Try this.
I am assuming that you are using TSQl (Not sure a you have also tagged SQL.
;WITH cte_result(ID,Ref, Period,Phase,StDt,EndDt) AS
(
SELECT 1,'013' ,3,'KAA',CAST('01/01/16'AS DATETIME),CAST('01/01/16'AS DATETIME) UNION ALL
SELECT 2,'013' ,3,'TAA','01/02/16','01/03/16' UNION ALL
SELECT 3,'013' ,3,'KAT','01/01/16','01/01/16' UNION ALL
SELECT 4,'013' ,3,'TTA','04/04/16','04/04/16')
,cte_PreResult AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CAST(StDt AS DATE), CAST(EndDt AS DATE) ORDER BY ID) AS [Order],
Ref,
Period,
Phase,
StDt,
EndDt
FROM cte_result
)
SELECT Ref,
Period,
Phase,
StDt,
EndDt,
CASE
WHEN [Order] <> 1
THEN '0 Day(s)'
ELSE CAST(DATEDIFF(dd, StDt, EndDt) + 1 AS VARCHAR(10)) + ' Day(s)'
END AS Comment
FROM cte_PreResult
If there is no ID column then select some column to order by, probably Phase so replace ID with Phase as here ROW_NUMBER() OVER (PARTITION BY StDt,EndDt ORDER BY ID) AS [Order], if there is no candidate column to order by then try this
;WITH cte_result(ID,Ref, Period,Phase,StDt,EndDt) AS
(
SELECT 1,'013' ,3,'KAA',CAST('01/01/16'AS DATETIME),CAST('01/01/16'AS DATETIME) UNION ALL
SELECT 2,'013' ,3,'TAA','01/02/16','01/03/16' UNION ALL
SELECT 3,'013' ,3,'KAT','01/01/16','01/01/16' UNION ALL
SELECT 4,'013' ,3,'TTA','04/04/16','04/04/16')
,cte_PreResult AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CAST(StDt AS DATE), CAST(EndDt AS DATE) ORDER BY (SELECT NULL)) AS [Order],
Ref,
Period,
Phase,
StDt,
EndDt
FROM cte_result
)
SELECT Ref,
Period,
Phase,
StDt,
EndDt,
CASE
WHEN [Order] <> 1
THEN '0 Day(s)'
ELSE CAST(DATEDIFF(dd, StDt, EndDt) + 1 AS VARCHAR(10)) + ' Day(s)'
END AS Comment
FROM cte_PreResult

This expression should work on the SSRS side:
=IIF(Fields!StartDate.Value=Fields!EndDate.Value AND Fields!Phase.Value <> LOOKUPSET(Fields!StartDate.Value &"_" & Fields!EndDate.Value,Fields!StartDate.Value & "_" & Fields!EndDate.Value,Fields!Phase.Value,"DatasetName").GetValue(0),0,DATEDIFF("D",Fields!StartDate.Value,Fields!EndDate.Value)+1)
It will return a value of 1 for the first phase returned by the dataset. If the phase-date range combinations are not unique within the grouping, this will not work as written, but you should be able to modify accordingly.
Also, if the rows are sorted differently between SSRS and the dataset, it may not be the first row that appears that gets the 1.

The below did the trick! Basically, I'm using count aggregate to count the number of instances where phases start and end on the same day PER Ref and period. Then, for any where there are more than 1, I just use simple case statments to count the first one as 1 and any subsequent ones as 0. I'm creating the below as a subquery in the joins as a left outer join:
LEFT OUTER JOIN
(SELECT TOP (100) PERCENT Period, Ref,
CONVERT(date, PhaseStartDate) AS stdt, CONVERT(date, PhaseEndDate) AS enddt,
COUNT(*)
AS NoOfSameDayPhases,
MIN(PhaseSequence) AS FirstPhSeq
FROM Phases AS Phases_1
WHERE (CONVERT(date, PhaseStartDate) =
CONVERT(date, PhaseEndDate))
GROUP BY VoidPeriod, Ref, CONVERT(date,
PhaseStartDate), CONVERT(date, PhaseEndDate)) AS SameDayPH ON CONVERT(date,
PhaseEndDate) = SameDayPH.enddt AND CONVERT(date,
PhaseStartDate) = SameDayPH.stdt AND
VoidPeriod = SameDayPH.VoidPeriod AND SameDayPH.Ref =
VoidPhases.Ref

How can I order by count with pagination?

I have to migrate some SQL from PostgreSQL to SQL Server (2005+). On PostgreSQL i had:
select count(id) as count, date
from table
group by date
order by count
limit 10 offset 25
Now i need the same SQL but for SQL Server. I did it like below, but get error: Invalid column name 'count'. How to solve it ?
select * from (
select row_number() over (order by count) as row, count(id) as count, date
from table
group by date
) a where a.row >= 25 and a.row < 35

You can't reference an alias by name, at the same scope, except in an ending ORDER BY (it is an invalid reference inside of a windowing function at the same scope).
To get the exact same results, it may need to be extended to (nesting scope for clarity):
SELECT c, d FROM
(
SELECT c, d, ROW_NUMBER() OVER (ORDER BY c) AS row FROM
(
SELECT d = [date], c = COUNT(id) FROM dbo.table GROUP BY [date]
) AS x
) AS y WHERE row >= 25 AND row < 35;
This can be shortened a little bit as per mohan's answer.
SELECT c, d FROM
(
SELECT COUNT(id), [date], ROW_NUMBER() OVER (ORDER BY COUNT(id))
FROM dbo.table GROUP BY [date]
) AS y(c, d, row)
WHERE row >= 25 AND row < 35;
In SQL Server 2012, it's much easier with OFFSET / FETCH - closer to the syntax you're used to, but actually using ANSI-compatible syntax rather than proprietary voodoo.
SELECT c = COUNT(id), d = [date]
FROM dbo.table GROUP BY [date]
ORDER BY COUNT(id)
OFFSET 25 ROWS FETCH NEXT 10 ROWS ONLY;
I blogged about this functionality in 2010 (lots of good comments there too) and should probably invest some time doing some serious performance tests.
And I agree with #ajon - I hope your real tables, columns and queries don't abuse reserved words like this.

It works
DECLARE #startrow int=0,#endrow int=0
;with CTE AS (
select row_number() over ( order by count(id)) as row,count(id) AS count, date
from table
group by date
)
SELECT * FROM CTE
WHERE row between #startrow and #endrow

I think this will do it
select * from (
select row_number() over (order by id) as row, count(id) as count, date
from table
group by date
) a where a.row >= 25 and a.row < 35
Also, I don't know what version of SQL Server you are using but SQL Server 2012 has a new Paging feature

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Generate a date range in Snowflake within a tabular UDF - snowflake-cloud-data-platform

Related

Invalid length parameter passed to the LEFT or SUBSTRING function while trying to sort by alphanumeric value

Can't Seem to Insert CTE Into Table

SQL Server contiguous dates - summarizing multiple rows into contiguous start and end date rows without CTE's, loops,...s

Calculate same day start/end dates as 0 days if another occurrence already exists

How can I order by count with pagination?

Categories

Resources