How to use the datebucket filter - snowflake-cloud-data-platform

Trying to use the :datebucket filter but it doesn't seem to work.
select date, address from database.table where address = 'xyz' group by :datebucket(date)
This returns the error that date isn't in the group by statement, but it is. If it add it separately to the group by statement, it just groups by the individual date instead of respecting the date bucket selection.
Not finding anything in the Snowflake documentation about how this filter is suppose to work, just that it exists.

In this site: https://www.webagesolutions.com/blog/querying-data-in-snowflake was example like this about databucket function
SELECT COUNT(ORDER_DATE) as COUNT_ORDER_DATE, ORDER_DATE
FROM ORDERS
GROUP BY :datebucket(ORDER_DATE), ORDER_DATE
ORDER BY COUNT_ORDER_DATE DESC;
So could your query work if it was modified like this:
SELECT
date,
address
FROM
database.table
WHERE
address = 'xyz'
GROUP BY :datebucket(date), date

Datebucket is truncating the date, to buckets. But you have selected the raw date.
This is like grouping by decade '60,'70,'80 of what great years, but want the actual year.
SELECT column1 as year,
truncate(year,-1) as decade
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29);
gives:
YEAR
DECADE
1
0
2
0
3
0
14
10
15
10
16
10
27
20
28
20
29
20
so if I try select
SELECT column1 as year
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29)
GROUP BY truncate(year,-1)
ORDER BY 1;
gives the error
Error: 'VALUES.COLUMN1' in select clause is neither an aggregate nor in the group by clause. (line 15)
So if we move the decade into the selection, it makes sense:
SELECT truncate(column1,-1) as decade
FROM VALUES (1),(2),(3),(14),(15),(16),(27),(28),(29)
GROUP BY decade
ORDER BY 1;
and we get the
DECADE
0
10
20
So the problem is not :datebucket(date) but the fact while :datebucket(date) and date are related, from the perspective of GROUPING they are unrelated.

I've been trying to use datebucket(date) and daterange, and I also needed the results in a Snowflake graph.
It was a bit trick, because the value returned by datebucket(date) is actually a truncated date based on the selected date part. For that, I had to convert it to a char, and it worked!
select
to_char(:datebucket(start_time), 'YYYY.MM.DD # HH24') as start_time_bucket,
sum(credits_used) as credits_used
from snowflake.account_usage.warehouse_metering_history wmh
where
start_time = :daterange
group by :datebucket(start_time)
And if you're an ACCOUNTADMIN, you can now use the query to get the total credits usage by date :)
Last, to answer the main query by Tony, the query should be:
select date, address
from database.table
where address = 'xyz'
group by :datebucket(date), date, address
// or
select :datebucket(date), address
from database.table
where address = 'xyz'
group by :datebucket(date), address

Try adding the :datebucket(date) in the select part as well (not only in group by). Also, you will probably need an aggregate function for the field address (for example any_value(address):
select :datebucket(date), any_value(address)
from database.table
where address = 'xyz'
group by :datebucket(date)

Related

trying to break down results of SQL query to show data for each month

I'm very new to SQL and have a problem I can't figure out.
I'm trying to replace an excel spreadsheet and turn it into a PowerBi report. Currently our team runs the following query to get the amount of active users every month and types it into an excel sheet which then graphs the number of users each month showing the increase. Since I don't want to manually input data each month my goal is to break down this query to give the current number of users in each month and add to that every month.
Desired result would look something like this
dateCreated # of Users
----------------------
2008-10 295
2008-11 355
2008-12 470
2009-01 522
I was able to break it down enough to give me the amount created each month, but that doesn't give me the total amount each month. This is the query that I used and a sample of the results I got.
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') AS 'dateCreated',
COUNT(s.UserId) AS "# of Users"
FROM
ER.dbo.ssUser s,
ER.dbo.ssUserAddress USERADDR,
ER.dbo.ssAddress ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
GROUP BY
FORMAT(USERADDR.DateCreated, 'yyyy-MM')
result sample:
dateCreated # of Users
2008-10 295
2008-11 41
2008-12 22
2009-01 19
This is almost there, but I need a running total. I've tried a lot of different things including SUM, SUM OVER, COUNT OVER etc. My boss suggested a while loop. I can't get that to work either and everything I've read says that should be the last resort. Here is one example of my failed attempts
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') as 'dateCreated',
COUNT(s.UserId)
OVER(
PARTITION BY Month(USERADDR.DateCreated)
GROUP BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')
)
AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
--original query which gives total number of users right now.
SELECT
count(s.UserId) AS "# of Users"
FROM
ER.dbo.User s,
ER.dbo.UserAddress USERADDR,
ER.dbo.Address ADDRESS
WHERE
s.UserId = USERADDR.UserId
AND USERADDR.AddressId = ADDRESS.AddressId
AND Isdefault = 1
AND Type = 'soldto'
You can do a window sum() on the aggregated count of users per month, like so:
SELECT
FORMAT(USERADDR.DateCreated, 'yyyy-MM') [dateCreated],
SUM(COUNT(s.UserId)) OVER(ORDER BY FORMAT(USERADDR.DateCreated, 'yyyy-MM')) [# of Users]
FROM
ER.dbo.ssUser s
INNER JOIN ER.dbo.ssUserAddress USERADDR
ON s.UserId = USERADDR.UserId,
INNER JOIN ER.dbo.ssAddress ADDRESS
ON USERADDR.AddressId = ADDRESS.AddressId
WHERE Isdefault = 1 AND Type = 'soldto'
group by FORMAT(USERADDR.DateCreated, 'yyyy-MM')
Notes:
always prefer proper, explicit join syntax (with the ON keyword) over implicit, old-school joins, who were deprecated long time ago - I modified your query accordingly
SQLServer uses square brackets for identifiers - you should avoid single quotes, as they are generally used for litteral strings
you have unqualified column names in the WHERE clause: always qualify column names in your query, so it is easy to understand to which table they belong

T-SQL find all records in a group with a max last modified date beyond a specific threshold

I have a Database table that has all the information I need arranged like so:
Inventory_ID | Dealer_ID | LastModifiedDate
Each Dealer_ID is attached to multiple Inventory_ID's. What I need is a query that calculates the Max Value LastModifiedDate for each dealer ID and then gives me a list of all the Dealer_ID's that have a last modified date beyond the last 30 days.
Getting The max last modified date for each Dealer_ID is simple, of course:
Select Dealer_ID, Max(LastModifiedDate)as MostRecentUpdate
from Inventory group by Dealer_ID order by MAX(LastModifiedDate)
The condition for records older than 30 day is also fairly simple:
LastModifiedDate < getdate() - 30
Somehow, I just can't figure out a way to combine the two that works properly.
Use HAVING:
Select Dealer_ID, Max(LastModifiedDate)as MostRecentUpdate
from Inventory
group by Dealer_ID
having LastModifiedDate < getdate() - 30
order by MAX(LastModifiedDate)
Check this query:
Select DT.DealerID, DT.MostRecentUpdate
(Select DealerID, Max(LastModifiedDate)as MostRecentUpdate
From YourTable
Group BY DealerID) DT
where DT.MostRecentUpdate < GETDATE() - 30

PostgreSQL Crosstab - variable number of columns

A common beef I get when trying to evangelize the benefits of learning freehand SQL to MS Access users is the complexity of creating the effects of a crosstab query in the manner Access does it. I realize that strictly speaking, in SQL it doesn't work that way -- the reason it's possible in Access is because it's handling the rendering the of the data.
Specifically, when I have a table with entities, dates and quantities, it's frequent that we want to see a single entity on one line with the dates represented as columns:
This:
entity date qty
------ -------- ---
278700-002 1/1/2016 5
278700-002 2/1/2016 3
278700-002 2/1/2016 8
278700-002 3/1/2016 1
278700-003 2/1/2016 12
Becomes this:
Entity 1/1/16 2/1/16 3/1/16
---------- ------ ------ ------
278700-002 5 11 1
278700-003 12
That said, the common way we've approached this is something similar to this:
with vals as (
select
entity,
case when order_date = '2016-01-01' then qty else 0 end as q16_01,
case when order_date = '2016-02-01' then qty else 0 end as q16_02,
case when order_date = '2016-03-01' then qty else 0 end as q16_02
from mydata
)
select
entity, sum (q16_01) as q16_01, sum (q16_02) as q16_02, sum (q16_03) as q16_03
from vals
group by entity
This is radically oversimplified, but I believe most people will get my meaning.
The main problem with this is not the limit on the number of columns -- the data is typically bounded, and I can make due with a fixed number of date columns -- 36 months, or whatever, depending on the context of the data. My issue is the fact that I have to change the dates every month to make this work.
I had an idea that I could leverage arrays to dynamically assign the quantity to the index of the array, based on the month away from the current date. In this manner, my data would end up looking like this:
Entity Values
---------- ------
278700-002 {5,11,1}
278700-003 {0,12,0}
This would be quite acceptable, as I could manage the rendering of the actual columns within whatever rendering tool I was using (Excel, for example).
The problem is I'm stuck... how do I get from my data to this. If this were Perl, I would loop through the data and do something like this:
foreach my $ref (#data) {
my ($entity, $month_offset, $qty) = #$ref;
$values{$entity}->[$month_offset] += $qty;
}
By this isn't Perl... so far, this is what I have, and now I'm at a mental impasse.
with offset as (
select
entity, order_date, qty,
(extract (year from order_date ) - 2015) * 12 +
extract (month from order_date ) - 9 as month_offset,
array[]::integer[] as values
from mydata
)
select
prod_id, playgrd_dte, -- oh my... how do I load into my array?
from fcst
The "2015" and the "9" are not really hard-coded -- I put them there for simplicity sake for this example.
Also, if my approach or my assumptions are totally off, I trust someone will set me straight.
As with all things imaginable and unimaginable, there is a way to do this with PostgreSQL. It looks like this:
WITH cte AS (
WITH minmax AS (
SELECT min(extract(month from order_date))::int,
max(extract(month from order_date))::int
FROM mytable
)
SELECT entity, mon, 0 AS qty
FROM (SELECT DISTINCT entity FROM mytable) entities,
(SELECT generate_series(min, max) AS mon FROM minmax) allmonths
UNION
SELECT entity, extract(month from order_date)::int, qty FROM mytable
)
SELECT entity, array_agg(sum) AS values
FROM (
SELECT entity, mon, sum(qty) FROM cte
GROUP BY 1, 2) sub
GROUP BY 1
ORDER BY 1;
A few words of explanation:
The standard way to produce an array inside a SQL statement is to use the array_agg() function. Your problem is that you have months without data and then array_agg() happily produces nothing, leaving you with arrays of unequal length and no information on where in the time period the data comes from. You can solve this by adding 0's for every combination of 'entity' and the months in the period of interest. That is what this snippet of code does:
SELECT entity, mon, 0 AS qty
FROM (SELECT DISTINCT entity FROM mytable) entities,
(SELECT generate_series(min, max) AS mon FROM minmax) allmonths
All those 0's are UNIONed to the actual data from 'mytable' and then (in the main query) you can first sum up the quantities by entity and month and subsequently aggregate those sums into an array for each entity. Since it is a double aggregation you need the sub-query. (You could also sum the quantities in the UNION but then you would also need a sub-query because UNIONs don't allow aggregation.)
The minmax CTE can be adjusted to include the year as well (your sample data doesn't need it). Do note that the actual min and max values are immaterial to the index in the array: if min is 743 it will still occupy the first position in the array; those values are only used for GROUPing, not indexing.
SQLFiddle
For ease of use you could wrap this query up in a SQL language function with parameters for the starting and ending month. Adjust the minmax CTE to produce appropriate min and max values for the generate_series() call and in the UNION filter the rows from 'mytable' to be considered.

multiple count rows based on date ranges (access db)

I can get a single count row from a specified date range like this:
SELECT table.[EVENT NAME], Count(*) AS [Count]
FROM table
WHERE [EVENT]='alphabetical' And table.DATE>=#11/20/2010# And (table.DATE)<=#11/26/2010#
GROUP BY table.[EVENT NAME];
but how could I add multiple rows with different date ranges?
[EVENT NAME],[DATE 11/20-11/26],[DATE 11/27-12/3], etc...
EDIT
the data would look something like this
event1;1/11/2010
event1;1/11/2010
event2;1/11/2010
event2;1/11/2010
event2;1/11/2010
event3;1/11/2010
event1;1/12/2010
event1;1/12/2010
event2;1/12/2010
event2;1/12/2010
event4;1/12/2010
event4;1/12/2010
etc.
and would like something like this (preferably with more columns) :
event1;2;2
event2;3;2
event3;1;0
event4;0;2
You'd use a group by clause and group by the date.
You didn't provide example records with expected results, that helps us help you :).
In other words post more information..
But from what I can tell you want a count based on a date range.
So if you had 1/1/2010 with 10 rows
and 1/2/2010 with 20 referenced rows
and 1/3/2010 with 6 reference rows...you'd want output like this:
1/1/2010 10
1/2/2010 20
1/3/2010 6
So SELECT COUNT(*), MyDate FROM MyTable GROUP BY MyDate
To answer your question about a date range, think of how group by works, it works by grouping a set of data by combining all sets that match a criteria. So when you say group by date it groups by a single date. You want a date range, so each row should know about or understand a range (Start to End). So you need to include these columns in each of your rows by generating them via SQL.
Edit
For instance
SELECT Count(*), DATEADD(day, -10, GetDate()) AS StartDate, DATEADD(day, 10, GetDate()) AS EndDate
FROM MyTable GROUP BY StartDate, EndDate
Access has similiar functions to add days to dates so look that up for MS Access. Then just generate a start and end date for each column.

How do i find the total number of records created on a given day using T-SQL?

I need to find out the total number of records that were created on a given day.
e.g.
ID CreatedDate
1 17/07/2009
2 12/07/2009
3 17/07/2009
4 05/07/2009
5 12/07/2009
6 17/07/2009
Output:
3 Records were created on 17/07/2009
2 Records were created on 12/07/2009
1 Record was created on 05/07/2009
EDIT
Upon testing the second suggestion made by Chris Roberts of including the formatting in the SQL i get this error:
Syntax error converting the varchar value ' Records were created on ' to a column of data type int.
Does anyone know how to rework the solution so that it works?
You should be able to get the data you're after with the following SQL...
SELECT COUNT(ID), CreatedDate
FROM MyTable
GROUP BY CreatedDate
Or - if you want to do the formatting in the SQL, too...
SELECT CONVERT(varchar, COUNT(ID)) + ' Records were created on ' + CONVERT(varchar, CreatedDate)
FROM MyTable
GROUP BY CreatedDate
Good luck!
Is the column actually a timestamp? In which case you will need to apply a function to remove the time component, e.g.:
SELECT COUNT(*), date(CreatedDate) FROM MyTable GROUP BY date(CreatedDate)
I don't know what the function is in T-SQL, it's date() in MySQL and trunc() in Oracle. You may even have to do a to_string and remove the end of the string and group by that if you lack this.
Hope this helps.
select count(*), CreatedDate from table group by CreatedDate order by count(*) DESC

Resources