Timeseries chart with aggregation over overlapping intervals

Timeseries chart with aggregation over overlapping intervals - google-data-studio

My dataset is of the following form:
Start Date End Date Count Contact
2019-01-20 2019-05-10 50 A
2019-03-05 2019-06-07 20 A
2019-03-05 2019-06-07 20 B
....
I want a timeseries chart where the X axis is months, and the Y axis is the total count.
E.g.
The entries would be
Month TotalCount Contact
Jan 50 A
Jan 0 B
Feb 50 A
Feb 0 B
Mar 70 A
Mar 20 B
Apr 70 A
Apr 20 B
May 70 A
May 20 B
Jun 20 A
Jun 20 B
Jul 0 A
Jul 0 B
...
How can I achieve this in Data Studio? The data is coming from bigquery.

You wouldn't be able to do that in Data Studio visually alone without manipulating your dataset first.
You could instead use a custom query to generate the a date range of month starts (https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#generate_date_array) and then join all rows from your dataset where the month start or between your start date and end date.
Example code for your example
WITH CTE AS (
SELECT date('2019-01-20') start_date, date('2019-05-10') end_date, 50 count, 'A' contact
UNION ALL SELECT date('2019-03-05'), date('2019-06-07'), 20, 'A'
UNION ALL SELECT date('2019-03-05'), date('2019-06-07'), 20, 'B'
), monthDates AS (
SELECT months FROM
UNNEST(GENERATE_DATE_ARRAY('2019-01-01', '2019-07-01', INTERVAL 1 MONTH)) months
), monthContact AS (
SELECT months, contact
FROM monthDates m
CROSS JOIN CTE
GROUP BY 1,2
)
SELECT months, ifnull(sum(count),0) count, m.contact
FROM monthContact m
LEFT JOIN CTE c ON m.months BETWEEN start_date AND end_date AND m.contact = c.contact
GROUP BY 1,3
ORDER BY 1,3

Related

Return Maximum Value along with day + time value grouped by month

Using MS-SQL 2012. Having a real puzzle trying to retrieve specific datafields from a large climatology dataset.
I have stripped this large raw data file down to a temp table called #max_temp which correctly pulls back the max value for each day along with the time it occurred and day/month value for reference:
monthid month day time current_temp
1 12 24 12:45 9.1
1 12 25 12:25 8.3
1 12 26 23:55 8.6
1 12 27 00:00 8.6
1 12 28 13:15 5.9
1 12 29 12:50 5
1 12 30 13:32 6.3
1 12 31 12:49 6.9
2 1 1 23:59 12
2 1 2 01:12 12.7
2 1 3 03:55 6.2
What I want to retrieve is an output grouped by monthID, so returning:
monthid month day time current_temp
1 12 24 12:45 9.1
2 1 9 20:04 15.1 <<*not shown in above sample*>>
From looking at other similar questions I have tried the following the code but not getting to the end solution or the query fails.
select *
from (select t.*, ROW_NUMBER () over (partition by t.monthid, t.time order by t.current_temp desc) as rn
from #max_temp t) x
where rn=1
order by monthid asc
or
select monthid, day, time, current_temp
from #max_temp
where current_temp= (select max(current_temp) from #max_temp group by MonthID, day, time)
Thanks in advance for your help,
Elliot.

Remove t.time from the partition by like so:
select *
from (
select t.*, ROW_NUMBER () over (partition by t.monthid order by t.current_temp desc) as rn
from #max_temp t
) x
where rn=1
order by monthid asc
Having time in the partition would give you the greatest value for current_temp for each monthid and time, but since you just want the greatest current_temp for each monthid, remove time from that expression.

SQL: GROUP BY where a column is unique

So I have this one giant table e.g.
PROD_IDGEOG_IDTIME_IDVALUE1
1 MT JAN 100
1 MT FEB 100
2 MT JAN 100
2 MT FEB 100
3 TT MARCH 100
And I want to receive Jan and Feb data only in the geography MT. Then sum the Value1's together where PROD_ID matches.
So the end result is:
PROD_IDGEOG_IDVALUE1
1 MT 200
2 MT 200
I have managed to get the data down to TIME_ID only using:
SELECT PROD_ID, GEOG_ID, TIME_ID, VALUE1 FROM database WHERE GEOG_ID = 'MT' AND TIME_ID IN ('JAN', 'FEB')
so I have :
PROD_IDGEOG_IDTIME_IDVALUE1
1 MT JAN 100
1 MT FEB 100
2 MT JAN 100
2 MT FEB 100
but now I am unsure how to use the group by function on PROD_ID since TIME_ID is unique.
Any thoughts?
Many thanks!

You can use the TIME_ID in the where clause without selecting it, meaning it doesn't have to go in the group by statement.
SELECT PROD_ID, GEOG_ID, SUM(VALUE1) AS TOTAL
FROM database
WHERE GEOG_ID = 'MT'
AND TIME_ID IN ('JAN', 'FEB')
GROUP BY PROD_ID, GEOG_ID

Using SQL join to retrieve set of data for each different value in a column

I am working on retrieving data from two SQL tables on SQL Server in which I am trying to get all rows from right table for each distinct value in a column in left table no matter if there is a match or not
For example I have two tables named products and Deals with following data
Products table
Id Product
1 ABC
2 XYZ
3 PQR
Deals Table
Id TradeDate Product Volume Price Delivery
56 2014-12-08 ABC 2500 -3.25 2015-01-01
57 2014-12-08 ABC 2500 -3.4 2015-02-01
63 2014-12-08 PQR 2500 -7 2015-01-01
64 2014-12-08 PQR 2500 -7 2015-01-01
I applied following query to the above tables
SELECT
FORMAT(a.Delivery,'MMMM yyyy') AS Delivery,
b.Product,COUNT(a.Id) AS Trades,
ROUND(((6.2898*SUM(a.Volume ))/DAY(EOMONTH(DATEADD(MONTH, DATEDIFF(MONTH, 0,a.Delivery), 0))))*0.001,4) AS BBLperDay,
SUM(a.Volume) AS M3,
ROUND(SUM(a.Volume*a.Price)/Sum(a.Volume),4) AS WeightedAverage
FROM Deals AS a right outer join Products AS b
ON a.Product=b.Product
AND CAST(a.TradeDate as date)='2014-12-08'
GROUP BY b.Product,CAST(a.TradeDate as date),
DATEADD(MONTH, DATEDIFF(MONTH, 0,a.Delivery),0), FORMAT(a.Delivery,'MMMM yyyy')
And I got following results
Delivery Product Trades BBLperDay M3 WeightedAverage
January 2015 ABC 1 0.5072 2500 -3.25
February 2015 ABC 1 0.5616 2500 -3.4
January 2015 PQR 2 1.0145 5000 -7
(null) XYZ 0 (null) (null) (null)
The above results are expected for the query but I am trying a way to get the results such that from the above results I want all rows in Products table for each distinct Delivery value by making Delivery value as default for every equivalent result and other fields be NULL as follows
Delivery Product Trades BBLperDay M3 WeightedAverage
January 2015 ABC 1 0.5072 2500 -3.25
January 2015 PQR 2 1.0145 5000 -7
January 2015 XYZ 0 (null) (null) (null)
February 2015 ABC 1 0.5616 2500 -3.4
February 2015 XYZ 0 (null) (null) (null)
February 2015 PQR 0 (null) (null) (null)
The above results can be explained as from actual results I have January 2015 Delivery for products ABC and PQR but the Products table as one more product XYZ which is missing for January 2015, so I added XYZ for January 2015 with Delivery as January 2015 and remaining NULL.
Same is the case with February 2015 Delivery it has only ABC so I added products XYZ and PQR to the results.
Please refer to http://sqlfiddle.com/#!6/db1508/3
May I know a good way to get this data?

I'd just build out that 'backbone' of delivery products first and put your original query against it with a left join. Shown here with CTEs.
WITH deliveryProducts AS
(
SELECT DISTINCT FORMAT(a.Delivery,'MMMM yyyy') AS Delivery, b.Product
FROM DEALS as a, PRODUCTS as b Where CAST(a.TradeDate as date)='2014-12-08'
)
, deliveryActuals AS
(
SELECT
FORMAT(a.Delivery,'MMMM yyyy') AS Delivery,
b.Product,ISNULL(COUNT(a.Id),0) AS Trades,
ROUND(((6.2898*SUM(a.Volume ))/DAY(EOMONTH(DATEADD(MONTH, DATEDIFF(MONTH, 0,a.Delivery), 0))))*0.001,4) AS BBLperDay,
SUM(a.Volume) AS M3,
ROUND(SUM(a.Volume*a.Price)/Sum(a.Volume),4) AS WeightedAverage
FROM
Deals AS a right outer join Products AS b
ON a.Product=b.Product
AND CAST(a.TradeDate as date)='2014-12-08'
GROUP BY
b.Product,CAST(a.TradeDate as date),
DATEADD(MONTH, DATEDIFF(MONTH, 0,a.Delivery),0), FORMAT(a.Delivery,'MMMM yyyy')
)
SELECT
dp.Delivery, dp.Product, trades, BBLperDay, M3, WeightedAverage
FROM
deliveryProducts dp
LEFT JOIN deliveryActuals da
on dp.Delivery = da.Delivery
and dp.product = da.Product
ORDER BY dp.Delivery
Here it is in your SQLFiddle

PIVOT Table Date & Year wise data display sql server 2005

this is my query
SELECT *
FROM (SELECT CurDate, YEAR(CurDate) AS orderyear, Warranty_Info
FROM eod_main where year(CurDate)>=2009 and year(CurDate)<=2011) AS D
PIVOT(SUM(Warranty_Info) FOR orderyear IN([2009],[2010],[2011])) AS P
the above query return data but CurDate return date it is is return multiple date for same month.
i want that SUM(Warranty_Info) should return only once for every month and year
output should look like
Month 2009 2010 2011 2012 2013
----- ---- ---- ---- ---- -----
1 10 0 11 32 98
2 20 10 21 11 44
3 0 224 33 77 31
some kind of problem is there in my query and that is why it is returning multiple data for same month like
please help me to have the right query. thanks

Instead of using CurDate as the full date with year, month and day, I would suggest using the Month() function to return the numeric value of each month. This will allow you to group by month instead of the full date:
SELECT dateMonth, [2009],[2010],[2011]
FROM
(
SELECT month(CurDate) dateMonth,
YEAR(CurDate) AS orderyear, Warranty_Info
FROM eod_main
where year(CurDate)>=2009
and year(CurDate)<=2011
) AS D
PIVOT
(
SUM(Warranty_Info)
FOR orderyear IN([2009],[2010],[2011])
) AS P;
See SQL Fiddle with Demo

Find the min and max dates between multiple sets of dates

Given the following set of data, I'm trying to determine how I can select the start and end dates of the combined date ranges, when they intersect with each other.
For instance, for PartNum 115678, I would want my final result set to display the date ranges 2012/01/01 - 2012/01/19 (rows 1, 2 and 4 combined since the date ranges intersect) and 2012/02/01 - 2012/03/28 (row 3 since this ones does not intersect with the range found previously).
For PartNum 213275, I would want to select the only row for that part, 2012/12/01 - 2013/01/01.
Edit:
I'm currently playing around with the following SQL statement, but it's not giving me exactly what I need.
with DistinctRanges as (
select distinct
ha1.PartNum "PartNum",
ha1.StartDt "StartDt",
ha2.EndDt "EndDt"
from dbo.HoldsAll ha1
inner join dbo.HoldsAll ha2
on ha1.PartNum = ha2.PartNum
where
ha1.StartDt <= ha2.EndDt
and ha2.StartDt <= ha1.EndDt
)
select
PartNum,
StartDt,
EndDt
from DistinctRanges
Here are the results of the query shown in the edit:

You're better off having a persisted Calendar table, but if you don't, the CTE below will create it ad-hoc. The TOP(36000) part is enough to give you 10 years worth of dates from the pivot ('20100101') on the same line.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table data (
partnum int,
startdt datetime,
enddt datetime,
age int
);
insert data select
12345, '20120101', '20120116', 15 union all select
12345, '20120115', '20120116', 1 union all select
12345, '20120201', '20120328', 56 union all select
12345, '20120113', '20120119', 6 union all select
88872, '20120201', '20130113', 43;
Query 1:
with Calendar(thedate) as (
select TOP(36600) dateadd(d,row_number() over (order by 1/0),'20100101')
from sys.columns a
cross join sys.columns b
cross join sys.columns c
), tmp as (
select partnum, thedate,
grouper = datediff(d, dense_rank() over (partition by partnum order by thedate), thedate)
from Calendar c
join data d on d.startdt <= c.thedate and c.thedate <= d.enddt
)
select partnum, min(thedate) startdt, max(thedate) enddt
from tmp
group by partnum, grouper
order by partnum, startdt
Results:
| PARTNUM | STARTDT | ENDDT |
------------------------------------------------------------------------------
| 12345 | January, 01 2012 00:00:00+0000 | January, 19 2012 00:00:00+0000 |
| 12345 | February, 01 2012 00:00:00+0000 | March, 28 2012 00:00:00+0000 |
| 88872 | February, 01 2012 00:00:00+0000 | January, 13 2013 00:00:00+0000 |