How can I implement both hourly and weekly seasonality in Fable - R Package using season() function? - forecasting

I have to create hourly forecasts for 2000 different time series. And I have strong hourly and weekly seasonality in my series. To deal with hourly seasonality I used season("day") option. However, I suppose season("week") would create 168 weekly dummies and that would be problem on computional problem.
Do you know a quick way to create dayofweek dummies using tsibble or fabletools packages.
ts_forecast1 <- train%>% filter(store_number==288) %>% collect()%>%
mutate(store_number = factor(store_number)) %>% group_by(store_number) %>%
filter(sales!=0) %>% tsibble::fill_gaps(sales=100) %>%
fabletools::model(Arima = ARIMA(log(sales) ~ season("day") +fourier("week", K = 8)))

Your code already contains the answer.
season("day") will create 23 dummy variables since there are 24 hours in a day. season("week") will create 167 dummy variables for the 168 hours in a week. To use fewer coefficients, replace season() with fourier() and use K to control the number of coefficients (equal to twice K).

Related

Sheets header row ARRAYFORMULA to look up rate based on the job's turnaround AND date received within a range of dates

I've got a Google Sheets workbook with two sheets: Jobs
A
B
C
D
E
1
Turnaround
Received
Rate
Pages
Total
2
Standard
12/2/2021
$0.40
204
$81.60
3
Rush
12/9/2021
$0.60
79
$47.40
4
Rush
12/29/2021
$0.60
24
$14.40
5
Standard
1/1/2022
$0.45
81
$36.45
6
Standard
1/2/2022
$0.45
137
$61.65
7
Standard
1/5/2022
$0.45
95
$42.75
8
Standard
1/15/2022
$0.45
162
$72.90
Rates
A
B
C
D
1
Turnaround
Base Rate
Start Date
End Date
2
Standard
$0.40
9/1/2021
12/31/2021
3
Rush
$0.60
8/17/2018
6/10/2022
4
Expedited
$0.80
8/17/2018
6/10/2022
5
Daily
$1.00
8/17/2018
6/10/2022
6
Standard
$0.45
1/1/2022
6/10/2022
I'm trying to use an ARRAYFORMULA in Jobs!C1 to look up the value in Rates!B:B where the Turnaround in Jobs!A:A matches the Turnaround in Rates!A:A and the Date Received in Jobs!B:B falls on or between the Start Date in Rates!C:C and End Date in Rates!D:D.
The idea is that rates may change over time, but the job totals will still calculate using the correct rate at the time each job came in.
I know I can't use SUMIFS with ARRAYFORMULA, so I tried using QUERY, but this only populates the rate for the first job.
={"Rate";
ARRAYFORMULA(QUERY(Rates!A:D,
"select B where A contains '"&Jobs!A2:A
&"' and C < date'"&TEXT(Jobs!B2:B, "YYYY-MM-DD")
&"' and D > date'"&TEXT(Jobs!B2:B, "YYYY-MM-DD")&"'",0))}
I'm okay with adding helper columns if needed. I'm trying to avoid having to manually fill the formula down the column as jobs are added.
Here is a link to the workbook:
Job Rate Lookup By Turnaround + Date Range
I appreciate any help on this.
try:
={"Rate"; ARRAYFORMULA(IFNA(VLOOKUP(A2:A&B2:B, SORT({
FILTER(Rates!A2:A, Rates!A2:A<>"")&Rates!C2:C, Rates!B2:B}, Rates!C2:C, 1, Rates!A2:A, 1), 2, 1)))}
When using ARRAYFORMULA you won't be able to use QUERY in order to get the whole array of values as it will only return the first value that is found.
I created a formula that matches the value using VLOOKUP however I had to modify the name in Jobs from Standard to Standard 2.
This is the formula:
=IFERROR(ARRAYFORMULA(VLOOKUP(A2:A,Rates!A2:D6,2,0)))
These are the results:

Monthly trend comparison using tables in Google Data Studio

I'm quite new to GDS and I've been experimenting with the comparison date range to see the increase in % between current and previous month
I've managed to get a slight result but it's not showing the correct % increase which I have manually calculated to confirm.
The values are
No of Reviews (calculation)
July - 379 reviews / 314 positive reviews = 82.85%
August - 480 reviews / 458 positive reviews = 95.42%
Manually Calculated Difference = 12.57%
GDS Comparison Difference = 15.2%
The date column itself is formatted as "YYYYMMDD" and I've tried the comparison and calculation field options on the metric but to no avail
It feels like I am getting a comparative % rather than a direct increase
Any help/guidance would be greatly appreciated as I have tried the GDS forum several times but there is very little activity on there
Thanks so much
Dan
It's not ideal but to do this in GDS you need to create a new column in Google Sheets which is literally the values you need from the previous month. I.e.
date |reviews|pos reviews|prev month reviews|prev month pos reviews|
2019-08-01|480 |458 |379 | 314 |
You can then create 2 calculated fields for this month % positive and last month % positive and a 3rd for the difference.
There are other wyas you can try to do this but they get a bit messy and manual so that's probably a good starting point. GDS is great in some ways (cost being one!) but a little limited in others!
On the style on your table, under show compare, check Show Absolute Change to get the exact difference.

Calculating all possible "time difference" combinations in R

I'm using camera trapping data which contains two columns:
"datetime" of when the photo was taken
"species" species that appears in the photo
I want to calculate time difference between all possible pairs of species in R.
I have used difftime and diff functions in R but the result obtained isn't what I aim, as R is calculating time between "datetime2-datetime1", "datetime3-datetime2", "datetime4-datetime3", etc.
An example of my data is:
datetime (POSIXct format): "2018-10-06 08:39:00", "2018-10-07 04:09:00", "2018-10-14 00:47:00"
species: "deer", "horse", "fox"
If I use diff function:
diff(datetime)
Time differences in hours
[1] 19.5000 164.6333 #this shows time between first and second and second and third datetimes.
#
I have also tried:
base_time <- datetime[1]
later_times <- datetime[2:3]
later_times - base_time
diff(later_times)
This option combines all possible datetimes but it doesnt make sense if my data set has more than 3 rows...
As I need to calculate time difference between all photos, this should be:
"datetime2-datetime1", "datetime3-datetime1", "datetime4-datetime1",
"datetime3-datetime2", "datetime4-datetime3", etc.
I'm still learning R, so any help would be greatly appreciated!

How to modify and manipulate a dataset in R? (new to R user here)

I have a dataset already imported into R that has 12 variables but can't seem to find much information about how to filter my dataset by each variable.
For example, one of these variables is "Sex", which has two values, "M" and "F". I'm interested in the sub-datasets that filters down the original dataset with both sexes, to only Males and Females.
Another example is Birth Year: Birth years in the data will range from 1800 to 2007, but I'm interested in birth years that are more recent so (Birth Year > 1990).
What's a simple and easy way to do this? Is it similar to SAS (which is just a few if statements)?
I have recieved the solution to my problem via a professor. Here's the code that helps with this, you need to install the 'dplyer' package in R.
install.packages("dplyr")
library(dplyr)
modified_dataset <- tbl_df(dataset)
Example of "filtering", this one only asks for the Male dataset, instead of the entire dataset
filter(modified_dataset, Sex == 'M')
select(filter(modified_dataset, Sex == 'M'), Name, etc)
only_Male <- modified_dataset %>% filter(Sex == 'M') %>% select(Name, Fed)
This format gets you a new dataset based off of the conditions you ask for.

Cumulative Sum - Choosing Portions of Hierarchy

I have a bit of an interesting problem.
I required the cumulative sum on a set that is created by pieces of a Time dimension. The time dimension is based on hours and minutes. This dimension begins at the 0 hour and minute and ends at the 23 hour and 59 minute.
What I need to do is slice out portions from say 09:30 AM - 04:00 PM or 4:30PM - 09:30 AM. And I need these values in order to perform my cumulative sums. I'm hoping that someone could suggest a means of doing this with standard MDX. If not is my only alternative to write my own stored procedure which forms my Periods to date set extraction using the logic described above?
Thanks in advance!
You can create a secondary hiearchy in your time dimension with only the hour and filter the query with it.
[Time].[Calendar] -> the hierarchy with year, months, day and hours level
[Time].[Hour] -> the 'new' hierarchy with only hours level (e.g.) 09:30 AM.
The you can make a query in mdx adding your criteria as filter :
SELECT
my axis...
WHERE ( SELECT { [Time].[Hour].[09:30 AM]:[Time].[Hour].[04:00 PM] } on 0 FROM [MyCube] )
You can also create a new dimension instead of a hierarchy, the different is in the autoexists behaviour and the performance.

Resources