Forecasting 5 Day Sales - forecasting

I have daily sales data between 2013-02-18 to 2017-02-12 that has only 4 days of data missing (all the Xmases on the 25th of each year). These holidays have a sale volume of zero. My purpose is to understand how to staff my store for the upcoming week by short-term predicting my sales for the next 5-7 days worth of data.
I start by setting up this data as a time series:
ts <- ts(mydata, frequency = 365)
and then an initial analysis through a decomposition:
This seems to show I have a declining sales trend, but there is some seasonality, if I'm not mistaken. So, to start my forecast implementation, I fit an arima model for the first two years worth of data by doing:
fit <- auto.arima(ts[1:730], stepwise = FALSE, approximation = FALSE)
Series: ts[1:730]
ARIMA(4,1,1)
Coefficients:
ar1 ar2 ar3 ar4 ma1
0.3638 -0.2290 -0.1451 -0.2075 -0.8958
s.e. 0.0413 0.0388 0.0388 0.0398 0.0241
sigma^2 estimated as 15424930: log likelihood=-7068.67
AIC=14149.33 AICc=14149.45 BIC=14176.88
This model doesn't seem right to me, because it does not include any seasonality. I know I have enough data. Rob Hyndman's blog said to try using ets which also showed no seasonality. What am I not understanding about this data series or the forecasting methodology?

I've re-asked this question more appropriately in the stats exchange forums. Could someone please close this question here in stackexchange for me?
The questions is now here.
https://stats.stackexchange.com/questions/295012/forecast-5-7-day-sales

Related

How to skip rows with the same values

I have the following problem: I have a dataset with over 1million entries (shown below), that includes the variables company (=Name of the company (string)) and reviews (=amount of reviews a company received) and company1 (assigns numeric to specific company name). Now I want to calculate the average amount of reviews a company in the dataset receives. But if I just do sum reviewsthen it will count the amount of reviews of company 3 two times, the amount of reviews of company five 23 times etc. (as often as they are listed in the data). How do I avoid this and only count them once?
Your image is not readable (by me on a laptop). The Stata tag wiki gives detailed advice on how to give data examples and the command dataex bundled with recent versions of Stata is easily used for SE.
The flavour of your request is easier to follow. Here is an analogue. With the Grunfeld data we can calculate a mean investment for each year.
webuse grunfeld, clear
egen mean = mean(invest), by(year)
Now we might want to know how many years had mean invest above 200 (in the units used)?
su mean if mean > 200
or
count if mean > 200
returns the number of observations (not years). If you try it, the result is 30. In the Grunfeld data, there are 10 companies each measured for each year, so dividing by 10 is an easy answer. For more complicated datasets, it would better to tag each year just once, and then look only at tagged observations:
egen tag = tag(year)
count if tag & mean > 200
It would be more common to tag panels, not years, but the principle is the same. See the help for egen.
collapse and contract offer other routes, with or without using frames.

Monthly trend comparison using tables in Google Data Studio

I'm quite new to GDS and I've been experimenting with the comparison date range to see the increase in % between current and previous month
I've managed to get a slight result but it's not showing the correct % increase which I have manually calculated to confirm.
The values are
No of Reviews (calculation)
July - 379 reviews / 314 positive reviews = 82.85%
August - 480 reviews / 458 positive reviews = 95.42%
Manually Calculated Difference = 12.57%
GDS Comparison Difference = 15.2%
The date column itself is formatted as "YYYYMMDD" and I've tried the comparison and calculation field options on the metric but to no avail
It feels like I am getting a comparative % rather than a direct increase
Any help/guidance would be greatly appreciated as I have tried the GDS forum several times but there is very little activity on there
Thanks so much
Dan
It's not ideal but to do this in GDS you need to create a new column in Google Sheets which is literally the values you need from the previous month. I.e.
date |reviews|pos reviews|prev month reviews|prev month pos reviews|
2019-08-01|480 |458 |379 | 314 |
You can then create 2 calculated fields for this month % positive and last month % positive and a 3rd for the difference.
There are other wyas you can try to do this but they get a bit messy and manual so that's probably a good starting point. GDS is great in some ways (cost being one!) but a little limited in others!
On the style on your table, under show compare, check Show Absolute Change to get the exact difference.

MS SQL - Calculating plan payments for a month

I need to calculate how much a plan has cost the customer in a specific month.
Plans have floating billing cycles of a month's length - for example a billing cycle can run from '2014-04-16' to '2014-05-16'.
I know the start date of a plan, and the end date can either be a specific date or NULL if the plan is still running.
If the end date is not null, then the customer is charged for a whole month - not pro rated. Example: The billing cycle is going from the 4th to 4th each month, but the customer ends his plan on the 10th, he will still be charged until the 4th next month.
Can anyone help me? I feel like I've been going over this a million times, and just can't figure it out.
Variables I have:
#planStartDate [Plan's start date]
#planEndDate [Plan's end date - can be null]
#billStartDate [The bill's start date - example: 2015-02-01]
#billEndDate [One month after bill's start date - 2015-03-01]
#price [the plan's price per billing cycle]
Heres the best answer I can give based on the very small information you have given so far(btw, in the future, it would really help people answer your question faster/easier/more efficiently if you could specify a lot more info;tables involved, all columns, etc..):
"I need to calculate how much a plan has cost the customer in a specific month."
SELECT SUM(price), customerID(I assume you have a column of some sort in this table to distinguish between customers) FROM table_foo
where planStartDate BETWEEN = 'a specific date you specify'
Its a bit rough of a query, but thats the best I can give till you specify more clearly your variable (i.e. tables involved, ALL columns in table, etc etc.....)

How to keep track changing items in a stock portfolio?

I have a system where people can pick some stocks and it values their portfolios but I'm having trouble doing this in a efficient way on a daily basis because I'm creating entries for days that don't have any changes(think of it like I'm measuring the values and having version control so I can track changes to the way the portfolio is designed).
Here's a example(each day's portfolio with stock name and weight):
Day1:
ibm = 10%
microsoft = 50%
google = 40%
day5:
ibm = 20%
microsoft = 20%
google = 40%
cisco = 20%
I can measure the value of the portfolio on day1 and understand I need to measure it again on day5(when it changed) but how do I measure day2-4 without recreating day1's entry in the database?
My approach right now(which I don't like) is to create a temp entry in my database for when someone changes the portfolio and then at the end of the day when I calculate the values if there is a temp entry I use that otherwise I create a new entry(for day2-4) using the last days data. The issue is as data often doesn't change I'm creating entries that are basically duplicates. The catch is: my stock data is all daily. I also thought of taking the portfolio and if it hasn't been updated in 3 days to find the returns of the last 3 days for each stock but I wasn't sure if there was a better solution.
Any ideas? I think this is a straight forward problem but I just can't see a efficient way of doing it.
note: in finance terms, its called creating a NAV and most firms do it the inefficient way I'm doing it but its because the process was created like 50 years ago and hasn't changed. I think this problem is very similar to version control but I can't seem to make a solution.
In storage terms is makes most sense to just store:
UserId - StockId1 - 23% - 2012-06-25
UserId - StockId2 - 11% - 2012-06-26
UserId - StockId1 - 20% - 2012-06-30
So you see that stock 1 went down at 30th. Now if you want to know the StockId1 percentage at the 28th you just select:
SELECT *
FROM stocks
WHERE datecolumn<=DATE(2012-06-28)
ORDER BY datecolumn DESC LIMIT 0,1
If it gives nothing back you did not have it, otherwise you get the last position back.
BTW. if you need for example a graph of stock 1 you could left join against a table full of dates. Then you can fill in the gaps easily.
Found this post here for example:
UPDATE mytable
SET number = (#n := COALESCE(number, #n))
ORDER BY date;
SQL QUERY replace NULL value in a row with a value from the previous known value

Booking system dates in database

I need some help with the following:
I am setting up a booking system (kind of hotel booking) and I have inserted a check in date and a check out date into database, how do I go to check if a room is already booked?
I have no clue how to manage the already booked days in the database. Is there anyone who can give me a clue how this works? And maybe where I can find more information?
Well, I didn't understand very well your question, but my suggestion is to you to add a state field, in which you can control the current state of the "booked" item. something like
Available
Under Maintenance
Occupied
Or whatever bunch of states that work for you.
[EDIT]
The analysis that I use to do for that case is as follows:
Take for instance, your room is currently booked with these date range:
Init Date: Feb 8
End Date: Feb 14
Success Booking Examples
Init Date: Feb 2
End Date: Feb 6
Init Date: Feb 15
End Date: Feb 24
You should check that the booking attempt satisfies these conditions:
Neither "Booking Init Date" nor "Booking End Date" can be inside of the already booked date.
Example:
Init Date: Feb 2
End Date: Feb 10 (Inside the current range (Feb 8 to 14))
Init Date: Feb 12 (Inside the current range (Feb 8 to 14))
End Date: Feb 27
if "Booking Init Date" is less than current init date, "Booking End Date" should also be less than current init date
Init Date: Feb 2.
End Date: Feb 27 (Init date before, but end date later)
This is an interesting question - not least because I don't believe that there is a single ideal answer as it will depend to some extent on the nature of the "Hotel" and the way in which people are placed in rooms and to a further extent on the scale of the question.
At the most basic level you have two ways that you can track occupancy of rooms:
You can have a table with one row per room per day that defines the state of that room on that date (and possibly the related booking if occupied)
For each room you maintain a list of "bookings" (which as already suggested will have to include states for when a room is unavailable for maintenance).
The reason that its an interesting question is that these both immediately present you with certain challenges when maintaining the data and searching for occupancy in the case of the former you're holding a lot of data that may not be needed and in the case of the latter finding gaps in the occupancy for new bookings is perhaps more interesting.
You then (in both cases) have a futher problem of resource allocation - if your bookings tend to be for two days and you system results in 1 day gaps between bookings you're going to need to re-arrange the occupancy to optimise usage... but then you have to be careful with bookings that need to be explicitly tied to specific rooms.
Ideally you would defer assigning a booking to a room for as long as possible (which is why it depends on the hotel - fine for 400 modular rooms rather less so for a dozen unique ones) - so long as there are sufficient rooms of the necessary standard available (which you invert, so long as there are fewer rooms booked than real rooms) during a target period you can take the booking. Of course you've still then got to represent the state of the rooms so this is in addition to the data you've got to manage.
All of which is what makes it interesting - there is considerable depth to the problem, you need to have a fairly decent understanding of the specific problem to produce an appropriate solution.
I have come to the booking problem from the perspective of avoiding a highly populated table, giving that the inventory is thousands rather than hundreds.
solution one - intervals
solution two - populate slots only when are occupied using the smallest unit (1 day)
solution three - generate slots in advance for each resource and manage the status
Solution one has smallest size footprint but since you cannot guess if your searched range is already in an interval or not - you have to read and compare the whole table.
Solution two solves this problem and you can search for a specific time-frame only but the table contains more lines. However since the empty slots are not written anywhere a high vacancy will reduce the size of the table.
Another advantage is that old bookings can be transferred to a separate table.
Solution three increases the size of the table to a maximum minSlotresourcetime and the lines are generated in advance. The only advantage that I can think of is the cost of finding empty slots with a simple select.
However generating the slots in advance looks like a terrible idea to me.

Resources