Store outputs of forecast - forecasting

I'm using a forecast() function in R many times with loop (12 months) for but I want to use accuracy to compare forecast for horizon time =12 and one-step ahead. My problem is how to store the results of 12 times to use it in accuracy.
Thank you

Related

Method to Correlate Time Series Arrays of Differing Lengths

I am attempting to correlate the time series from 4 separate tilt monitors that sample every 5 minutes. The time series have slightly different base times and end times, and the resulting arrays are slightly different lengths, though they span almost the (differing by ~3 mins) same period of time. My goal is to correlate each of these time series with a single "wind speed" time series that also covers the same period of time as the tilt monitors, sampling every 5 minutes, but also has a slightly different array length and origin time and end time.
The different array lengths in the tilt measurements are due to instrument error. There are some times within each of the arrays where the instrument missed a measurement and so the sample interval is 10 minutes.
My arrays sizes look something like this:
Tilt_a = 6236x2
Tilt_b = 6310x2
Tilt_c = 6304x2
Tilt_d = 6309x2
Wind_speed = 6283x2
I am using MATLAB to do the correlation. I imagine that I will need to re-sample the data using something like interp1, but I do not know how to renconcile the origin and end times. Is there a method that comes to mind for handling a situation such as this one? Or a function that allows correlating arrays of differing lengths?
So for the different time windows your analyzing, you could either trim them all so that they start and end at the same time, or you could just review them over their raw intervals, and make your comparisons over the windows that overlap.
As far as the sampling interval, you can use the resample command to make your comparisons more uniform.
https://www.mathworks.com/help/signal/ref/resample.html
Extending the first concept, you could use resample to define new vectors with the start time and end time and interval all synchronized, then continue with your analysis.

Find all key-unique permutations of an array

I have an array that looks something like this:
[["Sunday", [user1, user2]], ["Sunday", [user1, user4]], ["Monday", [user3, user2]]]
The array essentially has all permutations of a given day with a unique pair of users. I obtained it by running
%w[Su Mo Tu We Th Fr Sa].product(User.all_pairs)
where User.all_pairs is every unique pair of users.
My goal now is to compose this set of nested arrays into schedules, meaning I want to find every permutation of length 7 with unique days. In other words, I want every potential week. I already have every potential day, and I have every potential pair of users, now I just need to compose them.
I have a hunch that the Array.permutation method is what I need, but I'm not sure how I'd use it in this case. Or perhaps I should use Array.product?
If I understand you correctly, you want all possible weeks where there is one pair of users assigned to each day. You can do it like this:
User.all_pairs.combination(7)
This will give you all possible ways of how you can pick 7 pairs and assign them to the days of the week. But if you are asking for every possible week, then it also matters into which day is which pair assigned, and you also have to take every permutation of those 7 pairs:
User.all_pairs.combination(7).map{|week| week.permutation().to_a}.flatten(1)
Now this will give you all possible weeks, where every week is represented as array containing 7 pairs. For example one of the weeks may look like this:
[(user1, user2), (user1, user3), (user2, user3), (user3, user4), (user1, user4), (user2, user4), (user3, user4)]
However the amount of the weeks will be huge! If you have n users, you will have k = n!/2 pairs, there is p = k! / (7! * (k - 7)!) ways of selecting 7 pairs and p * 7! possible weeks. If you have just 5 users, you get 1946482876800 possible weeks! No matter what you are planning to do with it, it won't be possible.
If you are trying to find the best schedule for a week, you can try to make some greedy algorithm.

Is there an easy way to get the percentage of successful reads of last x minutes?

I have a setup with a Beaglebone Black which communicates over I²C with his slaves every second and reads data from them. Sometimes the I²C readout fails though, and I want to get statistics about these fails.
I would like to implement an algorithm which displays the percentage of successful communications of the last 5 minutes (up to 24 hours) and updates that value constantly. If I would implement that 'normally' with an array where I store success/no success of every second, that would mean a lot of wasted RAM/CPU load for a minor feature (especially if I would like to see the statistics of the last 24 hours).
Does someone know a good way to do that, or can anyone point me in the right direction?
Why don't you just implement a low-pass filter? For every successfull transfer, you push in a 1, for every failed one a 0; the result is a number between 0 and 1. Assuming that your transfers happen periodically, this works well -- and you just have to adjust the cutoff frequency of that filter to your desired "averaging duration".
However, I can't follow your RAM argument: assuming you store one byte representing success or failure per transfer, which you say happens every second, you end up with 86400B per day -- 85KB/day is really negligible.
EDIT Cutoff frequency is something from signal theory and describes the highest or lowest frequency that passes a low or high pass filter.
Implementing a low-pass filter is trivial; something like (pseudocode):
new_val = 1 //init with no failed transfers
alpha = 0.001
while(true):
old_val=new_val
success=do_transfer_and_return_1_on_success_or_0_on_failure()
new_val = alpha * success + (1-alpha) * old_val
That's a single-tap IIR (infinite impulse response) filter; single tap because there's only one alpha and thus, only one number that is stored as state.
EDIT2: the value of alpha defines the behaviour of this filter.
EDIT3: you can use a filter design tool to give you the right alpha; just set your low pass filter's cutoff frequency to something like 0.5/integrationLengthInSamples, select an order of 0 for the IIR and use an elliptic design method (most tools default to butterworth, but 0 order butterworths don't do a thing).
I'd use scipy and convert the resulting (b,a) tuple (a will be 1, here) to the correct form for this feedback form.
UPDATE In light of the comment by the OP 'determine a trend of which devices are failing' I would recommend the geometric average that Marcus Müller ꕺꕺ put forward.
ACCURATE METHOD
The method below is aimed at obtaining 'well defined' statistics for performance over time that are also useful for 'after the fact' analysis.
Notice that geometric average has a 'look back' over recent messages rather than fixed time period.
Maintain a rolling array of 24*60/5 = 288 'prior success rates' (SR[i] with i=-1, -2,...,-288) each representing a 5 minute interval in the preceding 24 hours.
That will consume about 2.5K if the elements are 64-bit doubles.
To 'effect' constant updating use an Estimated 'Current' Success Rate as follows:
ECSR = (t*S/M+(300-t)*SR[-1])/300
Where S and M are the count of errors and messages in the current (partially complete period. SR[-1] is the previous (now complete) bucket.
t is the number of seconds expired of the current bucket.
NB: When you start up you need to use 300*S/M/t.
In essence the approximation assumes the error rate was steady over the preceding 5 - 10 minutes.
To 'effect' a 24 hour look back you can either 'shuffle' the data down (by copy or memcpy()) at the end of each 5 minute interval or implement a 'circular array by keeping track of the current bucket index'.
NB: For many management/diagnostic purposes intervals of 15 minutes are often entirely adequate. You might want to make the 'grain' configurable.

Can we solve this using a greedy strategy? If not how do we solve this using dynamic programming?

Problem:
The city of Siruseri is impeccably planned. The city is divided into a rectangular array of cells with M rows and N columns. Each cell has a metro station. There is one train running left to right and back along each row, and one running top to bottom and back along each column. Each trains starts at some time T and goes back and forth along its route (a row or a column) forever.
Ordinary trains take two units of time to go from one station to the next. There are some fast trains that take only one unit of time to go from one station to the next. Finally, there are some slow trains that take three units of time to go from one station the next. You may assume that the halting time at any station is negligible.
Here is a description of a metro system with 3 rows and 4 columns:
S(1) F(2) O(2) F(4)
F(3) . . . .
S(2) . . . .
O(2) . . . .
The label at the beginning of each row/column indicates the type of train (F for fast, O for ordinary, S for slow) and its starting time. Thus, the train that travels along row 1 is a fast train and it starts at time 3. It starts at station (1,1) and moves right, visiting the stations along this row at times 3, 4, 5 and 6 respectively. It then returns back visiting the stations from right to left at times 6, 7, 8 and 9. It again moves right now visiting the stations at times 9, 10, 11 and 12, and so on. Similarly, the train along column 3 is an ordinary train starting at time 2. So, starting at the station (3,1), it visits the three stations on column 3 at times 2, 4 and 6, returns back to the top of the column visiting them at times 6,8 and 10, and so on.
Given a starting station, the starting time and a destination station, your task is to determine the earliest time at which one can reach the destination using these trains.
For example suppose we start at station (2,3) at time 8 and our aim is to reach the station (1,1). We may take the slow train of the second row at time 8 and reach (2,4) at time 11. It so happens that at time 11, the fast train on column 4 is at (2,4) travelling upwards, so we can take this fast train and reach (1,4) at time 12. Once again we are lucky and at time 12 the fast train on row 1 is at (1,4), so we can take this fast train and reach (1,1) at time 15. An alternative route would be to take the ordinary train on column 3 from (2,3) at time 8 and reach (1,3) at time 10. We then wait there till time 13 and take the fast train on row 1 going left, reaching (1,1) at time 15. You can verify that there is no way of reaching (1,1) earlier than that.
Test Data: You may assume that M, N ≤ 50.
Time Limit: 3 seconds
As the size of N,M is very small we can try to solve it by recursion.
At every station, we take two trains which can take us nearer to our destination. E.g.: If we want to go to 1,1 from 2,3 , we take the trains which take us more near to 2,3 and get down to the nearest station to our destination, while keeping track of the time we take, if we reach the destination, we keep track of the minimum time so far, and if the time taken to reach the destination is lesser than the minimum we update it.
We can determine which station a train is at a particular time using this method:
/* S is the starting time of the train and N is the number of stations it
visits, T is the time for which we want to find the station the train is at.
T always be greater than S*/
T = T-S+1
Station(T) = T%N, if T%N = 0, then Station(T) = N;
Here is my question:
How do we determine the earliest time when a particular train reaches the station we want in the direction we want?
As my above algorithm uses greedy strategy, will it give an accurate answer? If not then how do I approach this problem?
P.S : This is not homework, it is an online judge problem.
I believe greedy solution will fail here, but it will be a bit hard to construct a counter-example.
This problem is meant to be solved using Dijkstra's algorithm. Edges are the connection between adjacent nodes and depend on the type of train and its starting time. You also don't need to compute the whole graph - only compute edged for the current node you are considering. I have solved numerous similar problems and this is the way you solved. Also tried to use greedy several times before I learnt it never passes.
Hope this helps.

What's the best way to store elapsed times in a database

I working on a horse racing application and have the need to store elapsed times from races in a table. I will be importing data from a comma delimited file that provides the final time in one format and the interior elapsed times in another. The following is an example:
Final Time: 109.39 (1 minute, 9 seconds and 39/100th seconds)
Quarter Time: 2260 (21 seconds and 60/100th seconds)
Half Time: 4524 (45 seconds and 24/100th seconds)
Three Quarters: 5993 (59 seconds and 93/100th seconds)
I'll want to have the flexibility to easily do things like feet per seconds calculations and to convert elapsed times to splits. I'll also want to be able to easily display the times (elapsed or splits) in fifth of seconds or in hundredths.
Times in fifths: :223 :451 :564 1:091 (note the last digits are superscripts)
Times in hundredths: 22.60 :45.24 :56.93 1:09.39
Thanks in advance for your input.
Generally timespans are either stored as (1) seconds elapsed or (2) start / end datetime. Seconds elapsed can be an integer or a float / double if you require it. You could be creative / crazy and store all times as milliseconds in which case you'd only need an integer.
If you are using PostgreSQL, you can use interval datatype. Otherwise, any integer (int4, int8) or number your database supports is OK. Of course, store values on a single unit of measure: seconds, minutes, milliseconds.
It all depends on how you intend to use it, but number of elapsed seconds (perhaps as a float if necessary) is certainly a favorite.
I think the 109.39 representing 1 min 9.39 sec is pretty silly. Unambiguous, sure, historical tradition maybe, but it's miserable to do computations with that format. (Not impossible, but fixing it during import sounds easy.)
I'd store time in a decimal format of some sort -- either an integer representing hundredths-of-a-second, as all your other times are displayed, or a data-base specific decimal-aware format.
Standard floating point representations might eventually lead you to wonder why a horse that ran two laps in 20.1 seconds each took 40.200035 seconds to run both laps combined.

Resources