Exporting ARIMA-regression forecasts - export

I'll be grateful if you can answer this. I fit a regression-ARIMA model as
'm1<-arima(y1,order=c(0,1,1),xreg=x1)'
Standard command for ARIMA with regressors.
Then I use it to predict ahead:
'f1<-predict(m1,newxreg=x2,n.ahead=5)'
standard stuff
The problem appears next, where I get this output (focusing on the part that matters):
[126] 0.160095040 0.350650026 0.281973859 -0.161599717 0.305046264
$se
Time Series:
Start = 131
End = 135
Frequency = 1
[1] 0.2268524 0.2283935 0.2299241 0.2314447 0.2329553
I'd overlook that it repeats the entire series, which I'm not interested in; my problem is that I can't export the predictions! When I submit
'write.csv(f1,"out.csv")'
I get
Error in data.frame(pred = c(0.303546011854172, 0.240057038087077, 0.125706531559436, : arguments imply differing number of rows: 130, 5
As for the forecasts, the vector f1 appears of length 2, with f1[1] being the 130 original observations, and f1[2] being the 5 forecasts! (Instead of f1[1] being the forecast of the 1st period ahead, f1[2] the forecast of the 2nd period, etc)
Any ideas of what's going on and what can I do about it?
And to intrique you even more, the problem only appears when I use regressors! Everything works well if I only have ARIMA!
Any help is much appreciated!

Related

What am I doing wrong with this AI?

I am creating a very naive AI (it maybe shouldn't even be called an AI, as it just tests out a lot of possibilites and picks the best one for him), for a board game I am making. This is to simplify the amount of manual tests I will need to do to balance the game.
The AI is playing alone, doing the following things: in each turn, the AI, playing with one of the heroes, attacks one of the max 9 monsters on the battlefield. His goal is to finish the battle as fast as possible (in the least amount of turns) and with the fewest amount of monster activations.
To achieve this, I've implemented a think ahead algorithm for the AI, where instead of performing the best possible move at the moment, he selects a move, based on the possible outcome of future moves of other heroes. This is the code snippet where he does this, it is written in PHP:
/** Perform think ahead moves
*
* #params int $thinkAheadLeft (the number of think ahead moves left)
* #params int $innerIterator (the iterator for the move)
* #params array $performedMoves (the moves performed so far)
* #param Battlefield $originalBattlefield (the previous state of the Battlefield)
*/
public function performThinkAheadMoves($thinkAheadLeft, $innerIterator, $performedMoves, $originalBattlefield, $tabs) {
if ($thinkAheadLeft == 0) return $this->quantify($originalBattlefield);
$nextThinkAhead = $thinkAheadLeft-1;
$moves = $this->getPossibleHeroMoves($innerIterator, $performedMoves);
$Hero = $this->getHero($innerIterator);
$innerIterator++;
$nextInnerIterator = $innerIterator;
foreach ($moves as $moveid => $move) {
$performedUpFar = $performedMoves;
$performedUpFar[] = $move;
$attack = $Hero->getAttack($move['attackid']);
$monsters = array();
foreach ($move['targets'] as $monsterid) $monsters[] = $originalBattlefield->getMonster($monsterid)->getName();
if (self::$debug) echo $tabs . "Testing sub move of " . $Hero->Name. ": $moveid of " . count($moves) . " (Think Ahead: $thinkAheadLeft | InnerIterator: $innerIterator)\n";
$moves[$moveid]['battlefield']['after']->performMove($move);
if (!$moves[$moveid]['battlefield']['after']->isBattleFinished()) {
if ($innerIterator == count($this->Heroes)) {
$moves[$moveid]['battlefield']['after']->performCleanup();
$nextInnerIterator = 0;
}
$moves[$moveid]['quantify'] = $moves[$moveid]['battlefield']['after']->performThinkAheadMoves($nextThinkAhead, $nextInnerIterator, $performedUpFar, $originalBattlefield, $tabs."\t", $numberOfCombinations);
} else $moves[$moveid]['quantify'] = $moves[$moveid]['battlefield']['after']->quantify($originalBattlefield);
}
usort($moves, function($a, $b) {
if ($a['quantify'] === $b['quantify']) return 0;
else return ($a['quantify'] > $b['quantify']) ? -1 : 1;
});
return $moves[0]['quantify'];
}
What this does is that it recursively checks future moves, until the $thinkAheadleft value is reached, OR until a solution was found (ie, all monsters were defeated). When it reaches it's exit parameter, it calculates the state of the battlefield, compared to the $originalBattlefield (the battlefield state before the first move). The calculation is made in the following way:
/** Quantify the current state of the battlefield
*
* #param Battlefield $originalBattlefield (the original battlefield)
*
* returns int (returns an integer with the battlefield quantification)
*/
public function quantify(Battlefield $originalBattlefield) {
$points = 0;
foreach ($originalBattlefield->Monsters as $originalMonsterId => $OriginalMonster) {
$CurrentMonster = $this->getMonster($originalMonsterId);
$monsterActivated = $CurrentMonster->getActivations() - $OriginalMonster->getActivations();
$points+=$monsterActivated*($this->quantifications['activations'] + $this->quantifications['activationsPenalty']);
if ($CurrentMonster->isDead()) $points+=$this->quantifications['monsterKilled']*$CurrentMonster->Priority;
else {
$enragePenalty = floor($this->quantifications['activations'] * (($CurrentMonster->Enrage['max'] - $CurrentMonster->Enrage['left'])/$CurrentMonster->Enrage['max']));
$points+=($OriginalMonster->Health['left'] - $CurrentMonster->Health['left']) * $this->quantifications['health'];
$points+=(($CurrentMonster->Enrage['max'] - $CurrentMonster->Enrage['left']))*$enragePenalty;
}
}
return $points;
}
When quantifying some things net positive points, some net negative points to the state. What the AI is doing, is, that instead of using the points calculated after his current move to decide which move to take, he uses the points calculated after the think ahead portion, and selecting a move based on the possible moves of the other heroes.
Basically, what the AI is doing, is saying that it isn't the best option at the moment, to attack Monster 1, but IF the other heroes will do this-and-this actions, in the long run, this will be the best outcome.
After selecting a move, the AI performs a single move with the hero, and then repeats the process for the next hero, calculating with +1 moves.
ISSUE: My issue is, that I was presuming, that an AI, that 'thinks ahead' 3-4 moves, should find a better solution than an AI that only performs the best possible move at the moment. But my test cases show differently, in some cases, an AI, that is not using the think ahead option, ie only plays the best possible move at the moment, beats an AI that is thinking ahead 1 single move. Sometimes, the AI that thinks ahead only 3 moves, beats an AI that thinks ahead 4 or 5 moves. Why is this happening? Is my presumption incorrect? If so, why is that? Am I using wrong numbers for weights? I was investigating this, and run a test, to automatically calculate the weights to use, with testing an interval of possible weights, and trying to use the best outcome (ie, the ones, which yield the least number of turns and/or the least number of activations), yet the problem I've described above, still persists with those weights also.
I am limited to a 5 move think ahead with the current version of my script, as with any larger think ahead number, the script gets REALLY slow (with 5 think ahead, it finds a solution in roughly 4 minutes, but with 6 think ahead, it didn't even find the first possible move in 6 hours)
HOW THE FIGHT WORKS: The fight works in the following way: a number of heroes (2-4) controlled by the AI, each having a number of different attacks (1-x), which can be used once or multiple times in a combat, are attacking a number of monsters (1-9). Based on the values of the attack, the monsters lose health, until they die. After each attack, the attacked monster gets enraged if he didn't die, and after each heroes performed a move, all monsters get enraged. When the monsters reach their enrage limit, they activate.
DISCLAIMER: I know that PHP is not the language to use for this kind of operation, but as this is only an in-house project, I've preferred to sacrifice speed, to be able to code this as fast as possible, in my native programming language.
UPDATE: The quantifications that we currently use look something like this:
$Battlefield->setQuantification(array(
'health' => 16,
'monsterKilled' => 86,
'activations' => -46,
'activationsPenalty' => -10
));
If there is randomness in your game, then anything can happen. Pointing that out since it's just not clear from the materials you have posted here.
If there is no randomness and the actors can see the full state of the game, then a longer look-ahead absolutely should perform better. When it does not, it is a clear indication that your evaluation function is providing incorrect estimates of the value of a state.
In looking at your code, the values of your quantifications are not listed and in your simulation it looks like you just have the same player make moves repeatedly without considering the possible actions of the other actors. You need to run a full simulation, step by step in order to produce accurate future states and you need to look at the value estimates of the varying states to see if you agree with them, and make adjustments to your quantifications accordingly.
An alternative way to frame the problem of estimating value is to explicitly predict your chances of winning the round as a percentage on a scale of 0.0 to 1.0 and then choose the move that gives you the highest chance of winning. Calculating the damage done and number of monsters killed so far doesn't tell you much about how much you have left to do in order to win the game.

(C) - How would one compare 2 txt files REQUESTS.txt and AVAILABLE.txt, separating each str read into a (STR6, STR3, STR3, INT) formatted Structure?

I have been working on this program for over a week with no breakthrough. The questions states as follows:
A ​disc​ ​file​ ​‘REQUESTS.TXT’​ ​contains​ ​airline​ ​flight​ ​data formatted​
​(STR6,​ ​STR3,​ ​STR3,​ ​INT)​.
Example:​
AA1011​SFx​LAx​​34​ ​(American Airlines​ ​1010,​ ​SF​ ​to​ ​LA,​ ​34​ ​seats)
W0924​DNV​DFW​​101​ ​(Western​ ​0924,​ ​DNV​ ​to​ ​DFW,​ ​101​ ​seats)
Another​ ​file​ ​‘AVAILABL.TXT’​ ​contains​ ​an​ ​unspecified​ number​ ​of​ ​reservation​ request​ ​records formatted​ ​identically​ ​as​ ​described​ ​above​ ​except​ ​the​ Seats​ ​Available​ ​field​ ​is​ ​a​ ​Seats​ ​Requested field.
Guidelines:
Read reservation flights and process requests. If the request can be fullfilled (i.e.. it is in AVAILABL and REQUESTS) then print "Reservation Processed", otherwise print "Reservation Denied".
Print out flight data file before and after reservations are processed, ordered by flight ID in a four(4) column format.
Print an overall outcome report for all processed.(Present totals for the number of requests satisfied and denied)
I have tried a few different approaches.. I tried to split up the first STR6 by isalpha/isdigit and combine them to make the FlightID (AA + 1011). Proceeded to try to then split up the remaining characters between STR3 and STR3 via isalpha + for loop. And lastly, I tried to take the last 3+ digits for the # of seats during each for loop iteration and multiply the first digit by 100(for a 3-digit value) or 10(for a 2-digit value), adding it to a running total for availSeats(INT). This, at least I thought so, would produce a
AA+1011 = AA1011(STR6) // W+0924 = W0924(STR6)
SFx(STR3) // DNV(STR3)
LAx(STR3) // DFW(STR3)
(3*10)+(4*1) = 34(INT) // (1*100)+(0*10)+(1*1) = 101(INT)
All of this stored within a Struct Array.
i.e...
FlightData Flight; ............................................FlightData Flight;
Flight[0].flightID = AA1011; .........................Flight[1].flightID = W0924;
Flight[0].fromCity = SFx; ...............................Flight[1].fromCity = DNV;
Flight[0].toCity = LAx; ..................................Flight[1].toCity = DFW;
Flight[0].seatsAvail = 34; .............................Flight[1].seatsAvail = 101;
I am really at a loss right now and have no other way to progress other than searching up different techniques/methods to use to make this work. I am a beginner clearly and will continue to practice and progress in C, but if anyone could provide me with a push in the right direction on how one would execute this via .txt into a Struct would be amazing. Also, if anyone has another method they used to solve this problem I would love to analyze it. Thanks!
(This is my first post, I spent a lot of time formatting it to be clear on Stackoverflow, so If i messed up in areas some constructive critisism would be useful! This applies to my posting and my coding practices. Thanks again!)
EDIT: The question I am asking here is how to successfully take a string such as AA1011SFxLAx34 and turn it into a Structure like the above diagram. It must also work for the second string W0924DNVDFW101 which has only 1 Char in its ID. (rather than two in AA1011). Im not sure what else I am supposed to edit after reading the guidelines.
I consider this a home work question, so I answer according to
How do I ask and answer homework questions?
Find a tutorial on C, work through it.
Then take a HelloWorld, modify it in small steps to approach your goal in steps from working program to working program. This way you should at least get to being able to read text from a file and print it.
Then learn to store parts of what you print into basic variables.
Then learn about structures.
And so on.
This way you will get quite close to the solution.
If it is not completely what you need show the code you have here at that point and ask a specific question about the first problem explaining what you suspect the problem to be. Show code which has exactly that one problem and makes it visible and has not other warnings (using at least e.g. gcc -Wall mycode).
Fix with the help of commments/answers you receive, repeat.

TensorFlow learn.Estimator : is it naive to call fit() many times? Because I get ResourceExhaustedError

I am learning machine learning using TensorFlow. I have been through a couple of tutorials but I still have a hard time trying to find what are the good ways of training a model. Recently I implemented a CNN model I found in the litterature. The model must take a crop of a certain size centered on a given pixel and predict the label of this pixel. It does that for each pixel of the image. I used:
classifier = tf.learn.Estimator(model_fn=cnn_model_fn, model_dir="./cnn")
with cnn_model_fn beeing a function I implemented.
For each training image, we take 3000 crops randomly, so I can't load all theses images and their crops to memory. The way I found is by loading one image at a time, extract the 3000 crops and then call classifier.fit() to train on the 3000 crops. Then loop for each image in my dataset.
for i in range(len(filenames)):
...
image = misc.imread(filenames[i])
labels = misc.imread(groundTruth[i]) #labels for each pixels
input_classifier = preprocess(image,...) #crops 3000 images in image and do other things
input_labels = preprocess_labels(labels, ...) #take the corresponding 3000 labels
classifier.fit(x = input_classifier,
y = input_labels,
batch_size = 30
steps = 100)
It worked fine for 100 images, but if I try on the whole dataset (2000 images), it always stops and give an error of ResourceExhausted.
...
[everything goes well]
...
iteration :227/2000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
INFO:tensorflow:Create CheckpointSaverHook.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
Traceback (most recent call last):
File "train-cnn.py", line 78, in <module>
classifier.fit(x= input_classifier, y=input_labels,batch_size=30, steps=100)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
...
...
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: cnn/graph.pbtxt.tmp32bcc6311c164c29b91177d17d05d669
I don't see why it gets OOM... I have suspicions that it is because of the way I call fit() in loop. After each fit(), a ckpt is saved and it must be restored right after to train on the next image. So is it a bad way to train a model?
running estimator.fit in a loop with smaller steps is not a good idea. I would put all input logic into an input_fn. then run estimator.fit only once with more steps.
An example of reading data from different files can be found here: tf.contrib.learn.read_batch_examples

Increment by 5.005, sometimes 207, sometimes 196?

This problem deals with HLS playlists, and it may help to understand how HLS works before diving in.
HLS (HTTP Live Streaming) is a playlist, similar to that of an iTunes .m3u playlist. HLS takes a video file (such as an .mpg), and splits it into multiple, equal-length segment files (.ts — stands for Transport Stream). For simplicity, you can think of these segment files as chunks of the original .mpg file, which are to be played consecutively.
There are many ways to name these segment files. Sometimes you have…
file_segment_0.ts
file_segment_1.ts
file_segment_2.ts
But sometimes you have something like,
22/51/04.ts
22/51/09.ts
22/51/14.ts
(H/M/S)
The client (such as VLC) knows how to handle these files. It’s up to the producer to decide how they want to name their files.
An HLS playlist can also be “VOD” (Video On-Demand) or “Live”. If the playlist is “Live”, the client will jump to the current live time. Inside the playlist, a header will define the program’s (in terms of the streamed event) start datetime, like so:
#EXT-X-PROGRAM-DATE-TIME:2016-09-16T21:59:09+00:00
The playlist will also tell the client how far apart the segmented files are, in terms of seconds.
My issue falls with H/M/S format. You can find an example playlist here: http://pastebin.com/raw/rS84YJwN
The segments are 5.005s apart, as defined by #EXTINF:5.005.
At first glance, it doesn’t look so bad. Start at the EXT-X-PROGRAM-DATE-TIME, increment by 5.005, round accordingly, and format the date as H/M/S.ts
But there’s a bigger question: Why are there sometimes 207 segments between EXT-X-PROGRAM-DATE-TIME + 5.005, and why are there sometimes 196 segments between EXT-X-PROGRAM-DATE-TIME + 5.005?
My math tells me that I will increment by 6 (instead of 5) every 200 segments, which I can calculate as true with some quick and dirty ruby code[0], which produces this output:
139 22/14/40.ts
339 22/31/21.ts
539 22/48/02.ts
746 23/05/18.ts
946 23/21/59.ts
1146 23/38/40.ts
1346 23/55/21.ts
1402 00/00/01.ts
1542 00/11/42.ts
1742 00/28/23.ts
1942 00/45/04.ts
2142 01/01/45.ts
2342 01/18/26.ts
2542 01/35/07.ts
2742 01/51/48.ts
2942 02/08/29.ts
3142 02/25/10.ts
3342 02/41/51.ts
3542 02/58/32.ts
3749 03/15/48.ts
3949 03/32/29.ts
4149 03/49/10.ts
Where 139 is a line number, and 22/14/40.ts is the segment file.
My question is this: What’s going on here, and how can I reproduce it accurately? I obviously don’t/won’t have access to the actual input video file, and I need to rebuild these playlist files.
[0]
require 'date'
file = `curl 'http://pastebin.com/raw/rS84YJwN'`
start_date = DateTime.parse(file.scan(%r{#EXT-X-PROGRAM-DATE-TIME:(.*)$}).to_a.first.first)
lines = file.split("\n").select { |line| !line.index('.ts').nil? }
date_lines = []
lines.each_with_index do |line, i|
str = line.gsub("/", ':').split(".ts")[0]
date = "#{start_date.to_date} #{str}"
date_obj = DateTime.parse(date)
date_lines << date_obj
next unless i > 0
diff = date_obj.to_time - date_lines[i - 1].to_time
if diff != 5.0
puts "#{i} #{line}"
end
end

FSharpChart with Windows.Forms very slow for many points

I use code like the example below to do basic plotting of a list of values from F# Interactive. When plotting more points, the time taken to display increases dramatically. In the examples below, 10^4 points display in 4 seconds whereas 4.10^4 points take a patience-testing 53 seconds to display. Overall it's roughly as if the time to plot N points is in N^2.
The result is that I'll probably add an interpolation layer in front of this code, but
1) I wonder if someone who knows the workings of FSharpChart and Windows.Forms could explain what is causing this behaviour? (The data is bounded so one thing that seems to rule out is the display needing to adjust scale.)
2)Is there a simple remedy other than interpolating the data myself?
let plotl (f:float list) =
let chart = FSharpChart.Line(f, Name = "")
|> FSharpChart.WithSeries.Style(Color = System.Drawing.Color.Red, BorderWidth = 2)
let form = new Form(Visible = true, TopMost = true, Width = 700, Height = 500)
let ctl = new ChartControl(chart, Dock = DockStyle.Fill)
form.Controls.Add(ctl)
let z1 = [for i in 1 .. 10000 do yield sin(float(i * i))]
let z2 = [for i in 1 .. 20000 do yield sin(float(i * i))]
plotl z1
plotl z2
First of all, FSharpChart is a name used in an older version of the library. The latest version is called F# Charting, comes with a new documentation and uses just Chart.
To answer your question, Chart.Line and Chart.Points are quite slow for large number of points. The library also has Chart.FastLine and Chart.FastPoints (which do not support as many features, but are faster). So, try getting the latest version of F# Charting and using the "Fast" version of the method.

Resources