MDP - techniques generating transition probability - artificial-intelligence

I am working a MDP car supply and demand problem as follows and was thinking whether there is any techniques to generate the transition probability matrix automatically rather than manually.
Assume the demand is as follow:
time, station1, station2
1000, 3, 1
1030, 3, 1
1100, 2, 3
Assume for car from station1, there is a 60% chance that the car will be drop off at station1 and 40% chance to drop off at station2.
Assume for car from station2, there is a 80% chance that the car will be drop off at station1 and 20% chance to drop off at station2.
I have calculated manually that the following.
At time step 1,
P(car at station1 = 2,car at station2 = 8) = 0.0432
P(car at station1 = 3,car at station2 = 7) = 0.2016
P(car at station1 = 4,car at station2 = 6) = 0.1344
P(car at station1 = 5,car at station2 = 5) = 0.0896
P(car at station1 = 6,car at station2 = 4) = 0.0512
Hence, will like to check whether anyone could provide insights to calculate the probability at time step 2 automatically, rather than to compute by hand.
For your advises pls.

I'm not sure I understand your question.
If in a stationary Markov Process, the distribution of the state variable x_t (here, the station at which the car) at a given time t is a function only of the transition matrix P and the state at the time t-1.
You can write
x_t = x_{t-1} * P for any t, which means that x_t = x_0 * P^t.
Knowing x_0 (the distribution of car at the start, e.g. if the cars are evenly distributed between the two stations x_0 = [0.5 0.5]) and using P = [ 0.6 0.4 ; 0.8 0.2 ], you then get the distribution of cars at any time t > 0 as x_t = x_0 * P^t.

Related

matlab complex for-loop correlation calcul

This is the script that I have. It works till the ------ separation. Under I do not get any error from Matlab, but neither do I get a return of bestDx nor bestDy. Please help. (The first part is given just to put you in context)
%%
% Variables after running script Read_eA3_file.m
%date_time_UTC
%reflectivity
%clutter_mask
%Convert units
dBZ = reflectivity * 0.375 - 30;
dBZ_Mask = clutter_mask * 0.375 - 30;
%Replace clutter values with NaN
weather = NaN(size(dBZ)); %initialise to constant
weather(dBZ>=dBZ_Mask) = dBZ(dBZ>=dBZ_Mask); %copy values when A >= B
%Reduce to range -- those are 384x384 arrays
dBZ_range = dBZ(:,:,1:16); %16:18 to 16:23 included
weather_range = weather(:,:,1:16); %16:18 to 16:23 included
weather1618 = weather(:,:,1); %16:18 map only
weather1623 = weather(:,:,16); %16:23 map only
% Plot maps
image(imrotate(-weather1618,90)); %of 16:18
image(imrotate(-weather1623,90)); %of 16:23
%Find x,y of strongest dBZ
%Since the value are all negative. I look for their minimun
[M,I] = min(weather1618(:)); %for 16:18
[I_row, I_col] = ind2sub(size(weather1618),I); %values are 255 and 143
[M2,I2] = min(weather1623(:)); %for 16:23
[I2_row, I2_col] = ind2sub(size(weather1623),I2); %values are 223 and 7
%Calc displacement
%I get a value of 139.7140
max_displ=sqrt((I2_row-I_row)^2+(I2_col-I_col)^2); %between 1618 and 1623
%%
% -----Section below does not work; ONLY RUN the section ABOVE---------
%% Find Dx Dy for max_corr between two maps
maxCoeff=0;
weather1618Modified = zeros(384,384); %create weather array for time range
%weather1618Modified(:) = {NaN}; % Matlab cannot mix cell & double
%%
for x = 1:384
for y = 1:384
%30 pixel appx.
for Dx = -max_displ:30: max_displ
for Dy = -max_displ:30: max_displ
%Limit range of x+Dx and y+Dy to 1:384
if x+Dx<1 | y+Dy<1 | x+Dx>384 | y+Dy>384
continue
%weather1618Modified is the forecasted weather1823
weather1618Modified(x+Dx,y+Dy) = weather1618(x,y)
%Find the best correlation; Is corrcoef the right formula?
newCoeff=corrcoef(weather1623,weather1618Modified);
if newCoeff>maxCoeff
maxCoeff=newCoeff;
bestDx=Dx;
bestDy=Dy;
end
end
end
end
end
end
%% Calc displacement
bestDispl = sqrt(bestDx^2+bestDy^2); %bestDispl for a 5 min frame
%Calc speed
speed = bestDispl/time;
You have to delete the continue statement after the first if (or place it somewhere else).
The continue statement makes the program skip the remaining part of the for-loop and go directly to the next iteration. Therefore bestDx and bestDy will never be set.
Documentation: https://se.mathworks.com/help/matlab/ref/continue.html

Using anonymous function rather than looping for summing up values in cell array

I have a matrix called data. It contains 3 columns, company name, company value & company currency, bit like below.
Name Value Currency
ABC 10 USD
MNO 5 JPY
PLM 3 USD
NJK 7 EUR
I need to sum the total value for each currency so my answer would look like below,
Currency Value
EUR 7
JPY 5
USD 13
I know I can do this using a loop but is it possible using an anonymous function and if so how?
Update - Extra information as original post lacked information
Below is my solution, which works. However I have seen people use cellFun or anonymous functions and fell like there is a more efficient way (and would to like alternative way) for problems of this nature
val = cell2mat(data(:, 2)); % double - value
sedols = data(:, [1 3]); % cell - name (1st column) and currency (2nd column)
ccy = unique(sedols(:, 2));
fx_exp = zeros(length(ccy(:, 1)), 1);
for n = 1 : length(ccy(:, 1))
index = strmatch(ccy(n, 1), sedols(:, 2));
fx_exp(n, 1) = sum(val(index));
end
Using cellfun or arrayfun is not more efficient than a simple loop. To take advantage of vectorization you need to work with pure double arrays.
Assuming your data is stored in a cell array, unique combined with accumarray is the way to go:
data = {
'ABC' 10 'USD'
'MNO' 5 'JPY'
'PLM' 3 'USD'
'NJK' 7 'EUR' };
[a,b,subs] = unique(data(:,3))
vals = [data{:,2}];
currsum = accumarray(subs,vals)
out = [data(b,3) num2cell(currsum)]
out =
'EUR' [ 7]
'JPY' [ 5]
'USD' [13]

Select max value from huge matrix of 30 years daily data

Lets say i have daily data for 30 years of period in a matrix. To make it simple just assume it has only 1 column and 10957 row indicates the days for 30 years. The year start in 2010. I want to find the max value for every year so that the output will be 1 column and 30 rows. Is there any automated way to program it in Matlab? currently im doing it manually where what i did was:
%for the first year
max(RAINFALL(1:365);
.
.
%for the 30th of year
max(RAINFALL(10593:10957);
It is exhausting to do it manually and i have quite few of same data sets. I used the code below to calculate mean and standard deviation for the 30 years. I tried modified the code to work for my task above but i couldn't succeed. Hope anyone can modify the code or suggest new way to me.
data = rand(32872,100); % replace with your data matrix
[nDays,nData] = size(data);
% let MATLAB construct the vector of dates and worry about things like leap
% year.
dayFirst = datenum(2010,1,1);
dayStamp = dayFirst:(dayFirst + nDays - 1);
dayVec = datevec(dayStamp);
year = dayVec(:,1);
uniqueYear = unique(year);
K = length(uniqueYear);
a = nan(1,K);
b = nan(1,K);
for k = 1:K
% use logical indexing to pick out the year
currentYear = year == uniqueYear(k);
a(k) = mean2(data(currentYear,:));
b(k) = std2(data(currentYear,:));
end
One possible approach:
Create a column containing the year of each data value, using datenum and datevec to take care of leap years.
Find the maximum for each year, with accumarray.
Code:
%// Example data:
RAINFALL = rand(10957,1); %// one column
start_year = 2010; %// data starts on January 1st of this year
%// Computations:
[year, ~] = datevec(datenum(start_year,1,1) + (0:size(RAINFALL,1)-1)); %// step 1
result = accumarray(year.'-start_year+1, RAINFALL.', [], #max); %// step 2
As a bonus: if you change #max in step 2 by either #mean or #std, guess what you get... much simpler than your code.
This may help You:
RAINFALL = rand(1,10957); % - Your data here
firstYear = 2010;
numberOfYears = 4;
cum = 0; % - cumulative factor
yearlyData = zeros(1,numberOfYears); % - this isnt really necessary
for i = 1 : numberOfYears
yearLength = datenum(firstYear+i,1,1) - datenum(firstYear + i - 1,1,1);
yearlyData(i) = max(RAINFALL(1 + cum : yearLength + cum));
cum = cum + yearLength;
end

Saving mixed data cell array to ascii file in MATLAB

I get some data from an instrument that is formatted in a specific way. I need to load the data into MATLAB, manipulate some values, then save it back with the same format to load back into the instrument software for further analysis...
The issue I am having is the data is of mixed value types and they are kind of all over the place.
The file is tab delimited, I have added arrows eg --> to show the location of the tabs (like notepad++ does)
Scan-42/01
Temperature [K] :--> 295.00
Time [s] :--> 60
"Linspace"
0.01--> 0.96
0.02--> 0.95
0.03--> 0.95
"Logspace"
0.01--> 0.96
0.02--> 0.95
0.04--> 0.94
The data keeps going down but I have cut it off after 3 rows.
The data I need to manipulate will be the Temperature, and some of the values under Linspace and Logspace.
I am currently importing the data like this:
filename = 'test.asc';
delimiter = '\t';
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
Data in MATLAB looks like this:
Even if I could set up some kind of template in MATLAB where I could get the values nesessary, and then save them in excactly this format would work fine. The file must be saved as .asc, or the instrument will reject it.
Help is greatly appreciated.
Thanks
Hope this would work for you.
Code
%%// Note: file1 is your input .asc filename and file2 is the output .asc.
%%// Please specify their names before running this.
%%// **** Read in file data ****
fid = fopen(file1,'r');
A = importdata(file1,'\n')
%%// Delimiters (mind these assumptions)
linlog_delim1 = '--> ';
temperature_delim1 = 'Temperature [K] :--> ';
sep1 = cellfun(#(x) isequal(x,''),A)
sep1 = [sep1 ;1]
sep_ind = find(sep1)
full_data = regexp(A,linlog_delim1,'split')
%%// Temperature value
temp_ind = find(~cellfun(#isempty,strfind(A,'Temperature [K] :-->')))
temp_val = str2num(cell2mat(full_data{temp_ind,:}(1,2)))
%%// Linspace values
sep_linspace = cellfun(#(x) isequal(x,'"Linspace"'),A)
lin_start_ind = find(sep_linspace)+1
lin_stop_ind = sep_ind(find(sep_ind>lin_start_ind,1,'first'))-1
linspace_data = vertcat(full_data{lin_start_ind:lin_stop_ind})
linspace_valid_ind = cellfun(#str2num,linspace_data(:,1))
linspace_valid_val = cellfun(#str2num,linspace_data(:,2))
%%// Logspace values
sep_linspace = cellfun(#(x) isequal(x,'"Logspace"'),A)
log_start_ind = find(sep_linspace)+1
log_stop_ind = sep_ind(find(sep_ind>log_start_ind,1,'first'))-1
logpace_data = vertcat(full_data{log_start_ind:log_stop_ind})
logspace_valid_ind = cellfun(#str2num,logpace_data(:,1))
logspace_valid_val = cellfun(#str2num,logpace_data(:,2))
%%// **** Let us modify some data ****
temp_val = temp_val + 10;
linspace_valid_val_mod1 = linspace_valid_val+[1 2 3]'; %%//'
logspace_valid_val_mod1 = logspace_valid_val+[1 20 300]'; %%//'
%%// **** Write back file data ****
%%// Write back temperature data
A(temp_ind) = {[temperature_delim1,num2str(temp_val)]}
%%// Write back linspace data
mod_lin_val = cellfun(#strtrim,cellstr(num2str(linspace_valid_val_mod1)),'uni',0)
mod_lin_ind = cellstr(num2str(linspace_valid_ind))
sep_lin = repmat({linlog_delim1},numel(mod_lin_val),1)
A(lin_start_ind:lin_stop_ind)=cellfun(#horzcat,mod_lin_ind,sep_lin,mod_lin_val,'uni',0)
%%// Write back logspace data
mod_log_val = cellfun(#strtrim,cellstr(num2str(logspace_valid_val_mod1)),'uni',0)
mod_log_ind = cellstr(num2str(logspace_valid_ind))
sep_log = repmat({linlog_delim1},numel(mod_log_val),1)
A(log_start_ind:log_stop_ind)=cellfun(#horzcat,mod_log_ind,sep_log,mod_log_val,'uni',0)
%%// Remove leading whitespaces
A = strtrim(A)
%%// Write the modified data
fid2 = fopen(file2,'w');
for row = 1:numel(A)
fprintf(fid2,'%s\n',A{row,:});
end
fclose(fid);
fclose(fid2);
Changes for the demo:
Temperature has 10 added.
"Linspace" has 1 2 and 3 added to it's elements respectively.
"Logspace" has 1 20 and 300 added to it's elements respectively.
Results
Before -
Scan-42/01
Temperature [K] :--> 295.00
Time [s] :--> 60
"Linspace"
0.01--> 0.96
0.02--> 0.95
0.103--> 0.95
"Logspace"
0.01--> 0.96
0.02--> 0.95
0.04--> 0.94
After -
Scan-42/01
Temperature [K] :--> 305
Time [s] :--> 60
"Linspace"
0.01--> 1.96
0.02--> 2.95
0.103--> 3.95
"Logspace"
0.01--> 1.96
0.02--> 20.95
0.04--> 300.94
Edit 1:
Code
%%// I-O filenames
input_filename = 'gistfile1.txt';
output_file = 'gistfile1_out.txt';
%%// Get data from input filename
delimiter = '\t';
formatSpec = '%s%s%[^\n\r]';
fid = fopen(input_filename,'r');
dataArray = textscan(fid, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
%%// Get data into A
A(:,1) = dataArray{1,1}
A(:,2) = dataArray{1,2}
%%// Find separator indices
ind1 = find([cellfun(#(x) isequal(x,''),A(:,2));1])
temperature_ind = find(~cellfun(#isempty,strfind(A,'Temperature')))
temperature_val = str2num(cell2mat(A(temperature_ind,2)))
%%// Linspace values
sep_linspace = cellfun(#(x) isequal(x,'"Linspace"'),A(:,1))
lin_start_ind = find(sep_linspace)+1
lin_stop_ind = ind1(find(ind1>lin_start_ind,1,'first'))-1
linspace_valid_ind = cellfun(#str2num,A(lin_start_ind:lin_stop_ind,1))
linspace_valid_val = cellfun(#str2num,A(lin_start_ind:lin_stop_ind,2))
%%// Logspace values
sep_logspace = cellfun(#(x) isequal(x,'"Logspace"'),A(:,1))
log_start_ind = find(sep_logspace)+1
log_stop_ind = ind1(find(ind1>log_start_ind,1,'first'))-1
logspace_valid_ind = cellfun(#str2num,A(log_start_ind:log_stop_ind,1))
logspace_valid_val = cellfun(#str2num,A(log_start_ind:log_stop_ind,2))
%%// **** Let us modify some data ****
temp_val_mod1 = temperature_val + 10;
linspace_valid_val_mod1 = linspace_valid_val+[1:numel(linspace_valid_val)]';
logspace_valid_val_mod1 = logspace_valid_val+10.*[1:numel(logspace_valid_val)]';
%%// **** Write back file data into A ****
A(temperature_ind,2) = cellstr(num2str(temp_val_mod1))
A(lin_start_ind:lin_stop_ind,2) = cellstr(num2str(linspace_valid_val_mod1))
A(log_start_ind:log_stop_ind,2) = cellstr(num2str(logspace_valid_val_mod1))
%%// Write the modified data
fid2 = fopen(output_file,'w');
for row = 1:size(A,1)
fprintf(fid2,'%s\t%s\n',A{row,1},A{row,2});
end
%%// Close files
fclose(fid);
fclose(fid2);
Results
Before -
Scan-42/01
Temperature [K] : 295.00
Time [s] : 60
"Linspace"
0.01 0.96
0.02 0.95
0.03 0.95
"Logspace"
0.01 0.96
0.02 0.95
0.04 0.94
After -
Scan-42/01
Temperature [K] : 305
Time [s] : 60
"Linspace"
0.01 1.96
0.02 2.95
0.03 3.95
"Logspace"
0.01 10.96
0.02 20.95
0.04 30.94
Please note that the only formatting difference between input and output files is that there is no whitespaced row between "Linspace" and the previous row in the output file, as was there in the input file. This is seen similarly for "Logspace".
I've solved a nearly identical problem once before. The solution goes something like this:
First, you're already splitting your data up into chunks, so that's good. Judging by your comment, it seems that the data is consistently formatted from file to file, but inconsistently formatted in each individual file. That's fine.
What you need to do is iterate through dataArray, and find each unique label (Such as "Linspace") and track that labels index. What you'll end up with is a vector of indices that tell you exactly where in dataArray these labels appear. Once you have all of the labels indices, you need to look at the dataArray, and see how the data between each label is formatted. Then you'll write some code to break dataArray into sub-arrays. You'll need to write a different sub-array parser for each format.
I know that's a little abstract, so let me try to give you an example.
timeIndex = find(strcmp(dataArray, 'Time'), 1);
linespaceIndex = find(strcmp(dataArray, '"linSpace"'), 1);
logespaceIndex = find(strcmp(dataArray, '"logSpace"'), 1);
linSpaceData = dataArray(linspaceIndex+3:logspaceIndex-1); % This is the "sub-array" I was refering to. It's a little piece of dataArray that contains only the linspace data values.
This is just an example, and will probably not plug-and-play, it's just meant to be a thought-provoker. Note the +3 and -1, those were just guessed. You'll have to empirically determine those for each range, as lings like tabs, colons, and spaces can get in the way. That should be enough to get you started on your problem. Let me know if you need clarification, or if this isn't helpful. Good luck!
-Fletch

how to calculate rolling volatility

I am trying to design a function that will calculate 30 day rolling volatility.
I have a file with 3 columns: date, and daily returns for 2 stocks.
How can I do this? I have a problem in summing the first 30 entries to get my vol.
Edit:
So it will read an excel file, with 3 columns: a date, and daily returns.
daily.ret = read.csv("abc.csv")
e.g. date stock1 stock2
01/01/2000 0.01 0.02
etc etc, with years of data. I want to calculate rolling 30 day annualised vol.
This is my function:
calc_30day_vol = function()
{
stock1 = abc$stock1^2
stock2 = abc$stock1^2
j = 30
approx_days_in_year = length(abc$stock1)/10
vol_1 = 1: length(a1)
vol_2 = 1: length(a2)
for (i in 1 : length(a1))
{
vol_1[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a1[i:j])
vol_2[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a2[i:j])
j = j + 1
}
}
So stock1, and stock 2 are the squared daily returns from the excel file, needed to calculate vol. Entries 1-30 for vol_1 and vol_2 are empty since we are calculating 30 day vol. I am trying to use the rowSums function to sum the squared daily returns for the first 30 entries, and then move down the index for each iteration.
So from day 1-30, day 2-31, day 3-32, etc, hence why I have defined "j".
I'm new at R, so apologies if this sounds rather silly.
This should get you started.
First I have to create some data that look like you describe
library(quantmod)
getSymbols(c("SPY", "DIA"), src='yahoo')
m <- merge(ROC(Ad(SPY)), ROC(Ad(DIA)), all=FALSE)[-1, ]
dat <- data.frame(date=format(index(m), "%m/%d/%Y"), coredata(m))
tmpfile <- tempfile()
write.csv(dat, file=tmpfile, row.names=FALSE)
Now I have a csv with data in your very specific format.
Use read.zoo to read csv and then convert to an xts object (there are lots of ways to read data into R. See R Data Import/Export)
r <- as.xts(read.zoo(tmpfile, sep=",", header=TRUE, format="%m/%d/%Y"))
# each column of r has daily log returns for a stock price series
# use `apply` to apply a function to each column.
vols.mat <- apply(r, 2, function(x) {
#use rolling 30 day window to calculate standard deviation.
#annualize by multiplying by square root of time
runSD(x, n=30) * sqrt(252)
})
#`apply` returns a `matrix`; `reclass` to `xts`
vols.xts <- reclass(vols.mat, r) #class as `xts` using attributes of `r`
tail(vols.xts)
# SPY.Adjusted DIA.Adjusted
#2012-06-22 0.1775730 0.1608266
#2012-06-25 0.1832145 0.1640912
#2012-06-26 0.1813581 0.1621459
#2012-06-27 0.1825636 0.1629997
#2012-06-28 0.1824120 0.1630481
#2012-06-29 0.1898351 0.1689990
#Clean-up
unlink(tmpfile)

Resources