Solving multi-armed bandit problems with continuous action space

Solving multi-armed bandit problems with continuous action space - artificial-intelligence

My problem has a single state and an infinite amount of actions on a certain interval (0,1). After quite some time of googling I found a few paper about an algorithm called zooming algorithm which can solve problems with a continous action space. However my implementation is bad at exploiting. Therefore I'm thinking about adding an epsilon-greedy kind of behavior.
Is it reasonable to combine different methods?
Do you know other approaches to my problem?
Code samples:
import portion as P
def choose_action(self, i_ph):
# Activation rule
not_covered = P.closed(lower=0, upper=1)
for arm in self.active_arms:
confidence_radius = calc_confidence_radius(i_ph, arm)
confidence_interval = P.closed(arm.norm_value - confidence_radius, arm.norm_value + confidence_radius)
not_covered = not_covered - confidence_interval
if not_covered != P.empty():
rans = []
height = 0
heights = []
for i in not_covered:
rans.append(np.random.uniform(i.lower, i.upper))
height += i.upper - i.lower
heights.append(i.upper - i.lower)
ran_n = np.random.uniform(0, height)
j = 0
ran = 0
for i in range(len(heights)):
if j < ran_n < j + heights[i]:
ran = rans[i]
j += heights[i]
self.active_arms.append(Arm(len(self.active_arms), ran * (self.sigma_square - lower) + lower, ran))
# Selection rule
max_index = float('-inf')
max_index_arm = None
for arm in self.active_arms:
confidence_radius = calc_confidence_radius(i_ph, arm)
# indexfunction from zooming algorithm
index = arm.avg_learning_reward + 2 * confidence_radius
if index > max_index:
max_index = index
max_index_arm = arm
action = max_index_arm.value
self.current_arm = max_index_arm
return action
def learn(self, action, reward):
arm = self.current_arm
arm.avg_reward = (arm.pulled * arm.avg_reward + reward) / (arm.pulled + 1)
if reward > self.max_profit:
self.max_profit = reward
elif reward < self.min_profit:
self.min_profit = reward
# normalize reward to [0, 1]
high = 100
low = -75
if reward >= high:
reward = 1
self.high_count += 1
elif reward <= low:
reward = 0
self.low_count += 1
else:
reward = (reward - low)/(high - low)
arm.avg_learning_reward = (arm.pulled * arm.avg_learning_reward + reward) / (arm.pulled + 1)
arm.pulled += 1
# zooming algorithm confidence radius
def calc_confidence_radius(i_ph, arm: Arm):
return math.sqrt((8 * i_ph)/(1 + arm.pulled))

You may find this useful, full algorithm description is here. They grid out the probes uniformly, informing this choice (e.g. normal centering on a reputed high energy arm) is also possible (but this might invalidate a few bounds I am not sure).

Related

Trying to find the mean and variance of a certain section of an array for a given number

I have about 30,000 data points but want to find the mean and variance for a certain angle of attack and chart each into their own array but it just won't seem to work
clear all
clc
load('windflies.mat')
angleofattacks = [0 1.008 2.016 3.024 4.032 5.04 6.048 7.056 8.064 9.072 10.08 11.088 12.096 13.104 14.112 15.12 16.128 17.136 18.144 19.152 20.16];
WOPFast;
[rows,cols] = size(WOPFast);
[r1,c2] = size(angleofattacks);
count = 0;
xc=1;
forcexn = zeros(21,1);
total = 0;
x=[];
for c = 1:21
for r = 1:rows
if WOPFast(r,2) == angleofattacks(1,c)
x(r,xc)= WOPFast(r,3);
end
end
count = count + 1;
forcexn(count,xc)= mean(x);
xc =xc +1;
end

Restricted boltzmann machine - array

I'm doing a college assignment about RBM restricted Boltzmann machines. but this code error.I am confused how to get this code working. can anybody help me about this error?
def reconstruct(self, v):
h = sigmoid(np.dot(v, self.W) + self.hbias)
reconstructed_v = sigmoid(np.dot(h, self.W.T) + self.vbias)
return reconstructed_v
def test_rbm(learning_rate=0.1, k=1, training_epochs=10):
data = datainput
rng = np.random.RandomState(123)
# construct RBM
rbm = RBM(input=data, n_visible=40, n_hidden=20, np_rng=rng)
# train
for epoch in range(training_epochs):
rbm.contrastive_divergence(lr=learning_rate, k=k)
cost = rbm.get_reconstruction_cross_entropy()
print ('Training epoch %d, cost is ' % epoch, cost, file = sys.stderr)
# test
v = datatarget
print (rbm.reconstruct(v))
if __name__ == "__main__":
test_rbm()
ValueError: shapes (1979,1) and (40,20) not aligned: 1 (dim 1) != 40 (dim 0)
at run time print(rbm.reconstruct(v)) error line in h = sigmoid(np.dot(v, self.W) + self.hbias)

Echo state prediction laziness

I am trying to use PyBrain for time series prediction by implementing this solution. The others produce large offsets. The problem is that although I have tried changing the learning rate, momentum, max training epochs, continue epochs, neuron amount (1-500), and activation function, the result is always flat. What might the solution be?
Blue: original. Green: network's prediction.
INPUTS = 60
HIDDEN = 60
OUTPUTS = 1
def build_network():
net = buildNetwork(INPUTS, HIDDEN, OUTPUTS,
hiddenclass=LSTMLayer,
outclass=LinearLayer,
recurrent=True,
bias=True, outputbias=False)
net.sortModules()
return net
def prepare_datasets(data, training_data_ratio):
training_data, validation_data = split_list(data, training_data_ratio)
training_set = SequentialDataSet(INPUTS, OUTPUTS)
for i in range(len(training_data) - INPUTS - 1):
training_set.newSequence()
tr_inputs = training_data[i:i + INPUTS]
tr_output = training_data[i + INPUTS]
training_set.addSample(tr_inputs, tr_output)
validation_set = []
for i in range(len(validation_data) - INPUTS - 1):
validation_set.append(validation_data[i:i + INPUTS])
return training_set, validation_set
def train_network(net, data, max_iterations):
net.randomize()
learning_rate = 0.1
trainer = BackpropTrainer(net, data, verbose=True,
momentum=0.8,
learningrate=learning_rate)
errors = trainer.trainUntilConvergence(maxEpochs=max_iterations, continueEpochs=10)
return errors
def try_network(net, data):
outputs = []
for item in data:
output = net.activate(item)[0]
outputs.append(output)
return outputs

Normalize data:
data = data / max(data)

DICOM dimensions in matlab array (all frames end up in last dimension of array)

In one of my GUIs I load DICOM images. Sometimes they are just a volume and another dimension and when I load them in Matlab everything ends up where I want it.
handles.inf = dicominfo([filepath filename]);
handles.dat = dicomread(handles.inf);
size(handles.dat)
ans = 128 128 128 512
For an 128 by 128 by 128 volume at 512 timepoints for example (actually the third dimension would not even be 128, the third dimension is stacks, of which I don't know what it is). However sometimes There are more dimensions in the dicom, but the reader just puts all of them in the fourth dimension.
handles.inf = dicominfo([filepath filename]);
handles.dat = dicomread(handles.inf);
size(handles.dat)
ans = 128 128 1 4082
For a single 128 by 128 slice with 512 timepoints, two echoes and magnitude, phase, real and imaginary data for example.
It is then very hard to unscramble them. Manually I can do this for every DICOM I load but when in a GUI I would want to have a general approach that just creates a dimension in the array for each dimension in the dicom.
This is especially important not just for data-analysis, but also to transform the coordinates from image space to patient space. My own approach was to look at the header, but there's no guarantee that certain entries will work, and the order in which they are applied I can't find. The header entries I found so far:
inf.Rows;%inf.width;%inf.MRAcquisitionFrequencyEncodingSteps;%inf.MRAcquisitionPhaseEncodingStepsInPlane
inf.Columns;% inf.height; % inf.NumberOfKSpaceTrajectories;
inf.MRSeriesNrOfSlices
inf.MRSeriesNrOfEchoes
inf.MRSeriesNrOfDynamicScans
inf.MRSeriesNrOfPhases
inf.MRSeriesReconstructionNumber % not sure about this one
inf.MRSeriesNrOfDiffBValues
inf.MRSeriesNrOfDiffGradOrients
inf.MRSeriesNrOfLabelTypes
reshapeddat = reshape(dat, [all dimension sizes from header here]);
I'm not sure how to check if I've got all variables and what the right order for the reshape. Anybody knows of a sure-fire way to get all dimension sizes from the DICOM header and the order in which they are stacked?

Ok so I now manually go by all possible dimensions. When a stack also contains reconstructed data which has less dimensions than the rest, remove those first.
This is how I check the dimensions:
info = dicominfo(filename);
datorig = dicomread(filename);
%dimension sizes per frame
nrX = double(info.Rows); %similar nX;% info.width;% info.MRAcquisitionFrequencyEncodingSteps;% info.MRAcquisitionPhaseEncodingStepsInPlane
nrY = double(info.Columns); %similar nY;% info.height;% info.NumberOfKSpaceTrajectories;
%dimensions between frames
nrEcho = double(info.MRSeriesNrOfEchoes);
nrDyn = double(info.MRSeriesNrOfDynamicScans);
nrPhase = double(info.MRSeriesNrOfPhases);
nrSlice = double(info.MRSeriesNrOfSlices); %no per frame struct entry, calculate from offset.
%nr of frames
nrFrame = double(info.NumberOfFrames);
nrSeq = 1; % nSeq not sure how to interpret this, wheres the per frame struct entry?
nrBval = double(info.MRSeriesNrOfDiffBValues); % nB
nrGrad = double(info.MRSeriesNrOfDiffGradOrients); % info.MRSeriesNrOfPhaseContrastDirctns;%similar nGrad?
nrASL = 1; % info.MRSeriesNrOfLabelTypes;%per frame struct entry?
imtype = cell(1, nrFrame);
for ii = 1:nrFrame
%imtype(ii) = eval(sprintf('info.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.MRImageTypeMR', ii));
imtype{ii} = num2str(eval(sprintf('info.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.MRImageTypeMR', ii)));
end
imType = unique(imtype, 'stable');
nrType = length(imType);
This is how I reformat the dimensions:
%% count length of same dimension positions from start
if nrEcho > 1
for ii = 1:nrFrame
imecno(ii) = eval(sprintf('inf.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.EchoNumber', ii));
end
lenEcho = find(imecno ~= imecno(1), 1, 'first') - 1;
else
lenEcho = nrFrame;
end
if nrDyn > 1
for ii = 1:nrFrame
imdynno(ii) = eval(sprintf('inf.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.TemporalPositionIdentifier', ii));
end
lenDyn = find(imdynno ~= imdynno(1), 1, 'first') - 1;
else
lenDyn = nrFrame;
end
if nrPhase > 1
for ii = 1:nrFrame
imphno(ii) = eval(sprintf('inf.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.MRImagePhaseNumber', ii));
end
lenPhase = find(imphno~=imphno(1), 1, 'first') - 1;
else
lenPhase = nrFrame;
end
if nrType > 1
q = 1;
imtyno(1, 1) = q;
for ii = 2:nrFrame
if imtype{:, ii-1} ~= imtype{:, (ii)}
q = q+1;
end
imtyno(1, ii) = q;
%for jj = 1:nrType
%if imtype{:,ii} == imType{:,jj}
% imtyno(1, ii) = jj;
%end
%end
end
if q ~= nrType
nrType = q;
end
lenType = find(imtyno ~= imtyno(1), 1, 'first') - 1;
else
lenType = nrFrame;
end
% slices are not numbered per frame, so get this indirectly from location
% currently not sure if working for all angulations
for ii = 1:nrFrame
imslice(:,ii) = -eval(['inf.PerFrameFunctionalGroupsSequence.Item_',sprintf('%d', ii),'.PlanePositionSequence.Item_1.ImagePositionPatient']);
end
% stdsl = std(imslice,[],2); --> Assumption
% dirsl = max(find(stdsl == max(stdsl)));
imslices = unique(imslice', 'rows')';
if nrSlice > 1
for ii = 1:nrFrame
for jj = 1:nrSlice
if imslice(:,ii) == imslices(:,nrSlice - (jj-1)); %dirsl or :?
imslno(1, ii) = jj;
end
end
end
lenSlice = find(imslno~=imslno(1), 1, 'first')-1;
else
lenSlice = nrFrame;
end
if nrBval > 1
for ii = 1:nrFrame
imbno(ii) = eval(sprintf('inf.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.MRImageDiffBValueNumber', ii));
end
lenBval = find(imbno~=imbno(1), 1, 'first') - 1;
else
lenBval = nrFrame;
end
if nrGrad > 1
for ii = 1:nrFrame
imgradno(ii) = eval(sprintf('inf.PerFrameFunctionalGroupsSequence.Item_%d.PrivatePerFrameSq.Item_1.MRImageGradientOrientationNumber', ii));
end
lenGrad = find(imgradno~=imgradno(1), 1, 'first')-1;
else
lenGrad = inf.NumberOfFrames;
end
lenSeq = nrFrame; % dont know how to get this information per frame, in my case always one
lenASL = nrFrame; % dont know how to get this information per frame, in my case always one
%this would have been the goal format
goaldim = [nrSlice nrEcho nrDyn nrPhase nrType nrSeq nrBval nrGrad nrASL]; % what we want
goallen = [lenSlice lenEcho lenDyn lenPhase lenType lenSeq lenBval lenGrad lenASL]; % what they are
[~, permIX] = sort(goallen);
dicomdim = zeros(1, 9);
for ii = 1:9
dicomdim(1, ii) = goaldim(permIX(ii));
end
dicomdim = [nrX nrY dicomdim];
%for any possible zero dimensions from header use a 1 instead
dicomdim(find(dicomdim == 0)) = 1;
newdat = reshape(dat, dicomdim);
newdim = size(newdat);
newnonzero = length(newdim(3:end));
goalnonzero = permIX(1:newnonzero);
[dummyy, goalIX] = sort(goalnonzero);
goalIX = [1 2 goalIX+2];
newdat = permute(newdat, goalIX);
newdat = reshape(newdat, [nrX nrY goaldim]);
When Ive used this as a function for a longer period and debugged it a bit I might post in on the file exchange of mathworks.

Appending Int to an Array Matlab

I am using an API to get real data times of trains and am trying to get the closest train time to a user entered time and then display that train time, and the next 4 granted the trains are running. I am reading in the information and the code goes through what its supposed to do but when I look at the array its a bunch of [] brackets in 7 cells instead of the calculated numbers. Any suggestions? Code is below with the API
TEST VALUES:
requestStationSelected = 'University%20City' and requestEndStation = 'Roslyn'
%this is the API link for the live data from Septa this will get 30
%results and see which time is closer to the user entered time
requestInfoSeptaLive = ['http://www3.septa.org/hackathon/NextToArrive/' requestStationSelected '/' requestEndStation '/30'];
%Again tries to get the information and if there is a failure it will give
%a probable cause and terminate the program
try
getInfoSeptaLive = urlread(requestInfoSeptaLive);
catch
if getInfoSeptaLive ~= '[]'
disp...
('Either the arrival/depart stations dont quite match up or theres a server error. Try again.');
return;
else
disp('Unable to fetch the information from Septa, please try again')
return;
end
end
%parses the information returned from the Live API
dataReturnedFromLiveAPI = parse_json(getInfoSeptaLive);
dataReturnedFromLiveAPI = dataReturnedFromLiveAPI{1};
%gets the size of the API in case there are no trains running
sizeOfDataNoTrains = size(dataReturnedFromLiveAPI, 1);
sizeOfData = size(dataReturnedFromLiveAPI, 2);
counter = 0;
for i = 1:sizeOfData
scanForClosestTime = dataReturnedFromLiveAPI{1,i}.orig_departure_time;
trainTimeGivenH = sscanf(scanForClosestTime, '%i');
findColonTrain = strfind(scanForClosestTime, ':');
trainTimeGivenMStr = scanForClosestTime(findColonTrain+1:4);
trainTimeGivenM = int32(str2num(trainTimeGivenMStr));
trainDepartTimeM = (trainTimeGivenH(1,1) * 60) + (trainTimeGivenM);
differenceBetweenTimes = trainDepartTimeM - userEnteredMins;
if trainDepartTimeM < userEnteredMins
differenceBetweenTimes = userEnteredMins - trainDepartTimeM;
end
stopAtEndOfData = sizeOfData;
goodTimeFrame = 60;
closestTime = cell(1, stopAtEndOfData);
storeTheDifference = cell(1, stopAtEndOfData);
if(differenceBetweenTimes < 60)
if (counter < 5)
closestTime{i} = scanForClosestTime;
storeTheDifference{i} = differenceBetweenTimes;
counter = counter + 1;
end
end
end

You assign your cell arrays inside the for loop:
for i = 1:sizeOfData
...
closestTime = cell(1, stopAtEndOfData);
storeTheDifference = cell(1, stopAtEndOfData);
...
end
This means that you turn both of them into an array of {[],[],[],[],[]...} on every iteration of the loop - so unless the last iteration has a valid "closest Time" in it, your cell array will be all empty arrays - and if it does, all but the last element will still be [].
To fix this, move the two lines to before the start of the for loop.
The second problem seems to be the indexing of the arrays where you store the results. If you only want five results, I am assuming you want to store them in elements 1 - 5 of your array, and not in "just any" locations. I would change the code to
if (counter < 5)
counter = counter + 1;
closestTime{counter} = scanForClosestTime;
storeTheDifference{counter} = differenceBetweenTimes;
end
But maybe I misinterpreted how you want to handle that?
Unrelated to your question, you might want to take a look at the line
trainTimeGivenMStr = scanForClosestTime(findColonTrain+1:4);
It is quite possible that this is not what you intended to do - looking at an example of the response, I found the string "orig_departure_time":"11:57PM". I expect that findColonTrain == 3, so that the above line becomes
trainTimeGivenMStr = scanForClosestTime(4:4);
just a single character. Perhaps you meant
trainTimeGivenMStr = scanForClosestTime(findColonTrain+(1:4));
which would turn into
trainTimeGivenMStr = scanForClosestTime(4:7);
so that
trainTimeGivenMStr = '57PM';
I hope these three things help you get it all working!
EDIT: had a chance to run your code this morning - discovered a number of other problems. I include below an annotated "working" code: the biggest problem was most likely that you were not handling AM/PM in your code. Note that I used a different json parser - this changed a couple of lines very slightly. I'm sure you can put it back together to work the way you want. This returned valid data in all cells.
dataReturnedFromLiveAPI = loadjson(getInfoSeptaLive);
% next line not needed - loadjson returns struct array, not cell array
%dataReturnedFromLiveAPI = dataReturnedFromLiveAPI{1};
%gets the size of the API in case there are no trains running
sizeOfDataNoTrains = size(dataReturnedFromLiveAPI, 1);
sizeOfData = size(dataReturnedFromLiveAPI, 2);
counter = 0;
stopAtEndOfData = sizeOfData;
closestTime = cell(1, stopAtEndOfData);
storeTheDifference = cell(1, stopAtEndOfData);
userEnteredMins = 12*60+30; % looking for a train around 12:30 pm
for ii = 1:sizeOfData
scanForClosestTime = dataReturnedFromLiveAPI(ii).orig_departure_time;
trainTimeGivenH = sscanf(scanForClosestTime, '%i');
% since we'll be considering AM/PM, have to set 12 = 0:
if (trainTimeGivenH == 12), trainTimeGivenH = 0; end
findColonTrain = strfind(scanForClosestTime, ':');
% change next line to get minutes plus AM/PM:
trainTimeGivenMStr = scanForClosestTime(findColonTrain+(1:4));
% look at just minutes:
trainTimeGivenM = int32(str2num(trainTimeGivenMStr(1:2)));
% adjust for AM/PM:
if(trainTimeGivenMStr(3:4)=='PM'), trainTimeGivenH = trainTimeGivenH+12; end;
% compute time in minutes:
trainDepartTimeM = (trainTimeGivenH * 60) + (trainTimeGivenM);
differenceBetweenTimes = trainDepartTimeM - userEnteredMins;
if trainDepartTimeM < userEnteredMins
differenceBetweenTimes = userEnteredMins - trainDepartTimeM;
end
% added a couple of lines to see what is happening:
fprintf(1, 'train %d: depart %s - in minutes this is %d vs user entered %d\n', ...
ii, scanForClosestTime, trainDepartTimeM, userEnteredMins);
goodTimeFrame = 60;
if(differenceBetweenTimes < 600)
if (counter < 10)
counter = counter + 1;
closestTime{counter} = scanForClosestTime;
storeTheDifference{counter} = differenceBetweenTimes;
end
end
end

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solving multi-armed bandit problems with continuous action space - artificial-intelligence

You may find this useful, full algorithm description is here. They grid out the probes uniformly, informing this choice (e.g. normal centering on a reputed high energy arm) is also possible (but this might invalidate a few bounds I am not sure).

Related

Trying to find the mean and variance of a certain section of an array for a given number

Restricted boltzmann machine - array

Echo state prediction laziness

DICOM dimensions in matlab array (all frames end up in last dimension of array)

Appending Int to an Array Matlab

Categories

Resources