How to loop through table based on unique date in MATLAB - arrays

I have this table named BondData which contains the following:
Settlement Maturity Price Coupon
8/27/2016 1/12/2017 106.901 9.250
8/27/2019 1/27/2017 104.79 7.000
8/28/2016 3/30/2017 106.144 7.500
8/28/2016 4/27/2017 105.847 7.000
8/29/2016 9/4/2017 110.779 9.125
For each day in this table, I am about to perform a certain task which is to assign several values to a variable and perform necessary computations. The logic is like:
do while Settlement is the same
m_settle=current_row_settlement_value
m_maturity=current_row_maturity_value
and so on...
my_computation_here...
end
It's like I wanted to loop through my settlement dates and perform task for as long as the date is the same.
EDIT: Just to clarify my issue, I am implementing Yield Curve fitting using Nelson-Siegel and Svensson models.Here are my codes so far:
function NS_SV_Models()
load bondsdata
BondData=table(Settlement,Maturity,Price,Coupon);
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Bonds.Settle = Settlements(k); % current_row_settlement_value
Bonds.Maturity = BondData.Maturity(rows); % current_row_maturity_value
Bonds.Prices=BondData.Price(rows);
Bonds.Coupon=BondData.Coupon(rows);
Settle = Bonds.Settle;
Maturity = Bonds.Maturity;
CleanPrice = Bonds.Prices;
CouponRate = Bonds.Coupon;
Instruments = [Settle Maturity CleanPrice CouponRate];
Yield = bndyield(CleanPrice,CouponRate,Settle,Maturity);
NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
NSModel.Parameters
SVModel.Parameters
end
end
Again, my main objective is to get each model's parameters (beta0, beta1, beta2, etc.) on a per day basis. I am getting an error in Instruments = [Settle Maturity CleanPrice CouponRate]; because Settle contains only one record (8/27/2016), it's suppose to have two since there are two rows for this date. Also, I noticed that Maturity, CleanPrice and CouponRate contains all records. They should only contain respective data for each day.
Hope I made my issue clearer now. By the way, I am using MATLAB R2015a.

Use categorical array. Here is your function (without its' headline, and all rows I can't run are commented):
BondData = table(datetime(Settlement),datetime(Maturity),Price,Coupon,...
'VariableNames',{'Settlement','Maturity','Price','Coupon'});
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Settle = BondData.Settlement(rows); % current_row_settlement_value
Mature = BondData.Maturity(rows); % current_row_maturity_value
CleanPrice = BondData.Price(rows);
CouponRate = BondData.Coupon(rows);
Instruments = [datenum(char(Settle)) datenum(char(Mature))...
CleanPrice CouponRate];
% Yield = bndyield(CleanPrice,CouponRate,Settle,Mature);
%
% NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
% SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
%
% NSModel.Parameters
% SVModel.Parameters
end
Keep in mind the following:
You cannot concat different types of variables as you try to do in: Instruments = [Settle Maturity CleanPrice CouponRate];
There is no need in the structure Bond, you don't use it (e.g. Settle = Bonds.Settle;).
Use the relevant functions to convert between a datetime object and string or numbers. For instance, in the code above: datenum(char(Settle)). I don't know what kind of input you need to pass to the following functions.

Related

How do I delete entries from two two-column tables such that their second columns match within a certain error

So if I have two arrays in matlab. Let's call them locations1 and locations2
locations1
1123.44977625437 890.824688325172
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2050.70084480064 1214.45353443990
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
5676.10649593031 4529.23023993815
locations2
7474.22619378039 3166.41503120846
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
And I want to make it so that any two entries of the second columns that match each other within 100.0 remain while any entry that has no match will get removed. So I want the output to look like
locations1
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
locations2
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
How would I do this? Preferably without loops. Here is what I've done, but it has loops
locround1=round(locations1/50)*50;
locround2=round(locations2/50)*50;
for i=1:size(locations1,1)
nodel1(i)=sum(locround1(i,2)== locround2(:,2))
end
nodel1=repmat(nodel1>0,[2,1]);
nodel1=nodel1';
locations1=nodel1.*locations1;
locations1( ~any(locations1,2), : ) = [];
for i=1:size(locations2,1)
nodel2(i)=sum(locround2(i,2)== locround1(:,2))
end
nodel2=repmat(nodel2>0,[2,1]);
nodel2=nodel2';
locations2=nodel2.*locations2;
locations2( ~any(locations2,2), : ) = [];
This is what I got. If your MATLAB version has set operators, you can do it with the following codes:
Li1 = ismembertol(locations1(:,2),locations2(:,2),100, 'DataScale', 1);
locations1_new = locations1 (Li1,:);
Li2 = ismembertol(locations2(:,2),locations1(:,2),100, 'DataScale', 1);
locations2_new = locations2 (Li2,:);
I tested it, it works.
Let the data be defined as
locations1 = [
1123.44977625437 890.824688325172
1290.31273560851 5065.65794385883
1718.10632735926 2563.44895531365
1734.55379433782 4408.20631924691
2050.70084480064 1214.45353443990
2299.46239346717 3781.34694047196
4186.02801290113 4386.67818566045
5676.10649593031 4529.23023993815
];
locations2 = [
7474.22619378039 3166.41503120846
8604.40241305284 5069.40744277799
9048.25231808890 2563.58997620248
9059.71923042408 4381.75034710351
9643.05902166767 3796.42822996919
11460.8617087264 4392.85930695209
];
threshold = 100;
Then:
m = abs(locations1(:,2)-locations2(:,2).')<=threshold;
result1 = locations1(any(m,2),:);
result2 = locations2(any(m,1),:);
How this works:
The first line computes a matrix with the distance between each value from the second column of locations1 and each value from the second column of locations2. The distances are then compared with threshold, so that the matrix entries become true or false.
This makes use of implicit expansion, introduced in R2016b. For Matlab versions before that, use bsxfun as follows:
m = abs(bsxfun(#minus, locations1(:,2), locations2(:,2).'))<=threshold;
Each row of the computed matrix, m, corresponds to a value from locations1; and each column corresponds to a value from locations2.
The second line uses logical indexing to select the rows of location1 that satisfy the criterion for some value of location2.
Similarly, the third line selects the rows of location2 that satisfy the criterion for some value of location1.

How to speed up iteration through array in ruby

I have multiple csv files that have the name and the price of products. There may be or may not be products that are in both files. I have to find the highest and the lowest price across these files for each product.
I joined products from both files into one array:
Dir["./*.csv"].each do |file|
CSV.foreach(file, headers:true) do |row|
tmpRow = row.to_s.chomp + "," + file #saving name of the input file
list.push(tmpRow.chomp.split(","))
end
end
The array list looks like this:
[["5893105","2.38", "weightOrSomethingIrrelevant", "./FIAT_2.csv"]]
This is the main algorithm:
while list[0] do
if list[0] != nil
tmpPart = list[0][0]
tmpParts = list.select{ |part, price| part == tmpPart}
tmpParts.each do |tp|
tmpPrices.push(tp[1])
end
list[0][2].to_f != 0.0 ? tmpWeight = list[0][2].to_s : tmpWeight = "Undefined"
tmpMaxPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.max}
tmpMinPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.min}
result.push([tmpPart, tmpWeight, tmpPrices.max, tmpMaxPrice[0].last, tmpPrices.min, tmpMinPrice[0].last)
tmpPart = ""
list = list - tmpParts
tmpParts = []
tmpPrices = []
tmpMaxPrice = []
tmpMinPrice = []
tmpWeight = ""
end
end
The input files are huge (over 200 000 rows), so I am having problems with efficiency of my algorithm (as it processes one row in half a second).
I am wondering if there is any better way to write this app.
I would split this into several parts:
1) I suggest you have a table which represents files (the file name, location, line number etc) and connected to that a product table (the row data from that file)
2) script / function to ingest files and store rows as DB records
3) script / function to analyse rows and find products by name, using the DB and pulling price info out using Min/max.
This could later be improved to deal with naming inconsistencies products vs product occurrences etc.

Matlab string manipulation

I need help with matlab using 'strtok' to find an ID in a text file and then read in or manipulate the rest of the row that is contained where that ID is. I also need this function to find (using strtok preferably) all occurrences of that same ID and group them in some way so that I can find averages. On to the sample code:
ID list being input:
(This is the KOIName variable)
010447529
010468501
010481335
010529637
010603247......etc.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think [1,2,3],[1,2,3], the main difference is the values trailing the ID which I need to average out in matlab.
My current code is:
function Avg_Koi
N = evalin('base', 'KOIName');
file_1 = evalin('base', 'StarData');
global result;
for i=1:size(N)
[id, values] = strtok(file_1);
result = result(id);
result = result(values)
end
end
Thanks for any assistance.
You let us guess a lot, so I guess you want something like this:
load StarData.txt
IDs = { 010447529;
010468501;
010481335;
010529637;
010603247;
002141865}
L = numel(IDs);
values = cell(L,1);
% Iteration through all arrays and creating an cell array with matrices for every ID
for ii=1:L;
ID = IDs{ii};
ID_first = find(StarData(:,1) == ID,1,'first');
ID_last = find(StarData(:,1) == ID,1,'last');
values{ii} = StarData( ID_first:ID_last , 2:end );
end
When you now access the index ii=6 adressing the ID = 002141865
MatrixOfCertainID6 = values{6};
you get:
0.0038671440 742 0.001121 16.155089 6.2974940 0.001677
0.0054292780 1940 0.000477 16.583748 11.945627 0.001622
0.0043607150 1897 0.000667 16.863406 13.438383 0.001460
0.0039724670 2127 0.000459 16.103060 21.966853 0.001196
0.0085429320 2094 0.000421 17.452007 18.067214 0.002490
... for further calculations.

How to improve ANN results by reducing error through hidden layer size, through MSE, or by using while loop?

This is my source code and I want to reduce the possible errors. When running this code there is a lot of difference between trained output to target. I have tried different ways but didn't work so please help me reducing it.
a=[31 9333 2000;31 9500 1500;31 9700 2300;31 9700 2320;31 9120 2230;31 9830 2420;31 9300 2900;31 9400 2500]'
g=[35000;23000;3443;2343;1244;9483;4638;4739]'
h=[31 9333 2000]'
inputs =(a);
targets =[g];
% Create a Fitting Network
hiddenLayerSize = 1;
net = fitnet(hiddenLayerSize);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'dividerand'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression','plotconfusion' 'plotfit','plotroc'};
% Train the Network
[net,tr] = train(net,inputs,targets);
plottrainstate(tr)
% Test the Network
outputs = net(inputs)
errors = gsubtract(targets,outputs)
fprintf('errors = %4.3f\t',errors);
performance = perform(net,targets,outputs);
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
valPerformance = perform(net,valTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
% View the Network
view(net);
sc=sim(net,h)
I think you need to be more specific.
What is the performance like on your training set and on your test set?
Have you tried doing any regularization?

How to update a Mnesia table in Erlang

I have a little problem with my code. I have a table containing car details, name, price and quantity, so I am trying to create a function called buy which will be used to buy a specific car. When a user buys eg 5 BMW cars, they will call buy_car(bmw,5). Now after this I want to update the new value of quantity for BMW cars.
My attempt is below but I can't seem to work around it, I am new to Erlang.
buy_car(X,Ncars) ->
F = fun() ->
%% ----first i find the number of car X available in the shop
[Xcars] = mnesia:read({car,X}),
Nc = Xcars#car.quantity,
Leftcars = Xcars#car{quantity = Nc - Ncars},
%% ---now we update the database
mnesia:write(Leftcars),
end,
mnesia:transaction(F).
Please help me with how I can write a function that buys cars from the shop.
But your implementation works fine except you added illegal comma after mnesia:write(Leftcars).
Here is code that works (I tried your implementation as buy_car2).
-module(q).
-export([setup/0, buy_car/2, buy_car2/2]).
-record(car, {brand, quantity}).
setup() ->
mnesia:start(),
mnesia:create_table(car, [{attributes, record_info(fields, car)}]),
mnesia:transaction(fun() -> mnesia:write(#car{brand=bmw, quantity=1000}) end).
buy_car(Brand, Ncars) ->
F = fun() ->
[Car] = mnesia:read(car, Brand), % crash if the car is missing
mnesia:write(Car#car{quantity = Car#car.quantity - Ncars})
end,
mnesia:transaction(F).
buy_car2(X,Ncars) ->
F = fun() ->
%% ----first i find the number of car X available in the shop
[Xcars] = mnesia:read({car,X}),
Nc = Xcars#car.quantity,
Leftcars = Xcars#car{quantity = Nc - Ncars},
%% ---now we update the database
mnesia:write(Leftcars)
end,
mnesia:transaction(F).
I would do something like below:
Considering the record is defined as :
-record(car_record, {car, quantity}).
The following function will update the data:
buy_car(X,NCars) ->
Row = #car_record{car = X, quantity = NCars}.
mnesia:ets(fun()-> mnesia:dirty_write(Row) end),
mnesia:change_table_copy_type(guiding_data, node(), disc_copies).
To use the above method, mnesia table must be created as "ram_copies" and with no replication nodes. Also, if there are lot of updates happening, you might not want to copy the ram_copies to disk for every update due to performance issues, rather you will do it in time triggered manner.

Resources