Go slice not updating values outside for loop - arrays

I'm trying to code an Adaline neurone in Go an I'm having a problem with the scoop of an array, I update the values of it inside a for loop and it looks, like they are being updated, but when I try to access the new values from outside the loop they are always the same, they were just updated in the first iteration. Here is the code:
//This functions creates a [][]float64 and fill it with random numbers
weights := initWeights(inputLength)
// data is a [][]float 64 and expectedY a []float64
for i := 0; i < 10; i++ {
for j := range data {
//Calculate estimate
var estimate float64 = 0
for x := range data[j]{
estimate += data[j][x] * weights[x]
}
// Update weights (range passes values as a copy)
for x := 0; x < len(weights); x++ {
weights[x] = learningRate * (expectedY[j] - estimate) * data[j][x]
}
//PRINT #1
}
//PRINT #2
//
//Some more stuff
//
}
If I print weights before the loop it looks like this:
[-0.6046602879796196 0.6645600532184904 -0.4246374970712657 0.06563701921747622 0.09696951891448456 -0.5152126285020654 0.21426387258237492 0.31805817433032985 0.28303415118044517]
So it was created correctly. After I start the loops to adjust the neurone weights. Here is where the weird thing happens.
If I print in #1 I can see that the array is being updated in each iteration, but when I print in #2 the value of the array is always the same, it's the one was calculated on the first iteration of the weights loop.
PRINT #1
[0.06725611377611064 0 0 0.03490734755724929 0.014819026508554914 0.023919277971577904 0.021858582731470875 0.0051309928461725374 0.06915084698345737]
[0.030417970260300468 0.0274737201080031 0 0.02479712906046004 0.01662460439529523 0.014007493148808682 0.029246218179487176 0.004413401238393224 0.05947980105651245]
[0.008861875440076036 0 0.01792998206766924 0.017854161778140868 0.004333887749441702 0.020137868898735412 0.0125224790185058 0.008249247500686795 0.030328115811348512].
PRINT #2
[0.007796061340871362 0 0.011035383661848988 0.01289960904315235 0.003797667051516503 0.009918694200771232 0.015234505189042204 0.0008236738380263619 0.023072096303259435]
[0.007796061340871362 0 0.011035383661848988 0.01289960904315235 0.003797667051516503 0.009918694200771232 0.015234505189042204 0.0008236738380263619 0.023072096303259435]
[0.007796061340871362 0 0.011035383661848988 0.01289960904315235 0.003797667051516503 0.009918694200771232 0.015234505189042204 0.0008236738380263619 0.023072096303259435]
I've been struggling with this for the last two days and I couldn't figure out what's happening, I hope you guys can help me.
-- UPDATE --
Here is a more complete and runnable version https://play.golang.org/p/qyZGSJSKcs
In play, looks that the code is working fine... the exact same code in my computer outputs the exact same slice every iteration.
The only difference is that instead of fixed slices I'm creating them from two csv files with several hundreds of rows, so I'm guessing the problem comes from there, I'll continue investigating.
Here you have the raw data if it's helpfull:
Train data: https://pastebin.com/H3YgFF0a
Validate data: https://pastebin.com/aeK6krxD

Don't provide partial code with bits removed, just provide a runnable example - the process of doing this may well help you find the problem. Your initWeights function isn't really required for this purpose - better to use known start data.
Here is the incomplete code you have with data added. The algorithm seems to tend towards a certain set of results (presumably, with different data it might just get there quicker within 10 runs, I've upped the runs to 100).
https://play.golang.org/p/IqfCjNtd8a
Are you sure this is not working as intended? Do you have test data with expected results to test it with? I'd expect print 2 to always match print the last print 1, given the code you posted, but it was obviously incomplete.
[EDIT]
It's not clear this is a go code problem as opposed to results which surprise you from your algorithm/data.
You need to:
Provide code (you've now done this, but not code that shows the problem)
Reduce the code/data to the minimum which shows the error
Produce a test which demonstrates the surprising result with minimum data
If you can't reproduce with static data, show us how you load the data into the variables, because that is probably where your problem lies. Are you sure you are loading the data you expect and not loading lots of copies of one row for example? Are you sure the algorithm isn't working as intended (if so how)? Your descriptions of results don't match what you have shown us so far.

FOUND IT! It's such a silly thing.. the weights update process is accumulative
w(i+1) = w(i) + learningRate * (expected - estimated) * data[j][i]
so I just forgot to add the + to the weights assignment
weights[x] += learningRate * (expectedY[j] - estimate) * data[j][x]
Here is the complete snippet working properly:
for i := 0; i < cylces; i++ {
for j := range data {
//Calculate estimate
estimate = 0
for x := range data[j]{
estimate += data[j][x] * weights[x]
}
// Update weights (range passes values as a copy)
for x := 0; x < len(weights); x++ {
weights[x] += learningRate * (expectedY[j] - estimate) * data[j][x]
}
}
errorData = 0
for j := range data {
estimate = 0
for x := range data[j] {
estimate += data[j][x] * weights[x]
}
errorData += (expectedY[j] - estimate) * (expectedY[j] - estimate)
}
errorsCyles = append(errorsCyles, errorData / float64(len(data)))
}

Related

Initiate for loop

I have the following question:
I am building a model when I first test for stationarity. Then I have an if loop, saying:
if p>0.05:
x=y['boxcox']
else:
x=y['Normal']
If the pvalue is bigger than 0.05, then I do the boxcox transformation, if not, then I use my original values. This works.
I then have a large code, that is working.
However, in the end, I want to transform my values back.
Again with the if loop.
But how do I get the if loop started?
I first wanted to do:
if any (x==y['BoxCox']):
.....transform back
This works if I orginially have transformed my values, but not if I didn't, which makes sense, because the code does not know y['BoxCox'].
But how do I get the if loop initiated?
Thanks a lot!
If I understand your question correctly, you do not transform anything back, rather you remember the initial state. "Transforming back" sounds like a potential source of bugs. What if you alter your transformation algorithm and forget to update the transforming back part?
here is a simplified example, to illustrate my understanding of your problem:
x = 4
if x > 2:
x = x + 1
else:
x = x + 100
print("Result = ", x)
print("Initial value was ???") // you cannot tell what was the initial x
You can simply do not touch the initial values, to be accessible at any time:
x = 4
if x > 2:
result = x + 1
else:
result = x + 100
print("Result = ", result)
print("Initial value = , x)

Define a vector with random steps

I want to create an array that has incremental random steps, I've used this simple code.
t_inici=(0:10*rand:100);
The problem is that the random number keeps unchangable between steps. Is there any simple way to change the seed of the random number within each step?
If you have a set number of points, say nPts, then you could do the following
nPts = 10; % Could use 'randi' here for random number of points
lims = [0, 10] % Start and end points
x = rand(1, nPts); % Create random numbers
% Sort and scale x to fit your limits and be ordered
x = diff(lims) * ( sort(x) - min(x) ) / diff(minmax(x)) + lims(1)
This approach always includes your end point, which a 0:dx:10 approach would not necessarily.
If you had some maximum number of points, say nPtsMax, then you could do the following
nPtsMax = 1000; % Max number of points
lims = [0,10]; % Start and end points
% Could do 10* or any other multiplier as in your example in front of 'rand'
x = lims(1) + [0 cumsum(rand(1, nPtsMax))];
x(x > lims(2)) = []; % remove values above maximum limit
This approach may be slower, but is still fairly quick and better represents the behaviour in your question.
My first approach to this would be to generate N-2 samples, where N is the desired amount of samples randomly, sort them, and add the extrema:
N=50;
endpoint=100;
initpoint=0;
randsamples=sort(rand(1, N-2)*(endpoint-initpoint)+initpoint);
t_inici=[initpoint randsamples endpoint];
However not sure how "uniformly random" this is, as you are "faking" the last 2 data, to have the extrema included. This will somehow distort pure randomness (I think). If you are not necessarily interested on including the extrema, then just remove the last line and generate N points. That will make sure that they are indeed random (or as random as MATLAB can create them).
Here is an alternative solution with "uniformly random"
[initpoint,endpoint,coef]=deal(0,100,10);
t_inici(1)=initpoint;
while(t_inici(end)<endpoint)
t_inici(end+1)=t_inici(end)+rand()*coef;
end
t_inici(end)=[];
In my point of view, it fits your attempts well with unknown steps, start from 0, but not necessarily end at 100.
From your code it seems you want a uniformly random step that varies between each two entries. This implies that the number of entries that the vector will have is unknown in advance.
A way to do that is as follows. This is similar to Hunter Jiang's answer but adds entries in batches instead of one by one, in order to reduce the number of loop iterations.
Guess a number of required entries, n. Any value will do, but a large value will result in fewer iterations and will probably be more efficient.
Initiallize result to the first value.
Generate n entries and concatenate them to the (temporary) result.
See if the current entries are already too many.
If they are, cut as needed and output (final) result. Else go back to step 3.
Code:
lower_value = 0;
upper_value = 100;
step_scale = 10;
n = 5*(upper_value-lower_value)/step_scale*2; % STEP 1. The number 5 here is arbitrary.
% It's probably more efficient to err with too many than with too few
result = lower_value; % STEP 2
done = false;
while ~done
result = [result result(end)+cumsum(step_scale*rand(1,n))]; % STEP 3. Include
% n new entries
ind_final = find(result>upper_value,1)-1; % STEP 4. Index of first entry exceeding
% upper_value, if any
if ind_final % STEP 5. If non-empty, we're done
result = result(1:ind_final-1);
done = true;
end
end

Store values from a time series function in an array using a for loop in R

I am working with Bank of America time series data for stock prices. I am trying to store the forecasted value for a specific step ahead (in this case 1:20 steps) in an array. I then need to subtract each value of the array from each value of the test array. Then I have to square each value of the array, sum all the squared values of the array, then divide by N (N = number of steps forecasted ahead).
I have the following so far. Also, the quantmod and fpp libraries are needed for this.
---------Bank of America----------
library(quantmod)
library(fpp)
BAC = getSymbols('BAC',from='2009-01-02',to='2014-10-15',auto.assign=FALSE)
BAC.adj = BAC$BAC.Adjusted
BAC.daily=dailyReturn(BAC.adj,type='log')
test = tail(BAC.daily, n = 20)
train = head(BAC.daily, n = 1437)
Trying to write a function to forecast, extract requisite value (point forecast for time i), then store it in an array where I can perform operations on that array (i.e. - add, multiply, exponentiate, sum the values of the array)
MSE = function(N){
for(i in 1:(N)){
x = forecast(model1, h = i)
y = x$mean
w = as.matrix(as.double(as.matrix(unclass(y))))
p = array(test[i,]-w[i,])
}
}
and we also have:
model1 = Arima(train, order = c(0,2,0))
MSE = function(N){
result = vector("list", length = (N))
for(i in 1:(N)){
x = forecast(model1, h = i)
point_forecast = as.double(as.matrix(unclass(x$mean)))
result[i] = point_forecast
}
result = as.matrix(do.call(cbind, result))
}
Neither of these functions have worked so far. When I run the MSE function, I get the following errors:
> MSE(20)
There were 19 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
2: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
3: In result[i] = point_forecast :
number of items to replace is not a multiple of replacement length
4: In result[i] = point_forecast :
When I run MSE2 function, I get the following ouput:
MSE2(20)
[1] -0.15824
When putting a print statement inside, it printed out 'p' as a singular number, just like above (even though that had been run for i = 20). The x,y, and w variable in the MSE2 function act as vectors as far as storing the output, so I do not understand why p does not as well.
I appreciate any help in this matter, thank you.
Sincerely,
Mitchell Healy
Your question has two MSE functions: one in the first code block and one in the second code block.
Also, library(forecast) is needed to run Arima and forecast.
My understanding of what you are trying to do in the first paragraph is to compute the 20-step ahead forecast error. That is, what is the error in forecasts from model1 20 days ahead, based on your test data. This can be done in the code below:
model1 <- Arima(train, order = c(0,2,0))
y_fcst<-forecast(model1,h=20)$mean
errors<-as.vector(y_fcst)-as.vector(test)
MSE.fcst<-mean(errors^2)
However, I'm not sure what you're trying to do here: an ARIMA(0,2,0) model is simply modelling the differences in returns as a random walk. That is, this model just differences the returns twice and assumes this twice-differenced data is white noise. There's no parameters other than $\sigma^2$ being estimated.
Rob Hyndman has a blog post covering computing errors from rolling forecasts.
My solution to finding the MSE is below. I used log adjusted daily return data from Bank of America gathered through quantmod. Then I subsetted the data (which had length 1457) into training[1:1437] and testing[1438:1457].
The solution is:
forc = function(N){
forecast = matrix(data = NA, nrow = (N) )
for(i in 1:N){
fit = Arima(BAC.adj[(1+(i-1)):(1437+(i-1))], order = c(0,0,4))
x = forecast(fit, h = 1)
forecast[i,] = as.numeric(x$mean)
}
error = test - forecast
error_squared = error^2
sum_error_squared = sum(error_squared)
MSE = sum_error_squared/N
MSE
}

Effective way of calculating Correlation/Covariance Matrix row-wise

I have around 3000 files. Each file has a around 55000 rows/identifier and around ~100 columns. I need to calculate row-wise correlation or weighted covariance for each file (depending upon the number of columns in the file). The number of rows are same in all the files. I would like to know what is the most effective way to calculate the correlation matrix for each file ? I have tried Perl and C++ but it is taking a lot of time to process a file -- Perl takes 6 days, C takes more than a day. Typically, I don't want to take more than 15-20 minutes per file.
Now, I would like to know if I could process it faster using some trick or something. Here is my pseudo code:
while (using the file handler)
reading the file line by line
Storing the column values in hash1 where the key is the identifier
Storing the mean and ssxx (Sum of Squared Deviations of x to the mean) to the hash2 and hash3 respectively (I used hash of hashed in Perl) by calling the mean and ssxx function
end
close file handler
for loop traversing the hash (this is nested for loop as I need values of 2 different identifiers to calculate correlation coefficient)
calculate ssxxy by calling the ssxy function i.e. Sum of Squared Deviations of x and y to their mean
calculate correlation coefficient.
end
Now, I am calculating the correlation coefficient for a pair only once and I am not calculating the correlation coefficient for the same identifier. I have taken care of that using my nested for loop. Do you think if there is a way to calculate the correlation coefficient faster ? Any hints/advice would be great. Thanks!
EDIT1:
My Input File looks like this -- for the first 10 identifiers:
"Ident_01" 6453.07 8895.79 8145.31 6388.25 6779.12
"Ident_02" 449.803 367.757 302.633 318.037 331.55
"Ident_03" 16.4878 198.937 220.376 91.352 237.983
"Ident_04" 26.4878 398.937 130.376 92.352 177.983
"Ident_05" 36.4878 298.937 430.376 93.352 167.983
"Ident_06" 46.4878 498.937 560.376 94.352 157.983
"Ident_07" 56.4878 598.937 700.376 95.352 147.983
"Ident_08" 66.4878 698.937 990.376 96.352 137.983
"Ident_09" 76.4878 798.937 120.376 97.352 117.983
"Ident_10" 86.4878 898.937 450.376 98.352 127.983
EDIT2: here is snippet/subroutines or functions that I wrote in perl
## Pearson Correlation Coefficient
sub correlation {
my( $arr1, $arr2) = #_;
my $ssxy = ssxy( $arr1->{string}, $arr2->{string}, $arr1->{mean}, $arr2->{mean} );
my $cor = $ssxy / sqrt( $arr1->{ssxx} * $arr2->{ssxx} );
return $cor ;
}
## Mean
sub mean {
my $arr1 = shift;
my $mu_x = sum( #$arr1) /scalar(#$arr1);
return($mu_x);
}
## Sum of Squared Deviations of x to the mean i.e. ssxx
sub ssxx {
my ( $arr1, $mean_x ) = #_;
my $ssxx = 0;
## looping over all the samples
for( my $i = 0; $i < #$arr1; $i++ ){
$ssxx = $ssxx + ( $arr1->[$i] - $mean_x )**2;
}
return($ssxx);
}
## Sum of Squared Deviations of xy to the mean i.e. ssxy
sub ssxy {
my( $arr1, $arr2, $mean_x, $mean_y ) = #_;
my $ssxy = 0;
## looping over all the samples
for( my $i = 0; $i < #$arr1; $i++ ){
$ssxy = $ssxy + ( $arr1->[$i] - $mean_x ) * ( $arr2->[$i] - $mean_y );
}
return ($ssxy);
}
Have you searched CPAN? Method gsl_stats_correlation for computing Pearsons correlation. This one is in Math::GSL::Statisics. This module binds to the GNU Scientific Library.
gsl_stats_correlation($data1, $stride1, $data2, $stride2, $n) - This function efficiently computes the Pearson correlation coefficient between the array reference $data1 and $data2 which must both be of the same length $n. r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y) = {1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y) \over \sqrt{1/(n-1) \sum (x_i - \Hat x)^2} \sqrt{1/(n-1) \sum (y_i - \Hat y)^2} }
While minor improvements might be possible, I would suggest investing in learning PDL. The documentation on matrix operations may be useful.
#Sinan and #Praveen have the right idea for how to do this within perl. I would suggest that the overhead inherent in perl means you will never get the efficiency that you are looking for. I would suggest that you work on optimizing your C code.
First step would be to set the -O3 flag for maximum code optimization.
From there, I would change your ssxx code so that it subtracts the mean from each data point in place: x[i] -= mean. This means that you no longer need to subtract the mean in your ssxy code so that you do the subtraction once instead 55001 times.
I would check the disassembly to guarantee that the (x-mean)**2 is compiled to a multiplication, instead of 2^(2 * log(x - mean)), or just write it that way instead.
What sort of data structure are you using for your data? A double** with memory allocated for each row will lead to extra calls to (the slow function) malloc. Also, it is more likely to lead to memory thrashing with the allocated memory being located in different places. Ideally, you should have as few calls to malloc for as large as possible blocks of memory, and using pointer arithmetic to traverse the data.
More optimizations should be possible. If you post your code, I can make some suggestions.

How to change the values and do the functions Newton-Raphson

I need to iterate Newton-Raphson in MATLAB. It seems easy but I cannot figure out where I am wrong. The problem is:
For mmm=1:
1) If m=1 take c1=c1b and c2=1-c1 and do the loop for u1,2(i) and p1,2(i)
2)If m=2 take c1=c1+dc and c2=1-c1, and this time do the loop with new c1 and c2 for u1,2(i) and p1,2(i)
3) If m=3 take c1=(c1*st(1)-(c1-dc)*st(2))/(st(1)-st(2)) and do the loop for new c1 and c2.
Then increase the iteration number: mmm=2 ;
mmm keeps count of the number of N-R iterations. The first iteration has mmm=1, the second mmm=2, etc. (This particular run only do 2 iterations).
sumint are inside of the integrals.
I need to plot these figures in the code but MATLAB gives errors below. Please help me.
Relevant part of the code:
ii=101;
u = cell(2, 1);
ini_cond = [0,0];
for i = 1:2;
u{i} = zeros(1,ii);
u{i}(:, ii) = ini_cond(i) * rand(1, 1);
end
for i=1:ii;
fikness=fik*sin(pi.*x);
u{1}(i)=(c1-H1D*(x-0.5)+AD/2.*(x-0.5).^2)./(H1-0.5*fikness-A*(x-0.5));
u{2}(i)=(c2+H1D*(x-0.5)-AD/2.*(x-0.5).^2)./(1.-H1+0.5*fikness+A*(x-0.5));
end
p = cell(2, 1);
q = cell(2, 1);
for i = 1:2;
p{i} = zeros(1,ii);
q{i} = zeros(1,ii);
end
p{1}(1)=0.5*(1.-u{1}(1).^2);
q{1}(1)=0;
p{2}(1)=0.5*(1.-u{2}(1).^2);
q{2}(1)=0;
for i=2:101
q{1}(i)=q{1}(i-1)-dx*(u{1}(i-1)-ub{1}(i-1))./dt;
p{1}(i)=0.5*(1.-u{1}(i).^2)+q{1}(i);
q{2}(i)=q{2}(i-1)-dx*(u{2}(i-1)-ub{2}(i-1))./dt;
p{2}(i)=0.5*(1.-u{2}(i).^2)+q{2}(i);
end
st = zeros(2, length(t));
st(1,:)=p{1}(100)-p{2}(100);
m=m+1;
if m==3;
c1=(c1*st(1)-(c1-dc)*st(2))/(st(1)-st(2));
c2=1-c1;
end
for i = 1:2;
sumint{i} = zeros(1,length(t));
end
sumint = cell(2, 1);
sumint{1}(1)=0.5*(p{2}(1)-p{1}(1));
sumint{2}(1)=0.5*(p{2}(1)-p{1}(1)).*(-1/2);
for i=2:ii-1;
x=(i-1)*dx;
sumint{1}(i)=sumint{1}(i-1)+(p{2}(i)-p{1}(i));
sumint{2}(i)=sumint{2}(i-1)+(p{2}(i)-p{1}(i))*(x-1/2);
end
H1DDOT=-sumint{1}.*dx./rmass;
H1D=H1D+dt*H1DDOT;
H1=H1+dt*H1D;
ADDOT=sumint{2}*dx./rmomi;
AD=AD+dt*ADDOT;
A=A+dt*AD;
H1L=H1+A.*0.5;
H1R=H1-A.*0.5;
H2=1.-H1;
rat1=AD./ADinit;
rat2=ADDOT./AD;
u are the velocities p are the pressures c1,c2 are the camber effects H1DDOT and ADDOT are the second derivation of H1 and A. sum1 and sum2 are the inside of the integrals to define the values of H1DDOT and ADDOT. H1DDOT and ADDOT are functions of time.
As you can see from the message, the error is with this line:
sumint{2}(i)=sumint{2}(i-1)+(p{2}(i)-p{1}(i)).*(x-1/2);
Now, let's find out why:
sumint{2}(i) = ...
This part means you want to insert whatever is on the right side to the ith position of the array in cell sumint{2}. That means, the right side must be a scalar.
Is it?
Well, sumint{2}(i-1)+(p{2}(i)-p{1}(i)) is certainly a scalar, since you use a single value as index to all the vectors/arrays. The problem is the multiplication .*(x-1/2);.
From the code above, it is clear that x is a vector / array, (since you use length(x) etc). Multiplying the scalar sumint{2}(i-1)+(p{2}(i)-p{1}(i)), by the vector x, will give you back a vector, which as mentioned, will not work.
Maybe you want the ith value of x?
There are several other strange things going on in your code, for instance:
for i=1:101;
fikness=fik*sin(pi.*x);
u{1}=(c1-H1D*(x-0.5)+AD/2.*(x-0.5).^2)./(H1-0.5*fikness-A*(x-0.5));
u{2}=(c2+H1D*(x-0.5)-AD/2.*(x-0.5).^2)./(1.-H1+0.5*fikness+A*(x-0.5));
end
Why do you have a loop here? You are not looping anything, you are doing the same calculations 100 times. Again, I guess it should be x(i).
Update:
Your new error is introduced because you changed x into a scalar. Then u = zeros(1,length(x)) will only be a scalar, meaning u{1}(i) will fail for i ~= 1.

Resources