Neural network for linear regression - c

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}

Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Related

Integrating cos(x)/sqrt(x) between 0 and infinity to a user defined precision

So I'm trying to do what I've said above. The user will enter a precision, such as 3 decimal places, and then using the trapezium rule, the program will keep adding strips on until the 3rd decimal place is no longer changing, and then stop and print the answer.
I'm not sure of the best way to approach this. Due to the function being sinusoidal, one period of 2PI will almost be 0. I feel like this way would be the best way of approaching the problem, but no idea of how to go about it. At the moment I'm checking the y value for each x value to see when that becomes less than the required precision, however it never really goes lower enough. At x = 10 million, for example, y = -0.0002, which is still relatively large for such a large x value.
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z);
z += stripSize;
count++;
printf("%lf\n", func(z));
if(fabs(func(z))<lowestAddition/stripSize){
break;
}
}
So this above is what I'm trying to do currently. Where func is the function. The stripSize is set to 0.01, just something relatively small to make the areas of the trapeziums more accurate. sumFirstAndLast is the sum of the first and last values, set at 0.001 and 1000000. Just a small value and a large value.
As I mentioned, I "think" the best way to do this, would be to check the value of the integral over every 2PI, but once again not sure how to go about this. My current method gives me the correct answer if I take the precision part out, but as soon as I try to put a precision in, it gives a completely wrong answer.
For a non-periodic function that converges to zero you can (sort of) do a check of the function's value and compare to a minimum error value, but this doesn't work for a periodic function as you get an early exit before the integrand sum converges (as you've found out). For a non-periodic function you can simply check the change in the integrand sum on each iteration to a minimum error but that won't work here either.
Instead, you'll have to do like a few comments suggest to check for convergence relative to the period of the function, PI in this case (I found it works better than using 2*PI). To implement this do something like the following code (note I changed your sum to be the actual area instead of doing it at the end):
sumFirstAndLast = (0.5*func(a) + 0.5*func(b)) * stripSize;
double z = a + stripSize;
double CHECK_RANGE = 3.14159265359;
double NextCheck = CHECK_RANGE;
double LastCheckSum = 0;
double MinError = 0.0001;
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z) * stripSize;
if (z >= NextCheck)
{
if (fabs(LastCheckSum - sumFirstAndLast ) < MinError) break;
NextCheck += CheckRange;
LastCheckSum = sumFirstAndLast;
}
z += stripSize;
count++;
}
This seems to work and give the result to the specified accuracy according to the value of MinError. There are probably other (better) ways to check for convergence when numerically integrating a periodic function. A quick Google search reveals this paper for example.
The integral of from 0 to infinity of cos(x)/sqrt(x), or sin(x)/sqrt(x) is well known to be sqrt(pi/2). So evaluating pi to any number of digits is easier problem. Newton did it by integrating a quarter circle to get the area = pi/4. The integrals are evaluated by the methods of complex analysis. They are done in may text books on the subject, and on one of my final exams in graduate school.

What sort of indexing method can I use to store the distances between X^2 vectors in an array without redundancy?

I'm working on a demo that requires a lot of vector math, and in profiling, I've found that it spends the most time finding the distances between given vectors.
Right now, it loops through an array of X^2 vectors, and finds the distance between each one, meaning it runs the distance function X^4 times, even though (I think) there are only (X^2)/2 unique distances.
It works something like this: (pseudo c)
#define MATRIX_WIDTH 8
typedef float vec2_t[2];
vec2_t matrix[MATRIX_WIDTH * MATRIX_WIDTH];
...
for(int i = 0; i < MATRIX_WIDTH; i++)
{
for(int j = 0; j < MATRIX_WIDTH; j++)
{
float xd, yd;
float distance;
for(int k = 0; k < MATRIX_WIDTH; k++)
{
for(int l = 0; l < MATRIX_WIDTH; l++)
{
int index_a = (i * MATRIX_LENGTH) + j;
int index_b = (k * MATRIX_LENGTH) + l;
xd = matrix[index_a][0] - matrix[index_b][0];
yd = matrix[index_a][1] - matrix[index_b][1];
distance = sqrtf(powf(xd, 2) + powf(yd, 2));
}
}
// More code that uses the distances between each vector
}
}
What I'd like to do is create and populate an array of (X^2) / 2 distances without redundancy, then reference that array when I finally need it. However, I'm drawing a blank on how to index this array in a way that would work. A hash table would do it, but I think it's much too complicated and slow for a problem that seems like it could be solved by a clever indexing method.
EDIT: This is for a flocking simulation.
performance ideas:
a) if possible work with the squared distance, to avoid root calculation
b) never use pow for constant, integer powers - instead use xd*xd
I would consider changing your algorithm - O(n^4) is really bad. When dealing with interactions in physics (also O(n^4) for distances in 2d field) one would implement b-trees etc and neglect particle interactions with a low impact. But it will depend on what "more code that uses the distance..." really does.
just did some considerations: the number of unique distances is 0.5*n*n(+1) with n = w*h.
If you write down when unique distances occur, you will see that both inner loops can be reduced, by starting at i and j.
Additionally if you only need to access those distances via the matrix index, you can set up a 4D-distance matrix.
If memory is limited we can save up nearly 50%, as mentioned above, with a lookup function that will access a triangluar matrix, as Code-Guru said. We would probably precalculate the line index to avoid summing up on access
float distanceArray[(H*W+1)*H*W/2];
int lineIndices[H];
searchDistance(int i, int j)
{
return i<j?distanceArray[i+lineIndices[j]]:distanceArray[j+lineIndices[i]];
}

Weight Initialisation

I plan to use the Nguyen-Widrow Algorithm for an NN with multiple hidden layers. While researching, I found a lot of ambiguities and I wish to clarify them.
The following is pseudo code for the Nguyen-Widrow Algorithm
Initialize all weight of hidden layers with random values
For each hidden layer{
beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
For each synapse{
For each weight{
Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
}
}
}
Just wanted to clarify whether the value of hiddenNeurons is the size of the particular hidden layer, or the size of all the hidden layers within the network. I got mixed up by viewing various sources.
In other words, if I have a network (3-2-2-2-3) (index 0 is input layer, index 4 is output layer), would the value hiddenNeurons be:
NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)
Or just
NumberOfNeuronsInLayer(i) , where i is the current Layer I am at
EDIT:
So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer?
The Nguyen-Widrow initialization algorithm is the following :
Initialize all weight of hidden layers with (ranged) random values
For each hidden layer
2.1 calculate beta value, 0.7 * Nth(#neurons of input layer) root of
#neurons of current layer
2.2 for each synapse
2.1.1 for each weight
2.1.2 Adjust weight by dividing by norm of weight for neuron and
multiplying by beta value
Encog Java Framework
Sounds to me like you want more precise code. Here are some actual code lines from a project I'm participating to. Hope you read C. It's a bit abstracted and simplified. There is a struct nn, that holds the neural net data. You probably have your own abstract data type.
Code lines from my project (somewhat simplified):
float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);
for( w in all weight )
*w++ = random_range( -factor, factor );
/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
_scale_nguyen_widrow( factor, w, nn->n_hidden );
w += nn->n_hidden;
}
Functions called:
static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
unsigned int i;
float magnitude = 0.0f;
for ( i = 0; i < size; i++ )
magnitude += vec[i] * vec[i];
magnitude = sqrtf( magnitude );
for ( i = 0; i < size; i++ )
vec[i] *= factor / magnitude;
}
static inline float random_range( float min, float max)
{
float range = fabs(max - min);
return ((float)rand()/(float)RAND_MAX) * range + min;
}
Tip:
After you've implemented the Nguyen/Widrow weight initialization, you can actually add a little code line in the forward calculation that dumps each activation to a file. Then you can check how good the set of neurons hits the activation function. Find the mean and standard deviation. You can even plot it with a plotting tool, ie. gnuplot. (You need a plotting tool like gnuplot anyway for plotting error rates etc.) I did that for my implementation. The plots came out nice, and the initial learning became much faster using Nguyen/Widrow for my project.
PS: I'm not sure my implementation is correct according to Nguyen and Widrows intentions. I don't even think I care, as long as it does improve the initial learning.
Good luck,
-Øystein

Manhattan distance is over estimating and making me crazy

I'm implementing a-star algorithm with Manhattan distance to solve the 8-puzzle (in C). It seems to work very well and passes a lot of unit tests but it fails to find the shortest path in one case (it finds 27 steps instead of 25).
When I change the heuristic function to Hamming distance it finds in 25 steps.
Also finds in 25 steps when I make the Manhattan distance function to return a half of the actual cost.
That's why I believe the problem lies somewhere in Manhattan distance function and it is over estimating the cost (hence inadmissible). I thought maybe something else is going wrong in the C program so I wrote a little Python script to test and verify the output of the Manhattan distance function only and they both produce the exact same result.
I'm really confused because the heuristic function seems to be the only point of failure and it seems to be correct at the same time.
You can try this solver and put the tile order like "2,6,1,0,7,8,3,5,4"
Choose the algorithm Manhattan distance and it finds in 25 steps.
Now change it to Manhattan distance + linear conflict and it finds 27 steps.
But my Manhattan distance (without linear conflict) finds in 27 steps.
Here's my general algorithm:
manhattan_distance = 0
iterate over all tiles
if the tile is not the blank tile:
find the coordinates of this tile on the goal board
manhattan_distance += abs(x - goal_x) + abs(y - goal_y)
I think if there was something very badly wrong with some important part it wouldn't pass all 25+ previous tests so this might be some sort of edge case.
Here's commented Manhattan distance function in C:
int ManhattanDistance(Puzzle p, State b){
State goal = getFinalState(p);
int size = getSize(b);
int distance = 0;
if (getSize(goal) == size){ // both states are the same size
int i, j;
for(i=0; i<size; i++){
for(j=0; j<size; j++){ // iterate over all tiles
int a = getStateValue(b, i, j); // what is the number on this tile?
if (a != 'B'){ // if it's not the blank tile
int final_cordinates[2];
getTileCoords(goal, a, final_cordinates); // find the coordinates on the other board
int final_i = final_cordinates[0];
int final_j = final_cordinates[1];
distance += abs(i - final_i) + abs(j - final_j);
}
}
}
}
return distance;
}
Please help me.
EDIT: As discussed in comments, the code provided for opening nodes can be found here
The problem seems to be not in your heuristic function, but in the algorithm itself. From your description of the problem, and the fact that it occures only on some specific cases, I believe it has to do with the re-opening of a closed vertice, once you find a better path to it.
While reading the code you have provided [in comments], I think I understood where the problem lays, in line 20:
if(getG(current) + 1 < getG(children[i])){
This is wrong! You are checking if g(current) + 1 < g(children[i]), you actually want to check for: f(current) + 1 + h(children[i]) < g(children[i]), since you want to check this value with the heuristic function of children[i], and not of current!
Note that it is identical as to set f(children[i]) = min{f(children[i]),f(current)+1}, and then adding h(children[i]) to get the g value.

Eigenvector computation using OpenCV

I have this matrix A, representing similarities of pixel intensities of an image. For example: Consider a 10 x 10 image. Matrix A in this case would be of dimension 100 x 100, and element A(i,j) would have a value in the range 0 to 1, representing the similarity of pixel i to j in terms of intensity.
I am using OpenCV for image processing and the development environment is C on Linux.
Objective is to compute the Eigenvectors of matrix A and I have used the following approach:
static CvMat mat, *eigenVec, *eigenVal;
static double A[100][100]={}, Ain1D[10000]={};
int cnt=0;
//Converting matrix A into a one dimensional array
//Reason: That is how cvMat requires it
for(i = 0;i < affnDim;i++){
for(j = 0;j < affnDim;j++){
Ain1D[cnt++] = A[i][j];
}
}
mat = cvMat(100, 100, CV_32FC1, Ain1D);
cvEigenVV(&mat, eigenVec, eigenVal, 1e-300);
for(i=0;i < 100;i++){
val1 = cvmGet(eigenVal,i,0); //Fetching Eigen Value
for(j=0;j < 100;j++){
matX[i][j] = cvmGet(eigenVec,i,j); //Fetching each component of Eigenvector i
}
}
Problem: After execution I get nearly all components of all the Eigenvectors to be zero. I tried different images and also tried populating A with random values between 0 and 1, but the same result.
Few of the top eigenvalues returned look like the following:
9805401476911479666115491135488.000000
-9805401476911479666115491135488.000000
-89222871725331592641813413888.000000
89222862280598626902522986496.000000
5255391142666987110400.000000
I am now thinking on the lines of using cvSVD() which performs singular value decomposition of real floating-point matrix and might yield me the eigenvectors. But before that I thought of asking it here. Is there anything absurd in my current approach? Am I using the right API i.e. cvEigenVV() for the right input matrix (my matrix A is a floating point matrix)?
cheers
Note to readers: This post at first may seem unrelated to the topic, but please refer to the discussion in the comments above.
The following is my attempt at implementing the Spectral Clustering algorithm applied to image pixels in MATLAB. I followed exactly the paper mentioned by #Andriyev:
Andrew Ng, Michael Jordan, and Yair Weiss (2002).
On spectral clustering: analysis and an algorithm.
In T. Dietterich, S. Becker, and Z. Ghahramani (Eds.),
Advances in Neural Information Processing Systems 14. MIT Press
The code:
%# parameters to tune
SIGMA = 2e-3; %# controls Gaussian kernel width
NUM_CLUSTERS = 4; %# specify number of clusters
%% Loading and preparing a sample image
%# read RGB image, and make it smaller for fast processing
I0 = im2double(imread('house.png'));
I0 = imresize(I0, 0.1);
[r,c,~] = size(I0);
%# reshape into one row per-pixel: r*c-by-3
%# (with pixels traversed in columwise-order)
I = reshape(I0, [r*c 3]);
%% 1) Compute affinity matrix
%# for each pair of pixels, apply a Gaussian kernel
%# to obtain a measure of similarity
A = exp(-SIGMA * squareform(pdist(I,'euclidean')).^2);
%# and we plot the matrix obtained
imagesc(A)
axis xy; colorbar; colormap(hot)
%% 2) Compute the Laplacian matrix L
D = diag( 1 ./ sqrt(sum(A,2)) );
L = D*A*D;
%% 3) perform an eigen decomposition of the laplacian marix L
[V,d] = eig(L);
%# Sort the eigenvalues and the eigenvectors in descending order.
[d,order] = sort(real(diag(d)), 'descend');
V = V(:,order);
%# kepp only the largest k eigenvectors
%# In this case 4 vectors are enough to explain 99.999% of the variance
NUM_VECTORS = sum(cumsum(d)./sum(d) < 0.99999) + 1;
V = V(:, 1:NUM_VECTORS);
%% 4) renormalize rows of V to unit length
VV = bsxfun(#rdivide, V, sqrt(sum(V.^2,2)));
%% 5) cluster rows of VV using K-Means
opts = statset('MaxIter',100, 'Display','iter');
[clustIDX,clusters] = kmeans(VV, NUM_CLUSTERS, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton');
%% 6) assign pixels to cluster and show the results
%# assign for each pixel the color of the cluster it belongs to
clr = lines(NUM_CLUSTERS);
J = reshape(clr(clustIDX,:), [r c 3]);
%# show results
figure('Name',sprintf('Clustering into K=%d clusters',NUM_CLUSTERS))
subplot(121), imshow(I0), title('original image')
subplot(122), imshow(J), title({'clustered pixels' '(color-coded classes)'})
... and using a simple house image I drew in Paint, the results were:
and by the way, the first 4 eigenvalues used were:
1.0000
0.0014
0.0004
0.0002
and the corresponding eigenvectors [columns of length r*c=400]:
-0.0500 0.0572 -0.0112 -0.0200
-0.0500 0.0553 0.0275 0.0135
-0.0500 0.0560 0.0130 0.0009
-0.0500 0.0572 -0.0122 -0.0209
-0.0500 0.0570 -0.0101 -0.0191
-0.0500 0.0562 -0.0094 -0.0184
......
Note that there are step performed above which you didn't mention in your question (Laplacian matrix, and normalizing its rows)
I would recommend this article. The author implements Eigenfaces for face recognition. On page 4 you can see that he uses cvCalcEigenObjects to generate the eigenvectors from an image. In the article the whole pre processing step necessary for this computations are shown.
Here's a not very helpful answer:
What does theory (or maths scribbled on a piece of paper) tell you the eigenvectors ought to be ? Approximately.
What does another library tell you the eigenvectors ought to be ? Ideally what does a system such as Mathematica or Maple (which can be persuaded to compute to arbitrary precision) tell you the eigenvectors ought to be ? If not for a production-sixed problem at least for a test-sized problem.
I'm not an expert with image processing so I can't be much more helpful, but I spend a lot of time with scientists and experience has taught me that a lot of tears and anger can be avoided by doing some maths first and forming an expectation of what results you ought to get before wondering why you got 0s all over the place. Sure it might be an error in the implementation of an algorithm, it might be loss of precision or some other numerical problem. But you don't know and shouldn't follow up those lines of inquiry yet.
Regards
Mark

Resources