Learning algorithm for sigmoid Perceptron failing - Handwriting recognition

Learning algorithm for sigmoid Perceptron failing - Handwriting recognition - artificial-intelligence

I'm trying to do handwriting recognition with a single layer of perceptrons, using the MNIST database in Java. Each perceptron is assigned to recognise a digit (so 10 perceptrons). They are each trained randomly on each of 50,000 samples for a number of epochs. When testing, the perceptron with the highest rate is selected (ie the one that is most confident that it is correct).
Using this method I am consistently getting 93.5% accuracy. However, I think to improve I'll need to add hidden layers, and implement backpropogation. When using the Sigmoid (squashing) function to calculate on the forward pass of my single layer network, it works wonderfully. However when I change my backwards pass (learning) function to match back-propogation I get ~70% accuracy. Can someone check over my algorithms to make sure this is correct?
I got the algorithm from a few places, and I think it is the same. For example this one: http://www.webpages.ttu.edu/dleverin/neural_network/neural_networks.html
Note I am treating the last weight as a bias. The 'error' fed into the learn function is just the desired result minus the actual result. example is the same array as input - the greyscale of each pixel in the digit.
Forward pass:
public final double apply( double[] input ) {
double ret = 0;
for (int i=0; i<wei.length-1; i++){
ret = ret + input[i]*wei[i];
}
ret+=wei[wei.length-1];
//apply squashing function
ret = 1/(1+Math.exp(-ret));
return ret;
}
Learning Function (backwards Pass)
public final void learn( double error, double result, double[] example, double alpha ) {
for (int i=0; i<wei.length-1; i++){
//wei[i] = wei[i] + alpha*error*example[i]; //This was original learning function - it gives 93.5% accuracy
wei[i]=wei[i]+ alpha* error * example[i]*result*(1-result); //This line gives ~70% accuracy
}
//wei[wei.length-1] += alpha*error; //this line works for bias
wei[wei.length-1]=wei[wei.length-1]+ alpha* error * 1*result*(1-result); //again, this makes results inaccurate
}

I get best explanation about the algorithm and the implementation in here http://visualstudiomagazine.com/articles/2013/03/01/pattern-recognition-with-perceptrons.aspx this morning. It explains detailed one by one perceptron source code in C# for character recognition.

Related

Low Pass Filter in OpenCL

I am trying to implement a low pass filter in OpenCL and the theory behind all this has me confused a bit. I have attached my code at the bottom after my explanation of the scenario.
First off, let me try to explain the whole scenario in point form.
For the input, we have a cos signal with a sample size, frequency (Frequency sample obtained by multiplying sample size with frequency) and a step size.
The value of at each step size is stored in an array with the frequency and step size multiplied to the function
This array is then passed into the kernel, which then will execute the low pass filter function.
Kernel returns an output array with the new filtered values.
The cos function is always returning a value from (-1,1), the only thing that modifies this value is the frequency. So it may repeat faster or slower depending on the frequency BUT it is always between (-1,1).
This is where I am confused, I am not sure how to apply a low pass filter to these values. Let say the cutoff was 100Hz for the filter. I can't just say:
if(array[i] > 100 ) { //delete or ignore this value. Else store in a array }
The reason this won't work is because the value of array[i] ranges from (-1,1). So how then would I apply this filter? What values am I going to compare?
From a physical perspective, I can see how it works, a capacitor and a resistor to calculate the cut-off frequency and send the input through the circuit. But programmatically, I do not see how I can implement this. I have seen many implementations of this on-line but the code wasn't documented enough to get a good understanding of what was going on.
Here is the code on my host side:
//Array to hold the information of signal
float *Array;
//Number of sampling points
int sampleSize = 100;
float h = 0;
//Signal Frequency in Hz
float signalFreq = 10;
//Number of points between 0 and max val (T_Sample)
float freqSample = sampleSize*signalFreq;
//Step = max value or T_Sample
float stepSize = 1.0 / freqSample;
//Allocate enough memory for the array
Array = (float*)malloc(sampleSize*sizeof(float));
//Populate the array with modified cosine
for (int i = 0; i < sampleSize; i++) {
Array[0] = cos(2*CL_M_PI*signalFreq*h);
h = h + stepSize;
printf("Value of current sample for cos is: %f \n", Array[0]);
}
My kernel is only as follows: (Obviously this is not the code for the filter, this is where I am confused).
__kernel void lowpass(__global int *Array, __local float *cutOffValue, __global int *Output) {
int idx = get_global_id(0);
Output[idx] = Array[idx];
};
I found this PDF that implements a lot of filters. Near the end of the document you can find a float implementation of the Low Pass Filter.
http://scholar.uwindsor.ca/cgi/viewcontent.cgi?article=6242&context=etd
In the filter implementation in that pdf, the compare data[j] to value. Also I have no idea what numItems or workItems is.
If someone can provide some insight on this that would be great. I have searched a lot of other examples on low pass filters but I just can't wrap my head around the implementation. I hope I made this question clear. Again, I know how/what the low pass filter does. I just have no idea as to what values I need to compare in order for the filtering to take place.
Found this question aswell:
Low Pass filter in C

I have a possible solution. What I am attempting will be a moving average fir (which I am told is the easiest form of a lowpass filter that one can implement).
What is required:
FIFO buffer
Coefficient values (I generated and obtained mine from matlab for a specific cut-off frequency)
Input and Output arrays for the program
I have not implemented this code wise, but I do understand how to use it on a theoretical level. I have created a diagram below to try and explain the process.
Essentially, from another input array values will be passed into the FIFO buffer one at a time. Every time a value is passed in, the kernel will do a multiplication across the FIFO buffer that has 'n' taps. Each tap has a coefficient value associated with it. So the input at a particular element gets multiplied with the coefficient value and all the values are then accumulated and stored in one element of the output buffer.
Note that the coefficients were generated in Matlab. I didn't know how else to grab these values. At first I was going to just use the coefficient of 1/n, but I am pretty sure that is just going to distort the values of the signal.
And that should do the trick, I am going to implement this in the code now, but if there is anything wrong with this theory feel free to correct it.

Neural network for linear regression

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}

Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Integrating cos(x)/sqrt(x) between 0 and infinity to a user defined precision

So I'm trying to do what I've said above. The user will enter a precision, such as 3 decimal places, and then using the trapezium rule, the program will keep adding strips on until the 3rd decimal place is no longer changing, and then stop and print the answer.
I'm not sure of the best way to approach this. Due to the function being sinusoidal, one period of 2PI will almost be 0. I feel like this way would be the best way of approaching the problem, but no idea of how to go about it. At the moment I'm checking the y value for each x value to see when that becomes less than the required precision, however it never really goes lower enough. At x = 10 million, for example, y = -0.0002, which is still relatively large for such a large x value.
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z);
z += stripSize;
count++;
printf("%lf\n", func(z));
if(fabs(func(z))<lowestAddition/stripSize){
break;
}
}
So this above is what I'm trying to do currently. Where func is the function. The stripSize is set to 0.01, just something relatively small to make the areas of the trapeziums more accurate. sumFirstAndLast is the sum of the first and last values, set at 0.001 and 1000000. Just a small value and a large value.
As I mentioned, I "think" the best way to do this, would be to check the value of the integral over every 2PI, but once again not sure how to go about this. My current method gives me the correct answer if I take the precision part out, but as soon as I try to put a precision in, it gives a completely wrong answer.

For a non-periodic function that converges to zero you can (sort of) do a check of the function's value and compare to a minimum error value, but this doesn't work for a periodic function as you get an early exit before the integrand sum converges (as you've found out). For a non-periodic function you can simply check the change in the integrand sum on each iteration to a minimum error but that won't work here either.
Instead, you'll have to do like a few comments suggest to check for convergence relative to the period of the function, PI in this case (I found it works better than using 2*PI). To implement this do something like the following code (note I changed your sum to be the actual area instead of doing it at the end):
sumFirstAndLast = (0.5*func(a) + 0.5*func(b)) * stripSize;
double z = a + stripSize;
double CHECK_RANGE = 3.14159265359;
double NextCheck = CHECK_RANGE;
double LastCheckSum = 0;
double MinError = 0.0001;
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z) * stripSize;
if (z >= NextCheck)
{
if (fabs(LastCheckSum - sumFirstAndLast ) < MinError) break;
NextCheck += CheckRange;
LastCheckSum = sumFirstAndLast;
}
z += stripSize;
count++;
}
This seems to work and give the result to the specified accuracy according to the value of MinError. There are probably other (better) ways to check for convergence when numerically integrating a periodic function. A quick Google search reveals this paper for example.

The integral of from 0 to infinity of cos(x)/sqrt(x), or sin(x)/sqrt(x) is well known to be sqrt(pi/2). So evaluating pi to any number of digits is easier problem. Newton did it by integrating a quarter circle to get the area = pi/4. The integrals are evaluated by the methods of complex analysis. They are done in may text books on the subject, and on one of my final exams in graduate school.

Trying to brute force roots for a hermite polynomial

I've got a program that is supposed to find the roots for the **th hermite polynomial by using Newton's method, but it's taking a long time to run the program. I'm pretty new to C, so I can't figure out where my bug is or if this is just the nature of brute forcing this problem. I'm also having issues getting accurate roots, but so far, it's hard to find that bug because I can only run a test case every 5-10 minutes
CODE REMOVED

I'm 100% sure there's no good reason for Newton-Raphson to take so much time. In some cases it may be problematic, because this method isn't guaranteed to converge. But in your specific case - there should be no problem.
One thing that is clear is that you monstrously overuse recursion. Just calculating your hermite with n=37 is a recursion with complexity somewhat as summing 37 Fibonacci numbers, which is about 40 millions.
Now, think that your newton method should invoke the hermite repeatedly, as well as h_deriv (which is of the same order of magnitude of recursion), up until in converges up to 10^-12. Sounds like tens of interations.
And, not enough all this, you also manage to implement newton recursively! There is really no reason in the world to so this. (was lisp/scheme your first programming language?)
This is what you should do to improve the performance:
Fix your hermite. You should calculate the 37 coefficients, this may be done recursively. Once this is done - you should use them to calculate the the value of the polynomial in a normal time.
Same regarding the derivative. Just calculate the 36 coefficients.
Optionally fix your newton. As far as I can see - you'll not gain much of the performance: your "recursion" is an awkward loop nevertheless. However it'll look better, and consume much less stack.
Edit:
After reading the comments I took time and tried to build & run this. And, I must admit, I underestimated the problem complexity.
As turns out, the coefficients calculated by the recursive relation grow rapidly, and the round-off error seems to dominate. So that solving this problem by brute-force has unavoidable implications, and it's not obvious that using the pre-calculated coefficients (and summing them in the straight order) yields the same result.
Nevertheless there's a way to get rid of the ridiculous recursion without changing the calculation logic:
const int N = 37;
double g_pHermiteValues[N+1];
void CalcHermiteAt(double x)
{
double x2 = x*2;
g_pHermiteValues[0] = 1.;
g_pHermiteValues[1] = x2;
for (int n = 2; n <= N; n++)
g_pHermiteValues[n] =
g_pHermiteValues[n - 1] * x2 -
g_pHermiteValues[n - 2] * 2*(n - 1);
}
double CalcHermiteDerivAt()
{
return g_pHermiteValues[N - 1] * 2*N;
}
double newton(double x_0)
{
const double tolerance = 1E-12;
while (true)
{
CalcHermiteAt(x_0);
if (abs(g_pHermiteValues[N]) < tolerance)
return x_0;
x_0 -= g_pHermiteValues[N] / CalcHermiteDerivAt();
}
}
That is, we use the same recursive relation. It's just in order to calculate the value of the Hermite polynomial at a given point we calculate it for all the polynomials up to n=37 iteratively, and store the results in the global array. Then its top element holds the needed result, and the derivative is also deduced from the 2nd array element from the end.
Since in Newton-Raphson algorithm at each step we need both the value and the derivative at the same point - this is done effectively.
P.S. However so far I could not come to the solution. The Newton-Raphson just doesn't converge for the points I've tried to start from.
I believe for such a question a more robust method may be used, such as a median search.

Manhattan distance is over estimating and making me crazy

I'm implementing a-star algorithm with Manhattan distance to solve the 8-puzzle (in C). It seems to work very well and passes a lot of unit tests but it fails to find the shortest path in one case (it finds 27 steps instead of 25).
When I change the heuristic function to Hamming distance it finds in 25 steps.
Also finds in 25 steps when I make the Manhattan distance function to return a half of the actual cost.
That's why I believe the problem lies somewhere in Manhattan distance function and it is over estimating the cost (hence inadmissible). I thought maybe something else is going wrong in the C program so I wrote a little Python script to test and verify the output of the Manhattan distance function only and they both produce the exact same result.
I'm really confused because the heuristic function seems to be the only point of failure and it seems to be correct at the same time.
You can try this solver and put the tile order like "2,6,1,0,7,8,3,5,4"
Choose the algorithm Manhattan distance and it finds in 25 steps.
Now change it to Manhattan distance + linear conflict and it finds 27 steps.
But my Manhattan distance (without linear conflict) finds in 27 steps.
Here's my general algorithm:
manhattan_distance = 0
iterate over all tiles
if the tile is not the blank tile:
find the coordinates of this tile on the goal board
manhattan_distance += abs(x - goal_x) + abs(y - goal_y)
I think if there was something very badly wrong with some important part it wouldn't pass all 25+ previous tests so this might be some sort of edge case.
Here's commented Manhattan distance function in C:
int ManhattanDistance(Puzzle p, State b){
State goal = getFinalState(p);
int size = getSize(b);
int distance = 0;
if (getSize(goal) == size){ // both states are the same size
int i, j;
for(i=0; i<size; i++){
for(j=0; j<size; j++){ // iterate over all tiles
int a = getStateValue(b, i, j); // what is the number on this tile?
if (a != 'B'){ // if it's not the blank tile
int final_cordinates[2];
getTileCoords(goal, a, final_cordinates); // find the coordinates on the other board
int final_i = final_cordinates[0];
int final_j = final_cordinates[1];
distance += abs(i - final_i) + abs(j - final_j);
}
}
}
}
return distance;
}
Please help me.
EDIT: As discussed in comments, the code provided for opening nodes can be found here

The problem seems to be not in your heuristic function, but in the algorithm itself. From your description of the problem, and the fact that it occures only on some specific cases, I believe it has to do with the re-opening of a closed vertice, once you find a better path to it.
While reading the code you have provided [in comments], I think I understood where the problem lays, in line 20:
if(getG(current) + 1 < getG(children[i])){
This is wrong! You are checking if g(current) + 1 < g(children[i]), you actually want to check for: f(current) + 1 + h(children[i]) < g(children[i]), since you want to check this value with the heuristic function of children[i], and not of current!
Note that it is identical as to set f(children[i]) = min{f(children[i]),f(current)+1}, and then adding h(children[i]) to get the g value.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight