Weight Initialisation - artificial-intelligence

Weight Initialisation - artificial-intelligence

I plan to use the Nguyen-Widrow Algorithm for an NN with multiple hidden layers. While researching, I found a lot of ambiguities and I wish to clarify them.
The following is pseudo code for the Nguyen-Widrow Algorithm
Initialize all weight of hidden layers with random values
For each hidden layer{
beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
For each synapse{
For each weight{
Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
}
}
}
Just wanted to clarify whether the value of hiddenNeurons is the size of the particular hidden layer, or the size of all the hidden layers within the network. I got mixed up by viewing various sources.
In other words, if I have a network (3-2-2-2-3) (index 0 is input layer, index 4 is output layer), would the value hiddenNeurons be:
NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)
Or just
NumberOfNeuronsInLayer(i) , where i is the current Layer I am at
EDIT:
So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer?

The Nguyen-Widrow initialization algorithm is the following :
Initialize all weight of hidden layers with (ranged) random values
For each hidden layer
2.1 calculate beta value, 0.7 * Nth(#neurons of input layer) root of
#neurons of current layer
2.2 for each synapse
2.1.1 for each weight
2.1.2 Adjust weight by dividing by norm of weight for neuron and
multiplying by beta value
Encog Java Framework

Sounds to me like you want more precise code. Here are some actual code lines from a project I'm participating to. Hope you read C. It's a bit abstracted and simplified. There is a struct nn, that holds the neural net data. You probably have your own abstract data type.
Code lines from my project (somewhat simplified):
float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);
for( w in all weight )
*w++ = random_range( -factor, factor );
/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
_scale_nguyen_widrow( factor, w, nn->n_hidden );
w += nn->n_hidden;
}
Functions called:
static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
unsigned int i;
float magnitude = 0.0f;
for ( i = 0; i < size; i++ )
magnitude += vec[i] * vec[i];
magnitude = sqrtf( magnitude );
for ( i = 0; i < size; i++ )
vec[i] *= factor / magnitude;
}
static inline float random_range( float min, float max)
{
float range = fabs(max - min);
return ((float)rand()/(float)RAND_MAX) * range + min;
}
Tip:
After you've implemented the Nguyen/Widrow weight initialization, you can actually add a little code line in the forward calculation that dumps each activation to a file. Then you can check how good the set of neurons hits the activation function. Find the mean and standard deviation. You can even plot it with a plotting tool, ie. gnuplot. (You need a plotting tool like gnuplot anyway for plotting error rates etc.) I did that for my implementation. The plots came out nice, and the initial learning became much faster using Nguyen/Widrow for my project.
PS: I'm not sure my implementation is correct according to Nguyen and Widrows intentions. I don't even think I care, as long as it does improve the initial learning.
Good luck,
-Øystein

Related

Improving the performance of nested loops in C

Given a list of spheres described by (xi, yi, ri), meaning the center of sphere i is at the point (xi, yi, 0) in three-dimensional space and its radius is ri, I want to compute all zi where zi = max { z | (xi, yi, z) is a point on any sphere }. In other words, zi is the highest point over the center of sphere i that is in any of the spheres.
I have two arrays
int **vs = (int **)malloc(num * sizeof(int *));
double **vh = (double **)malloc(num * sizeof(double *));
for (int i = 0; i < num; i++){
vs[i] = (int *)malloc(2 * sizeof(int)); // x,y
vh[i] = (double *)malloc(2 * sizeof(double)); r,z
}
The objective is to calculate the maximum z for each point. Thus, we should check if there are larger spheres over each x,y point.
Initially we see vh[i][1]=vh[i][0] for all points, which means that z is the r of each sphere. Then, we check if these z values are inside larger spheres to maximize the z value.
for (int i = 0; i < v; i++) {
double a = vh[i][0] * vh[i][0]; // power of the radius of sphere #1
for (int j = 0; j < v; j++) {
if (vh[i][0] > vh[j][1]) { // check only if r of sphere #1 is larger than the current z of #2
double b = a - (vs[j][0] - vs[i][0]) * (vs[j][0] - vs[i][0])
- (vs[j][1] - vs[i][1]) * (vs[j][1] - vs[i][1]);
// calculating the maximum z value of sphere #2 crossing sphere #1
// (r of sphere #1)**2 = (z of x_j,y_j)**2 + (distance of two centers)**2
if (b > vh[j][1] * vh[j][1]) {
vh[j][1] = sqrt(b);// update the z value if it is larger than the current value
}
}
}
}
it works perfectly, but the nested loop is very slow when the number of points increases. I look for a way to speed up the process.
An illustration for the clarification of the task

When you say
The objective is to calculate the maximum z for each point.
I take you to mean, for the center C of each sphere, the maximum z coordinate among all the points lying directly above C (along the z axis) on any of the spheres. This is fundamentally an O(n2) problem -- there is nothing you can do to prevent the computational expense scaling with the square of the number of spheres.
But there may be some things you can do to reduce the scaling coeffcient. Here are some possibilities:
Use bona fide 2D arrays (== arrays of arrays) instead arrays of pointers. It's easier to implement, more memory-efficient, and better for locality of reference:
int (*vs)[2] = malloc(num * sizeof(*vs));
double (*vh)[2] = malloc(num * sizeof(*h));
// no other allocations needed
Alternatively, it may help to use an array of structures, one per sphere, instead of two 2D arrays of numbers. It would certainly make your code clearer, but it might also help give a slight speed boost by improving locality of reference:
struct sphere {
int x, y;
double r, z;
};
struct sphere *spheres = malloc(num * sizeof(*spheres));
Store z2 instead of z, at least for the duration of the computation. This will reduce the number of somewhat-expensive sqrt calls from O(v2) to O(v), supposing you make a single pass at the end to convert all the results to zs, and it will save you O(v2) multiplications, too. (More if you could get away without ever converting from z2 to z.)
Pre-initialize each vh[i][1] value to the radius of sphere i (or the square of the radius if you are exercising the previous option, too), and add j != i to the condition around the inner-loop body.
Sorting the spheres in decreasing order by radius may help you find larger provisional z values earlier, and therefore to make the radius test in the inner loop more effective at culling unnecessary computations.
You might get some improvement by checking each distinct pair only once. That is, for each unordered pair i, j, you can compute the inter-center distance once only, determine from the relative radii which height to check for a possible update, and go from there. The extra logic involved might or might not pay off through a reduction in other computations.
Additionally, if you are doing this for large enough inputs, then you might be able to reduce the wall time consumed by parallelizing the computation.
Note, by the way, that this comment is incorrect:
// (r of sphere #1)**2 = (r of sphere #2)**2 + (distance of two centers)**2
. However, it also not what you are relying upon. What you are relying upon is that if sphere 1 covers the center of sphere 2 at all, then its height, z, above the center of sphere 2 satisfies the relationship
r12 = z2 + d1,22
. That is, where you wrote r of sphere #2 in the comment, you appear to have meant z.

Neural network for linear regression

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}

Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Integrating cos(x)/sqrt(x) between 0 and infinity to a user defined precision

So I'm trying to do what I've said above. The user will enter a precision, such as 3 decimal places, and then using the trapezium rule, the program will keep adding strips on until the 3rd decimal place is no longer changing, and then stop and print the answer.
I'm not sure of the best way to approach this. Due to the function being sinusoidal, one period of 2PI will almost be 0. I feel like this way would be the best way of approaching the problem, but no idea of how to go about it. At the moment I'm checking the y value for each x value to see when that becomes less than the required precision, however it never really goes lower enough. At x = 10 million, for example, y = -0.0002, which is still relatively large for such a large x value.
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z);
z += stripSize;
count++;
printf("%lf\n", func(z));
if(fabs(func(z))<lowestAddition/stripSize){
break;
}
}
So this above is what I'm trying to do currently. Where func is the function. The stripSize is set to 0.01, just something relatively small to make the areas of the trapeziums more accurate. sumFirstAndLast is the sum of the first and last values, set at 0.001 and 1000000. Just a small value and a large value.
As I mentioned, I "think" the best way to do this, would be to check the value of the integral over every 2PI, but once again not sure how to go about this. My current method gives me the correct answer if I take the precision part out, but as soon as I try to put a precision in, it gives a completely wrong answer.

For a non-periodic function that converges to zero you can (sort of) do a check of the function's value and compare to a minimum error value, but this doesn't work for a periodic function as you get an early exit before the integrand sum converges (as you've found out). For a non-periodic function you can simply check the change in the integrand sum on each iteration to a minimum error but that won't work here either.
Instead, you'll have to do like a few comments suggest to check for convergence relative to the period of the function, PI in this case (I found it works better than using 2*PI). To implement this do something like the following code (note I changed your sum to be the actual area instead of doing it at the end):
sumFirstAndLast = (0.5*func(a) + 0.5*func(b)) * stripSize;
double z = a + stripSize;
double CHECK_RANGE = 3.14159265359;
double NextCheck = CHECK_RANGE;
double LastCheckSum = 0;
double MinError = 0.0001;
for (int i = 1; i < 1000000000; i++)
{
sumFirstAndLast += func(z) * stripSize;
if (z >= NextCheck)
{
if (fabs(LastCheckSum - sumFirstAndLast ) < MinError) break;
NextCheck += CheckRange;
LastCheckSum = sumFirstAndLast;
}
z += stripSize;
count++;
}
This seems to work and give the result to the specified accuracy according to the value of MinError. There are probably other (better) ways to check for convergence when numerically integrating a periodic function. A quick Google search reveals this paper for example.

The integral of from 0 to infinity of cos(x)/sqrt(x), or sin(x)/sqrt(x) is well known to be sqrt(pi/2). So evaluating pi to any number of digits is easier problem. Newton did it by integrating a quarter circle to get the area = pi/4. The integrals are evaluated by the methods of complex analysis. They are done in may text books on the subject, and on one of my final exams in graduate school.

DFT function implementation in C

I am working on implementation of BFSK implementation on a DSP processor and am currently simulating it on a LINUX machine using C. I am working on the demodulation function and it involves taking a FFT of the incoming data. For simulation purposes, I have a pre-defined function for DFT which is:
void dft(complex_float* in, complex_float* out, int N, int inv)
{
int i, j;
float a, f;
complex_float s, w;
f = inv ? 1.0/N : 1.0;
for (i = 0; i < N; i++) {
s.re = 0;
s.im = 0;
for (j = 0; j < N; j++) {
a = -2*PI*i*j/N;
if (inv) a = -a;
w.re = cos(a);
w.im = sin(a);
s.re += in[j].re * w.re - in[j].im * w.im;
s.im += in[j].im * w.re + in[j].re * w.im;
}
out[i].re = s.re*f;
out[i].im = s.im*f;
}
Here the complex_float is a struct defined as follows:
typedef struct {
float re;
float im;
} complex_float;
In the dft() function, the parameter N denotes the number of DFT points.
My doubt is that since the algorithm also involves a frequency hopping sequence, while demodulating the signal, I need to check the amplitude of DFT of the signal at different frequency components.
In MATLAB this was quite simple as the FFT function there involves the sampling frequency as well and I could find the power at any frequency point as
powerat_at_freq = floor((freq * fftLength) / Sampling_freq)
But the C function does not involve any frequencies, so how can I determine the magnitude of the DFT at any particular frequency?

The index in the FFT table for a particular frequency is calculated as follows:
int i = round(f / fT*N)
where f is the wanted frequency, fT is the sampling frequency and N is the number of FFT points.
The FFT should be fine-grained enough (i.e. N should be large) to cover all the frequencies.
If the precise frequency isn't present in the FFT, the nearest one will be used. More
info about FFT indexes versus frequencies:
How do I obtain the frequencies of each value in an FFT?

The frequency represented depends on the sample rate of the data fed to it (divided by the length if the FFT). Thus any DFT or FFT can represent any frequency you want just by feeding it the right amount of data at the right sample rate.

You can refer to the FFTW library which is famous and useful in the applicational area of FFT.
The official website is: http://www.fftw.org/
By the way, the matlab's FFT function is also implemented through the FFTW library.

What is wrong with my low pass filter?

I have an array of int samples ranging from 32766 to -32767. In part of trying to create an envelope detector I've written a low pass filter, but it doesn't seem to be doing the job. Please keep in mind I'm trying to filter an entire array in one shot (no buffers).
This is not streamed, but applied to recorded audio for later playback. It is written in C. An example cutoff argument would be 0.5.
void lopass(int *input, float cutoff, int *output)
{
float sample = 0;
for (int i=1 ; i < (1430529-10); i++) // we will go through all except the last 10 samples
{
for (int j = i; j < (i+10); j++) { // only do this for a WINDOW of a hundred samples
float _in = (float)input[j];
float _out = (float)output[j-1];
sample = (cutoff * _in) + (32766 - (32766*cutoff)) * _out;
}
output[i] = (int)sample;
}
}
I thought that I would run my filtering statement on a window of 10 samples. Not only is it super slow, but it doesn't really do much but seemingly lower the overall amplitude. \
If you have any advice, or suggestions (or code!) on how to do this properly, that would be great!

A low-pass filter is basically some variant of averaging a number of values together. That means at least in the normal case your inner loop will accumulate a value. It's hard to guess the exact intent from your code, but you end up with something on the extremely general order of:
sample = 0;
for (int j=i; j<i+10; j++)
sample += input[j];
output[i] = sample / 10;
As it stands right now, this just does averaging, with no cutoff specified -- that means it has a fixed (and fairly slow) cuttoff curve. The cutoff is governed only by the number of samples in the window.
To control the cutoff, you do not (at least normally) multiply all the input values by the same amount -- that would basically just modify the scale factor. Instead, you take a set of samples (10 of them, in your case) of the cutoff curve you want to apply, run them through an inverse FFT, and get a set of 10 coefficients. You then apply those coefficients in your loop:
sample = 0;
for (j=0; j<10; j++)
sample += input[i+j] * coefficients[j];
output[i] = sample;
The number of samples in your window isn't normally an input to the design process -- rather, it's an output. You start by specify the cutoff frequency (as a fraction of the sampling frequency) and the cutoff width, and based on those you compute the necessary window size.
There are quite a few different techniques for computing your coefficients. Regardless of how you compute them, however, you normally end up with something on this general order -- accumulate the sum of the samples in the window, each multiplied by its respective coefficient.
The EE times had a pretty good article on filter design a few years ago.

Don't know if it's relevant, but in your code the inner loop is doing nothing
for (j=???; j<???; j++) {
sample = ???;
}
is the same as
// for (j=???; j<???; j++) {
sample = ???; // for last j
// }

The arithmetic in the filter looks wrong, and as #pmg already pointed out, you are not storing output values correctly. It should probably be:
void lopass(int *input, float cutoff, int *output)
{
float sample = 0.0f;
output[0] = 0.0f;
for (int i = 1 ; i < (1430529 - 10); i++)
{
for (int j = i; j < (i + 10); j++)
{
float _in = (float)input[j];
float _out = (float)output[j-1];
sample = (cutoff * _in) + (1.0f - cutoff) * _out;
output[i] = (int)sample;
}
}
}
There are still a few minor issues to be fixed but this should at least work as a fairly crude single pole recursive (IIR) filter.

It's a broken moving-window filter of 10 samples in the inner loop (where you actually use only the last sample of the 10), when in your comments you say you want 100 samples in your rectangular filter window.
The first error will give you a filter transition frequency 10X too high.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight