I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}
Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.
I am looking for a programming language and a way to automatize the following problems.
Given a formula connecting different variables, say g=GM/r^2, and values for all but one of the variables, (g=9.8,M=5E25,G=6.7E-11), how can I program a routine which:
a) Identifies the unknown variable
b) symbolically, solves the formula
c) finally, substitutes values of known variables and solves the equation for the unknown.
I am far from an expert in programming and the only thing it came to my mind was a slow process in which, one checks variable after variable which one has not been set to a value and according to that use the appropriate rearrangement of the formula to calculate the unknown.
(eg. in our case, the program checks variable after variable until it find that r is the unknown. Then, it uses the same formula but ready to calculate r, i.e. r=sqrt(GM/g))
I am sure there is a fast an elegant language to do this but I cannot figure it out.
Thanks in advance for your help.
Well, here is one way to do it, using Maxima.
eq : g = G * M / r^2;
known_values : [g = 9.8, M = 5e25, G = 6.7e-11];
eq1 : subst (known_values, eq);
remaining_var : listofvars (eq1);
solve (eq1, remaining_var);
=> [r = -5000000*sqrt(670)/7, r = 5000000*sqrt(670)/7]
You can use the function float to get a floating point value from that.
You can probably also do it with Sympy or something else.
For such a simple case, the approach that you suggest is quite appropriate.
The "slow" process might take on the order of 10 nanoseconds to find the unknown variable (using a compiled language), so I wouldn't worry so much.
Indeed symbolic computation programs are able to derive the explicit formulas, that you can retranscript in most programming languages
g=GM/r²
G=gr²/M
M=gr²/G
r=√GM/g
// C code
if (g == 0) g= G * M / (r * r);
else if (G == 0) G= g * r * r / M;
else if (M == 0) M= g * r * r / G;
else r= Math.sqrt(G * M / g);
For instance, the free Microsoft Mathematics can do it. But in this particular case, just do it by hand.
For a completely integrated solution with built-in scripting, think of Mathematica, Mathcad, Maple and the like.
I plan to use the Nguyen-Widrow Algorithm for an NN with multiple hidden layers. While researching, I found a lot of ambiguities and I wish to clarify them.
The following is pseudo code for the Nguyen-Widrow Algorithm
Initialize all weight of hidden layers with random values
For each hidden layer{
beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
For each synapse{
For each weight{
Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
}
}
}
Just wanted to clarify whether the value of hiddenNeurons is the size of the particular hidden layer, or the size of all the hidden layers within the network. I got mixed up by viewing various sources.
In other words, if I have a network (3-2-2-2-3) (index 0 is input layer, index 4 is output layer), would the value hiddenNeurons be:
NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)
Or just
NumberOfNeuronsInLayer(i) , where i is the current Layer I am at
EDIT:
So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer?
The Nguyen-Widrow initialization algorithm is the following :
Initialize all weight of hidden layers with (ranged) random values
For each hidden layer
2.1 calculate beta value, 0.7 * Nth(#neurons of input layer) root of
#neurons of current layer
2.2 for each synapse
2.1.1 for each weight
2.1.2 Adjust weight by dividing by norm of weight for neuron and
multiplying by beta value
Encog Java Framework
Sounds to me like you want more precise code. Here are some actual code lines from a project I'm participating to. Hope you read C. It's a bit abstracted and simplified. There is a struct nn, that holds the neural net data. You probably have your own abstract data type.
Code lines from my project (somewhat simplified):
float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);
for( w in all weight )
*w++ = random_range( -factor, factor );
/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
_scale_nguyen_widrow( factor, w, nn->n_hidden );
w += nn->n_hidden;
}
Functions called:
static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
unsigned int i;
float magnitude = 0.0f;
for ( i = 0; i < size; i++ )
magnitude += vec[i] * vec[i];
magnitude = sqrtf( magnitude );
for ( i = 0; i < size; i++ )
vec[i] *= factor / magnitude;
}
static inline float random_range( float min, float max)
{
float range = fabs(max - min);
return ((float)rand()/(float)RAND_MAX) * range + min;
}
Tip:
After you've implemented the Nguyen/Widrow weight initialization, you can actually add a little code line in the forward calculation that dumps each activation to a file. Then you can check how good the set of neurons hits the activation function. Find the mean and standard deviation. You can even plot it with a plotting tool, ie. gnuplot. (You need a plotting tool like gnuplot anyway for plotting error rates etc.) I did that for my implementation. The plots came out nice, and the initial learning became much faster using Nguyen/Widrow for my project.
PS: I'm not sure my implementation is correct according to Nguyen and Widrows intentions. I don't even think I care, as long as it does improve the initial learning.
Good luck,
-Øystein
I am trying to find a way to extend a line segment by a specific distance. For example if I have a line segment starting at 10,10 extending to 20,13 and I want to extend the length by by 3 how do I compute the new endpoint. I can get the length by sqrt(a^2 +b^2) in this example 10.44 so if I wanted to know the new endpoint from 10,10 with a length of 13.44 what would be computationally the fastest way? I also know the slope but don't know if that helps me any in this case.
You can do it by finding unit vector of your line segment and scale it to your desired length, then translating end-point of your line segment with this vector. Assume your line segment end points are A and B and you want to extend after end-point B (and lenAB is length of line segment).
#include <math.h> // Needed for pow and sqrt.
struct Point
{
double x;
double y;
}
...
struct Point A, B, C;
double lenAB;
...
lenAB = sqrt(pow(A.x - B.x, 2.0) + pow(A.y - B.y, 2.0));
C.x = B.x + (B.x - A.x) / lenAB * length;
C.y = B.y + (B.y - A.y) / lenAB * length;
If you already have the slope you can compute the new point:
x = old_x + length * cos(alpha);
y = old_y + length * sin(alpha);
I haven't done this in a while so take it with a grain of salt.
I just stumbled upon this after searching for this myself, and to give you an out-of-the-box solution, you can have a look at the code inside a standard Vector class (in any language) and cherry pick what parts you need, but I ended up using one and the code looks like this :
vector.set(x,y);
vector.normalize();
vector.multiply(10000);// scale it by the amount that you want
Good luck !
The algorithm I'm talking about using would allow you to present it with x number of items with each having a range of a to b with the result being y. I would like to have an algorithm which would, when presented with the values as described would output the possibility of it happening.
For example, for two die. Since I already know them(due to the possible results being so low). It'd be able to tell you each of the possibilities.
The setup would be something like. x=2 a=1 b=6. If you wanted to know the chance of having it result in a 2. Then it'd simply spit out 1/36(or it's float value). If you put in 7 as the total sum, it'd tell you 6.
So my question is, is there a simple way to implement such a thing via an algorithm that is already written. Or does one have to go through every single iteration of each and every item to get the total number of combinations for each value.
The exact formula would also, give you the combinations to make each of the values from 1-12.
So it'd give you a distribution array with each one's combinations at each of the indexes. If it does 0-12. Then 0 would have 0, 1 would have 0, and 2 would have 1.
I feel like this is the type of problem that someone else has had and wanted to work with and has the algorithm already done. If anyone has an easy way to do this beyond simply just looping through every possible value would be awesome.
I have no idea why I want to have this problem solved, but for some reason today I just had this feeling of wanting to solve it. And since I've been googling, and using wolfram alpha, along with trying it myself. I think it's time to concede defeat and ask the community.
I'd like the algorithm to be in c, or maybe PHP(even though I'd rather it not be since it's a lot slower). The reason for c is simply because I want raw speed, and I don't want to have to deal with classes or objects.
Pseudo code, or C is the best ways show your algorithm.
Edit:
Also, if I offended the person with a 'b' in his name due to the thing about mathematics I'm sorry. Since I didn't mean to offend, but I wanted to just state that I didn't understand it. But the answer could've stayed on there since I'm sure there are people who might come to this question and understand the mathematics behind it.
Also I cannot decide which way that I want to code this up. I think I'll try using both and then decide which one I like more to see/use inside of my little library.
The final thing that I forgot to say is that, calculus is about four going on five years ago. My understanding of probability, statistics, and randomness come from my own learning via looking at code/reading wikipedia/reading books.
If anyone is curious what sparked this question. I had a book that I was putting off reading called The Drunkards Walk and then once I say XKCD 904, I decided it was time to finally get around to reading it. Then two nights ago, whilst I was going to sleep... I had pondered how to solve this question via a simple algorithm and was able to think of one.
My coding understanding of code comes from tinkering with other programs, seeing what happened when I broke something, and then trying my own things whilst looking over the documentation for the build in functions. I do understand big O notation from reading over wikipedia(as much as one can from that), and pseudo code was because it's so similar to python. I myself, cannot write pseudo code(or says the teachers in college). I kept getting notes like "make it less like real code make it more like pseudo code." That thing hasn't changed.
Edit 2: Incase anyone searching for this question just quickly wanted the code. I've included it below. It is licensed under the LGPLv3 since I'm sure that there exists closed-source equivalents of this code.
It should be fairly portable since it is written entirely in c. If one was wanting to make it into an extension in any of the various languages that are written in c, it should take very little effort to do so. I chose to 'mark' the first one that linked to "Ask Dr. Math" as the answer since it was the implementation that I have used for this question.
The first file's name is "sum_probability.c"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
/*!
* file_name: sum_probability.c
*
* Set of functions to calculate the probabilty of n number of items adding up to s
* with sides x. The question that this program relates to can be found at the url of
* http://stackoverflow.com/questions/6394120/
*
* Copyright 2011-2019, Macarthur Inbody
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the Lesser GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the Lesser GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/lgpl-3.0.html>.
*
* 2011-06-20 06:03:57 PM -0400
*
* These functions work by any input that is provided. For a function demonstrating it.
* Please look at the second source file at the post of the question on stack overflow.
* It also includes an answer for implenting it using recursion if that is your favored
* way of doing it. I personally do not feel comfortable working with recursion so that is
* why I went with the implementation that I have included.
*
*/
/*
* The following functions implement falling factorials so that we can
* do binomial coefficients more quickly.
* Via the following formula.
*
* K
* PROD (n-(k-i))/i
* i=1;
*
*/
//unsigned int return
unsigned int m_product_c( int k, int n){
int i=1;
float result=1;
for(i=1;i<=k;++i){
result=((n-(k-i))/i)*result;
}
return result;
}
//float return
float m_product_cf(float n, float k){
int i=1;
float result=1;
for(i=1;i<=k;++i){
result=((n-(k-i))/i)*result;
}
return result;
}
/*
* The following functions calculates the probability of n items with x sides
* that add up to a value of s. The formula for this is included below.
*
* The formula comes from. http://mathforum.org/library/drmath/view/52207.html
*
*s=sum
*n=number of items
*x=sides
*(s-n)/x
* SUM (-1)^k * C(n,k) * C(s-x*k-1,n-1)
* k=0
*
*/
float chance_calc_single(float min, float max, float amount, float desired_result){
float range=(max-min)+1;
float series=ceil((desired_result-amount)/range);
float i;
--amount;
float chances=0.0;
for(i=0;i<=series;++i){
chances=pow((-1),i)*m_product_cf(amount,i)*m_product_cf(desired_result-(range*i)-1,amount)+chances;
}
return chances;
}
And here is the file that shows the implementation as I said in the previous file.
#include "sum_probability.c"
/*
*
* file_name:test.c
*
* Function showing off the algorithms working. User provides input via a cli
* And it will give you the final result.
*
*/
int main(void){
int amount,min,max,desired_results;
printf("%s","Please enter the amount of items.\n");
scanf("%i",&amount);
printf("%s","Please enter the minimum value allowed.\n");
scanf("%i",&min);
printf("%s","Please enter the maximum value allowed.\n");
scanf("%i",&max);
printf("%s","Please enter the value you wish to have them add up to. \n");
scanf("%i",&desired_results);
printf("The total chances for %i is %f.\n", desired_results, chance_calc_single(min, max, amount, desired_results));
}
First of all, you do not need to worry about the range being from a to b. You can just subtract a*x from y and pretend the range goes from 0 to b-a. (Because each item contributes at least a to the sum... So you can subtract off that a once for each of your x items.)
Second, note that what you are really trying to do is count the number of ways of achieving a particular sum. The probability is just that count divided by a simple exponential (b-a+1)^x.
This problem was covered by "Ask Dr. Math" around a decade ago:
Link
His formulation is assuming dice numbered from 1 to X, so to use his answer, you probably want to shift your range by a-1 (rather than a) to convert it into that form.
His derivation uses generating functions which I feel deserve a little explanation. The idea is to define a polynomial f(z) such that the coefficient on z^n is the number of ways of rolling n. For a single 6-sided die, for example, this is the generating function:
z + z^2 + z^3 + z^4 + z^5 + z^6
...because there is one way of rolling each number from 1 to 6, and zero ways of rolling anything else.
Now, if you have two generating functions g(z) and h(z) for two sets of dice, it turns out the generating function for the union of those sets is just the product of g and h. (Stare at the "multiply two polynomials" operation for a while to convince yourself this is true.) For example, for two dice, we can just square the above expression to get:
z^2 + 2z^3 + 3z^4 +4z^5 + 5z^6 + 6z^7 + 5z^8 + 4z^9 + 3z^10 + 2z^11 + z^12
Notice how we can read the number of combinations directly off of the coefficients: 1 way to get a 2 (1*z^2), 6 ways to get a 7 (6*z^7), etc.
The cube of the expression would give us the generating function for three dice; the fourth power, four dice; and so on.
The power of this formulation comes when you write the generating functions in closed form, multiply, and then expand them again using the Binomial Theorem. I defer to Dr. Math's explanation for the details.
Let's say that f(a, b, n, x) represents the number of ways you can select n numbers between a and b, which sum up to x.
Then notice that:
f(a, b, n, x) = f(0, b-a, n, x-n*a)
Indeed, just take one way to achieve the sum of x and from each of the n numbers subtract a, then the total sum will become x - n*a and each of them will be between 0 and b-a.
Thus it's enough to write code to find f(0, m, n, x).
Now note that, all the ways to achieve the goal, such that the last number is c is:
f(0, m, n-1, x-c)
Indeed, we have n-1 numbers left and want the total sum to be x-c.
Then we have a recursive formula:
f(0,m,n,x) = f(0,m,n-1,x) + f(0,m,n-1,x-1) + ... + f(0,m,n-1,x-m)
where the summands on the right correspond to the last number being equal to 0, 1, ..., m
Now you can implement that using recursion, but this will be too slow.
However, there is a trick called memoized recursion, i.e. you save the result of the function, so that you don't have to compute it again (for the same arguments).
The memoized recursion will have complexity of O(m * n), because that's the number of different input parameters that you need to compute and save.
Once you have computed the count you need to divide by the total number of posiblities, which is (m+1)*n to get the final probability.
Number theory, statistics and combinatorics lead you to believe that to arrive at a numerical value for the probability of an event -- well you have to know 2 things:
the number of possible outcomes
within the set of total outcomes how many equal the outcome 'y' whose probability value you seek.
In pseudocode:
numPossibleOutcomes = calcNumOutcomes(x, a, b);
numSpecificOutcomes = calcSpecificOutcome(y);
probabilityOfOutcome = numSpecificOutcomes / numPossibleOutcomes;
Then just code up the 2 functions above which should be easy.
To get all possibilities, you could make a map of values:
for (i=a to b) {
for (j=a to b) {
map.put(i+j, 1+map.get(i+j))
}
}
For a more efficient way to count sums, you could use the pattern
6 7's, 5 6's, 4 5's, 3 4's, 2 3's, 1 two.
The pattern holds for n x n grid, there will be n (n+1)'s, with one less possibility for a sum 1 greater or less.
This will count the possibilities, for example, Count(6, 1/2/3/4/5/6) will give possibilities for sums of dice.
import math
def Count(poss,sumto):
return poss - math.fabs(sumto-(poss+1));
Edit: In C this would be:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>;
int count(int poss, int sumto)
{
return poss - abs(sumto-(poss+1));
}
int main(int argc, char** argv) {
printf("With two dice,\n");
int i;
for (i=1; i<= 13; i++)
{
printf("%d ways to sum to %d\n",count(6,i),i);
}
return (EXIT_SUCCESS);
}
gives:
With two dice,
0 ways to sum to 1
1 ways to sum to 2
2 ways to sum to 3
3 ways to sum to 4
4 ways to sum to 5
5 ways to sum to 6
6 ways to sum to 7
5 ways to sum to 8
4 ways to sum to 9
3 ways to sum to 10
2 ways to sum to 11
1 ways to sum to 12
0 ways to sum to 13