Back propagation through time, understanding - artificial-intelligence

Back_Propagation_Through_Time(a, y) // a[t] is the input at time t. y[t] is the output
Unfold the network to contain k instances of f
do until stopping criteria is met:
x = the zero-magnitude vector;// x is the current context
for t from 0 to n - 1 // t is time. n is the length of the training sequence
Set the network inputs to x, a[t], a[t+1], ..., a[t+k-1]
p = forward-propagate the inputs over the whole unfolded network
e = y[t+k] - p; // error = target - prediction
Back-propagate the error, e, back across the whole unfolded network
Update all the weights in the network
Average the weights in each instance of f together, so that each f is identical
x = f(x); // compute the context for the next time-step
Hey,
I don't understand the concept of the algorithm above, are we creating k instances of a neural network f (k copies) and then passing a[t] as input and x as input and what is x = f(x)?
Thanks for your help

are we creating k instances of a neural network
Kind of. In a recurrent neural network the output of the network for some input xi depends on each of the xi-j that came before it. So when you call the network with an input of length k, the network can effectively be "unfolded" into k networks, each feeding into each other sequentially.
After unfolding, the recurrent neural network looks a lot like a traditional neural network with hidden layers. We can use the backpropagation algorithm to then assign error and update our weights.
what is x = f(x)?
The 'context' or x is like the 'memory' of the neural network. It's like time-varying output depending on which iteration of the recurrent network you're in. It's initialized to all zeros at the start because there is no memory. We compute it with x = f(x) because the output of a previous layer forms part of the input for the next layer (the other part being a[ti])

Related

Network Formation and Large Array's in Matlab Optimization

I am getting an error using repmat. My Matlab version is 2017a. "Requested 3711450x2726 (75.4GB) array exceeds maximum array size..." First, some context.
I have an adjacency matrix of social network data call it D. D is 2725x2725 with 1s denoting a link between agents i and j and 0s otherwise. I have been provided a function and sub-functions for a network formation model. There are K regressors (x variables). The model requires forming a dyad-specific regressor matrix W that is W = 0.5N(N-1) x K. In my data, this is 3711450 x K. For a start, I select only one x variable so K=1.
In the main function, there are two steps. The first step calculates the joint MLE from a logit. I have a problem in the second step computation of the variance covariance matrix with array size. Inside this step, there is a calculation that creates a 3711450 x n (2725) matrix using repmat.
INFO = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
exp_Xbeta is 3711450 x K and X is a sparse 3711450 x 2725 matrix with Bytes = 178171416 of class double. The error occurs at INFO.
I've tried converting X to a tall matrix but thus far no joy. I've tried adding sparse to the INFO line but again no joy. Anyone have any ideas short of going to a cluster or getting more ram? Could I somehow convert X from a sparse matrix to a full matrix inside a datastore and then call the datastore using tall? I have not been able to figure out how to do that if it is possible.
Once INFO is constructed as an array it will be used later in one of the sub-functions. So, it needs to be callable. In case you're curious, INFO is the second derivative matrix.
I have found that producing the INFO matrix all at once was too much for my memory constraints. I split up the steps, but still, repmat and subsequent steps were a problem. Now, I've turned to building up the INFO matrix one step at a time, while never holding more than exp_Xbeta, X, and two vectors in memory. Replacing the construction of INFO with
for i = 1:d
s1_i = step1(:,1).*X(:,i);
s1_i = s1_i';
for j = 1:d;
INFO(i,j) = s1_i*X(:,j);
end
clear s1_i;
end
has dropped the memory requirement, though its slow, and things seem to be working. For anyone interested, below is a little example illustrating the point.
clear all
N = 20
n = 0.5*N*(N-1)
exp_Xbeta = rand(n,1);
X = rand(n,N);
step1 = (exp_Xbeta ./ (1+exp_Xbeta).^2);
[c,d] = size(X);
INFO = zeros(d,d);
for i = 1:d
s1_i = step1(:,1).*X(:,i)
s1_i = s1_i'
for j = 1:d
INFO(i,j) = s1_i*X(:,j)
end
clear s1_i
end
K = 1
INFO2 = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
% Methods produce equivalent matrices
INFO
INFO2

Neural network for linear regression

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}
Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Matlab: how to use an array in dsolve function?

I have an ODE system of two equations, but want to minimize it with using just one equation with the result of the other.
1)
t=linspace(0,2,3);
syms x(t) y(t);
inits='x(0)=2,y(0)=0';
[x,y]=dsolve('Dx=y','Dy=(y*2)-x', inits)
x = 2*exp(t) - 2*t*exp(t);
y = -2*t*exp(t)
xx=eval(vectorize(x));
xx = 2.0000; 0; -14.7781
yy=eval(vectorize(y));
yy = 0; -5.4366; -29.5562
After I had got the results, I tried to solve it just with one equation and use xx array in Dy equation.
2)
inits='y(0)=0';
[y]=dsolve('Dy=(y*2)-xx', inits);
y = xx/2 - (xx*exp(2*t))/2
yy=eval(vectorize(y));
yy = 0; 0; 396.0397
The values are not the same as it was in the first example. How to get the same result using array?
One problem seems to be that variable xx is not symbolic, so the symbolic solver appears to be considering it as a constant.
A bigger problem is that you really haven't identified how exactly you want matlab to treat the xx values as a continuous function, when it's merely a vector of three points! The fact that you are even expecting the output to be the same for the second case indicates some kind of misunderstanding to me.
But to make this definite, lets assume that you want it to treat xx as a ZOH (zero order held) continuous signal. To handle this symbolically I believe you would need to construct the ZOH signal explicitly using Heaviside functions.
Alternative you could solve it numerically using ode45 for example
t = [0,1,2];
xx = [2, 0, -14.7781];
dydy = #(t,y) 2*y - xx(1+trunc(t));
y = ode45(dydt, [0,2], 0);
This will return yy values of [0, -6.39, -47.21] at the t values of [0, 1, 2] respectively.
This corresponds well with the theoretical values (calculated by hand) of [0, 1-e^2, e^2-e^4] for the ZOH system.
As you can see the above answer is much more in line with your original solution of yy = [0, -5.4366, -29.5562]. Naturally however the two systems differ, as the first one was fed with a continuous time exponential signal whereas the second system was fed with a very coarsely sampled approximation!
You make the two more similar by sampling at a faster rate (finer time interval), and also by interpolating the intersample points with something better than a ZOH.
Update:
Thank you. Maybe can you help me with creating ZOH continuous signal? How to do that?
In the above example I created a ZOH in my derivative function (dydx) by using the three given points in the xx vector and accessing these using "xx(1+trunc(t))". This uses trunc (truncate) to explicitly hold the input constant during the inter-sample (non integer) times.
Seeing as your ODE is linear, you could also use the matlab function "lsim()" which allows you to directly specify the time vector and input vector, and also to directly specify the type of input interpolation (including ZOH, which is actually the default).
For example:
t=[0,1,2]
x=[2,0,-2*e^2]
num=-1
den=[1,-2]
mytf = tf(num,den)
y = lsim(mytf,x,t,0,'zoh');
As with my previous ode45 numerical solution, this gives the identical solution of,
y = [0.00000; -6.38906; -47.20909]
Update (again)
Thank you. Maybe can you help me with creating ZOH continuous signal? How to do that?
Re the symbolic solver. I don't have access to the Matlab symbolic library, but if you really want to use the symbolic solver, then as I explained previous, you can construct a continuous time ZOH signal using the heaviside (unit step) function. Something like the following should do it:
syms xzoh(t)
xzoh = xx(1)*heaviside(t) + (xx(2)-xx(1))*heaviside(t-1) + (xx(3)-xx(2))*heaviside(t-2)

How do I implement a bandpass filter given by this equation?

I'm messing around with some audio stuff and the algorithm I'm trying to implement calls for a band-pass second-order FIR filter given by the equation
H(z) = z - z^(-1)
How do I implement such a bandpass filter in C?
I have raw audio data as well as an FFT on that audio data available to me, but I'm still not sure how to implement this filter, neither am I sure exactly what the equation means.
In the image below, I am trying to implement HF3:
z^-1 is a unit (one sample) delay, z is one sample into the future. So your filter output at sample i depends on input samples at i-1 and i+1. (In general you can think of z^-n is an n sample delay.)
If you have time domain samples in an input buffer x[], and you want to filter these samples to an an output buffer y[], then you would implement the given transfer function like this:
y[i] = x[i+1] - x[i-1]
E.g. in C you might process a buffer of N samples like this:
for (i = 1; i < N - 1; ++i)
{
y[i] = x[i + 1] - x[i - 1];
}
This is a very simple first-order non-recursive high pass filter - it has zeroes at +1 and -1, so the magnitude response is zero at DC (0) and at Nyquist (Fs / 2), and it peaks at Fs / 4. So it's a very broad bandpass filter.
A FIR filter multiplies by coefficients and accumulates a bunch of adjacent input data samples for every output data sample. The number of coefficients will be the same as the number of z terms on the right size of your Z transform.
Note that a bandpass FIR filter usually requires a lot more terms or coefficients, roughly proportional to the steepness of the bandpass transitions desired, so 2 taps is probably too short for any useful bandpass filtering.

Genetic Programming with the Mandelbrot Set

I'm reading a chapter in this fascinating book about using genetic programming to interactively evolve images. Most of the function set is comprised of simple arithmetic and trig functions (which really operation on and return images). These functions make up the internal nodes of the parse trees that encode our images. The leaves of the tree, or the terminal values, are random numbers and x,y coordinates.
There's a section about adding iterative functions of the complex plane to the function set:
Say the genetics inserts a particular Mandelbrot set as a node somewhere in
a bushy tree. The function expects two arguments: mandel(cReal, cImag), treating
them as real and imaginary coordinates in the complex plane. If the genome
just happened to supply the pixel coordinates (x,y), and mandel() were the root
node, you would get the familiar Mset. But chances are that cReal and cImag are themselves the results of whole branches of functions, with many instances of coordinates
x,y scattered out among the leaves. Enter the iteration loop, orbit
around for a while, and finally escape with some measure of distance to the Mset
attractor, such as the number of iterations.
My question is how would you make a Mandelbrot set renderer as a function that takes the real and imaginary coordinates of a point on the complex plane as arguments and returns a rendering of the Mandelbrot set?
I'm not sure if this actually answers your question, but my understanding of the text you quoted simply says that the mandel function is just another function (like multiplication, min, max, addition, etc) that can appear in your genetic program.
The mandel function, like the multiplication function, takes two arguments (in_1 and in_2) and returns a single value. Whereas the multiplication function just returns in_1 * in_2, the mandel function might do something like this:
int mandel(int in_1, int in_2) {
x = 0
y = 0
iteration = 0
max_iteration = 1000
while( x*x + y*y <= (2*2) && iteration < max_iteration ) {
xtemp = x*x - y*y + in_1
y = 2*x*y + in_2
x = xtemp
++iteration
}
if( iteration == max_iteration ) return 0
else return iteration
}
If your whole genetic program tree consists of nothing but the mandel function with one input as x and the other input as y, then repeatedly evaluating your program for a bunch of different (x,y) values and saving the result will give you a nice picture of the Mandelbrot set.
Of course, the neat thing about genetic programming is that the inputs can be fancier than just x and y. For example, what would the result look like if one input was x and the other input was x + 2*y? Or if one input was x and the other was mandel(x,y)?

Resources