Initial Hidden Markov Model for the Baum Welch algorithm - c

While trying to make a program for hidden markov models, I did the simplest assumption for the initial HMM of the Baum-Welch algorithm : put everything as a uniform distribution. That is,
A[i][j] = 1/statenumber;
B[i][j] = 1/observationnumber;
P[i] = 1/statenumber;
up to a logarithm to avoid underflowing. It has the benefit of not requiring to check for normalization.
But so far, I've run into the algorithm not actually doing much. The emission matrix changes at the first iteration, but not after that, and the transition matrix and initialization vector do not evolve at all. It seems to be that the gamma matrix does not change at all.
At first I thought it was my algorithm not working out too well, but after trying it on some other HMM libraries, I seem get the same type of results.
Is it impossible to converge to the correct HMM using such an initialization, and what is the ideal method to initialize those arrays?

The Baum Welch algorithm won't work with a uniform initial distribution -- the updates will be degenerate. Try to randomize it instead.

Related

Why are the inputs to my guess_nonlinear() all 1s?

The N2 diagram for my full problem is below.
The N2 diagram for the coupled portion of the problem is below.
I have a DirectSolver handling the coupling between LLTForces and ImplicitLiftingLine, and an LNBGS solver handling the coupling between LiftingLineGroup and TestCL.
The gist for the problem is here: https://gist.github.com/eufren/31c0e569ed703b2aea3e2ef5360610f7
I have implemented guess_nonlinear() on ImplicitLiftingLine, which should use various outputs from LLTGeometry to give a good initial guess for the vortex strengths based on a linearised form of the governing equations.
def guess_nonlinear(self, inputs, outputs, resids):
freestream_unit_vector = inputs['freestream_unit_vector']
freestream_velocity = inputs['freestream_velocity']
n = inputs['normal_vectors']
A = inputs['surface_areas']
l = inputs['bound_vortices']
ic_tot = inputs['influence_coefficients_total']
v_inf = freestream_velocity
v_inf_vec = v_inf*freestream_unit_vector
lin_numerator = np.pi * v_inf * A * np.sum(n * v_inf_vec, axis=1)
lin_denominator = (np.linalg.norm(np.cross(v_inf_vec, l), axis=1) - np.pi * v_inf * A * np.sum(np.sum(n * ic_tot, axis=2), axis=1))
lin_vtx_str = lin_numerator / lin_denominator
outputs['vortex_strengths'] = lin_vtx_str
However, when the problem is run for the first time, any inputs not explicitly set with p.set_val() are all 1s. This causes guess_nonlinear() to give a bad output and so the system fails to converge:
As far as I can tell, the execution order for the LLT group is correct, and the geometry components should be being executed before the implicit component. I'm confused as to why this doesn't seem to actually be happening when the code is run, and instead these inputs are taking their default values.
What do I need to change to get this to work properly? Additionally, I've found difficulty in getting LNBGS to converge (hence adding guess_nonlinear()) during optimisation - only DirectSolver gets all the way through the optimisation without issues, but it's very slow for large numbers of LLT nodes). How can I improve the linear and nonlinear solver selection, and improve the reliability of the iterative solver?
Note: Thanks for providing a testable example. It made figuring out the answer to your question a lot simpler. Your problem was a bit subtle and I would not have been able to give a good answer without runnable code
Your first question: "Why are all the inputs 1"
"Short" Answer
You have put the nonlinear solver to high in the model hierarchy, which then included a key precurser component that computed your input values. By moving the solver down to a lower level of the model, I was able to ensure that the precurser component (LTTGeometry) ran and had valid outputs before you got to the guess_nonlinear of implicit component.
Here is what you had (Notice the implicit solver included LTTGeometry even though the data cycle does not require that component:
I moved both the nonlinear solver and the linear solver down into the LTTCycle group, which then allows the LTTGeometry component to execute before getting to the nonlinear solver and guess_nonlinear step:
My fix is only partially correct, since there is a secondary cycle from the TestCL component that also needs a solver and does not have one. However, that cycle still does not involve the LTTGeometry group. So the fully correct fix is to restructure you model top run geometry first, and then put the LTTCycle and TestCL groups together so you can run a solver over just them. That was a bit more hacking than I wanted to do on your test problem, but you can see the general idea from the adjusted N2 above.
Long Answer
The guess_nonlinear sequence in OpenMDAO does NOT run the compute method of explicit components or of groups. It follows the execution hierarchy, and calls any guess_nonlinear that it finds. So that means that any explicit components you have in your model will NOT get executed, their outputs will not get updated with computed values, and those computed values will not get passed to the inputs of downstream components.
Things get a little tricky when you have deep model hierarchies. The guess_nonlinear method is called as the first step in the nonlinear solver process. If you have a NonLinearRunOnce solver at the top level, it will follow the compute chain down the line calling compute or solve_nonlinear on each child and doing a data transfer after each one. If one of those children happens to be a group with a nonlinear solver, then that solver will call guess_nonlinear on its children (grandchildren of the top group with the NonLinearRunOnce solver) as the first step. So any outputs that were computed by the siblings of this group will be valid, but none of the outputs from the grandchild level will have been computed yet.
You may be wondering why not just have the guess_nonlinear method call the compute for any explicit components? There is a difficult to balance trade off here. If you assume that all explicit components are very cheap to run, then it might make sense to run the compute methods --- or it might not. A lot depends on the cyclic data structure. If some early component in the group needs guesses from the later one, then running its compute isn't going to help you much at all. Perhaps more importantly though, not all explicit components are cheap to run. You might have a very expensive computation, and calling compute as part of the guess process would be way too costly.
The compromise here, if you need some kind of top level guess process, is that you can implement guess_nonlinear at the group level. It's less common to do, but it gives you total control over what happens. You can call whatever you need to call in whatever sequence.
So the absolute key thing to remember is that the only data you have available to you when a guess_nonlinear is called is any data that was computed before your containing solver was executed. That means any thing that was computed before you got to the model scope of the containing solver (not the scope of the component with the guess_method itself).
Your second question: "How can I speed this up when the number of nodes gets large?"
This one not possible to give a generic answer to at all. I noticed that you have already specified sparse partial derivatives. That is a great start, but if its still not fast enough for you then it means you're reaching the limits of what you can do with a DirectSolver. You note that this solver is the only one that gets you through the optimization without issues, which I will take to mean that ScipyKryloventer link description here and PetscKrylov are not converging the linear system well for you --- at least not by themselves. Thats not surprising, as krylov solvers almost always require some kind of preconditioner... and this is why I can't offer a generic answer. Setting up efficient linear solvers for larger-scale compute is a tricky subject. If you look into the literature, you'll find some good suggestions. You can also study open source implementations like VSPAero for some tips.
effectively, you've reached the limit of what simple linear solvers can offer you. From this point forward, OpenMDAO can help a bit by making it easier to implement some preconditioning, but you'll have to suffer the math side yourself.

3D interpolation methods in C (or Fortran), and comparison to Shepard's Method

I would like to interpolate a 3D scalar function f(x, y, z). I have coded up a 3D linear interpolation algorithm (http://en.wikipedia.org/wiki/Trilinear_interpolation). This was not so bad.
However, I would like something more sophisticated, e.g. 3D cubic splines. Are there any open source, easy-to-use, publicly available code for interpolating a 3D scalar? I would prefer to use C, but Fortran would be OK as well. I would like to stay away from Matlab.
I have seen similar questions asked here:
Interpolating a scalar field in a 3D space
and
What are some good libraries for 3D interpolation?
The second one was OK with Matlab, which I am not.
As for the first one, the main suggestion was Shepard's method. I am curious how accurate Shepard's method is. For instance, in the case of a uniform grid, one can apply Shepard's method only to nearby grid points, and in that case does it tend to be more accurate than linear interpolation or cubic splines? I imagine not, but wasn't 100% sure, and if in fact it is not better, then I would prefer to find code using something like splines if any such codes are available.
Take a look at Geometric Tools for Interpolation:
templated C++ for tricubic, uniform B-splines, and much more.
(einspline, a C library for B-splines in 1d 2d 3d,
seems to be dormant in 2013; the author doesn't answer emails.
Also, it's C; C++ templates would reduce code bloat for interpolating
floats, colors, vecs ...)
I haven't used either of these.
On Inverse distance weighting
a.k.a. Shepard's method, you can take any number of neighbors: in 3d, 2^3 or 3^3 or 4^3 ...
A general problem is "sagging" — see the plot in the link.
"Accuracy" of any interpolation method is really hard to measure: what's "golden",
for what class of data / what noise ?
And you have two measures, error at the data and smoothness, to trade off
— for
photo enlargement
three:
aliasing, blurring and edge halos.
There's some theory on spline interpolation of band-limited functions, but afaik none at all for IDW.
Added:
What about the
bullseye effect ?
IDW is a terrible choice in almost every case.
It assumes that all of your input data points are local minimums or maximums!
Well, IDW can have peaks above nearby data points, if there are high peaks far away.
For example in 1d,
IDW( [0 0] [1 0] [2 y] ) = y/7 at x = 1/2.
But IDW weights ~ 1 / distance may be too spiky, fall off too fast, for some tasks.
Interpolation methods and kernels have to be chosen to fit specific data and noise — an art.
The bspline-fortran library does 2d-6d b-spline interpolation for data on a regular grid. It is written in modern Fortran (there is a basic subroutine interface and also an object-oriented interface).
vspline is a FOSS C++ template library for b-spline processing. It's dimension-agnostic, so you can use it for 3D data. It's focus is on efficiently processing large raster data sets with multithreaded SIMD code. If you're concerned about precision, it can use long doubles for calculations and has extremely precise precomputed constants for maximum fidelity.

Understanding this C function

I'm trying to understand how this function works, I have studied several algorithms to generate sudoku puzzles and found out this one.
Tested the function and it does generates a valid 9x9 Latin Square (Sudoku) Grid.
My problem is that I can't understand how the function works, i do know the struct is formed by to ints, p and b , p will hold the number for the cell in the table, But after that I don't understand why it creates more arrays (tab 1 and tab2) and how it checks for a latin square =/ etc , summarizing , I'm completely lost.
I'm not asking for a line by line explanation, the general concept behind this function.
would help me a lot !
Thanks again <3
int sudoku(struct sudoku tabla[9][9],int x,int y)
{
int tab[9] = {1,1,1,1,1,1,1,1,1};
int i,j;
for(i=0;i<y;++i)
{
tab[tabla[x][i].p-1]=0;
for(i=0;i<x;++i)
{
tab[tabla[i][y].p-1]=0;
}
for(i=(3*(x/3));i<(3*(x/3)+3);++i)
{
for(j=(3*(y/3));j<y;++j)
{
tab[tabla[i][j].p-1]=0;
}
}
int n=0;
for(i=0;i<9;++i)
{
n=n+tab[i];
}
int *tab2;
tab2=(int*)malloc(sizeof(int)*n);
j=0;
for(i=0;i<9;++i)
{ if(tab[i]==1)
{
tab2[j]=i+1;
j++;
}
}
int ny, nx;
if(x==8)
{
ny=y+1;
nx=0;
}
else
{
ny=y;
nx=x+1;
}
while(n>0)
{
int los=rand()%n;
tabla[x][y].p=tab2[los];
tab2[los]=tab2[n-1];
n--;
if(x==8 && y==8)
{
return 1;
}
if (sudoku(tabla,nx,ny)==1)
{
return 1;
}
}
return 0;
}
EDIT
Great, I now understand the structure, thanks lijie's answer. What I still don't understand is the part that tries out the values in random order). I don't understand how it checks if the random value placement is valid without calling the part of the code that checks if the movement is legal, also, after placing the random numbers is it necessary to check if the grid is valid again? –
Basically, the an invocation of the function fills in the positions at and "after" (x, y) in the table tabla, and the function assumes that the positions "prior" to (x, y) are filled, and returns whether a legal "filling in" of the values is possible.
The board is linearized via increasing x, then y.
The first part of the function finds out the values that are legal at (x, y), and the second part tries out the values in a random order, and attempts fills out the rest of the board via a recursive call.
There isn't actually a point in having tab2 because tab can be reused for that purpose, and the function leaks memory (since it is never freed, but aside from these, it works).
Does this make sense to you?
EDIT
The only tricky area in the part that checks for legal number is the third loop (checking the 3x3 box). The condition for j is j < y because those values where j == y are already checked by the second loop.
EDIT2
I nitpick, but the part that counts n and fills tab2 with the legal values should really be
int n = 0;
for (i = 0; i < 9; ++i) if (tab[i]) tab[n++] = i+1;
hence omitting the need for tab2 (the later code can just use tab and n instead of tab2). The memory leak is thusly eliminated.
EDIT
Note that the randomness is only applied to valid values (the order of trying the values is randomized, not the values themselves).
The code follows a standard exhaustive search pattern: try each possible candidate value, immediately returning if the search succeeds, and backtracking with failure if all the candidate values fail.
Try to solve sudoku yourself, and you'll see that there is inherent recursion in finding a solution to it. So, you have function that calls itself until whole board is solved.
As for code, it can be significantly simplified, but it will be for the best if you try to write one yourself.
EDIT:
Here is one from java, maybe it will be similar to what you are trying to do.
A quick description of the principles - ignoring the example you posted. Hopefully with the idea, you can tie it to the example yourself.
The basic approach is something that was the basis of a lot of "Artificial Intelligence", at least as it was seen until about the end of the 80s. The most general solution to many puzzles is basically to try all possible solutions.
So, first you try all possible solutions with a 1 in the top-left corner, then all possible solutions with a 2 in the top-left corner and so on. You recurse to try the options for the second position, third position and so on. This is called exhaustive search - or "brute force".
Trouble is it takes pretty much forever - but you can short-cut a lot of pointless searching.
For example, having placed a 1 in the top-left corner, you recurse. You place a 1 in the next position and recurse again - but now you detect that you've violated two rules (two ones in a row, two ones in a 3x3 block) even without filling in the rest of the board. So you "backtrack" - ie exit the recursion to the previous level and advance to putting a 2 in that second position.
This avoids a lot of searching, and makes things practical. There are further optimisations, as well, if you keep track of the digits still unused in each row, column and block - think about the intersection of those sets.
What I described is actually a solution algorithm (if you allow for some cells already being filled in). Generating a random solved sudoku is the same thing but, for each digit position, you have to try the digits in random order. This also leaves the problem of deciding which cells to leave blank while ensuring the puzzle can still be solved and (much harder) designing puzzles with a level-of-difficulty setting. But in a way, the basic approach to those problems is already here - you can test whether a particular set of left-blank spaces is valid by running the solution algorithm and finding if (and how many) solutions you get, for example, so you can design a search for a valid set of cells left blank.
The level-of-difficulty thing is difficult because it depends on a human perception of difficulty. Hmmm - can I fit "difficult" in there again somewhere...
One approach - design a more sophisticated search algorithm which uses typical human rules-of-thumb in preference to recursive searching, and which judges difficulty as the deepest level of recursion needed. Some rules of thumb might also be judged more advanced than others, so that using them more counts towards difficulty. Obviously difficulty is subjective, so there's no one right answer to how precisely the scoring should be done.
That gives you a measure of difficulty for a particular puzzle. Designing a puzzle directly for a level of difficulty will be hard - but when trying different selections of cells to leave blank, you can try multiple options, keep track of all the difficulty scores, and at the end select the one that was nearest to your target difficulty level.

How do I implement a bandpass filter in C (Purpose: pitch detection)?

I recently asked this question:
I am looking for an algorithm to detect pitch. one of the answers suggested that I use an initial FFT to get the basic frequency response, figure out which frequencies are getting voiced, and follow it up with a band pass filter in each area of interest:
A slightly advanced algorithm could do something like this:
Roughly detect pitch frequency (could be done with DFT).
Bandpass signal to filter isolate pitch frequency.
Count the number of samples between two peaks in the filtered signals.
Now I can do the first step okay ( I am coding for iOS, and Apple has a framework (the accelerate framework) for doing FFTs etc.
I have made a start here: but I can see the problem: an FFT that would differentiate all of the possible notes one could sing would require a lot of samples, and I don't want to perform too much unnecessary computation as I'm targeting a mobile device.
So I'm trying to get my head round this answer above, but I don't understand how I could apply the concept of a band pass filter to code.
Can anyone help?
Filter design is pretty complex. There are many techniques. First you have to decide what kind of filter you want to create. Finite impulse response (FIR)? Infinite impulse response (IIR)? Then you select an algorithm for designing a filter of that type. The Remez algorithm is often used for FIR filter design. Go here to see the complexity that I was referring to: http://en.wikipedia.org/wiki/Remez_algorithm
Your best best for creating a filter is to use an existing signal processing library. A quick Google search led me here: http://spuc.sourceforge.net/
Given what your application is, you may want to read about matched filters. I am not sure if they are relevant here, but they might be. http://en.wikipedia.org/wiki/Matched_filter
well in Wikipedia, checkup on low-pass filter, and hi-pass, then join them to make a band-pass filter. Wikipedia has code implementations for those two filters.
http://en.wikipedia.org/wiki/Low-pass_filter
http://en.wikipedia.org/wiki/High-pass_filter
Since you only want to detect a single frequency, it would be an overkill to perform a DFT to then only use one of the values.
You could implement the Goertzel algorithm. Like this C implementation used to detect DTMF tones over a phone line, from the FreePBX source code:
float goertzel(short x[], int nmax, float coeff) {
float s, power;
float sprev, sprev2;
int n;
sprev = 0;
sprev2 = 0;
for(n=0; n<nmax; n++) {
s = x[n] + coeff * sprev - sprev2;
sprev2 = sprev;
sprev = s;
}
power = sprev2*sprev2 + sprev*sprev - coeff*sprev*sprev2;
return power;
}
As you can see, the implementation is fairly trivial and quite effective for single frequencies. Check the link for different versions with and without floating point, and how to use it.

mean and variance of image in single pass

am trying to calculate mean and variance using 3X3 window over image(hXw) in opencv...here is my code...is there any accuracy issues with this??or is there any other efficient method to do it in one pass.?
int pi,a,b;
for(i=1;i<h-1;i++)
{
for(j=1;j<w-1;j++)
{ int sq=0,sum=0;
double mean=0;
double var=0;
for(a=-1;a<=1;a++)
{
for(b=-1;b<=1;b++)
{
pi=data[(i+a)*step+(j+b)];
sq=pi*pi;
sum=sum+sq;
mean=mean+pi;
}
}
mean=mean/9;
double soa=mean*mean;//square of average
double aos=sum/9;//mean of squares
double var=aos-soa;//variance
}
}
With respect to computational efficiency I would recommend doing this in the Fourier domain instead of the time (image) domain using convolutions. Remember, a convolution is a simple multiplication in the Fourier domain. Just like in time series where the spectral density function is the variance decomposed as a function of frequency, one can extend this into two dimensions for an image. Should be much better than nested for-loops.
I don't have the code on me at the moment. but this technique has been used in algorithms like "fast template matching" for object detection or image registration.
That is a pretty well-researched topic, see e.g. this Wikipedia article on variance calculations.
One of the issues that sometimes gets mentioned is accumulated numerical errors; you need to decide if that may be an issue. If the values you compute over are similar in range that it may be less of an issue.
You should be fine even with floats over such a small number of pixels. Typically you need doubles if you're doing this kind of thing over an entire image.
You should better use image integrals for quick local mean and standard deviation calculation!
All you need in that case is to correctly calculate the boundaries of the mask window at each position of the image. It will be much more faster.
If you will need a sample code, please ask for that.

Resources