How to train a Support Vector Machine(svm) classifier with openCV with facial features? - c

I want to use the svm classifier for facial expression detection. I know opencv has a svm api, but I have no clue what should be the input to train the classifier. I have read many papers till now, all of them says after facial feature detection train the classifier.
so far what I did,
Face detection,
16 facial points calculation in every frame. below is an output of facial feature detection![enter image description
A vector which holds the features points pixel address
Note: I know how I can train the SVM only with positive and negative images, I saw this codehere, But I don't know how I combine the facial feature information with it.
Can anybody please help me to start the classification with svm.
a. what should be the sample input to train the classifier?
b. How do I train the classifier with this facial feature points?
Regards,

the machine learning algos in opencv all come with a similar interface. to train it, you pass a NxM Mat offeatures (N rows, each feature one row with length M) and a Nx1 Mat with the class-labels. like this:
//traindata //trainlabels
f e a t u r e 1
f e a t u r e -1
f e a t u r e 1
f e a t u r e 1
f e a t u r e -1
for the prediction, you fill a Mat with 1 row in the same way, and it will return the predicted label
so, let's say, your 16 facial points are stored in a vector, you would do like:
Mat trainData; // start empty
Mat labels;
for all facial_point_vecs:
{
for( size_t i=0; i<16; i++ )
{
trainData.push_back(point[i]);
}
labels.push_back(label); // 1 or -1
}
// now here comes the magic:
// reshape it, so it has N rows, each being a flat float, x,y,x,y,x,y,x,y... 32 element array
trainData = trainData.reshape(1, 16*2); // numpoints*2 for x,y
// we have to convert to float:
trainData.convertTo(trainData,CV_32F);
SVM svm; // params omitted for simplicity (but that's where the *real* work starts..)
svm.train( trainData, labels );
//later predict:
vector<Point> points;
Mat testData = Mat(points).reshape(1,32); // flattened to 1 row
testData.convertTo(testData ,CV_32F);
float p = svm.predict( testData );

Face gesture recognition is a widely researched problem, and the appropriate features you need to use can be found by a very thorough study of the existing literature. Once you have the feature descriptor you believe to be good, you go on to train the SVM with those. Once you have trained the SVM with optimal parameters (found through cross-validation), you start testing the SVM model on unseen data, and you report the accuracy. That, in general, is the pipeline.
Now the part about SVMs:
SVM is a binary classifier- it can differentiate between two classes (though it can be extended to multiple classes as well). OpenCV has an inbuilt module for SVM in the ML library. The SVM class has two functions to begin with: train(..) and predict(..). To train the classifier, you give as in input a very large amount of sample feature descriptors, along with their class labels (usually -1 and +1). Remember the format OpenCV supports: every training sample has to be a row-vector. And each row will have one corresponding class label in the labels vector. So if you have a descriptor of length n, and you have m such sample descriptors, your training matrix would be m x n (m rows, each of length n), and the labels vector would be of length m. There is also a SVMParams object that contains properties like SVM-type and values for parameters like C that you'll have to specify.
Once trained, you extract features from an image, convert it into a single row format, and give to predict() and it'll tell you which class it belongs to (+1 or -1).
There's also a train_auto() with similar arguments with a similar format that gives you the optimum values of the SVM parameters.
Also check this detailed SO answer to see an example.
EDIT:
Assuming you have a Feature Descriptor that returns a vector of features, the algorithm would be something like:
Mat trainingMat, labelsMat;
for each image in training database:
feature = extractFeatures( image[i] );
Mat feature_row = alignAsRow( feature );
trainingMat.push_back( feature_row );
labelsMat.push_back( -1 or 1 ); //depending upon class.
mySvmObject.train( trainingMat, labelsMat, Mat(), Mat(), mySvmParams );
I don't presume that extractFeatures() and alignAsRow() are existing functions, you might need to write them yourself.

Related

What exactly is the output quaternion of slerp?

I'm trying to implement SLERP (described by Ken Shoemake in "Animating Rotation with Quaternion Curves)
I've read up on the topic on wikipedia (topic: quaternions, 1 and 2) and other sites and also searched stackoverflow about this problem. It seems like I understand the theory behind it, but oversee one small detail. I will use w for the scalar value of the quaternion
So initially I have two 3D vectors. Each vector has a representation in two coordinate systems (C and C'). My goal is to find a third representation of these vectors in the system "halfway" the initial two.
So what I do is I find the rotation matrix, which transform the vectors from C to C', which seems to work out quite fine.
My next step is to transform this matrix into a quaternion, which also works.
Now my issue is with the formula of slerp, which is:
slerp(q1, q2; u) = ((sin(1-u) * t)/ (sin t)) * q1 + (sin(ut)/sin t) * q2
(sorry can't upload images yet for a better representation: see source 1)
so I guess here u = 0.5, q1 is the vector I would like to rotate (with w=0) and q2 equals the quaternion I calculated previously. Theta is calculated from the dotproduct of the normalized vector and the (already) normalized quaternion.
So what I expect is that I get back a vector, rotated either from C to the third coordinate system or from C' to the third coordinate system.
My issue now is, that I don't see, how I will get a vector and not a quaternion. Meaning, how is it possible, that I will get a quaternion with (w=0), as by simply multiplying q2 with this factor won't set w to 0. Or is it something else I will get from this function?
What am I overseeing here?
Thanks for your help!
Seems like I figured it out. For someone with the same understanding problem:
slerp simply interpolates between two orientations, meaning between two actual rotations. So in my case, q1 is the quaternion corresponding to the identity matrix (so [1, 0, 0, 0]). q2 is the rotation. theta is still 0.5.
With the quaternion I get from this, I have to calculate the rotation with q^-1 v q. Where v is my vector I want to rotate. This can be calculated using the Hamilton product.

Uniformly sampling on hyperplanes

Given the vector size N, I want to generate a vector <s1,s2, ..., sn> that s1+s2+...+sn = S.
Known 0<S<1 and si < S. Also such vectors generated should be uniformly distributed.
Any code in C that helps explain would be great!
The code here seems to do the trick, though it's rather complex.
I would probably settle for a simpler rejection-based algorithm, namely: pick an orthonormal basis in n-dimensional space starting with the hyperplane's normal vector. Transform each of the points (S,0,0,0..0), (0,S,0,0..0) into that basis and store the minimum and maximum along each of the basis vectors. Sample uniformly each component in the new basis, except for the first one (the normal vector), which is always S, then transform back to the original space and check if the constraints are satisfied. If they are not, sample again.
P.S. I think this is more of a maths question, actually, could be a good idea to ask at http://maths.stackexchange.com or http://stats.stackexchange.com
[I'll skip "hyper-" prefix for simplicity]
One of possible ideas: generate many uniformly distributed points in some enclosing volume and project them on the target part of plane.
To get uniform distribution the volume must be shaped like the part of plane but with added margins along plane normal.
To uniformly generate points in such volumewe can enclose it in a cube and reject everything outside of the volume.
select margin, let's take margin=S for simplicity (once margin is positive it affects only performance)
generate a point in cube [-M,S+M]x[-M,S+M]x[-M,S+M]
if distance to the plane is more than M, reject the point and go to #2
project the point on the plane
check that projection falls into [0,S]x[0,S]x[0,S], if not - reject and go to #2
add this point to the resulting set and go to #2 is you need more points
The problem can be mapped to that of sampling on linear polytopes for which the common approaches are Monte Carlo methods, Random Walks, and hit-and-run methods (see https://www.jmlr.org/papers/volume19/18-158/18-158.pdf for examples a short comparison). It is related to linear programming, and can be extended to manifolds.
There is also the analysis of polytopes in compositional data analysis, e.g. https://link.springer.com/content/pdf/10.1023/A:1023818214614.pdf, which provide an invertible transformation between the plane and the polytope that can be used for sampling.
If you are working on low dimensions, you can use also rejection sampling. This means you first sample on the plane containing the polytope (defined by your inequalities). This later method is easy to implement (and wasteful, of course), the GNU Octave (I let the author of the question re-implement in C) code below is an example.
The first requirement is to get vector orthogonal to the hyperplane. For a sum of N variables this is n = (1,...,1). The second requirement is a point on the plane. For your example that could be p = (S,...,S)/N.
Now any point on the plane satisfies n^T * (x - p) = 0
we assume also that x_i >= 0
With these given you compute an orthonormal basis on the plane (the nullity of the vector n) and then create random combination on that bases. Finally you map back to the original space and apply your constraints on the generated samples.
# Example in 3D
dim = 3;
S = 1;
n = ones(dim, 1); # perpendicular vector
p = S * ones(dim, 1) / dim;
# null-space of the perpendicular vector (transposed, i.e. row vector)
# this generates a basis in the plane
V = null (n.');
# These steps are just to reduce the amount of samples that are rejected
# we build a tight bounding box
bb = S * eye(dim); # each column is a corner of the constrained region
# project on the null-space
w_bb = V \ (bb - repmat(p, 1, dim));
wmin = min (w_bb(:));
wmax = max (w_bb(:));
# random combinations and map back
nsamples = 1e3;
w = wmin + (wmax - wmin) * rand(dim - 1, nsamples);
x = V * w + p;
# mask the points inside the polytope
msk = true(1, nsamples);
for i = 1:dim
msk &= (x(i,:) >= 0);
endfor
x_in = x(:, msk); # inside the polytope (your samples)
x_out = x(:, !msk); # outside the polytope
# plot the results
scatter3 (x(1,:), x(2,:), x(3,:), 8, double(msk), 'filled');
hold on
plot3(bb(1,:), bb(2,:), bb(3,:), 'xr')
axis image

I need to translate 3d points relative to a triangle as if the triangle was somewhere else

I posted this on twitter a while ago but seeing how none of my followers appears to be a math/programming genius, I'll try my luck here as well. I got here because I found this which might contain part of my solution.
I described my problem in the following pdf document, containing a picture of what I'm trying to achieve.
To give some more details, I divided the pentagon's of a dodecahedron (12 pentagons) into triangles (5/pentagon, 60 triangles in total), then collected a set of data points relative to each of these triangles.
The idea is to generate terrain meshes for each individual triangle.
To do so, the data must be represented flat, in a 32K x 32K square (idTech4 Megatexture)
I have vaguely heard of transformation matrices, which when set up properly, could do the trick of passing all the data points trough them to have them show up in the right place.
I looked at this source code here but I don't understand how I'm supposed to get the points in and/or out of there, not to mention how to do the setup so I can present each point in turn and get the result point back.
I got as fas as identifying the point that belongs in the back right corner. All my 3D points are originally stored in latitude / longitude pairs. I retrieve the 3D vectors this way:
coord getcoord(point* p)
{
coord c;
c.x=cos(p->lat*pi/180.l) * cos(p->lon*pi/180.l);
c.y=cos(p->lat*pi/180.l) * sin(p->lon*pi/180.l);
c.z=sin(p->lat*pi/180.l);
return c;
};
My thought is that if I can find the center of my triangle, and discover how to offset my angles so the vector from the center of my sphere to the middle of the triangle moves to 90N then my points would already be in the right plane if I rotated them all along the same angles. If I then convert them all to 3d and subtracti the radius from y, they'll be at the correct y position as well.
Then all I'd need to do is the rotation, the scaling, and the moving to the final position.
There are several kinds of 'centers' for a triangle, I think the one I need is the one that is equidistant to the corners of the triangle (Circumcenter?)
But then there might be an easier approach to the whole problem so while I continue my own research, perhaps some of you can help pointing me in the right direction.
It appears as if some sample data is in order, here are a few of these triangles in obj file format:
v 0.000000 0.000000 3396.000000
v 2061.582356 0.000000 2698.646733
v 637.063983 1960.681333 2698.646733
f 1 2 3
And another:
v -938.631230 2888.810129 1518.737455
v 637.063983 1960.681333 2698.646733
v 1030.791271 3172.449325 637.064076
f 1 2 3
You will notice that each point is at a distance of 3396 from 0,0,0
I mentioned 'on the sphere' meaning that the face away from the center of the sphere is the face that needs to become the 'top' when translated into the square.
Theoretically all these triangles should in fact have identical sizes, but due to rounding errors in the math that generated them, this might not be entirely true.
If I'm not mistaken I already took measures to ensure that the first point you see here is always the one opposite the longest border, so it's the one that should go in the far left corner (testing the above 2 samples confirms this, but I'm measuring anyway just to be sure)
Both legs leading away from this point should theoretically have the same length as well, but again rounding errors might slightly offset that.
If I've done it correctly then the longer side is 1,113587 times longer than the 2 shorter sides. Assuming those are identical, then doing some goal seeking in excel, I can deduct that the final points, assuming I was just translating this triangle, should look like:
v 16384.000000 0.000000 16384.000000
v -16384.000000 0.000000 9916.165306
v 9916.165306 0.000000 -16384.000000
f 1 2 3
So I need to setup the matrix to do this transformation, preferably using the 4x4 matrix as explained below.
I would recommend using transform matrices. The 3d transform matrix is a 4x4 data structure which describes a translation and rotation (and possibly a scale). Once you have a matrix you can transform a point like so
result.x = (tmp->pt.x * m->element[0][0]) +
(tmp->pt.y * m->element[1][0]) +
(tmp->pt.z * m->element[2][0]) +
m->element[3][0];
result.y = (tmp->pt.x * m->element[0][1]) +
(tmp->pt.y * m->element[1][1]) +
(tmp->pt.z * m->element[2][1]) +
m->element[3][1];
result.z = (tmp->pt.x * m->element[0][2]) +
(tmp->pt.y * m->element[1][2]) +
(tmp->pt.z * m->element[2][2]) +
m->element[3][2];
int w = (tmp->pt.x * m->element[0][3]) + (tmp->pt.y * m->element[1][3])
+ (tmp->pt.z * m->element[2][3]) + m->element[3][3];
if (w!=0 || w!=1)
result.x/=w; result.y/=w; result.z/=w;
This will transform the 3D point pt by the matrix m. If you now a little matrix math you'll see i'm just multiplying my origin point as a vector against the matrix (and doing a little normalization if it is a skew matrix.) Matrices can be multiplied together to form complicated transformations so they are very useful.
For details on making matrices suggest reading this link.
http://en.wikipedia.org/wiki/Transformation_matrix

Checking Triangle Similarity in C

The problem set asks me to create two triangles, defining them using points, and then checking if they're similar.
I did first part: created a struct point and a struct triangle, as the profesor told us to. To solve the problem of checking similarity, I thought I could use the points to define vectors, and them use the law of cosines to calculate its angles, together with some if sentences to check if the triangles are similar.
Which codes could help me achieve that? I could not find anything that I'd be able to turn into a partial solution.
What you said does the trick!
For the first triangle, take some measures, like as you said: an angle (or its cosine - easy to calculate with a dot product) on any vertex and the lengths of the sides next to it.
For another triangle, use if-conditions to see if the angle (or its cosine) is the same, and if the ratios of the lengths are also the same. You'd have to do this check from all 3 vertices in this way (if at least one fits, then the triangles are similar).
A faster way would be to always start with (for instnace) the vertex with the smallest angle, then you'd need to only compare once.
Now go code it! :-)
You are given coordinates of all three points of each triangle. Let us consider two triangles T1 A(a1,a2) B(b1,b2) C(c1, c2), T2 P(p1,p2) Q(q1,q2) R(r1,r2).
a = length of opposite side of vertex A
b = length of opposite side of vertex B
c = length of opposite side of vertex C
similarly p,q,r of triangle T2
So, for the two triangles to be similar, it has to follow the following conditions
1. AB = PQ; BC = QR; CA = RP
(We don't need their directions, So I am considering only magnitudes)
2. angle (A) = angle(B) i.e angle(BAC) = angle(QPR);
angle(B) = angle(Q) i.e angle(CBA) = angle (RQP) and
angle(C) = angle(R).
Now, you got to use coordinate geometry/ spherical geometry here.
COS (A) = ( b^2 + c^2 - a^2 )/2bc
COS (B) = ( c^2 + a^2 - b^2 )/2ac
COS (C) = (a^2 + b^2 - c^2)/2ab
Note:: As cosine is periodic with 2*pi, please make sure that you have exact angle. So, why don't you think of using inverse cosine functions where you get principle angles.(I am not sure of them, as how they work. please do check)
(Similarly for P,Q,R of triangle T2).
Actually there is another rule by which its easy to do.
law: a/sin(A) = b/sin(B) = c/sin(C).
I think you have to go through Spherical Geometry
I hope this helps you to do the program.
How to do the program:
Actually, its fine if you want to use structures. Create a structure with fields of 3 sides and 3 angles. Thus you need to take two variables under structure type and compare those quantities mentioned above.
If they satisfy, they are similar triangles.
I hope this helps you.

Resampling a sound sample, what filter do I use?

I am trying to resample a signal (sound sample) from one sampling rate, to a higher sampling rate.
Unfortunately it needs some kind of filter, as some 'aliasing' appears to occur, and I'm not familiar with filters. Here is what I came up with:
int i, j, a, b, z;
a = 44100;
b = 8363;
// upsample by a
for(i = z = 0; i < samplen; i++)
for(j = 0; j < a; j++)
cbuf[z++] = sampdata[i];
// some filter goes here???
// downsample by b
for(j = i = 0; i < z; i += b)
buf[j++] = cbuf[i];
The new sample is very similar to the original, but it has some kind of noise.
Can you please tell me what filter I need to add, and preferably some code related to that filter?
Original sound: http://www.mediafire.com/?9gnga1in52d6t4x
Resampled sound: http://www.mediafire.com/?x34h7ggk8n9k8z1
Don't use linear interpolation unless both sample rates (source and destination) are well above the highest frequency in your data. It's a very poor low-pass filter.
What you want is an interpolating low pass filter with a stop-band starting below half the lower of the two sample rates you are dealing with. Common methods of implementing this are upsampling/downsampling using IIR filters, and using poly-phase FIR filters. A windowed Sinc interpolator also works well for this if you don't need real-time performance, and don't want to upsample/downsample. Here's a Windowed Sinc interpolating low-pass filter in Basic, that should be trivial to convert into C.
If you want to use IIR filtering, here's the canonical Cookbook for biquad IIR filters.
If you want the best explanation of audio resampling theory, here's Stanford CCRMA's Resampling page.
Have you considered using a specialised library for this, such as libsamplerate?
It is quite portable and it is developed by people who know how to do things like this correctly. Even if you do not use it directly, you might find the algorithms it implements quite interesting.
A few comments, although I'm only guessing at your actual intent:
You are up-sampling at a rate 44100 times the original sample rate. For example, if your input was at 10kHz your intermediate cbuf[] would be at 441MHz which is a tad high for most audio analysis. Assuming you want cbuf[] to be at 44100Hz then you only need to create 44100/OrigSampleRate of samples in cbuf[] per sample in sampdata[].
You are incrementing z twice in the up-sampling loop. This results in all odd elements of cbuf[] to have their original values. I believe this ultimately results in the final buf[] having invalid odd elements which may be the source of your noise. There is also a potential buffer overflow in cbuf if you didn't create it with at least twice the required number of elements.
As mentioned by Steve a linear interpolation is generally the simplest that creates a good result when up-sampling. More complicated up-sampling can be done if desired (polynomials, splines, etc...). Similarly, when down-sampling you may wish to average samples instead of just truncating.
Best resampling code I ever come across: http://shibatch.sourceforge.net/
Take the source, and try to learn something from it. It is in nasty condition, but results of that resampler are far above everything else.
Use FFMpeg and avcodec directly. Here's a good example showing how to do this:
http://tdistler.com/projects/audio-resampling-with-ffmpeg
Before you resample to a lower sample rate you MUST low pass filter the original less than 1/2 times the sample rate or you will introduce alizing artifacts. The spectrum will fold back upon itself for frequencies more than 1/2 the sample rate. So if you want to resample to 11025 from 44100 you must filter the 44100 lowpassa at 1/2 of 11025 or 5500 Hz since faithfulness of reproduction decreases with lower bandwidths its best to do this with max amplitude like -10Db of amplitude. For 16 bits signed the value is like 10^(-10/20)*2^(16-1) or 10362 +/- for max amplitude. The exact algorithms might be found online since there should be no intellectual rights for these old and basic ideas. After doing all calculations with no rounding double precision floating point then you round the results to their proper integer values and interpolate on the time scale exactly where the one set intercepts the other. It requires quite an imagination and memory and previous experience which then puts you in the realm of the mathematician physics programmer. :-O :-)
Linear interpolation works quite well here. The issues is with the author's code, it's not linear interpolation - it's just taking the nearest value without any interpolation at all.
Here is an example of linear interpolation with source sample rate = 5 and destination sample rate = 6
src val: 5 10 5 0 5 (this is our audio data, 5 samples)
src idx: 0 1 2 3 4 (source positions for 5 samples)
dst idx: 0 1 2 3 4 5 (destination positions for 6 samples)
dst val: ? ? ? ? ? ?
At first let's calculate scale factor:
scaleCF = srcSampleRate / dstSampleRate = 0.83333334
Let's look at dst[2]
For dst index 2 we need to take part from src[1] and part from src[2]
Let's find nearest source indices and their contribution coeffitients:
idxD = (double)idx * cf; = 0.833333334 * 2 = 1.6666668
a = (int)idxD = (int)(1.6666668) = 1
b = a + 1 = 2
bCF = idxD - a = 1.6666668 - 1 = 0.6666668
aCF = 1.0 - bCF = 1.0 - 0.6666668 = 0.3333332
res = (float)(aCF * Data[a] + bCF * Data[b])
= 0.3333332 * 10 + 0.6666668 * 5 = 6.6666666
So our destination value at position 2 will be 6.6666666
Algorithm can be used for downsampling / upsampling.
Probably not the most efficient solution and not the most accurate, still easy to implement and works pretty good.

Resources