Viola jones weak classifier explanation

Viola jones weak classifier explanation - face-detection

I have been trying to understand the paper by viola n jones on face detection. I am not totally sure what this equation's parameters mean from section 3
h(x, f, p, theta) = 1 ; if pf(x) < p theta
What I understood was feature (f) is the value that is obtained by running any of those 5 basic features explained in the beginning of the paper over integral image of x.
What I can't understand properly is the threshold 'theta' and polarity 'p'. Does this pmean positive image and negative image and can have value of +1 or -1? And how do I calculate theta. This equation is vital to boosting section so I can't go further. Please help if I am making myself clear enough.

You must understand that the weak classifier h uses a Haar-like feature f to classify an image subwindow x. The parameter p, if equal to -1, simply causes the inversion of the comparison sign of the condition if pf(x) < p theta.
The parameter theta is simply a threshold. Say, for instance, that p = +1. If f(x) < theta, then h(x, f, p, theta) = +1, i.e., the weak classifier considers x a face.

Related

Solving a second-order differential equation with both Dirichlet and Neumann boundary conditions

I want to solve the Fourier’s law for the heat equation
of an isolated electrically heated rod:
with a Dirichlet boundary condition of
and a Neumann boundary condition of
where
x is the length coordinate
L is the length of the rod
K is the thermal conductivity of the material (assumed constant)
Q is the internal heat generation per unit length
q heat load from the left side
TL is the ambient temperature on the right side
To solve the differential equation I used the
eqn : 'diff(T, x, 2) + Q / k = 0;
sol : ode2(eqn, T, x);
giving the correct general form of
however when applying the boundary conditions using:
bc2(sol, x=0, 'diff(T, x)=-q/k, x=L, T=TL);
I get the wrong answer of
while what I expected to see was
I would appreciate it if you could help me know what is the problem and how I can resolve it.

In this specific case, because the Neumann boundary condition happened in the x = 0 I could use the
ic2(sol, x=L, T=TL, 'diff(T, x)=-q/k);
to get the correct result:

Understanding "well founded" proofs in Coq

I'm writing a fixpoint that requires an integer to be incremented "towards" zero at every iteration. This is too complicated for Coq to recognize as a decreasing argument automatically and I'm trying prove that my fixpoint will terminate.
I have been copying (what I believe is) an example of a well-foundedness proof for a step function on Z from the standard library. (Here)
Require Import ZArith.Zwf.
Section wf_proof_wf_inc.
Variable c : Z.
Let Z_increment (z:Z) := (z + ((Z.sgn c) * (-1)))%Z.
Lemma Zwf_wf_inc : well_founded (Zwf c).
Proof.
unfold well_founded.
intros a.
Qed.
End wf_proof_wf_inc.
which creates the following context:
c : Z
wf_inc := fun z : Z => (z + Z.sgn c * -1)%Z : Z -> Z
a : Z
============================
Acc (Zwf c) a
My question is what does this goal actually mean?
I thought that the goal I'd have to prove for this would at least involve the step function that I want to show has the "well founded" property, "Z_increment".
The most useful explanation I have looked at is this but I've never worked with the list type that it uses and it doesn't explain what is meant by terms like "accessible".

Basically, you don't need to do a well founded proof, you just need to prove that your function decreases the (natural number) abs(z). More concretely, you can implement abs (z:Z) : nat := z_to_nat (z * Z.sgn z) (with some appropriate conversion to nat) and then use this as a measure with Function, something like Function foo z {measure abs z} := ....
The well founded business is for showing relations are well-founded: the idea is that you can prove your function terminates by showing it "decreases" some well-founded relation R (think of it as <); that is, the definition of f x makes recursive subcalls f y only when R y x. For this to work R has to be well-founded, which intuitively means it has no infinitely descending chains. CPDT's general recursion chapter as a really good explanation of how this really works.
How does this relate to what you're doing? The standard library proves that, for all lower bounds c, x < y is a well-founded relation in Z if additionally its only applied to y >= c. I don't think this applies to you - instead you move towards zero, so you can just decrease abs z with the usual < relation on nats. The standard library already has a proof that this relation is well founded, and that's what Function ... {measure ...} uses.

C - generate random numbers within an interval with respect to a mean

I need to generate a set of random numbers within an interval which also happens to have a mean value. For instance min = 1000, max = 10000 and a mean of 7000. I know how to create numbers within a range but I am struggling with the mean value thing. Is there a function that I can use?

What you're looking for is done most easily with so called acceptance rejection method.
Split your interval into smaller intervals.
Specify a probability density function (PDF), can be a very simple one too, like a step function. For Gaussian distrubution you would have left and right steps lower than your middle step i.e (see the image bellow that has a more general distribution).
Generate a random number in the whole interval. If the generated number is greater than the value of your PDF at that point reject the generated number.
Repeat the steps until you get desired number of points
EDIT 1
Proof of concept on a Gaussian PDF.
Ok, so the basic idea is shown in graph (a).
Define/Pick your probability density function (PDF). PDF is a function of, statistically speaking, a random variable and describes the probability of finding the value x in a measurement/experiment. A function can be a PDF of a random variable x if it satisfies: 1) f(x) >= 0 and 2) it's normalized (meaning it sums, or integrates, up to the value 1).
Get maximum (max) and "zero points" (z1 < z2) of PDF. Some PDF's can have their zero points in infinity. In that case, determine cutoff points (z1, z2) for which PDF(z1>x>z2) < eta where you pick eta yourself. Basically means, set some small-ish value eta and then say your zero points are those values for which the value of PDF(x) is smaller than eta.
Define the interval Ch(z1, z2, max) of your random generator. This is the interval in which you generate your random variables.
Generate a random variable x such that z1<x<z2.
Generate a second unrelated random variable y in the range (0, max). If the value of y is smaller than PDF(x) reject both randomly generated values (x,y) and go back to step 4. If the generated value y is larger than PDF(x) accept the value x as the randomly generated point on a distribution and return it.
Here's the code that reproduces similar behavior for a Gaussian PDF.
#include "Random.h"
#include <fstream>
using namespace std;
double gaus(double a, double b, double c, double x)
{
return a*exp( -((x-b)*(x-b)/(2*c*c) ));
}
double* random_on_a_gaus_distribution(double inter_a, double inter_b)
{
double res [2];
double a = 1.0; //currently parameters for the Gaussian
double b = 2.0; //are defined here to avoid having
double c = 3.0; //a long function declaration line.
double x = kiss::Ran(inter_a, inter_b);
double y = kiss::Ran(0.0, 1.0);
while (y>gaus(a,b,c,x)) //keep creating values until step 5. is satisfied.
{
x = kiss::Ran(inter_a, inter_b); //this is interval (z1, z2)
y = kiss::Ran(0.0, 1.0); //this is the interval (0, max)
}
res[0] = x;
res[1] = y;
return res; //I return (x,y) for plot reasons, only x is the randomly
} //generated value you're looking for.
void main()
{
double* x;
ofstream f;
f.open("test.txt");
for(int i=0; i<100000; i++)
{
//see bellow how I got -5 and 10 to be my interval (z1, z2)
x = random_on_a_gaus_distribution(-5.0, 10.0);
f << x[0]<<","<<x[1]<<endl;
}
f.close();
}
Step 1
So first we define a general look of a Gaussian PDF in a function called gaus. Simple.
Then we define a function random_on_a_gaus_distribution which uses a well defined Gaussian function. In an experiment\measurement we would get coefficients a, b, c by fitting our function. I picked some random ones (1, 2, 3) for this example, you can pick the ones that satisfy your HW assignment (that is: coefficients that make a Gaussian that has a mean of 7000).
Step 2 and 3
I used wolfram mathematica to plot gaus. with parameters 1,2,3 too see what would be the most appropriate values for max and (z1, z2) . You can see the graph yourself. Maximum of the function is 1.0 and via ancient method of science called eyeballin' I estimated that the cutoff points are -5.0 and 10.0.
To make random_on_a_gaus_distribution more general you could follow step 2) more rigorously and define eta and then calculate your function in successive points until PDF gets smaller than eta. Dangers with this are that your cutoff points can be very far apart and this could take long for very monotonous functions. Additionally you have to find the maximum yourself. This is generally tricky, However a simpler problem is minimization of a negative of a function. This can also be tricky for a general case but not "undoable". Easiest way is to cheat a bit like I did and just hard-code this for a couple of functions only.
Step 4 and 5
And then you bash away. Just keep creating new and new points until you reach satisfactory hit. DO NOTICE the returned number x is a random number. You wouldn't be able to find a logical link between two successively created x values, or first created x and the millionth.
However the number of accepted x values in the interval around the x_max of our distribution is greater than the number of x values created in intervals for which PDF(x) < PDF(x_max).
This just means that your random numbers will be weighted within the chosen interval in such manner that the larger PDF value for a random variable x will correspond to more random points accepted in a small interval around that value than around any other value of xi for which PDF(xi)<PDF(x).
I returned both x and y to be able to plot the graph bellow, however what you're looking to return is actually just the x. I did the plots with matplotlib.
It's probably better to show just a histogram of randomly created variable on a distribution. This shows that the x values that are around the mean value of your PDF function are the most likely ones to get accepted, and therefore more randomly created variables with those approximate values will be created.
Additionally I assume you would be interested in implementation of the kiss Random number generator. IT IS VERY IMPORTANT YOU HAVE A VERY GOOD GENERATOR. I dare to say to an extent kiss doesn't probably cut it (mersene twister is used often).
Random.h
#pragma once
#include <stdlib.h>
const unsigned RNG_MAX=4294967295;
namespace kiss{
// unsigned int kiss_z, kiss_w, kiss_jsr, kiss_jcong;
unsigned int RanUns();
void RunGen();
double Ran0(int upper_border);
double Ran(double bottom_border, double upper_border);
}
namespace Crand{
double Ran0(int upper_border);
double Ran(double bottom_border, double upper_border);
}
Kiss.cpp
#include "Random.h"
unsigned int kiss_z = 123456789; //od 1 do milijardu
unsigned int kiss_w = 378295763; //od 1 do milijardu
unsigned int kiss_jsr = 294827495; //od 1 do RNG_MAX
unsigned int kiss_jcong = 495749385; //od 0 do RNG_MAX
//KISS99*
//Autor: George Marsaglia
unsigned int kiss::RanUns()
{
kiss_z=36969*(kiss_z&65535)+(kiss_z>>16);
kiss_w=18000*(kiss_w&65535)+(kiss_w>>16);
kiss_jsr^=(kiss_jsr<<13);
kiss_jsr^=(kiss_jsr>>17);
kiss_jsr^=(kiss_jsr<<5);
kiss_jcong=69069*kiss_jcong+1234567;
return (((kiss_z<<16)+kiss_w)^kiss_jcong)+kiss_jsr;
}
void kiss::RunGen()
{
for (int i=0; i<2000; i++)
kiss::RanUns();
}
double kiss::Ran0(int upper_border)
{
unsigned velicinaIntervala = RNG_MAX / upper_border;
unsigned granicaIzbora= velicinaIntervala*upper_border;
unsigned slucajniBroj = kiss::RanUns();
while(slucajniBroj>=granicaIzbora)
slucajniBroj = kiss::RanUns();
return slucajniBroj/velicinaIntervala;
}
double kiss::Ran (double bottom_border, double upper_border)
{
return bottom_border+(upper_border-bottom_border)*kiss::Ran0(100000)/(100001.0);
}
Additionally there's the standard C random generators:
CRands.cpp
#include "Random.h"
//standardni pseudo random generatori iz C-a
double Crand::Ran0(int upper_border)
{
return rand()%upper_border;
}
double Crand::Ran (double bottom_border, double upper_border)
{
return (upper_border-bottom_border)*rand()/((double)RAND_MAX+1);
}
It's worthy also to comment on the (b) graph above. When you have a very badly behaved PDF, PDF(x) will vary significantly between large numbers and very small ones.
Issue with that is that the interval area Ch(x) will match the extreme values of the PDF well, but since we create a random variable y for small values of PDF(x) as well; the chances of accepting that value are minute! It is more likely that the generated y value will always be larger than PDF(x) at that point. This means that you'll spend a lot of cycles creating numbers that won't get chosen and that all your chosen random numbers will be very locally bound to the max of your PDF.
That's why it's often useful not to have the same Ch(x) intervals everywhere, but to define a parametrized set of intervals. However this adds a fair bit of complexity to the code.
Where do you set your limits? How to deal with borderline cases? When and how to determine that you indeed need to suddenly use this approach? Calculating max might not be as simple now, depending on the method you originally envisioned would be doing this.
Additionally now you have to correct for the fact that a lot more numbers get accepted more easily in the areas where your Ch(x) box height is lower which skews the original PDF.
This can be corrected by weighing numbers created in the lowered boundary by the ratio of heights of higher and lower boundary, basically you repeat the y step one more time. Create a random number z from 0 to 1 and compare it to the ratio lower_height/higher_height, guaranteed to be <1. If z is smaller than the ratio: accept x and if it's larger reject.
Generalizations of code presented are also possible by writing a function, that takes in an object pointer instead. By defining your own class i.e. function which would generally describe functions, have a eval method at a point, be able to store your parameters, calculate and store it's own max/min values and zero/cutoff points, you wouldn't have to pass, or define them in a function like I did.
Good Luck have fun!

tl;dr: Raise a uniform 0 to 1 distribution to the power (1 - m) / m where m is the desired mean (between 0 and 1). Shift/scale as desired.
I was curious about how to implement this. I figured a trapezoid would be the easiest method, but then you're limited in that the most extreme mean you can get is with a triangle, which isn't that extreme. The math started getting hard, so I reverted to a purely empirical method that seems to work pretty well.
Anyways, for a distribution, how about starting with the uniform [0, 1) distribution and raising the values to some arbitrary power. Square them and the distribution shifts to the right. Square root them and they shift to the left. You can go to whatever extreme you want and shove the distribution as hard as you want.
def randompow(p):
return random.random() ** p
(Everything's written in Python, but should be easy enough to translate. If something's unclear, just ask. random.random() returns floats from 0 to 1)
So, how do we adjust that power? Well, how's the mean seem to shift with varying powers?
Looks like some sort of sigmoid curve. There are lots of sigmoid functions, but hyperbolic tangent seems to work pretty well.
Not 100% there, lets try to scale it in the X direction...
# x are the values from -3 to 3 (log transformed from the powers used)
# y are the empirically-determined means given all those powers
def fitter(tanscale):
xsc = tanscale * x
sigtan = np.tanh(xsc)
sigtan = (1 - sigtan) / 2
resid = sigtan - y
return sum(resid**2)
fit = scipy.optimize.minimize(fitter, 1)
The fitter says the best scaling factor is 1.1514088816214016. The residuals are actually pretty low, so sounds good.
Implementing the inverse of all the math I didn't talk about looks like:
def distpow(mean):
p = 1 - (mean * 2)
p = np.arctanh(p) / 1.1514088816214016
return 10**p
That gives us the power to use in the first function to get whatever mean to the distribution. A factory function can return a method to churn out a bunch of numbers from the distribution with the desired mean
def randommean(mean):
p = distpow(mean)
def f():
return random.random() ** p
return f
How's it do? Reasonably well out to 3-4 decimals:
for x in [0.01, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9, 0.99]:
f = randommean(x)
# sample the distribution 10 million times
mean = np.mean([f() for _ in range(10000000)])
print('Target mean: {:0.6f}, actual: {:0.6f}'.format(x, mean))
Target mean: 0.010000, actual: 0.010030
Target mean: 0.100000, actual: 0.100122
Target mean: 0.200000, actual: 0.199990
Target mean: 0.400000, actual: 0.400051
Target mean: 0.500000, actual: 0.499905
Target mean: 0.600000, actual: 0.599997
Target mean: 0.800000, actual: 0.799999
Target mean: 0.900000, actual: 0.899972
Target mean: 0.990000, actual: 0.989996
A more succinct function that just gives you a value given a mean (not a factory function):
def randommean(m):
p = np.arctanh(1 - (2 * m)) / 1.1514088816214016
return random.random() ** (10 ** p)
Edit: fitting against the natural log of the mean instead of log10 gave a residual suspiciously close to 0.5. Doing some math to simplify out the arctanh gives:
def randommean(m):
'''Return a value from the distribution 0 to 1 with average *m*'''
return random.random() ** ((1 - m) / m)
From here it should be fairly easy to shift, rescale, and round off the distribution. The truncating-to-integer might end up shifting the mean by 1 (or half a unit?), so that's an unsolved problem (if it matters).

You simply define 2 distributions dist1 operating in [1000, 7000] and dist2 operating in [7000, 10000].
Let's call m1 the mean of dist1 and m2 the mean of dist2.
You are looking for a mixture between dist1and dist2the mean of which is 7000.
You must adjust the weights (w1, w2 = 1-w1) such as :
7000 = w1 * m1 + w2 * m2
which leads to:
w1 = (m2 - 7000) / (m2 - m1)
Using the OpenTURNS library, the code will look as follow:
import openturns as ot
dist1 = ot.Uniform(1000, 7000)
dist2 = ot.Uniform(7000, 10000)
m1 = dist1.getMean()[0]
m2 = dist2.getMean()[0]
w = (m2 - 7000) / (m2 - m1)
dist = ot.Mixture([dist1, dist2], [w, 1 - w])
print ("Mean of dist = ", dist.getMean())
>>> Mean of dist = [7000]
Now you can draw a sample of size N by calling dist.getSample(N). For instance:
print(dist.getSample(10))
>>> [ X0 ]
0 : [ 3019.97 ]
1 : [ 7682.17 ]
2 : [ 9035.1 ]
3 : [ 8873.59 ]
4 : [ 5217.08 ]
5 : [ 6329.67 ]
6 : [ 9791.22 ]
7 : [ 7786.76 ]
8 : [ 7046.59 ]
9 : [ 7088.48 ]

How can I prove if this language is regular or not?

How can I prove if this language is regular or not?
L = {an bn: n≥1} union {an bn+2: n≥1}

I'll give an approach and a sketch of a prove, there might be some holes in it that I believe you can fill yourself.
The idea is to use nerode's theorem - show that there are infinte number of equivalence groups for RL - and from the theorem you can derive that the language is irregular.
Define two types of sets:
G_j = {anb k | n-k = j , k≥1} for each j in
[-2,-1,0,1,...]
H_j = {aj } for each j in
[0,1,...]
G_illegal = {0,1}* / (G_j U H_j) [for each j in the specified range]
It is easy to see that for each x in G_illegal, and for each z in {a,b}*: xz is not in L.
So, for every x,y in G_illegal and for each z in {a,b}*: xz in L <-> yz in L.
Also, for each z in {a,b}* - and for each x,y in some G_j [same j for both]:
if z contains a, both xz and yz are not in L
if z = bj, then xz = an bk bj, and since k+j = n - xz is in L. Same applies for y, so yz is in L.
if z = bj+2, then xz = an bk bj+2, and since k+j+2 = n+2 - xz is in L. Same applies for y, so yz is in L.
otherwise, x is bi such that i≠j and i≠j+2, and you get that both xz and yz are not in L.
So, for every j and for every x,y in G_j and for each z in {a,b}*: xz in L <-> yz in L.
Prove the same for every H_j using the same approach.
Also, it is easy to show that for each x G_j U H_j, and for each y in G_illegal - for z = bj, xz is in L and yz is not in L.
For x in G_j, and y in H_i, for z = abj+1 - it is easy to see that xz is not in L and yz is in L.
It is easy to see that for x,y in G_j and G_i respectively or x, y in H_j, H_i - for z = bj: xz is in L while yz is not.
We just proved that the sets we created are actually the equivalence relations for RL from nerode's theorem, and since we have infinite number of these sets, each is an equivalence relation for RL [we have H_j and G_j for every j] - we can derive from nerode's theorem that the language L is irregular.

You could just use the pumping lemma for regular languages. It basically says that if you can find a string for any given integer n and any partition of this string into xyz such that |xy| <= n, and |y| > 0, then you can pump the y part of the string, and it has to stay in the language, that means, if xy^iz it's not in the language for some i, then the language is not regular.
The proof goes like this, kind of a adversary proof. Suppose someone tells you that this language is regular. Then ask him for a number n > 0. You build a convenient string of length greater than n, and you give to the adversary. He partitions the string in x, y z, in any way he wants, as long as |xy| <= n. Then you have to pump y (repeat it i times) until you find a string that is not in that same language.
In this case, I tell you: give me n. You fix n. The I tell you: take the string "a^n b^{n+2}", and tell you to split it. In any way that you can split this string, you will always have to make y = a^k, with k > 0, since you are force to make |xy| <= n, and my string begins with a^n. Here is the trick, you give the adversary a string such that any way he can split it, he gives you a part that you can pump. So now we pump y, let's say, 0 times, and you get "a^m b^{n+2}" with m < n, which is not in your language. Done. We can also pump a 1 time, n times, n! factorial times, anything you need to make it leave the language.
The proof of this theorem goes around saying that if you have a regular language then you have an automaton with n states for some fixed n. If a string has more than n characters, then it must go through some cycle in your automaton. If we name x the part of the string before entering the cycle, and y the part in the cycle, it's clear that we can pump y as many times as we want, because we can keep running on the cycle as many times as we want, and the resulting string has to be in the language, because it will be recognized by that automaton. To use the theorem to prove for non-regularity, since we don't know how the supposed automaton will be, we have to leave to the adversary the choose for n and for the position of the cycle inside the automaton (there will be no automaton, but you say to the adversary something like: dare to give me an automaton and I will show you it cannot exist.)

What does 'zero of the function will be found within the precision limit ϵ = 10 - 3' mean in this?

Well, the question is; "Write a C code that finds zero of a function y = ax + b, without solving the equation. The zero will be found within the precision limit ϵ = 10 - 3. You'll start at x=0, and move x in the proper direction until |y|< ϵ."
I'm a newbie, to programming, and don't know anything about this ϵ thing either.
Help me out!!

It means you have to solve the inequality |ax+b| < 10^-3 by trying different values for x.
Since this is a linear function it's easy. Start with a random number at x and then increase it or decrease it depending on the result of ax+b. I.e. if you move to one direction and the results go more away then you should follow the opposite direction.
You will have to develop an algorithm that decides the increments/decrements of x.

|y| < 10⁻³, or well, -0.001 < y < 0.001.
You must increase or decrease x (starting from 0, as you've said) in order to make y to take a value between -0.001 and 0.001.
About ϵ, a.k.a. epsilon, is used to denote a very small value. For this problem, ϵ denotes a tolerance value, as it's not needed y to take a strict value of 0.