How to recognizing money bills in Images? - c

I'm having some images, of euro money bills. The bills are completely within the image
and are mostly flat (e.g. little deformation) and perspective skew is small (e.g. image quite taken from above the bill).
Now I'm no expert in image recognition. I'd like to achieve the following:
Find the boundingbox for the money bill (so I can "cut out" the bill from the noise in the rest of the image
Figure out the orientation.
I think of these two steps as pre-processing, but maybe one can do the following steps without the above two. So with that I want to read:
The bills serial-number.
The bills face value.
I assume this should be quite possible to do with OpenCV. I'm just not sure how to approach it right. Would I pick a FaceDetector like approach or houghs or a contour detector on an edge detector?
I'd be thankful for any further hints for reading material as well.

Hough is great but it can be a little expensive
This may work:
-Use Threshold or Canny to find the edges of the image.
-Then cvFindContours to identify the contours, then try to detect rectangles.
Check the squares.c example in opencv distribution. It basically checks that the polygon approximation of a contour has 4 points and the average angle betweeen those points is close to 90 degrees.
Here is a code snippet from the squares.py example
(is the same but in python :P ).
..some pre-processing
cvThreshold( tgray, gray, (l+1)*255/N, 255, CV_THRESH_BINARY );
# find contours and store them all as a list
count, contours = cvFindContours(gray, storage)
if not contours:
continue
# test each contour
for contour in contours.hrange():
# approximate contour with accuracy proportional
# to the contour perimeter
result = cvApproxPoly( contour, sizeof(CvContour), storage,
CV_POLY_APPROX_DP, cvContourPerimeter(contour)*0.02, 0 );
res_arr = result.asarray(CvPoint)
# square contours should have 4 vertices after approximation
# relatively large area (to filter out noisy contours)
# and be convex.
# Note: absolute value of an area is used because
# area may be positive or negative - in accordance with the
# contour orientation
if( result.total == 4 and
abs(cvContourArea(result)) > 1000 and
cvCheckContourConvexity(result) ):
s = 0;
for i in range(4):
# find minimum angle between joint
# edges (maximum of cosine)
t = abs(angle( res_arr[i], res_arr[i-2], res_arr[i-1]))
if s<t:
s=t
# if cosines of all angles are small
# (all angles are ~90 degree) then write quandrange
# vertices to resultant sequence
if( s < 0.3 ):
for i in range(4):
squares.append( res_arr[i] )
-Using MinAreaRect2 (Finds circumscribed rectangle of minimal area for given 2D point set), get the bounding box of the rectangles. Using the bounding box points you can easily calculate the angle.
you can also find the C version squares.c under samples/c/ in your opencv dir.

There is a good book on openCV
Using a Hough transform to find the rectangular bill shape (and angle) and then find rectangles/circles within it should be quick and easy
For more complex searching, something like a Haar classifier - if you needed to find odd corners of bills in an image?

You can also take a look at the Template Matching methods in OpenCV; another option would be to use SURF features. They let you search for symbols & numbers in size, angle etc. invariantly.

Related

Uniform random sampling of CIELUV for RGB colors

Selecting a random color on a computer is a touch harder than I thought it would be.
The naive way of uniform random sampling of 0..255 for R,G,B will tend to draw lots of similar greens. It would make sense to sample from a perceptually uniform space like CIELUV.
A simple way to do this is to sample L,u,v on a regular mesh and ensure the color solid is contained in the bounds (I've seen different bounds for this). If the sample falls outside embedded RGB solid (tested by mapping it XYZ then RGB), reject it and sample again. You can settle for a kludgy-but-guaranteed-to-terminate "bailout" selection (like the naive procedure) if you reject more then some arbitrary threshold number of times.
Testing if the sample lies within RGB needs to be sure to test for the special case of black (some implementations end up being silent on the divide by zero), I believe. If L=0 and either u!=0 or v!=0, then the sample needs to be rejected or else you would end up oversampling the L=0 plane in Luv space.
Does this procedure have an obvious flaw? It seems to work but I did notice that I was rolling black more often than I thought made sense until I thought about what was happening in that case. Can anyone point me to the right bounds on the CIELUV grid to ensure that I am enclosing the RGB solid?
A useful reference for those who don't know it:
https://www.easyrgb.com/en/math.php
The key problem with this is that you need bounds to reject samples that fall outside of RGB. I was able to find it worked out here (nice demo on page, API provides convenient functions):
https://www.hsluv.org/
A few things I noticed with uniform sampling of CIELUV in RGB:
most colors are green and purple (this is true independent of RGB bounds)
you have a hard time sampling what we think of as yellow (very small volume of high lightness, high chroma space)
I implemented various strategies that focus on sampling hues (which is really what we want when we think of "sampling colors") by weighting according to the maximum chromas at that lightness. This makes colors like chromatic light yellows easier to catch and avoids oversampling greens and purples. You can see these methods in actions here (select "randomize colors"):
https://www.mysticsymbolic.art/
Source for color randomizers here:
https://github.com/mittimithai/mystic-symbolic/blob/chromacorners/lib/random-colors.ts
Okay, while you don't show the code you are using to generate the random numbers and then apply them to the CIELUV color space, I'm going to guess that you are creating a random number 0.0-100.0 from a random number generator, and then just assigning it to L*.
That will most likely give you a lot of black or very dark results.
Let Me Explain
L* of L * u * v* is not linear as to light. Y of CIEXYZ is linear as to light. L* is perceptual lightness, so an exponential curve is applied to Y to make it linear to perception but then non-linear as to light.
TRY THIS
To get L* with a random value 0—100:
Generate a random number between 0.0 and 1.0
Then apply an exponent of 0.42
Then multiply by 100 to get L*
Lstar = Math.pow(Math.random(), 0.42) * 100;
This takes your random number that represents light, and applies a powercurve that emulates human lightness perception.
UV Color
As for the u and v values, you can probably just leave them as linear random numbers. Constrain u to about -84 and +176, and v to about -132.5 and +107.5
Urnd = (Math.random() - 0.5521) * 240;
Vrnd = (Math.random() - 0.3231) * 260;
Polar Color
It might be interesting converting uv to LChLUV or LshLUV
For hue, it's probably as simple as H = Math.random() * 360
For chroma contrained 0—178: C = Math.random() * 178
The next question is, should you find chroma? Or saturation? CIELUV can provide either Hue or Sat — but for directly generating random colors, it seems that chroma is a bit better.
And of course these simple examples are not preventing over-runs, so they color values to be tested to see if they are legal sRGB or not. There's a few things that can be done to constrain the generated values to legal colors, but the object here was to get you to a better distribution without excess black/dark results.
Please let me know of any questions.

Do depth values in AVDepthData (from TrueDepth camera) indicate distance from camera or camera plane?

Do depth values in AVDepthData (from TrueDepth camera) indicate distance in meters from the camera, or perpendicular distance from the plane of the camera (i.e. z-value in camera space)?
My goal is to get an accurate 3D point from the depth data, and this distinction is important for accuracy. I've found lots online regarding OpenGL or Kinect, but not for TrueDepth camera.
FWIW, this is the algorithm I use. I'm find the value of depth buffer at a pixel found using some OpenCV feature detection. Below is the code I use to find real world 3D point at a given pixel at let cgPt: CGPoint. This algorithm seems to work quite well, but I'm not sure whether small error is introduced by the assumption of depth being distance to camera plane.
let depth = 1/disparity
let vScreen = sceneView.projectPoint(SCNVector3Make(0, 0, -depth))
// cgPt is the 2D coordinates at which I sample the depth
let worldPoint = sceneView.unprojectPoint(SCNVector3Make(cgPt.x, cgPt.y, vScreen.z))
I'm not sure of authoritative info either way, but it's worth noticing that capture in a disparity (not depth) format uses distances based on a pinhole camera model, as explained in the WWDC17 session on depth photography. That session is primarily about disparity-based depth capture with back-facing dual cameras, but a lot of the lessons in it are also valid for the TrueDepth camera.
That is, disparity is 1/depth, where depth is distance from subject to imaging plane along the focal axis (perpendicular to imaging plane). Not, say, distance from subject to the focal point, or straight-line distance to the subject's image on the imaging plane.
IIRC the default formats for TrueDepth camera capture are depth, not disparity (that is, depth map "pixel" values are meters, not 1/meters), but lacking a statement from Apple it's probably safe to assume the model is otherwise the same.
It looks like it measures distance from the camera's plane rather than a straight line from the pinhole. You can test this out by downloading the Streaming Depth Data from the TrueDepth Camera sample code.
Place the phone vertically 10 feet away from the wall, and you should expect to see one of the following:
If it measures from the focal point to the wall as a straight line, you should expect to see a radial pattern (e.g. the point closest to the camera is straight in front of it; the points furthest to the camera are those closer to the floor and ceiling).
If it measures distance from the camera's plane, then you should expect the wall color to be nearly uniform (as long as you're holding the phone parallel to the wall).
After downloading the sample code and trying it out, you will notice that it behaves like #2, meaning it's distance from the camera's plane, not from the camera itself.

Source engine styled rope rendering

I am creating a 3D graphics engine and one of the requirements is ropes that behave like in Valve's source engine.
So in the source engine, a section of rope is a quad that rotates along it's direction axis to face the camera, so if the section of rope is in the +Z direction, it will rotate along the Z axis so it's face is facing the camera's centre position.
At the moment, I have the sections of ropes defined, so I can have a nice curved rope, but now I'm trying to construct the matrix that will rotate it along it's direction vector.
I already have a matrix for rendering billboard sprites based on this billboarding technique:
Constructing a Billboard Matrix
And at the moment I've been trying to retool it so that Right, Up, Forward vector match the rope segment's direction vector.
My rope is made up of multiple sections, each section is a rectangle made up of two triangles, as I said above, I can get the position and sections perfect, it's the rotating to face the camera that's causing me a lot of problems.
This is in OpenGL ES2 and written in C.
I have studied Doom 3's beam rendering code in Model_beam.cpp, the method used there is to calculate the offset based on normals rather than using matrices, so I have created a similar technique in my C code and it sort of works, at least it, works as much as I need it to right now.
So for those who are also trying to figure this one out, use the cross-product of the mid-point of the rope against the camera position, normalise that and then multiply it to how wide you want the rope to be, then when constructing the vertices, offset each vertex in either + or - direction of the resulting vector.
Further help would be great though as this is not perfect!
Thank you
Check out this related stackoverflow post on billboards in OpenGL It cites a lighthouse3d tutorial that is a pretty good read. Here are the salient points of the technique:
void billboardCylindricalBegin(
float camX, float camY, float camZ,
float objPosX, float objPosY, float objPosZ) {
float lookAt[3],objToCamProj[3],upAux[3];
float modelview[16],angleCosine;
glPushMatrix();
// objToCamProj is the vector in world coordinates from the
// local origin to the camera projected in the XZ plane
objToCamProj[0] = camX - objPosX ;
objToCamProj[1] = 0;
objToCamProj[2] = camZ - objPosZ ;
// This is the original lookAt vector for the object
// in world coordinates
lookAt[0] = 0;
lookAt[1] = 0;
lookAt[2] = 1;
// normalize both vectors to get the cosine directly afterwards
mathsNormalize(objToCamProj);
// easy fix to determine wether the angle is negative or positive
// for positive angles upAux will be a vector pointing in the
// positive y direction, otherwise upAux will point downwards
// effectively reversing the rotation.
mathsCrossProduct(upAux,lookAt,objToCamProj);
// compute the angle
angleCosine = mathsInnerProduct(lookAt,objToCamProj);
// perform the rotation. The if statement is used for stability reasons
// if the lookAt and objToCamProj vectors are too close together then
// |angleCosine| could be bigger than 1 due to lack of precision
if ((angleCosine < 0.99990) && (angleCosine > -0.9999))
glRotatef(acos(angleCosine)*180/3.14,upAux[0], upAux[1], upAux[2]);
}

How to identify optimal parameters for cvCanny for polygon approximation

This is my source image (ignore the points, they were added manually later):
My goal is to get a rough polygon approximation of the two hands. Something like this:
I have a general idea on how to do this; I want to use cvCanny to find edges, cvFindContours to find contours, and then cvApproxPoly.
The problem I'm facing is that I have no idea on how to properly use cvCanny, particularly, what should I use for the last 3 parameters (threshold1&2, apertureSize)? I tried doing:
cvCanny(source, cannyProcessedImage, 20, 40, 3);
but the result is not ideal. The left hand looks relatively fine but for the right hand it detected very little:
In general it's not as reliable as I'd like. Is there a way to guess the "best" parameters for Canny, or at least a detailed explanation (understandable by a beginner) of what they do so I can make educated guesses? Or perhaps there's a better way to do this altogether?
It seems you have to lower your thresholds.
The Canny algorithm work on the hysteresis threshold: it selects a contour if at least a pixel is as bright as the max threshold, and takes all the connected contour pixels if they are above the lower threshold.
Papers recommend to take the two thresholds in a scale of 2:1 oe 3:1 (by example 10 and 30, or 20 and 60, etc). For some applications, a threshold determined manually and hardcoded is enough. It may your case, too. I suspect that if you lower your thresholds, you will have good results, because the images are not that complicated.
A number of methods to automatically determine the best canny thresholds have been proposed. Most of them rely on gradient magnitudes to estimate the best thresholds.
Steps:
Extract the gradients (Sobel is a good option)
You can convert it to uchar. Gradients teoretically can have greater numerical values than 255, but that's ok. opencv's sobel returns uchars.
make a histogram of the resulting image.
take the max threshold at the 95th percentile of your histogram, and the lower as high/3.
You should probably adjust the percentile value depending on your app, but the results will be much more robust than a hardcoded hig and low values
Note: An excellent threshold detection algorithm is implemented in Matlab. It is based on the idea above, but a bit more sophisticated.
Note 2: This methods will work if the contours and illumination do not varies a lot between image areas. If the contours are crisper on one part of the image, then you need locally adaptive thresholds, and that's another story. But looking at you pics, it should not be the case.
Maybe one of the easiest solution is make Otsu thresholding on grayscale image, find contours on the binary image and than approximate them. Here's code:
Mat img = imread("test.png"), gray;
vector<Vec4i> hierarchy;
vector<vector<Point2i> > contours;
cvtColor(img, gray, CV_BGR2GRAY);
threshold(gray, gray, 0, 255, THRESH_OTSU);
findContours(gray, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
for(size_t i=0; i<contours.size(); i++)
{
approxPolyDP(contours[i], contours[i], 5, false);
drawContours(img, contours, i, Scalar(0,0,255));
}
imshow("result", img);
waitKey();
And this is result:

similarity between an image and its rotated version using SIFT

I have implemented SIFT in opencv for comparing images... i have not yet written the program for comparing.Thinking of using FLANN for the same.But,my problem is that,looking into the 128 elements of the descriptor,cannot really understand the similarity of an image and its rotated version.
By reading Lowe's paper,i do understand that the descriptor co-ordinates are all rotated in terms of the keypoint orientation...but,how exactly is the similarity obtained.Can we undertstand the similarity by just viewing the 128 values.
pls,help me...this is for my project presentation.
You can first use Lowe's metric to compute some putative matches between the two images. The metric is that for any given descriptor de in image 1, find the distance to all descriptors de' in image 2. If the ratio of the closest distance to the second closest distance is below a threshold, then accept it.
After this, you can do RANSAC or other form of robust estimation or Hough Transform to check geometric consistency in terms of position, orientation, and scale of the keypoints that you accepted as putative matches.
If I recall correctly, SIFT will give you a set of 128-value descriptors that describe each of the interest points. You also have the location of each point in each of the images, as well as its "direction" (I forget what the "direction" is called in the paper) and scale in each image.
Once you've found two points that have matching descriptors, you can calculate the transformation from the interest point in one image to the same point in the other image by comparing coordinates and directions.
If you have enough matches, you see if all (or a majority of) the interest points have the same transformation. If they do, the images are similar, if they don't, the images are different.
Hope this helps...
What you are looking for is basically ASIFT
You can find the code here and some overview

Resources