selecting SIFT descriptors from a lot such that chosen are invariant to rotation - sift

I have a collection of images (size 512x512) of different objects.
I get approx 6k SIFT descriptors in each image.
My aim is to choose few (say n) among them.
If I rotate the image by any angle then I want a method so that I am able to single out those 'n' descriptors.
If I replace part of an image with blank then I want the method to give me some out of those 'n' descriptors.
Please suggest.
Also, is it a good idea to sort descriptors by their response value and choose the top few .
Thanks.

Sift is not rotation invariant, but you will not find a correlation between a descriptor(s) and an angle.
Descriptors has location and a radius, you can just check for each of them if it within required rectangle
Its definitely a good idea from performance perspective to chose descriptors with best response.

Related

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

Simple Multi-Blob Detection of a Binary Image?

If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
X,Y,size,color
28.37,10.90,41,white
51.64,10.36,42,white
...
Here's an example (output image is produced thanks to the show-blobs.py tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.
http://www.mcs.csueastbay.edu/~grewe/CS6825/Mat/BinaryImageProcessing/BlobDetection.htm

Detect signs on roads

I have a video which has got turn left,turn right etc marks on the roads.
I have to detect those signs.I am going ahead with template matching in which I am matching the edge detected outputs,But I am not getting satisfactory results,Is there any other way to detect it? Please help.
If you want a solution that is not too complicated but more robust than template matching, I suggest you'd go for Hough voting on SIFT descriptors. This method is provides some degree of robustness to various problems, including partial occlusion of the sign, illumination variations and deformations of the sign. In particular, the method is completely invariant to rotation and uniform scaling of the template object.
The basic idea of the algorithm is as follows:
a) extract SIFT features from the template and query images.
b) set an arbitrary reference point in the template image and calculate, for each keypoint in the template image, the vector from the keypoint to the reference point.
c) match keypoints from the template image to the query image.
d) cast a vote for each matched keypoint for all object locations in the query image that this keypoint agrees with. You do that using the vectors calculated in step (b) and the location, scale and orientation of the matched keypoints in the query image.
e) If the object is indeed located in the image, the votes map should have a strong local maximum at it's location.
f) Optionally, you can verify the detection by using template matching.
You can read more about that method on Wikipedia here or in the original paper (by D. Lowe) here.
Using SIFT or SURF. You can get the invariable descriptor with training you can determine if the vector that represent the road marks (turn left, right or stop) match with the new in the video.
You might try extracting features and training a classifier (linear discriminant, neural network, naive Bayes, etc.). There are many candidate features you might try, but I'd think that you wouldn't need anything too complicated, even if the edge detection is poor, assuming that isolation of the sign is good. Some features to consider are: horizontal and vertical projections (row and column totals) and simple statistics of edge pixels (mean, standard deviation, skewness, etc. For more feature ideas, see any of these books:
"Shape Classification and Analysis: Theory and Practice", by Costa and Cesar
"Algorithms for Image Processing and Computer Vision", by J. R. Parker
"Digital Image Processing", by Gonzalez and Woods

similarity between an image and its rotated version using SIFT

I have implemented SIFT in opencv for comparing images... i have not yet written the program for comparing.Thinking of using FLANN for the same.But,my problem is that,looking into the 128 elements of the descriptor,cannot really understand the similarity of an image and its rotated version.
By reading Lowe's paper,i do understand that the descriptor co-ordinates are all rotated in terms of the keypoint orientation...but,how exactly is the similarity obtained.Can we undertstand the similarity by just viewing the 128 values.
pls,help me...this is for my project presentation.
You can first use Lowe's metric to compute some putative matches between the two images. The metric is that for any given descriptor de in image 1, find the distance to all descriptors de' in image 2. If the ratio of the closest distance to the second closest distance is below a threshold, then accept it.
After this, you can do RANSAC or other form of robust estimation or Hough Transform to check geometric consistency in terms of position, orientation, and scale of the keypoints that you accepted as putative matches.
If I recall correctly, SIFT will give you a set of 128-value descriptors that describe each of the interest points. You also have the location of each point in each of the images, as well as its "direction" (I forget what the "direction" is called in the paper) and scale in each image.
Once you've found two points that have matching descriptors, you can calculate the transformation from the interest point in one image to the same point in the other image by comparing coordinates and directions.
If you have enough matches, you see if all (or a majority of) the interest points have the same transformation. If they do, the images are similar, if they don't, the images are different.
Hope this helps...
What you are looking for is basically ASIFT
You can find the code here and some overview

Recognizing tetris pieces in C

I have to make an application that recognizes inside an black and white image a piece of tetris given by the user. I read the image to analyze into an array.
How can I do something like this using C?
Assuming that you already loaded the images into arrays, what about using regular expressions?
You don't need exact shape matching but approximately, so why not give it a try!
Edit: I downloaded your doc file. You must identify a random pattern among random figures on a 2D array so regex isn't suitable for this problem, lets say that's the bad news. The good news is that your homework is not exactly image processing, and it's much easier.
It's your homework so I won't create the code for you but I can give you directions.
You need a routine that can create a new piece from the original pattern/piece rotated. (note: with piece I mean the 4x4 square - all the cells of it)
You need a routine that checks if a piece matches an area from the 2D image at position x,y - the matching area would have corners (x-2, y-2, x+1, y+1).
You search by checking every image position (x,y) for a match.
Since you must use parallelism you can create 4 threads and assign to each thread a different rotation to search.
You might not want to implement that from scratch (unless required, of course) ... I'd recommend looking for a suitable library. I've heard that OpenCV is good, but never done any work with machine vision myself so I haven't tested it.
Search for connected components (i.e. using depth-first search; you might want to avoid recursion if efficiency is an issue; use your own stack instead). The largest connected component should be your tetris piece. You can then further analyze it (using the shape, the size or some kind of border description)
Looking at the shapes given for tetris pieces in Wikipedia, called "I,J,L,O,S,T,Z", it seems that the ratios of the sides of the bounding box (easy to find given a binary image and C) reveal whether you have I (4:1) or O (1:1); the other shapes are 2:3.
To detect which of the remaining shapes you have (J,L,S,T, or Z), it looks like you could collect the length and position of the shape's edges that fall on the bounding box's edges. Thus, T would show 3 and 1 along the 3-sides, and 1 and 1 along the 2 sides. Keeping track of the positions helps distinguish J from L, S from Z.

Resources