How to train a custom Object detector from scratch in tensorflow.js? - tensorflow.js

I followed multiple example, to train a custom object detector in TensorflowJS . The main problem I am facing every where it is using pretrained model.
Pretrained models are fine for general use cases, but custom scenario it fails. For example, take this this is example form official Tensorflowjs examples, here it is using mobilenet, and mobilenet and mobilenet has image size restriction 224x224 which defeats all the purpose, because my images are big and also not of same ratio so resizing is not an option.
I have tried multiple example, all follows same path oneway or another.
What I want ?
Any example by which I can train a custom objector from scratch in Tensorflow.js.
Although the answer sounds simple but trust me I searching for this for multiple days. Any help will be greatly appreciated. Thanks

Currently it is not yet possible to use tensorflow object detection api in nodejs. But the image size should not be a restriction. Instead of resizing, you can crop your image and keep only the part that contain your object to be detected.
One approach will be like partition the image in 224x224 and run for all partitions but what if the object is between two partitions
The image does not need to be partitioned for it. When labelling the image, you will need to know the x, y coordinates (from the top left) and the w, h of the detected box. You only need to crop a part of the image that will contain the box. Cropping at the coordinates x - (224-w)/2, y- (224-h)/2 can be a good start. There are two issues with these coordinates:
the detected boxes will always be in the center, so the training will be biaised. To prevent it, a randomn factor can be used. x - (224-w)/r , y- (224-h)/r where r can be randomly taken from [1-10] for instance
if the detected boxes are bigger than 224 * 224 maybe you might first choose to resize the video keeping it ratio before cropping. In this case the boxe size (w, h) will need to be readjusted according to the scale used for the resizing

Related

How to detect text region in image?

Given an image (i.e. newspaper, scanned newspaper, magazine etc), how do I detect the region containing text? I only need to know the region and remove it, don't need to do text recognition.
The purpose is I want to remove these text areas so that it will speed up my feature extraction procedure as these text areas are meaningless for my application. Anyone know how to do this?
BTW, it will be good if this can be done in Matlab!
Best!
You can use Stroke Width Transform (SWT) to highlight text regions.
Using my mex implementation posted here, you can
img = imread('http://i.stack.imgur.com/Eyepc.jpg');
[swt swtcc] = SWT( img, 0, 10 );
Playing with internal parameters of the edge-map extraction and image filtering in SWT.m can help you tweak the resulting mask to your needs.
To get this result:
I used these parameters for the edge map computation in SWT.m:
edgeMap = single( edge( img, 'canny', [0.05 0.25] ) );
Text detection in natural images is an active area of research in computer vision community. U can refer to ICDAR papers. But in your case I think it should be simple enough. As you have text from newspaper or magazines, it should be of fixed size and horizontally oriented.
So, you can apply scanning window of a fixed size, say 32x32. Train it on ICDAR 2003 training dataset for positive windows having text in it. U can use a small feature set of color and gradients and train an SVM which would give a positive or negative result for a window having text or not.
For reference go to http://crypto.stanford.edu/~dwu4/ICDAR2011.pdf . For code, you can try their homepages
This example in the Computer Vision System Toolbox in Matlab shows how to detect text using MSER regions.
If your image is well binarized and you know the usual size of the text you could use the HorizontalRunLengthSmoothing and VerticalRunLengthSmoothing algorithms. They are implemented in the open source library Aforge.Net but it should be easy to reimplement them in Matlab.
The intersection of the result image from these algorithm will give you a good indication that the region contains text, it is not perfect but it is fast.

Simple Multi-Blob Detection of a Binary Image?

If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
X,Y,size,color
28.37,10.90,41,white
51.64,10.36,42,white
...
Here's an example (output image is produced thanks to the show-blobs.py tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.
http://www.mcs.csueastbay.edu/~grewe/CS6825/Mat/BinaryImageProcessing/BlobDetection.htm

About finding pupil in a video

I am now working on an eye tracking project. In this project I am tracking eyes in a webcam video (resolution if 640X480).
I can locate and track the eye in every frame, but I need to locate the pupil. I read a lot of papers and most of them refer to Alan Yuille's deformable template method to extract and track the eye features. Can anyone help me with the code of this method in any languages (matlab/OpenCV)?
I have tried with different thresholds, but due to the low resolution in the eye regions, it does not work very well. I will really appreciate any kind of help regarding finding pupil or even iris in the video.
What you need to do is to convert your webcam to a Near-Infrared Cam. There are plenty of tutorials online for that. Try this.
A Image taken from an NIR cam will look something like this -
You can use OpenCV then to threshold.
Then use the Erode function.
After this fill the image with some color takeing a corner as the seed point.
Eliminate the holes and invert the image.
Use the distance transform to the nearest non-zero value.
Find the max-value's coordinate and draw a circle.
If you're still working on this, check out my OptimEyes project: https://github.com/LukeAllen/optimeyes
It uses Python with OpenCV, and works fairly well with images from a 640x480 webcam. You can check out the "Theory Paper" and demo video on that page also. (It was a class project at Stanford earlier this year; it's not very polished but we made some attempts to comment the code.)
Depending on the application for tracking the pupil I would find a bounding box for the eyes and then find the darkest pixel within that box.
Some psuedocode:
box left_location = findlefteye()
box right_location = findrighteye()
image_matrix left = image[left_location]
image_matrix right = image[right_location]
image_matrix average = left + right
pixel min = min(average)
pixel left_pupil = left_location.corner + min
pixel right_pupil = right_location.corner + min
In the first answer suggested by Anirudth...
Just apply the HoughCirles function after thresholding function (2nd step).
Then you can directly draw the circles around the pupil and using radius(r) and center of eye(x,y) you can easily find out the Center of Eye..

Detect signs on roads

I have a video which has got turn left,turn right etc marks on the roads.
I have to detect those signs.I am going ahead with template matching in which I am matching the edge detected outputs,But I am not getting satisfactory results,Is there any other way to detect it? Please help.
If you want a solution that is not too complicated but more robust than template matching, I suggest you'd go for Hough voting on SIFT descriptors. This method is provides some degree of robustness to various problems, including partial occlusion of the sign, illumination variations and deformations of the sign. In particular, the method is completely invariant to rotation and uniform scaling of the template object.
The basic idea of the algorithm is as follows:
a) extract SIFT features from the template and query images.
b) set an arbitrary reference point in the template image and calculate, for each keypoint in the template image, the vector from the keypoint to the reference point.
c) match keypoints from the template image to the query image.
d) cast a vote for each matched keypoint for all object locations in the query image that this keypoint agrees with. You do that using the vectors calculated in step (b) and the location, scale and orientation of the matched keypoints in the query image.
e) If the object is indeed located in the image, the votes map should have a strong local maximum at it's location.
f) Optionally, you can verify the detection by using template matching.
You can read more about that method on Wikipedia here or in the original paper (by D. Lowe) here.
Using SIFT or SURF. You can get the invariable descriptor with training you can determine if the vector that represent the road marks (turn left, right or stop) match with the new in the video.
You might try extracting features and training a classifier (linear discriminant, neural network, naive Bayes, etc.). There are many candidate features you might try, but I'd think that you wouldn't need anything too complicated, even if the edge detection is poor, assuming that isolation of the sign is good. Some features to consider are: horizontal and vertical projections (row and column totals) and simple statistics of edge pixels (mean, standard deviation, skewness, etc. For more feature ideas, see any of these books:
"Shape Classification and Analysis: Theory and Practice", by Costa and Cesar
"Algorithms for Image Processing and Computer Vision", by J. R. Parker
"Digital Image Processing", by Gonzalez and Woods

Recognizing tetris pieces in C

I have to make an application that recognizes inside an black and white image a piece of tetris given by the user. I read the image to analyze into an array.
How can I do something like this using C?
Assuming that you already loaded the images into arrays, what about using regular expressions?
You don't need exact shape matching but approximately, so why not give it a try!
Edit: I downloaded your doc file. You must identify a random pattern among random figures on a 2D array so regex isn't suitable for this problem, lets say that's the bad news. The good news is that your homework is not exactly image processing, and it's much easier.
It's your homework so I won't create the code for you but I can give you directions.
You need a routine that can create a new piece from the original pattern/piece rotated. (note: with piece I mean the 4x4 square - all the cells of it)
You need a routine that checks if a piece matches an area from the 2D image at position x,y - the matching area would have corners (x-2, y-2, x+1, y+1).
You search by checking every image position (x,y) for a match.
Since you must use parallelism you can create 4 threads and assign to each thread a different rotation to search.
You might not want to implement that from scratch (unless required, of course) ... I'd recommend looking for a suitable library. I've heard that OpenCV is good, but never done any work with machine vision myself so I haven't tested it.
Search for connected components (i.e. using depth-first search; you might want to avoid recursion if efficiency is an issue; use your own stack instead). The largest connected component should be your tetris piece. You can then further analyze it (using the shape, the size or some kind of border description)
Looking at the shapes given for tetris pieces in Wikipedia, called "I,J,L,O,S,T,Z", it seems that the ratios of the sides of the bounding box (easy to find given a binary image and C) reveal whether you have I (4:1) or O (1:1); the other shapes are 2:3.
To detect which of the remaining shapes you have (J,L,S,T, or Z), it looks like you could collect the length and position of the shape's edges that fall on the bounding box's edges. Thus, T would show 3 and 1 along the 3-sides, and 1 and 1 along the 2 sides. Keeping track of the positions helps distinguish J from L, S from Z.

Resources