Does tensorflow.js mobilenet model.classify work better with square images? - tensorflow.js

In Python code I see that images given to MobileNet are 224x224 while the Tensorflow.js version seems to work with any size or aspect ratio. For non-square images does it stretch them or add white or transparent pixels to produce square input with the aspect ratio of the image maintained? If it does stretch it to become square should one do some image manipulation before using model.classify?
https://github.com/tensorflow/tfjs-models/tree/master/mobilenet#making-a-classification doesn't say anything about this.

There is no requirements for images to be square. Using non square images will achieve the same result. Maybe the reason why some neural networks such as mobilenet use square images are for operation such as convolution where the kernel is chosen most of the time as square.
To use mobilenet for classification, the image needs to be reshape to a shape of [224, 224, 3] which is the input size of the network. Methods such as .resizeBilinear, resizeNearestNeighbor, ... will achieve that very purpose. Obviously transforming a non square image to a square image will distort the image. But those algorithms use the technique of anti-aliasing to make up for the distorsion.
But the distorsion of the input image is the less thing one need to be concerned with. Actually, a good model prediction should be invariant to such distorsion, because the trained data were so much distorted and augmented with noise so that the model can generalize well.

Related

Determine chessboard dimensions in pixels

Similar to calibrating a single camera 2D image with a chessboard, I wish to determine the width/height of the chessboard (or of a single square) in pixels.
I have a camera aimed vertically at the ground, ensured to be perfectly level with the surface below. I am using the camera to determine the translation between consequtive frames (successfully achieved using fourier phase correlation), at the moment my result returns the translation in pixels, however I would like to use techniques similar to calibration, where I move the camera over the chessboard which is flat on the ground, to automatically determine the size of the chessboard in pixels, relative to my image height and width.
Knowing the size of the chessboard in millimetres, I can then convert a pixel unit to a real-world-unit in millimetres, ie, 1 pixel will represent a distance proportional to the height of the camera above the ground. This will allow me to convert a translation in pixels to a translation in millimetres, recalibrating every time I change the height of the camera.
What would be the recommended way of achieving this? Surely it must be simpler than single camera 2D calibration.
OpenCV can give you the position of the chessboard's corners with cv::findChessboardCorners().
I'm not sure if the perspective distortion will affect your calculations, but if the chessboard is perfectly aligned beneath the camera, it should work.
This is just an idea so don't hit me.. but maybe using the natural contrast of the chessboard?
"At some point it will switch from bright to dark pixels and that should happen (can't remember number of columns on chessboard) times." should be a doable algorithm.

Efficient image translation by (x,y) pixels?

Looking to see if anyone can recommend a computationally efficient method for translating/shifting an image by (x,y) pixels.
Reason being, I have been part successful in implementing the fourier-mellin transform to determine the rotation and translation between image frames. Once the image is unrotated I would like to untranslate the image by the calculated pixel offset (x,y). Allowing me to test the image correlation after rotation and translation.
I would think that a efficient method would be to:
Make a border cv::copyMakeBorder().
Use a ROI e.g. make a new matrix header without copying data.
Good luck

Blob detection in C (not with OPENCV)

I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Even if is something written inside the paper. I need to detect the paper and is corner, because what i really want is to draw a opengl polygon over the paper in each corner of the paper will be a corner of the polygon. Then i need the coordinates of the paper to do other stuffs.
So i need to:
- detect a square white blob.
- get the coordinates of the cornes
- draw a polygon over the white sheet.
Any ideias how can i do that?
Much depends on context. For example, suppose that you:
know that the paper is always roughly centered (i.e. W/2, Y/2 is always inside the blob), and no more rotated than 45 degrees (30 would be better)
have a suitable border around the sheet so that the corners never touch the edges of the FOV
are able (through analysis of local variance, or if you're lucky, check of background color or luminance) to say whether a point is inside or outside the blob
the inside/outside function never fails (except possibly in the close vicinity of a border)
then you could walk a line from a point on the border (surely outside) and the center (surely inside), even through bisection, and find a point - an areal - on the edge.
Two edge points give a rect (two areals give a beam), two rects give an intersection (two beams give a larger areal) - and there's your corner. You should carry along the detection uncertainty (areal radius) in order to validate corners (another less elegant approach is to roughly calculate where the corner is, and pinpoint it with a spiral search or drunkard's walk).
This algorithm is amenable to parallelization and, as long as the hypotheses hold, should be really fast.
All that said, it remains a hack -- I agree with unwind, why reinvent the wheel? If you have memory or CPU constraints (embedded systems, etc.), I believe there ought to be OpenCV and e-Vision "lite" ports also for ARM and embedded platforms.
(Sorry for my terminology - I'm monkey-translating from Italian. "Areal" is likely to correspond to your "blob", a beam is the family of lines joining all couples of points in two different blobs, line intensity being the product of distance from a point from its areal's center)
I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Your first shot could be a simple flood-fill. That is, select a good threshold to binarize the image and apply the algorithm. The threshold can be fixed if you know the paper is always brighter than X and the background is always darker than this. Or this can be an adaptive threshold, for example Otsu's method. OpenCV offers this for free.
If you'd need to speed it up you could use a union-find data structure.
Finally you'd need to come up with some heuristic how to identify the corners (e.g. the four extreme values in x/y direction).
Then i need [...] the coordinates of the cornes [...]
Then you don't need blob detection, but corner detection or contour detection in the first place. OpenCV has some nice functionality for exactly this.
If you can't use it, I would suggest to binarize the image as above and use a harris-detector to find the corners of the object.
OpenCV's TBB support could also come quite handy if you'd use it and you have problems to meet your real-time requirements.

ImageProcessing in WPF (Fant BitmapScalingMode)

My application presents an image that can be scaled to a certain size. I'm using the Image WPF control with the scaling method of FANT.
However, there is no documentation how this scaling algorithm works.
Can anyone reference me to the relevant link for this algorithm description?
Nir
Avery Lee of VirtualDub states that it's a box filter for downscaling and linear for upscaling. If I'm not mistaken, "box filter" here means basically that each output pixel is a "flat" average of several input pixels.
In practice, it's a lot more blurry for downscaling than GDI's cubic downscaling, so the theory about averaging sounds about right.
I know what it is, but I couldn't find much on Google either :(
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4056711 is the appropriate paper I think; behind a pay-wall.
You don't need to understand the algorithm to use it. You should explicitly make the choice each time you create a bitmap control that is scaled whether you want it high-quality scaled or low quality scaled.

What is dataset Bounding Box?

I am bit new to Imaging and want to understand below:
what is the bounding box of a dataset and why is that needed? Does it represent something of measurement in real world or just for computer screen where it is displayed? How is this related to the image size specified in pixels?
What does WMTS layers zoom level & matrix sets mean? I understand that WMTS works by using getting tiles of the dataset. Also, I see that the get Capabilities for a specific WMTS dataset returns back matrix Sets in the XML which I don't understand?
what do the matrix datasets and zoom levels signify and how can I understand them as a layman?
I have tried googling a bit but it looks like the articles assume some technical knowledge around this already which I am trying to gather.
A bounding box is the (imaginary) rectangle that you can draw around a dataset (or feature) that touches it's maximums and minimums in both X and Y direction. It is measured in the same units as the geometry. It is related to the image size in pixels as the resolution or scale which are bbox.width/image.width or (the inverse), and are in units of metres/pixel or pixels/metre (or degree or foot).
A WMTS layer is a set of pre-rendered tiles that have been produced at a fixed set of scales and over a fixed area. These are related in the matrix sets of a WMTS layer - the zoom level is how far down that set of scales you have traversed with 0 being the top and an arbitrary number (usually between 15-20 for global data sets) being the lowest (or most detailed).
See 2. - You should not really need to understand them in detail as your client library will handle all of that for you.

Resources