I am a newbie in CNN and I want to ask what does the channels do in SSD for example? For what reason they exist? For example 18X18X1024 (third number)?
Thanks for any answer.
The dimensions of an image can be represented using 3 numbers. For example, a color image in CIFAR-10 dataset has a height of 32 pixels, width of 32 pixels and is represented as 32 x 32 x 3. Here 3 represents the number of channels in your image. Color images have a channel size of 3 (usually RGB), while a grayscale image will have a channel size of 1.
A CNN will learn features of the images that you feed it, with increasing levels of complexity. These features are represented by the channels. The deeper you go into the network, the more channels you will have that represents these complex features. These features are then used by the network to perform object detection.
In your example, 18X18X1024 means your input image is now represented with 1024 channels, where each channel represents some complex feature/information about the image.
Since you are a beginner, I suggest you look into how CNNs work in general, before diving into object detection. A good start would be image classification using CNNs. I hope this answers your question. Happy learning!! :)
Related
Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.
I've been using 24bit .png with Alpha, from Photoshop, and just tried a .psd which worked fine with OpenGL ES, but Metal didn't see the Alpha channel.
What's the absolutely most performant texture format for particles within SceneKit?
Here's a sheet to test on, if needs be.
It looks white... right click and save as in the blank space. It's an alpha heavy set of rings. You can probably barely make them out if you squint at the screen:
exaggerated example use case:
https://www.dropbox.com/s/vu4dvfl0aj3f50o/circless.mov?dl=0
// Additional points for anyone can guess the difference between the left and right rings in the video.
Use a grayscale/alpha PNG, not an RGBA one. Since it uses 16 bits per pixel (8+8) instead of 32 (8+8+8+8), the initial texture load will be faster and it may (depending on the GPU) use less memory as well. At render time, though, you’re not going to see much of a speed difference, since whatever the texture format is it’s still being drawn to a full RGB(A) render buffer.
There’s also PVRTC, which can get you down as low as 2–4 bits per pixel, but I tried Imagine’s tool out on your image and even the highest quality settings caused a bunch of artifacts like the below:
Long story short: go with a grayscale+alpha PNG, which you can easily export from Photoshop. If your particle system is hurting your frame rate, reduce the number and/or size of the particles—in this case you might be able to get away with layering a couple of your particle images on top of each other in the source texture atlas, which may not be too noticeable if you pick ones that differ in size enough.
If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
X,Y,size,color
28.37,10.90,41,white
51.64,10.36,42,white
...
Here's an example (output image is produced thanks to the show-blobs.py tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.
http://www.mcs.csueastbay.edu/~grewe/CS6825/Mat/BinaryImageProcessing/BlobDetection.htm
I'm writing an application that displays different color swatches to help people with color coordination. How can I find the RGB values of real world objects?
For example, one of the colors is Red Apple but obviously a red apple isn't just red. It has hints of other colors in it.
Well, it's not an easy task to be honest, but a good place to start would be with a digital camera and/or a flatbed scanner.
Once you have an image in the computer then the task is somewhat easier beacuse all you need is to use a picture / photo editing package such as photoshop or the gimp to sample a selection of colours before using them in your application.
once you have a few different samples, then you need to average them, and that's quite easy to do. Lets say you took 5 samples of RGB values:
255,50,10
250,40,11
253,51,15
248,60,13
254,45,20
You simply need to add up each component and divide by how many samples you took so:
Red = (255 + 250 + 253 + 248 + 254) / 5
Green = (50 + 40 + 51 + 60 + 45) / 5
Blue = (10 + 11 + 15 + 13 + 20) / 5
Now, if what your asking is how do I do this automatically in program code, that's a whole different kettle of fish, first you'll need something like a web cam, then you'll need to write code to capture images from the web-cam, then once you have your image you'll need not just the ability to pick colour, but to actually figure out where in the image the object you want to pick the colour from actually is.
For now, I'd look at using the first method, it's a bit manual I agree, but far easier and will get you started.
The image processing required to do the second maths has given software engineers & comp scientists headaches for years and is still not a perfect science... and that's before we even start thinking about the maths.
For each object, I would do it this way:
Use goolge images to search pictures of the object you want.
Select the one that have the most accurate color, say, to your idea of a "red apple" for example.
--you can skip 1 and 2 if you have a digital picture of the object.
Open that image in Paint; you can do it stroking the "Impr Pant" key on your keyboard, opening Paint, and then "ctrl+v" will paste the screenshoot in paint.
Select the pick color tool on Paint (the one like a dropper) and click on the image, just in the place with the color you want.
Select from the menu, "Colors -> Edit colors" and then in the Colors palette that opens, clic on "Define Custom Colors".
You got it, there RGB values are at your right.
There must be an easier way, but this will work.
If your looking for a programmatic solution then you would look into bitwise operations. The general idea here is you would read the image in it's binary roots and then you could logically convert the bits into RGB values. There are several methods for doing this depending on programming language. Here is a method for Actionscript3.
http://www.flashandmath.com/intermediate/rgbs/explanations.html
also if your looking for the average color look here, (for AS3)
http://blog.soulwire.co.uk/code/actionscript-3/extract-average-colours-from-bitmapdata
a related method and explanation for Java
Bitwise version of finding RGB in java
I am designing a jpeg to bmp decoder which scales the image. I have been supplied with the source code for the decoder so my actual work is to design a scaler . I do not know where to begin. I have scouted the internet for the various scaling algorithms but am not sure where to introduce the scaling. So should I do the the scaling after the image is converted into bmp or should I do this during the decoding at the MCU level. am confused :(
If you guys have some information to help me out, its appreciated. any material to read, source code to analyse etc....
Oh I forgot to mention one more thing, this is a porting project from the pc platform to a fpga, so, not all the library files are available on the target platform.
There are many ways to scale an image.
The easiest way is to decode the image and then scale using a naive scaling algorithm, something like:
dest_pixel [x,y] = src_pixel [x * x_scale_factor, y * y_scale_factor]
where x/y_scale_factor is
src_size / dest_size
Once you have that working, you can look into more complex scaling systems, things like bilinear filter. For example, the destination pixel is the average of several source pixels when reducing the size and an interpolation of several source pixels when increasing the size.