Image scaling in C - c

I am designing a jpeg to bmp decoder which scales the image. I have been supplied with the source code for the decoder so my actual work is to design a scaler . I do not know where to begin. I have scouted the internet for the various scaling algorithms but am not sure where to introduce the scaling. So should I do the the scaling after the image is converted into bmp or should I do this during the decoding at the MCU level. am confused :(
If you guys have some information to help me out, its appreciated. any material to read, source code to analyse etc....
Oh I forgot to mention one more thing, this is a porting project from the pc platform to a fpga, so, not all the library files are available on the target platform.

There are many ways to scale an image.
The easiest way is to decode the image and then scale using a naive scaling algorithm, something like:
dest_pixel [x,y] = src_pixel [x * x_scale_factor, y * y_scale_factor]
where x/y_scale_factor is
src_size / dest_size
Once you have that working, you can look into more complex scaling systems, things like bilinear filter. For example, the destination pixel is the average of several source pixels when reducing the size and an interpolation of several source pixels when increasing the size.

Related

Seeking some guidance on webcam picture display using GTK+ and Cairo in C

In this question I'm mostly seeking for advice and guidance on overall understanding of some concepts of drawing wth GTK+ and Cairo in C language (IMO the information on topic is rather scarce, also my experience in really modest).
I'm coding some pet application which captures frames from webcam and displays them on a GTK window.
My app is working, but there are some points which I don't feel like grasped.
Overall process:
I've got a webcam frame as an array of bytes mmaped from webcam device to my app's process memory. So when another frame is captured what I have is a 640*480*3 bytes long array which is denoted as being in a RGB24 format. After some searching it looks like for a purpose of displaying it in a GTK window I need to create an object called drawing area using gtk_drawing_area_new(), add a "draw" callback and do "drawing" there in a designated callback. So, according to Cairo "drawing" is a process of applying "source" to "destination". I assume that I already have a source - my webcam mmaped pixels, but it looks like I need to use some "source" that Cairo is able to understand. I found a candidate:
cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_RGB24, 640, 480);
As I see this call creates some Cairo acceptable object, which along the way allocates a buffer in my app's memory which I can get, using:
unsigned char* surface_data = cairo_image_surface_get_data(surface);
According to docs this is a 640x480x4 bytes long buffer, which, on a little endian archs, should be filled with BGRA formatted pixel data.
Then I should rearrange my original webcam pixels for EVERY frame captured using this :
for (size_t idx_src=0, idx_dst=0; idx_src<640*480*3; idx_dst+=4, idx_src+=3) {
surface_data[idx_dst] = image[idx_src+2]; //B [3rd pos -> 1st pos]
surface_data[idx_dst+1] = image[idx_src+1]; //G [no change]
surface_data[idx_dst+2] = image[idx_src]; //R [1st pos -> 3rd pos]
}
After this I should do "drawing" with:
cairo_set_source_surface(cr, surface, 0, 0);
cairo_paint(cr);
So questions:
Is it what is supposed to be done for task at hand or I miss
something completely here ?
What confuses me is that I should
rearrange my original webcam pixels for EVERY frame captured (this
presumably consumes some cpu time, could be a limiting factor for
capturing in HD res at high frame rates). Is there some other way ?
Let's suppose I somehow acquire pixels from webcam in a Cairo
conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a
way to "wrap" this data in some Cairo acceptable object to exclude
pixel rearranging part ?
Any other thoughts I should've consider ?
Thanks for attention.
For most of your questions: Cairo only supports some image formats. Since your data comes in another format, you will have to convert it. All this copying around will likely be too slow. To make this work with an acceptable speed, you would need some other approach. No, I do not have any helpful suggestions here.
An unhelpful one would be: Is there some example for this webcam that you could look at?
Let's suppose I somehow acquire pixels from webcam in a Cairo conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a way to "wrap" this data in some Cairo acceptable object to exclude pixel rearranging part ?
Yup. cairo_image_surface_create_for_data.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

How to detect text region in image?

Given an image (i.e. newspaper, scanned newspaper, magazine etc), how do I detect the region containing text? I only need to know the region and remove it, don't need to do text recognition.
The purpose is I want to remove these text areas so that it will speed up my feature extraction procedure as these text areas are meaningless for my application. Anyone know how to do this?
BTW, it will be good if this can be done in Matlab!
Best!
You can use Stroke Width Transform (SWT) to highlight text regions.
Using my mex implementation posted here, you can
img = imread('http://i.stack.imgur.com/Eyepc.jpg');
[swt swtcc] = SWT( img, 0, 10 );
Playing with internal parameters of the edge-map extraction and image filtering in SWT.m can help you tweak the resulting mask to your needs.
To get this result:
I used these parameters for the edge map computation in SWT.m:
edgeMap = single( edge( img, 'canny', [0.05 0.25] ) );
Text detection in natural images is an active area of research in computer vision community. U can refer to ICDAR papers. But in your case I think it should be simple enough. As you have text from newspaper or magazines, it should be of fixed size and horizontally oriented.
So, you can apply scanning window of a fixed size, say 32x32. Train it on ICDAR 2003 training dataset for positive windows having text in it. U can use a small feature set of color and gradients and train an SVM which would give a positive or negative result for a window having text or not.
For reference go to http://crypto.stanford.edu/~dwu4/ICDAR2011.pdf . For code, you can try their homepages
This example in the Computer Vision System Toolbox in Matlab shows how to detect text using MSER regions.
If your image is well binarized and you know the usual size of the text you could use the HorizontalRunLengthSmoothing and VerticalRunLengthSmoothing algorithms. They are implemented in the open source library Aforge.Net but it should be easy to reimplement them in Matlab.
The intersection of the result image from these algorithm will give you a good indication that the region contains text, it is not perfect but it is fast.

Advice for Object Detection on Embedded System with no non-standard libraries

I am looking for some advice for a good way to detect either square or circular objects in an image. I currently have a canny edge algorithm running on the original greyscale and I can produce this output:
http://imgur.com/FAwowr1
Now I can see that there is a cubesat in this picture, but what is a good computationally efficient way that the program can see that aswell? I have looked at houghs transform but that seems to be very computation heavy. I have also looked at Harris corner detect, but I feel I would get to many false positives, for I am essentially looking to isolate pictures that contain said cube satellite.
Anyone have any thoughts on some good algorithms to pursue? I am very limited on space so I cannot use any large external libraries like opencv. (This is all in C btw)
Many Thanks!
I would into what is called mathematical morphology
Basically you operate on binary images, so you must find a clever way to threshold them first , the you do operations such as erosion and dilation with some well selected structuring element to extract areas of interest in your image.

Resources