How is frame data stored in libav? - c

I am trying to learn to use libav. I have followed the very first tutorial on dranger.com, but I got a little confused at one point.
// Write pixel data
for(y=0; y<height; y++)
fwrite(pFrame->data[0]+y*pFrame->linesize[0], 1, width*3, pFile);
This code clearly works, but I don't quite understand why, particulalry I don't understand how the frame data in pFrame->data stored, whether or not it depends on the format/codec in use, why pFrame->data and pFrame->linesize is always referenced at index 0, and why we are adding y to pFrame->data[0].
In the tutorial it says
We're going to be kind of sketchy on the PPM format itself; trust us, it works.
I am not sure if writing it to the ppm format is what is causing this process to seem so strange to me. Any clarification on why this code is the way it is and how libav stores frame data would be very helpful. I am not very familiar with media encoding/decoding in general, thus why I am trying to learn.

particularly I don't understand how the frame data in pFrame->data stored, whether or not it depends on the format/codec in use
Yes, It depends on the pix_fmt value. Some formats are planar and others are not.
why pFrame->data and pFrame->linesize is always referenced at index 0,
If you look at the struct, you will see that data is an array of pointers/a pointer to a pointer. So pFrame->data[0] is a pointer to the data in the first "plane". Some formats, like RGB have a singe plane, where all data is stored in one buffer. Other formats like YUV, use a separate buffer for each plane. e.g. Y = pFrame->data[0], U = pFrame->data[1], pFrame->data[3] Audio may use one plane per channel, etc.
and why we are adding y to pFrame->data[0].
Because the example is looping over an image line by line, top to bottom.
To get the pointer to the fist pixel of any line, you multiply the linesize by the line number then add it to the pointer.

Related

what is AV_SAMPLE_FMT_FLT

SwrContext *swr_ctx = swr_alloc_set_opts(NULL,
AV_CH_LAYOUT_STEREO,
AV_SAMPLE_FMT_FLT,
sample_rate,
pCodecParameters->channel_layout,
pCodecParameters->format,
pCodecParameters->sample_rate,
0,
NULL);
what exactly AV_SAMPLE_FMT_FLT is ? i already read docs but i want to know that what is float layout means in the context of Audio. How actually binary data of audio will look in that format.
It means every sample is in single buffer and per sample data is 32bit float.
Buffer AFAIK structured like this:
[SAMPLE_CH0][SAMPLE_CH1]...[SAMPLE_CHn]
[SAMPLE_CH0][SAMPLE_CH1]...[SAMPLE_CHn]
So on so forth. This repeats the number of "samples" times.
I may be wrong though, you need to check it yourself.

Seeking some guidance on webcam picture display using GTK+ and Cairo in C

In this question I'm mostly seeking for advice and guidance on overall understanding of some concepts of drawing wth GTK+ and Cairo in C language (IMO the information on topic is rather scarce, also my experience in really modest).
I'm coding some pet application which captures frames from webcam and displays them on a GTK window.
My app is working, but there are some points which I don't feel like grasped.
Overall process:
I've got a webcam frame as an array of bytes mmaped from webcam device to my app's process memory. So when another frame is captured what I have is a 640*480*3 bytes long array which is denoted as being in a RGB24 format. After some searching it looks like for a purpose of displaying it in a GTK window I need to create an object called drawing area using gtk_drawing_area_new(), add a "draw" callback and do "drawing" there in a designated callback. So, according to Cairo "drawing" is a process of applying "source" to "destination". I assume that I already have a source - my webcam mmaped pixels, but it looks like I need to use some "source" that Cairo is able to understand. I found a candidate:
cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_RGB24, 640, 480);
As I see this call creates some Cairo acceptable object, which along the way allocates a buffer in my app's memory which I can get, using:
unsigned char* surface_data = cairo_image_surface_get_data(surface);
According to docs this is a 640x480x4 bytes long buffer, which, on a little endian archs, should be filled with BGRA formatted pixel data.
Then I should rearrange my original webcam pixels for EVERY frame captured using this :
for (size_t idx_src=0, idx_dst=0; idx_src<640*480*3; idx_dst+=4, idx_src+=3) {
surface_data[idx_dst] = image[idx_src+2]; //B [3rd pos -> 1st pos]
surface_data[idx_dst+1] = image[idx_src+1]; //G [no change]
surface_data[idx_dst+2] = image[idx_src]; //R [1st pos -> 3rd pos]
}
After this I should do "drawing" with:
cairo_set_source_surface(cr, surface, 0, 0);
cairo_paint(cr);
So questions:
Is it what is supposed to be done for task at hand or I miss
something completely here ?
What confuses me is that I should
rearrange my original webcam pixels for EVERY frame captured (this
presumably consumes some cpu time, could be a limiting factor for
capturing in HD res at high frame rates). Is there some other way ?
Let's suppose I somehow acquire pixels from webcam in a Cairo
conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a
way to "wrap" this data in some Cairo acceptable object to exclude
pixel rearranging part ?
Any other thoughts I should've consider ?
Thanks for attention.
For most of your questions: Cairo only supports some image formats. Since your data comes in another format, you will have to convert it. All this copying around will likely be too slow. To make this work with an acceptable speed, you would need some other approach. No, I do not have any helpful suggestions here.
An unhelpful one would be: Is there some example for this webcam that you could look at?
Let's suppose I somehow acquire pixels from webcam in a Cairo conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a way to "wrap" this data in some Cairo acceptable object to exclude pixel rearranging part ?
Yup. cairo_image_surface_create_for_data.

C create a txt.ppm file from a safed textfile

PPM1
Textfile
I tried create a C code, that can create a ppm, like on the picture 1 from a textfile like on picture 3. When somemone can help, it where great. I am a new Programmer, i tried do do that Code for 6h. I tried to scan in the data from the textfile and put it in a array and try to make withe that a ppm, but my code is unusable:/.
The path forward is to split the task into smaller sub-tasks, solve and test each one of them separately, and only after they all work, combine them into a single program.
Because the OP has not posted any code, I will not post any directly useful code either. If OP is truly blocked due to not getting any forward progress even after trying, this should actually be of practical use. If OP is just looking for someone to do their homework, this should annoy them immensely. Both work for me. :)
First sub-task is to read the input in an array. There are several examples on the web, and related questions here. You'll want to put this in a separate function, so merging into the complete project later on is easier. Since you are a beginner programmer, you could go for a function like
int read_numbers(double data[], int max);
so that the caller declares the maximum number of data points, and the function returns the number of data points read; or negative if an error occurs. Your main() for testing that function should be trivial, say
#define MAX_NUMBERS 500
int main(void)
{
double x[MAX_NUMBERS];
int i, n;
n = read_numbers(x, MAX_NUMBERS, stdin);
if (n <= 0) {
fprintf(stderr, "Error reading numbers from standard input.\n");
return EXIT_FAILURE;
}
printf("Read %d numbers:\n", n);
for (i = 0; i < n; i++)
printf("%.6f\n", x[i]);
return EXIT_SUCCESS;
}
The second sub-task is to generate a PPM image. PPM is actually a group of closely related image formats, also called Netpbm formats. The example image is a bitmap image -- black and white only; no colors, no shades of gray --, so the PBM format (or variant of PPM) is suitable for this.
I suspect it is easiest to attack this sub-task by using a two-dimensional array, sized for the largest image you can generate (i.e. unsigned char bitmap[HEIGHT_MAX][WIDTH_MAX];), but note that you can also just use a part of it. (You could also generate the image on the fly, without any array, but that is much more error prone, and not as universally applicable as using an array to store the image is.)
You'll probably need to decide the width of the bitmap based on the maximum data value, and the height of the bitmap based on the number of data points.
For testing, just fill the array with some simple patterns, or maybe just a diagonal line from top left corner to bottom right corner.
Then, consider writing a function that sets a rectangular area to a given value (0 or 1). Based on the image, you'll also need a function that draws vertical dotted lines, changing (exclusive-OR'ing) the state of each bit. For example,
#define WIDTH_MAX 1024
#define HEIGHT_MAX 768
unsigned char bitmap[HEIGHT_MAX][WIDTH_MAX];
int width = 0;
int height = 0;
void write_pbm(FILE *out); /* Or perhaps (const char *filename)? */
void fill_rect(int x, int y, int w, int h, unsigned char v);
void vline_xor(int x, int y, int h);
At this point, you should have realized that the write_pbm() function, the one that saves the PBM image, should be written and tested first. Then, you can use the fill_rect() function to not just draw filled rectangles, but also to initialize the image -- the portion of the array you are going to use -- to a background color (0 or 1). All of the three functions above you can, and should, do in separate sub-steps, so that at any point you can rely on that the code you've written earlier is correct and tested. That way, you only need to look at bugs in the code you have written since the last successful testing! It might sound like a slow way to progress, but it actually turns out to be the fastest way to get code working. You very quickly start to love the confidence the testing gives you, and the fact that you only need to focus and worry about one thing at a time.
The third sub-task is to find out a way to draw the rectangles and vertical dotted lines, for various inputs.
(I cheated a bit, above, and included the fill_rect() and vline_xor() functions in the previous sub-task, because I could tell those are needed to draw the example picture.)
The vertical dotted lines are easiest to draw afterwards, using a function that draws a vertical line, leaving every other pixel untouched, but exclusive-ors every other pixel. (Hint: for (y = y0; y < y0 + height; y += 2) bitmap[y][x] ^= 1;)
That leaves the filled rectangles. Their height is obviously constant, and they have a bit of vertical space in between, and they start at the left edge; so, the only thing, is to calculate how wide each rectangle needs to be. (And, how wide the entire bitmap should be, and how tall, as previously mentioned; the largest data value, and the number of data values, dictates those.)
Rather than writing one C source file, and adding to it every step, I recommend you write a separate program for each of the sub-steps. That is, after every time you get a sub-part working, you save that as a separate file, and keep it for reference -- a back-up, if you will. If you lose your way completely, or decide on another path for solving some problem, you only need to revert back to your last reference version, rather than start from scratch.
That kind of work-flow is known to be reliable, efficient, and scalable: it does not matter how large the project is, the above approach works. Of course, there are tools to help us do this in an easy, structured manner (with comments on each commit -- completed unit of code, if you will); and git is a popular, free, but very powerful one. Just do a web search for git for beginners if you are interested in it.
If you bothered to read all through the above wall of text, and grasped the work flow it describes, you won't have much trouble learning how to use tools like git to help you with the workflow. You'll also love how much typing tools like make (and the Makefiles containing the make recipes) help, and how easy it is to make and maintain projects that not only work, but also look professional. Yet, don't try to grasp all of it at once: take it one small step at a time, and verify you have a solid footing, before going onwards. That way, when you stumble, you won't hurt yourself; just learn.
Have fun!

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

Image-processing basics

I have to do some image processing but I don't know where to start. My problem is as follows :-
I have a 2D fiber image (attached with this post), in which the fiber edges are denoted by white color and the inside of the fiber is black. I want to choose any black pixel inside the fiber, and travel from it along the length of the fiber. This will involve comparing the contrast with the surrounding pixels and then travelling in the desired direction. My main aim is to find the length of the fiber
So can someone please tell me atleast where to start? I have made a rough algorithm in my mind on how to approach my problem but I don't know even which software/library to use.
Regards
Adi
EDIT1 - Instead of OpenCV, I started using MATLAB since I found it much easier. I applied the Hough Transform and then Houghpeaks function with max no. of peaks = 100 so that all fibers are included. After that I got the following image. How do I find the length now?
EDIT2 - I found a research article on how to calculate length using Hough Transform but I'm not able to implement it in MATLAB. Someone please help
If your images are all as clean as the one you posted, it's quite an easy problem.
The very first technique I'd try is using a Hough Transform to estimate the line parameters, and there is a good implementation of the algorithm in OpenCV. After you have them, you can estimate their length any way you want, based on whatever other constraints you have.
Problem is two-fold as I see it:
1) locate start and end point from your starting position.
2) decide length between start and end points
Since I don't know your input data I assume it's pixel data with a 0..1 data on each pixel representing it's "whiteness".
In order to find end points I would do some kind of WALKER/AI that tries to walk in different locations, knowing original pos and last traversed direction then continuing along that route until "forward arc" is all white. This assumes fiber is somewhat straight (is it?).
Once you got start and end points you can input these into a a* path finding algorithm and give black pixels a low value and white very high. Then find shortest distance between start and end point, that is the length of the fiber.
Kinda hard to give more detail since I have no idea what techniques you gonna use and some example input data.
Assumptions:
-This image can be considered a binary image where there are only 0s(black) and 1s(white).
-all the fibers are straight and their starting and ending points are on borders.
-we can come up with a limit for thickness in fiber(thickness of white lines).
Under these assumptions:
start scanning the image border(start from wherever you want in whichever direction you want...just be consistent) until you encounter with the first white pixel.At this point your program will understand that this is definitely a starting point. By knowing this, you will gather all the white pixels until you reach a certain limit(or a threshold). The idea here is, if there is a fiber,you will get the angle between the fiber and the border the starting point is on...of course the more pixels you get(the inner you get)the surer you will be in the end. This is the trickiest part. after somehow ending up with a line...you need to calculate the angle(basic trigonometry). Since you know the starting point, the width/height of the image and the angle(or cos/sin of those) you will have the exact coordinate of the end point. Be advised...the exactness here is not really what you might have understood because we may(the thing is we will) have calculation errors in cos/sin values. So you need to hold the threshold as long as possible. So your end point will not be a point actually but rather an area indicating possibility that the ending point is somewhere inside that area. The rest is just simple maths.
Obviously you can put too much detail in this method like checking the both white lines that makes the fiber and deciding which one is longer or you can allow some margin for error since those lines will not be straight properly...this is where a conceptual thickness comes to the stage etc.
Programming:
C# has nice stuff and easy for you to use...I'll put some code here...
newBitmap = new Bitmap(openFileDialog1.FileName);
for (int x = 0; x < newBitmap.Width; x++)
{
for (int y = 0; y < newBitmap.Height; y++)
{
Color originalColor = newBitmap.GetPixel(x, y);//gets the pixel value...
//things go here...
}
}
you'll get the image from a openfiledialog and bitmap the image. inside the nested for loop this code scans the image left-to-right however you can change this...
Since you know C++ and C, I would recommend OpenCV
. It is open-source so if you don't trust anyone like me, you won't have a problem ;). Also if you want to use C# like #VictorS. Mentioned I would use EmguCV which is the C# equivilant of OpenCV. Tutorials for OpenCV are included and for EmguCV can be found on their website. Hope this helps!
Download and install the latest version of 3Dslicer,
Load your data and go the the package>EM segmenter without Atlas>
Choose your anatomical tree in 2 different labels, the back one which is your purpose, the white edges.
The choose the whole 2D image as your ROI and click on segment.
Here is the result, I labeled the edges in green and the black area in white
You can modify your tree and change the structures you define.
You can give more samples to your segmentation to make it more accurate.

Resources