Getting RGB values for each pixel from a raw image in C - c

I want to read the RGB values for each pixel from a raw image. Can someone tell me how to achieve this? Thanks for help!
the format of my raw image is .CR2 which come from camera.

Assuming the image is w * h pixels, and stored in true "packed" RGB format with no alpha component, each pixel will require three bytes.
In memory, the first line of the image might be represented in awesome ASCII graphics like this:
R0 G0 B0 R1 G1 B1 R2 G2 B2 ... R(w-1) G(w-1) B(w-1)
Here, each Rn Gn and Bn represents a single byte, giving the red, green or blue component of pixel n of that scanline. Note that the order of the bytes might be different for different "raw" formats; there's no agreed-upon world standard. Different environments (graphics cards, cameras, ...) do it differently for whatever reason, you simply have to know the layout.
Reading out a pixel can then be done by this function:
typedef unsigned char byte;
void get_pixel(const byte *image, unsigned int w,
unsigned int x,
unsigned int y,
byte *red, byte *green, byte *blue)
{
/* Compute pointer to first (red) byte of the desired pixel. */
const byte * pixel = image + w * y * 3 + 3 * x;
/* Copy R, G and B to outputs. */
*red = pixel[0];
*green = pixel[1];
*blue = pixel[2];
}
Notice how the height of the image is not needed for this to work, and how the function is free from bounds-checking. A production-quality function might be more armor-plated.
Update If you're worried this approach will be too slow, you can of course just loop over the pixels, instead:
unsigned int x, y;
const byte *pixel = /* ... assumed to be pointing at the data as per above */
for(y = 0; y < h; ++y)
{
for(x = 0; x < w; ++x, pixel += 3)
{
const byte red = pixel[0], green = pixel[1], blue = pixel[2];
/* Do something with the current pixel. */
}
}

None of the methods posted so far are likely to work with a camera "raw" file. The file formats for raw files are proprietary to each manufacturer, and may contain exposure data, calibration constants, and white balance information, in addition to the pixel data, which will likely be in a packed format where each pixel can takes up more than one byte, but less than two.
I'm sure there are open-source raw file converter programs out there that you could consult to find out the algorithms to use, but I don't know of any off the top of my head.
Just thought of an additional complication. The raw file does not store RGB values for each pixel. Each pixel records only one color. The other two colors have to be interpolated from heighboring pixels. You'll definitely be better off finding a program or library that works with your camera.

A RAW image is an uncompressed format, so you just have to point where your pixel is (skipping any possible header, and then adding the size of the pixel times the number columns times the number of row plus the number of the colum), and then read whatever binary data is giving a meaningful format to the layout of the data (with masks and shifts, you know).
That's the general procedure, for your current format you'll have to check the details.

Related

Resizing single 1 pixel wide bitmap strip - faster than this example? (for Raycaster algorithm)

I am attaching the picture example and my current code.
My question is: Can I make resizing/streching/interpolating single vertical bitmap strip faster
that using another for-loop.
The current Code looks very optimal:
for current strip size in the screen, iterate from start height to end height. Get corresponding
pixel from texture and add to output buffer. Add step to get another pixel.
here is an essential part of my code:
inline void RC_Raycast_Walls()
{
// casting ray for every width pixel
for (u_int16 rx = 0; rx < RC_render_width_i; ++rx)
{
// ..
// traversing thru map of grid
// finding intersecting point
// calculating height of strip in screen
// ..
// step size for nex pixel in texutr
float32 tex_step_y = RC_texture_size_f / (float32)pp_wall_height;
// starting texture coordinate
float32 tex_y = (float32)(pp_wall_start - RC_player_pitch - player_z_div_wall_distance - RC_render_height_d2_i + pp_wall_height_d2) * tex_step_y;
// drawing walls into buffer <- ENTERING ANOTHER LOOP only for SINGLE STRIP
for (int16 ry = pp_wall_start; ry < pp_wall_end; ++ry)
{
// cast the texture coordinate to integer, and mask with (texHeight - 1) in case of overflow
u_int16 tex_y_safe = (u_int16)tex_y & RC_texture_size_m1_i;
tex_y += tex_step_y;
u_int32 texture_current_pixel = texture_pixels[RC_texture_size_i * tex_y_safe + tex_x];
u_int32 output_pixel_index = rx + ry * RC_render_width_i;
output_buffer[output_pixel_index] =
(((texture_current_pixel >> 16 & 0x0ff) * intensity_value) >> 8) << 16 |
(((texture_current_pixel >> 8 & 0x0ff) * intensity_value) >> 8) << 8 |
(((texture_current_pixel & 0x0ff) * intensity_value) >> 8);
}
}
}
Maybe some bigger stepping like 2 instead of 1, got then every second line empty,
but adding another line of code that could fil that empty space results the same performance..
I would not like to have doubled pixels and interpolating between two of them I think would take even
longer. ??
Thank You in Advance!
ps.
Its based on Lodev Raycaster algorithm:
https://lodev.org/cgtutor/raycasting.html
You do not need floats at all
You can use DDA on integers without multiplication and division. These days floating is not that slow as it used to but your conversion between float and int might be ... See these QAs (both use this kind of DDA:
DDA line with subpixel
DDA based rendering routines
use LUT for applying Intensity
Looks like each color channel c is 8 bit and intensity i is fixed point in range <0,1> so you can precompute every combination into something like this:
u_int8 LUT[256][256]
for (int c=0;c<256;c++)
for (int i=0;i<256;i++)
LUT[c][i]=((c*i)>>8)
use pointers or union to access RGB channels instead of bit operations
My favorite is union:
union color
{
u_int32 dd; // 1x 32bit RGBA
u_int16 dw[2]; // 2x 16bit
u_int8 db[4]; // 4x 8bit (individual channels)
};
texture coordinates
Again looks like you are doing too many operations. for example [RC_texture_size_i * tex_y_safe + tex_x] if your texture size is 128 you can bitshift lef by 7 bits instead of multiplication. Yes on modern CPUs is this not an issue however the whole thing can be replaced by simple LUT. You can remember pointer to each horizontal ScanLine of texture and rewrite to [tex_y_safe][tex_x]
So based on #2,#3 rewrite your color computation to this:
color c;
c.dd=texture_current_pixel;
c.db[0]=LUT[c.db[0]][intensity_value];
c.db[1]=LUT[c.db[1]][intensity_value];
c.db[2]=LUT[c.db[2]][intensity_value];
output_buffer[output_pixel_index]=c.dd;
As you can see its just bunch of memory transfers instead of multiple bit-shifts,bit-masks and bit-or operations. You can also use pointer of color instead of texture_current_pixel and output_buffer[output_pixel_index] to speed up little more.
And finally see this:
Ray Casting with different height size
Which is my version of the raycast using VCL.
Now before changing anything measure the performance you got now by measuring the time it needs to render. Then after each change in the code measure if it actually improve performance or not. In case it didn't use old version of code as predicting what is fast on nowadays platforms is sometimes hard.
Also for resize much better visual results are obtained by using mipmaps ... that usually eliminates the weird noise while moving

How to convert 8 bits Grayscale image to NV12 (limited range) color space using IPP

Video encoders like IntelĀ® Media SDK do not accept 8 bits Grayscale image as input format.
8 bits Grayscale format applies one byte per pixel in range [0, 255].
8 bits YUV format in the context of the question applies YCbCr (BT.601 or BT.709).
Although there is a full range YUV standard, the commonly used format is "limited range" YUV, where range of Y is [16, 235] and range of U,V is [16, 240].
NV12 format is the common input format in this case.
NV12 format is YUV 4:2:0 format ordered in memory with a Y plane first, followed by packed chroma samples in interleaved UV plane:
YYYYYY
YYYYYY
UVUVUV
The Grayscale image will be referred as "I plane":
IIIIII
IIIIII
Setting the UV plane is simple: Set all U,V elements to 128 value.
But what about the Y plane?
In case of full range YUV, we can simply put "I plane" as Y plane (i.e Y = I).
In case of "limited" YUV format, a transformation is required:
Setting R=G=B in the conversion formula results: Y = round(I*0.859 + 16).
What is the efficient way to do the above conversion using IPP?
I am adding an answer to my own question.
I hope to see a better answer...
I found a solution using two IPP functions:
ippsMulC_8u_Sfs - Multiplies each element of a vector by a constant value.
ippsAddC_8u_ISfs - Adds a constant value to each element of a vector.
I selected functions that uses fixed point math, for better performance.
Fixed point implementation of 0.859 scaling is performed by expanding, scaling and shifting. Example: b = (a*scale + (1<<7)) >> 8; [When scale = (0.859)*2^8].
val parameter to ippsMulC_8u_Sfs set to round(0.859*2^8) = 220.
scaleFactor parameter to ippsMulC_8u_Sfs set to 8 (divide the scaled result by 2^8).
Code sample:
void GrayscaleToNV12(const unsigned char I[],
int image_width,
int image_height,
unsigned char J[])
{
IppStatus ipp_status;
const int image_size = image_width*image_height;
unsigned char *UV = &J[image_size]; //In NV12 format, UV plane starts below Y.
const Ipp8u expanded_scaling = (Ipp8u)(0.859 * 256.0 + 0.5);
//J[x] = (expanded_scaling * I[x] + 128u) >> 8u;
ipp_status = ippsMulC_8u_Sfs(I, //const Ipp8u* pSrc,
expanded_scaling, //Ipp8u val,
J, //Ipp8u* pDst,
image_size, //int len,
8); //int scaleFactor);
//Check ipp_status, and handle errors...
//J[x] += 16;
//ippsAddC_8u_ISfs is deprecated, I used it to keep the code simple.
ipp_status = ippsAddC_8u_ISfs(16, //Ipp8u val,
J, //Ipp8u* pSrcDst,
image_size, //int len,
0); //int scaleFactor);
//Check ipp_status, and handle errors...
//2. Fill all UV plane with 128 value - "gray color".
memset(UV, 128, image_width*image_height/2);
}
Out of topic note:
There is a way to mark a video stream as "full range" (where Y range is [0, 255] instead of [16, 235], and U,V range is also [0, 255]).
Using the "full range" standard allows placing I in place of Y (i.e Y = I).
Marking the stream as "full range" using Intel Media SDK, is possible (but not well documented).
Marking H.264 stream as "full range" requires to add pointer to mfxExtBuffer **ExtParam list (in structure mfxVideoParam):
A pointer to structure of type mfxExtVideoSignalInfo should be added with the following values:
typedef struct {
mfxExtBuffer Header; //MFX_EXTBUFF_VIDEO_SIGNAL_INFO and sizeof(mfxExtVideoSignalInfo)
mfxU16 VideoFormat; //Most likely 5 ("Unspecified video format")
mfxU16 VideoFullRange; //1 (video_full_range_flag is equal to 1)
mfxU16 ColourDescriptionPresent; //0 (description_present_flag equal to 0)
mfxU16 ColourPrimaries; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 TransferCharacteristics; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 MatrixCoefficients; //0 (no affect when ColourDescriptionPresent = 0)
} mfxExtVideoSignalInfo;
VideoFullRange = 1 is the only relevant parameter of setting "full range" video, but we must fill the entire structure.

Extract luminance data using ffmpeg libavfilter, specifically PIX_FMT_YUV420P type

This pertains to ffmpeg 0.7 (yes I know it's old, but data access should be similar).
I am writing a libavfilter to extract the luminance data from each frame. In draw_slice() function I have access to AVFilterLink structure which in turn gives me access to AVFilterBufferRef structure that have uint8_t *data[] pointers. With the PIX_FMT_YUV420P type, I think data[0], data[1], data[2] refers to Y U V channels respectively.
My question is, with the pointer to data[0] (luminance plane), how do I interpret the data? The pixfmt.h header file states:
PIX_FMT_YUV420P, ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
does that mean I have to interpret the luminance plane data every 2 bytes? Also, what exactly is the datatype for the values pointed to by the pointer - int, float, etc?
Thanks in advance
Yes data[0] is luminance. It is 8 bits (one byte) per pixel. but you must watch the line stride.
So to look at every pixel in a loop:
uint8_t pixval;
for(int y = 0 ; y < height; ++y )
{
for(int x = 0 ; x < width; ++x )
{
pixval = data[0][x+(y*stride)];
}
}
(obviously, you could optimize this)
The U and V planes are one quarter (half the height and half the width) the resolution of the Y plane. So each byte is 4 pixels (2 wide 2 tall).

Encode rgb to yuv420p using libav

I'm trying to convert an vector of RGB image data (derived from a .png image) to YUV420p format using libav.
In the libav sample code the following is used to create a dummy image:
/* prepare a dummy image */
static void fill_yuv_image(AVFrame *pict, int frame_index, int width, int height)
{
int x, y, i;
i = frame_index;
/* Y */
for(y=0;y<height;y++) {
for(x=0;x<width;x++) {
pict->data[0][y * pict->linesize[0] + x] = x + y + i * 3;
}
}
/* Cb and Cr */
for(y=0;y<height/2;y++) {
for(x=0;x<width/2;x++) {
pict->data[1][y * pict->linesize[1] + x] = 128 + y + i * 2;
pict->data[2][y * pict->linesize[2] + x] = 64 + x + i * 5;
}
}
}
I'm not clear about a few things here:
Firstly, do I need to rearrange the RGB data in the input vector so that it's suitable for encoding as YUV420p?
Secondly, I understand that there's a Y value for every pixel and that the Cb and Cr values are used for four (2x2) pixels. What I don't understand is how the RGB data gets "reduced" to the Cb and Cr values - is there an example of how to do this anywhere?
I'm not entirely sure what you're trying to achieve exactly, so I'll just directly answer your questions as best I can (feel free to follow up with clarifying comments):
1) You will be transforming the RGB data to YUV which will involve some rearrangement. The packed RGB data is fine where it is. You don't really need to adjust it. Actually, it would probably be better to leave it packed the way it is for cache locality reasons.
2) As you already understand, YUV 4:2:0 encodes a Y sample for each pixel but each 2x2 block shares a Cb and a Cr value. However, there is also YUV 4:4:4 data. This is where each pixel gets its own Y, Cb, and Cr sample. A simple strategy for converting RGB -> YUV 4:2:0 is to convert RGB -> YUV 4:4:4 and then average (arithmetic mean) each block of 2x2 Cb samples. There are other algorithms (like filters that involve more of the surrounding samples), but this should work if you're just experimenting with how this stuff works.
Another strategy for experimentation (and speed) is to only compute the Y plane and hold the Cb and Cr planes constant at 128. That will result in a grayscale image.
For real work, you would probably want to leverage the built-in conversion facilities that libav has to offer.

FFT - Applying window on PCM data

I'm currently trying to reproduce the getSpectrum function of the FMOD audio library. This function read the PCM data of the currently playing buffer, apply a window on this data and apply a FFT to get the spectrum.
It returns an array of float where each float is between 0 and 1 dB (10.0f * ( float)log10(val) * 2.0f ).
I'm not sure of what I do is what I should do so I'll explain it :
First, I get the PCM data in a 4096 bytes buffer, according to the documentation, PCM data is composed of samples which are a left-right pair of data.
In my case I'm working with 16bit samples like in the image above. So, if I want to work only with the left channel, I save the left PCM data in a short array doing :
short *data = malloc(4096);
FMOD_Sound_ReadData(sound, (void *)data, 4096, &read);
So if a sample = 4 bytes, I have 1024 samples i.e 1024 shorts representing the left channel and 1024 shorts representing the right channel.
In order to perform the FFT, I need to have a float array and apply a window (Hanning) on my data:
float hanningWindow(short in, size_t i, size_t s)
{
return in*0.5f*(1.0f-cos(2.0f*M_PI*(float)(i)/(float)(s-1.0f)));
}
whew in is the input, i is the position in the array and s the size of the array (1024).
To get only the left channel :
float *input = malloc(1024*sizeof(float));
for (i = 0; i < 1024; i++)
input[i] = hanningWindow(data[i*2], i, 1024);
Then I perform the FFT thanks to kiss_fft (from real to complex). I get a kiss_fft_cpx *ouput (array of complex) of size 1024/2+1 = 513.
I calculate the amplitude of each frequency with :
kiss_fft_cpx c = output[i];
float amp = sqrt(c.r*c.r + c.i*c.i);
calculate in dB :
amp = 10.0f * (float)log10(amp) * 2.0f;
amp is not between 0 and 1. I don't know where I have to normalize my data (on the PCM data or at the end). Also I'm not sure of the way I am applying my window on the PCM data.
Here is the result I get from a 0 to 20kHz song compared to the result of the getSpectrum function. (for a rectangular window)
My Result getSpectrum Result
How can I achieve the same result?
You're a little confused about log (dB) scales - you don't get a range of 0 - 1 dB, you get a range of typically 96 dB for 16 bit audio, where the upper and lower end are somewhat arbitrary, e.g. 0 to -96 dB, or 96 dB to 0 dB, or any other range you like, depending on various factors. You probably just need to shift and scale your spectrogram plotting by a suitable offset and factor to account for this.
(Note: the range of 96 dB comes from the formula 20 * log10(2^16), where 16 is the number of bits.)

Resources