In OpenGL ES 2.0 after reading the framebuffer and converting it to RGBA frames. I want to convert it to YUV format.
I tried using this table. Which ignore alpha component. When i do that and see the YUV frame generated its distorted.
Can anyone help me please
yuvdata[i * j *1]= (0.257)*memory[i*j*1] + (0.504)*memory[i*j*2]+(0.098)*memory[i*j*3]+16;
yuvdata[i * j *3]= (0.439)*memory[i*j*1] - (0.368)*memory[i*j*2] -(0.071)*memory[i*j*3]+128;
yuvdata[i * j *2]= -(0.148)*memory[i*j*1] - (0.291)*memory[i*j*2] +(0.439)*memory[i*j*3]+128;
`
Normalization did not help
memory store rgba
and yuv is free space to store yuv
i used rgb to yuv conversion ignoring alpha component
Friends this problem is resolved. This article is awesomeyuv2rgb. Thanks to Viktor Latypov and Mārtiņš Možeiko
"YUV" is not a complete format.
From this wikipedia article you can get the conversions of YUV411,YUV422,YUV420p to YUV444. Combine the inverse of these transforms with your RGB conversion and you'll get the result.
The thing you are missing: one RGB triple may produce a number (not one) of YUV components this way.
YUV444 3 bytes per pixel
YUV422 4 bytes per 2 pixels
YUV411 6 bytes per 4 pixels
YUV420p 6 bytes per 4 pixels, reordered
First, YUV422
Y'UV422 to RGB888 conversion
Input: Read 4 bytes of Y'UV (u, y1, v, y2 )
Output: Writes 6 bytes of RGB (R, G, B, R, G, B)
Then YUV4111
Y'UV411 to RGB888 conversion
Input: Read 6 bytes of Y'UV
Output: Writes 12 bytes of RGB
// Extract YUV components
u = yuv[0];
y1 = yuv[1];
y2 = yuv[2];
v = yuv[3];
y3 = yuv[4];
y4 = yuv[5];
rgb1 = Y'UV444toRGB888(y1, u, v);
rgb2 = Y'UV444toRGB888(y2, u, v);
rgb3 = Y'UV444toRGB888(y3, u, v);
rgb4 = Y'UV444toRGB888(y4, u, v);
Similar with 420p, but the YUV values are distributed over the rectangle there - see the Wikipedia's diagram and image for that.
Basically, you should fetch 4 RGB pixels, convert each one of them to YUV (using your hopefully valid 444 converter) and then store the YUV[4] array in a tricky way shown at the wikipedia.
Are you sure you are accessing memory array correctly right? Shouldn't it be something like this:
R=memory[(i*width + j)*4+0]
G=memory[(i*width + j)*4+1]
B=memory[(i*width + j)*4+3]
or swap 0 with 3, if you have BGRA.
Related
I've been reverse engineering a program and recently came across a function that is intended to create a sort of translucent-looking color to be used for text selections. It does this by converting RGB to YUV, alters the Y (luma?) component, then converts back to RGB.
uint32_t CalcSelectionColor(uint32_t bgr)
{
double r,g,b;
double y,u,v;
r = (bgr >> 0) & 0xFF;
g = (bgr >> 8) & 0xFF;
b = (bgr >> 16) & 0xFF;
/* RGB to YUV */
y = 0.299*r + 0.587*g + 0.114*b;
u = (b-y) * 0.565 * 0.5;
v = (r-y) * 0.713 * 0.5;
/* lower brightness? */
y = 255.0 - y;
/* YUV to RGB */
r = y + 1.403*v;
g = y - 0.344*u - 0.714*v;
b = y + 1.77*u;
return ((uint8_t)(b) << 16) | ((uint8_t)(g) << 8) | ((uint8_t)(r));
}
As someone with very limited knowledge of computer graphics, I'd just like a bit more detail of what it does between the conversions, and the actually intended effect in a broader sense. Is this a common approach of adjusting brightness of a color or something? If I pass in 0x00FF00, the result I get is 0x1E9D1E
The formulas used in this code are similar to Julien transformation from RGB to YUV and back:
Transformation from RGB to YUV:
Y = 0.299R + 0.587G + 0.114B
U'= (B-Y)*0.565
V'= (R-Y)*0.713
Transformation from YUV to RGB:
R = Y + 1.403V'
G = Y - 0.344U' - 0.714V'
B = Y + 1.770U'
However, the formulas in your code are a bit different. While the back transformation is the same, the forward transform has an additional multiplier 0.5 for both U and V components. There is also a trivial manipulation with the brightness component
y = 255.0 - y
which simply inverses the brightness. So, what happens here?
If you use normal Julien RGB->YUV transform, you get a representation for your color as a combination of brightness Y and two color tone components U and V, which define the color as shown on this picture:
However, in your code you also multiply both U and V components by 0.5. This means, that on this UV plane you move from any given color two times closer to the point of origin (0, 0). For example, if the initial color was A with UV coordinates (-0.4, 0.3), then you'll get a new color B with UV coordinates (-0.2, 0.15). Similarly, the color C (0.2, -0.3) becomes color D (0.1, -0.15):
After that you inverse the brightness of the color, making dark colors bright and bright colors dark. This is the effect of your code.
It's not terribly common, but it's a very good approach. Commonly used models like HSL/HSV don't represent intensity correctly and have some weird piecewise-linear stuff with hue/color going on. YUV is a really good colorspace, representing intensity along one axis and chroma (hue/color) in a perpendicular plane.
Normally modifying Y without also adjusting (at least clamping) U and V is somewhat dubious, because near the extremes (Y=0 black, Y=full white) U and V have limited range (no range at all at the endpoints). Otherwise applying them will take you outside of the RGB cube and result in bogus clipped results when you go back to RGB. But here the trick is very clever. The code is inverting Y while keeping chroma fixed, so the incoming range limits on U and V near black will automatically ensure they're roughly correct in the output, and vice versa.
As Alex noted, the code here is also halving the chroma values, reducing color saturation. This was probably to avoid the above mentioned clipping issue, but it's not needed. But maybe it's part of the intended visual effect too.
So, TL;DR: the effect is inverting intensity/luma and halving saturation.
Video encoders like Intel® Media SDK do not accept 8 bits Grayscale image as input format.
8 bits Grayscale format applies one byte per pixel in range [0, 255].
8 bits YUV format in the context of the question applies YCbCr (BT.601 or BT.709).
Although there is a full range YUV standard, the commonly used format is "limited range" YUV, where range of Y is [16, 235] and range of U,V is [16, 240].
NV12 format is the common input format in this case.
NV12 format is YUV 4:2:0 format ordered in memory with a Y plane first, followed by packed chroma samples in interleaved UV plane:
YYYYYY
YYYYYY
UVUVUV
The Grayscale image will be referred as "I plane":
IIIIII
IIIIII
Setting the UV plane is simple: Set all U,V elements to 128 value.
But what about the Y plane?
In case of full range YUV, we can simply put "I plane" as Y plane (i.e Y = I).
In case of "limited" YUV format, a transformation is required:
Setting R=G=B in the conversion formula results: Y = round(I*0.859 + 16).
What is the efficient way to do the above conversion using IPP?
I am adding an answer to my own question.
I hope to see a better answer...
I found a solution using two IPP functions:
ippsMulC_8u_Sfs - Multiplies each element of a vector by a constant value.
ippsAddC_8u_ISfs - Adds a constant value to each element of a vector.
I selected functions that uses fixed point math, for better performance.
Fixed point implementation of 0.859 scaling is performed by expanding, scaling and shifting. Example: b = (a*scale + (1<<7)) >> 8; [When scale = (0.859)*2^8].
val parameter to ippsMulC_8u_Sfs set to round(0.859*2^8) = 220.
scaleFactor parameter to ippsMulC_8u_Sfs set to 8 (divide the scaled result by 2^8).
Code sample:
void GrayscaleToNV12(const unsigned char I[],
int image_width,
int image_height,
unsigned char J[])
{
IppStatus ipp_status;
const int image_size = image_width*image_height;
unsigned char *UV = &J[image_size]; //In NV12 format, UV plane starts below Y.
const Ipp8u expanded_scaling = (Ipp8u)(0.859 * 256.0 + 0.5);
//J[x] = (expanded_scaling * I[x] + 128u) >> 8u;
ipp_status = ippsMulC_8u_Sfs(I, //const Ipp8u* pSrc,
expanded_scaling, //Ipp8u val,
J, //Ipp8u* pDst,
image_size, //int len,
8); //int scaleFactor);
//Check ipp_status, and handle errors...
//J[x] += 16;
//ippsAddC_8u_ISfs is deprecated, I used it to keep the code simple.
ipp_status = ippsAddC_8u_ISfs(16, //Ipp8u val,
J, //Ipp8u* pSrcDst,
image_size, //int len,
0); //int scaleFactor);
//Check ipp_status, and handle errors...
//2. Fill all UV plane with 128 value - "gray color".
memset(UV, 128, image_width*image_height/2);
}
Out of topic note:
There is a way to mark a video stream as "full range" (where Y range is [0, 255] instead of [16, 235], and U,V range is also [0, 255]).
Using the "full range" standard allows placing I in place of Y (i.e Y = I).
Marking the stream as "full range" using Intel Media SDK, is possible (but not well documented).
Marking H.264 stream as "full range" requires to add pointer to mfxExtBuffer **ExtParam list (in structure mfxVideoParam):
A pointer to structure of type mfxExtVideoSignalInfo should be added with the following values:
typedef struct {
mfxExtBuffer Header; //MFX_EXTBUFF_VIDEO_SIGNAL_INFO and sizeof(mfxExtVideoSignalInfo)
mfxU16 VideoFormat; //Most likely 5 ("Unspecified video format")
mfxU16 VideoFullRange; //1 (video_full_range_flag is equal to 1)
mfxU16 ColourDescriptionPresent; //0 (description_present_flag equal to 0)
mfxU16 ColourPrimaries; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 TransferCharacteristics; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 MatrixCoefficients; //0 (no affect when ColourDescriptionPresent = 0)
} mfxExtVideoSignalInfo;
VideoFullRange = 1 is the only relevant parameter of setting "full range" video, but we must fill the entire structure.
This pertains to ffmpeg 0.7 (yes I know it's old, but data access should be similar).
I am writing a libavfilter to extract the luminance data from each frame. In draw_slice() function I have access to AVFilterLink structure which in turn gives me access to AVFilterBufferRef structure that have uint8_t *data[] pointers. With the PIX_FMT_YUV420P type, I think data[0], data[1], data[2] refers to Y U V channels respectively.
My question is, with the pointer to data[0] (luminance plane), how do I interpret the data? The pixfmt.h header file states:
PIX_FMT_YUV420P, ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
does that mean I have to interpret the luminance plane data every 2 bytes? Also, what exactly is the datatype for the values pointed to by the pointer - int, float, etc?
Thanks in advance
Yes data[0] is luminance. It is 8 bits (one byte) per pixel. but you must watch the line stride.
So to look at every pixel in a loop:
uint8_t pixval;
for(int y = 0 ; y < height; ++y )
{
for(int x = 0 ; x < width; ++x )
{
pixval = data[0][x+(y*stride)];
}
}
(obviously, you could optimize this)
The U and V planes are one quarter (half the height and half the width) the resolution of the Y plane. So each byte is 4 pixels (2 wide 2 tall).
I'm trying to convert an vector of RGB image data (derived from a .png image) to YUV420p format using libav.
In the libav sample code the following is used to create a dummy image:
/* prepare a dummy image */
static void fill_yuv_image(AVFrame *pict, int frame_index, int width, int height)
{
int x, y, i;
i = frame_index;
/* Y */
for(y=0;y<height;y++) {
for(x=0;x<width;x++) {
pict->data[0][y * pict->linesize[0] + x] = x + y + i * 3;
}
}
/* Cb and Cr */
for(y=0;y<height/2;y++) {
for(x=0;x<width/2;x++) {
pict->data[1][y * pict->linesize[1] + x] = 128 + y + i * 2;
pict->data[2][y * pict->linesize[2] + x] = 64 + x + i * 5;
}
}
}
I'm not clear about a few things here:
Firstly, do I need to rearrange the RGB data in the input vector so that it's suitable for encoding as YUV420p?
Secondly, I understand that there's a Y value for every pixel and that the Cb and Cr values are used for four (2x2) pixels. What I don't understand is how the RGB data gets "reduced" to the Cb and Cr values - is there an example of how to do this anywhere?
I'm not entirely sure what you're trying to achieve exactly, so I'll just directly answer your questions as best I can (feel free to follow up with clarifying comments):
1) You will be transforming the RGB data to YUV which will involve some rearrangement. The packed RGB data is fine where it is. You don't really need to adjust it. Actually, it would probably be better to leave it packed the way it is for cache locality reasons.
2) As you already understand, YUV 4:2:0 encodes a Y sample for each pixel but each 2x2 block shares a Cb and a Cr value. However, there is also YUV 4:4:4 data. This is where each pixel gets its own Y, Cb, and Cr sample. A simple strategy for converting RGB -> YUV 4:2:0 is to convert RGB -> YUV 4:4:4 and then average (arithmetic mean) each block of 2x2 Cb samples. There are other algorithms (like filters that involve more of the surrounding samples), but this should work if you're just experimenting with how this stuff works.
Another strategy for experimentation (and speed) is to only compute the Y plane and hold the Cb and Cr planes constant at 128. That will result in a grayscale image.
For real work, you would probably want to leverage the built-in conversion facilities that libav has to offer.
I want to read the RGB values for each pixel from a raw image. Can someone tell me how to achieve this? Thanks for help!
the format of my raw image is .CR2 which come from camera.
Assuming the image is w * h pixels, and stored in true "packed" RGB format with no alpha component, each pixel will require three bytes.
In memory, the first line of the image might be represented in awesome ASCII graphics like this:
R0 G0 B0 R1 G1 B1 R2 G2 B2 ... R(w-1) G(w-1) B(w-1)
Here, each Rn Gn and Bn represents a single byte, giving the red, green or blue component of pixel n of that scanline. Note that the order of the bytes might be different for different "raw" formats; there's no agreed-upon world standard. Different environments (graphics cards, cameras, ...) do it differently for whatever reason, you simply have to know the layout.
Reading out a pixel can then be done by this function:
typedef unsigned char byte;
void get_pixel(const byte *image, unsigned int w,
unsigned int x,
unsigned int y,
byte *red, byte *green, byte *blue)
{
/* Compute pointer to first (red) byte of the desired pixel. */
const byte * pixel = image + w * y * 3 + 3 * x;
/* Copy R, G and B to outputs. */
*red = pixel[0];
*green = pixel[1];
*blue = pixel[2];
}
Notice how the height of the image is not needed for this to work, and how the function is free from bounds-checking. A production-quality function might be more armor-plated.
Update If you're worried this approach will be too slow, you can of course just loop over the pixels, instead:
unsigned int x, y;
const byte *pixel = /* ... assumed to be pointing at the data as per above */
for(y = 0; y < h; ++y)
{
for(x = 0; x < w; ++x, pixel += 3)
{
const byte red = pixel[0], green = pixel[1], blue = pixel[2];
/* Do something with the current pixel. */
}
}
None of the methods posted so far are likely to work with a camera "raw" file. The file formats for raw files are proprietary to each manufacturer, and may contain exposure data, calibration constants, and white balance information, in addition to the pixel data, which will likely be in a packed format where each pixel can takes up more than one byte, but less than two.
I'm sure there are open-source raw file converter programs out there that you could consult to find out the algorithms to use, but I don't know of any off the top of my head.
Just thought of an additional complication. The raw file does not store RGB values for each pixel. Each pixel records only one color. The other two colors have to be interpolated from heighboring pixels. You'll definitely be better off finding a program or library that works with your camera.
A RAW image is an uncompressed format, so you just have to point where your pixel is (skipping any possible header, and then adding the size of the pixel times the number columns times the number of row plus the number of the colum), and then read whatever binary data is giving a meaningful format to the layout of the data (with masks and shifts, you know).
That's the general procedure, for your current format you'll have to check the details.