I am trying to use OpenGL to capture the back buffer of my computer's screen, and then H.264 encode the buffer using FFMPEG's libavcodec library. The issue I'm having is that I would like to encode the video in AV_PIX_FMT_420P, but the back buffer capture function provided by OpenGL, glReadPixels(), only supports formats like GL_RGB. As you can see below, I try to use FFMPEG's swscale() function to convert from RGB to YUV, but the following code crashes at the swscale() line. Any ideas on how I can encode the OpenGL backbuffer?
int width = 1280, height = 720;
BYTE* pixels = (BYTE *) malloc(sizeof(BYTE));
glReadPixels(0, 720, width, height, GL_RGB, GL_UNSIGNED_BYTE, pixels);
AVCodec *codec;
AVCodecContext *context;
struct SwsContext *sws;
AVPacket packet;
AVFrame *frame;
codec = avcodec_find_encoder(AV_CODEC_ID_H264);
context = avcodec_alloc_context3(encoder->codec);
context->dct_algo = FF_DCT_FASTINT;
context->bit_rate = 400000;
context->width = width;
context->height = height;
context->time_base.num = 1;
context->time_base.den = 30;
context->gop_size = 1;
context->max_b_frames = 1;
context->pix_fmt = AV_PIX_FMT_YUV420P;
avcodec_open2(context, codec, NULL);
int frame_size = avpicture_get_size(AV_PIX_FMT_YUV420P, out_width, out_height);
encoder->frame_buffer = malloc(frame_size);
avpicture_fill((AVPicture *) encoder->frame, (uint8_t *) encoder->frame_buffer, AV_PIX_FMT_YUV420P, out_width, out_height);
sws = sws_getContext(in_width, in_height, AV_PIX_FMT_RGB32, out_width, out_height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, 0, 0, 0);
uint8_t *in_data[1] = {(uint8_t *) pixels};
int in_linesize[1] = {width * 4};
sws_scale(encoder->sws, in_data, in_linesize, 0, encoder->in_height, encoder->frame->data, encoder->frame->linesize);
int success;
avcodec_encode_video2(context, &packet, frame, &success);

Your pixels buffer is too small; you malloc only one BYTE instead of width*height*4 bytes:
BYTE* pixels = (BYTE *) malloc(width*height*4);
Your glReadPixels call is also incorrect:
Passing y=720 causes it to read outside the window. Remember that OpenGL coordinate system has the y-axis pointing upwards.
AV_PIX_FMT_RGB32 expects four bytes per pixel, whereas GL_RGB writes three bytes per pixel, therefore you need GL_RGBA or GL_BGRA.
Of the two I'm pretty sure that it should be GL_BGRA: AV_PIX_FMT_RGB32 treats pixels as 32-bit integers, therefore on little-endian blue comes first. OpenGL treats each channel as a byte, therefore it should be GL_BGRA to match.
To summarize try:
glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, pixels);
Additionally, due to OpenGL y-axis pointing upwards but ffmpeg y-axis pointing downwards you may need to flip the image. It can be done with the following trick:
uint8_t *in_data[1] = {(uint8_t *) pixels + (height-1)*width*4}; // address of the last line
int in_linesize[1] = {- width * 4}; // negative stride


SDL Texture created from memory is rendering only in black and white

I am trying to port a game from DDraw to SDL2.
The original program loads the images and blits them to a backbuffer then flips it to a primary one.
I am thinking that I could technically shortcut part of the process and just grab the backbuffer in memory and then turn it into a texture and blit that to the screen. This kind of works already the only problem is that the screen is black and white.
here is some code. The variable that is holding the backbuffer is the destmemarea
SDL_Log("Unable to initialize SDL: %s", SDL_GetError());
SDL_Window* window = NULL;
SDL_Texture *bitmapTex = NULL;
SDL_Surface *bitmapSurface = NULL;
SDL_Surface *MySurface = NULL;
SDL_DisplayMode DM;
SDL_GetCurrentDisplayMode(0, &DM);
auto Width = DM.w;
auto Height = DM.h;
window = SDL_CreateWindow("SDL Tutorial ", Width = DM.w - SCREEN_WIDTH, 32, SCREEN_WIDTH *4, SCREEN_HEIGHT, SDL_WINDOW_SHOWN);
if (window == NULL)
printf("Window could not be created! SDL_Error: %s\n", SDL_GetError());
SDL_Renderer * renderer = SDL_CreateRenderer(window, -1, 0);
int w, h;
SDL_GetRendererOutputSize(renderer, &w, &h);
SDL_Surface * image = SDL_CreateRGBSurfaceFrom( destmemarea, 640, 0, 32, 640, 0, 0, 0,0);
SDL_Texture * texture = SDL_CreateTextureFromSurface(renderer, image);
SDL_RenderCopy(renderer, texture, NULL, NULL);
Not sure if this helps but this is what is being used for DDRAW fort he looks...
dd.dwWidth = 768;
dd.lPitch = 768;
dd.dwSize = 108;
dd.dwHeight = 656;
dd.ddpfPixelFormat.dwSize = 32;
So, I'm not 100% sure I understand what you are trying to do, but I have a few assumptions.
You said that you're porting your codebase from DDraw, so I assume that the backbuffer you are mentioning is an internal backbuffer that you are allocating, and in the rest of your application are doing your rendering to it.
If I am correct in this assumption, than your current approach is what you need to do, but need to specify correct parameters to SDL_CreateRGBSurfaceFrom
width and height are... width and height in pixels
depth is the amount of bits in a single pixel. This depends on the rest of your rendering code that writes to your memory buffer. If we assume that you're doing a standard RGBA, where each channel is 8 bits, it would be 32.
pitch is the size in bytes for a single row in your surface - should be equal to width * (depth / 8).
the 4 masks, Rmask, Gmask, Bmask, and Amask describe how each of your depth sized pixels distributes channels. Again, depends on how you render to your memory buffer, and the endianness of your target platform. From the documentation, 2 possible standard layouts:
rmask = 0xff000000;
gmask = 0x00ff0000;
bmask = 0x0000ff00;
amask = 0x000000ff;
rmask = 0x000000ff;
gmask = 0x0000ff00;
bmask = 0x00ff0000;
amask = 0xff000000;
Be sure not to forget to free your surface by calling SDL_FreeSurface()
With all that said... I think you are approaching your problem from the wrong angle.
As I stated in my comment, SDL handles double buffering for you. Instead of having custom code that renders to a buffer in memory, and then trying to create a surface from that memory and rendering it to SDLs backbuffer, and calling present... you should skip the middle man and draw directly to SDLs back buffer.
This is done through the various SDL render functions, of which RenderCopy is a member.
Your render loop should basically do 3 things:
Call SDL_RenderClear()
Loop over every object that you want to present to the screen, and use one of the SDL render functions - in the most common case of an image, that would be SDL_RenderCopy. This would mean, throughout your codebase, load your images, create SDL_Surface and SDL_Texture for them, keep those, and on every frame call SDL_RenderCopy or SDL_RenderCopyEx
Finally, you call SDL_RenderPresent exactly once per frame. This will swap the buffers, and present your image to screen.

convert AVPicture to array<unsigned char>

I use ffmpeg to extract frame of video in c++. I want to get array<unsigned char> of frame in c++ but I get AVFrame from this line code.
avcodec_decode_video2(codecContext, DecodedFrame, &gotPicture, Packet);
So I use sws_scale to convert AVFrame to AVPicture but also I cannot got array<unsigned char> from frame.
sws_scale(convertContext, DecodedFrame->data, DecodedFrame->linesize, 0, (codecContext)->height, convertedFrame->data, convertedFrame->linesize);
So can anyone help me to convert AVFrame or AVPicture to array<unsigned char>?
AVPicture is deprecated. Converting to it is meaningless since AVFrame is its alternative.
If I understand this question correctly, you're trying to get the raw picture pixels value to a std::array. If such so, just dump the data fields of AVFrame into it.
avcodec_decode_video2(codecContext, DecodedFrame, &gotPicture, Packet);
// If you need rgb, create a swscontext to convert from video pixel format
sws_ctx = sws_getContext(DecodedFrame->width, DecodedFrame->height, codecContext->pix_fmt, DecodedFrame->width, DecodedFrame->height, AV_PIX_FMT_RGB24, 0, 0, 0, 0);
uint8_t* rgb_data[4]; int rgb_linesize[4];
av_image_alloc(rgb_data, rgb_linesize, DecodedFrame->width, DecodedFrame->height, AV_PIX_FMT_RGB24, 32);
sws_scale(sws_ctx, DecodedFrame->data, DecodedFrame->linesize, 0, DecodedFrame->height, rgb_data, rgb_linesize);
// RGB24 is a packed format. It means there is only one plane and all data in it.
size_t rgb_size = DecodedFrame->width * DecodedFrame->height * 3;
std::array<uint8_t, rgb_size> rgb_arr;
std::copy_n(rgb_data[0], rgb_size, rgb_arr);

OpenCV using cvImageCreate() with grayscale image fails, and resizing usually fails

I have such code that is loading grayscale image from buffer 1byte, 8bits bitmap. Then it resizes this image.
int resizeBitmap(const unsigned char *inData, const size_t inDataLength, const size_t inWidth, const size_t inHeight,
const int bitDepth, const int noOfChannels, unsigned char **outData, size_t *outDataLength, const size_t outWidth, const size_t outHeight) {
// create input image
IplImage *inImage = cvCreateImage(cvSize(inWidth, inHeight), bitDepth, noOfChannels);
cvSetData(inImage, inData, inImage->widthStep);
// show input image
cvNamedWindow("OpenCV Input Image", CV_WINDOW_FREERATIO);
cvShowImage("OpenCV Input Image", inImage);
cvDestroyWindow("OpenCV Input Image");
/* */
// create output image
IplImage *outImage = cvCreateImage(cvSize(outWidth, outHeight), inImage->depth, inImage->nChannels);
// select interpolation type
double scaleFactor = (((double) outWidth)/inWidth + ((double) outHeight)/inHeight)/2;
int interpolation = (scaleFactor > 1.0) ? CV_INTER_LINEAR : CV_INTER_AREA;
// resize from input image to output image
cvResize(inImage, outImage, interpolation);
/* // show output image
cvNamedWindow("OpenCV Output Image", CV_WINDOW_FREERATIO);
cvShowImage("OpenCV Output Image", outImage);
cvDestroyWindow("OpenCV Output Image");
// get raw data from output image
int step = 0;
CvSize size;
cvGetRawData(outImage, outData, &step, &size);
*outDataLength = step*size.height;
return 0;
I am using here bitDepth = 8 and noOfChannels = 1.
Loaded image is:
and the output is:
this output is not always written as program usually fails with error:
OpenCV Error: Bad number of channels (Source image must have 1, 3 or 4 channels) in cvConvertImage, file /tmp/opencv-20160915-26910-go28a5/opencv-2.4.13/modules/highgui/src/utils.cpp, line 611
libc++abi.dylib: terminating with uncaught exception of type cv::Exception: /tmp/opencv-20160915-26910-go28a5/opencv-2.4.13/modules/highgui/src/utils.cpp:611: error: (-15) Source image must have 1, 3 or 4 channels in function cvConvertImage
I am attaching debugger output as there is interesting situation as I am passing grayscale buffer of size 528480 which equals 1 byte * 1101 *480, but after cvCreateImage there is inside imageSize 529920 and widthStep is 1104! Maybe here is the problem with this image, but why it is ?
This issue is related to widthstep and width of IplImage. Opencv pads the image to have a widthstep of multiple of 4 bytes. Here opencv is using width of 1101 and widthstep of 1104. But data when written from bitmap to IplImage, few extra pixels get written per row(note the diagonal line from top-left to bottom-right).
Note, that the image is not tilted. It's just that every next row is shifted a little to left(by 3 pixels), thus giving the idea of shearing transformation.
It could also be possible that you are giving a smaller width than what Bitmap holds.
See docs here and search for padding. You can try copying all column data row-wise.
Why crash: Sometimes opencv will end up reading beyond Bitmap buffer and may hit untouchable memory addresses, causing exception.
Note: Bitmap probably also has padding from which you received the black diagonal line.
Based on answer saurabheights I have wrote procedure to make padding of each bitmap row to any given multiplicity of bytes in the row.
int padBitmap(const unsigned char *data, const size_t dataLength, const size_t width, const size_t height,
const int bitDepth, const int noOfChannels, unsigned char **paddedData, size_t *paddedDataLength, const size_t row_multiple) {
size_t row_length = (width*noOfChannels*bitDepth)/CHAR_BIT;
size_t row_padding_size = row_multiple - row_length % row_multiple;
if(row_padding_size == 0) return 0;
size_t new_row_length = row_length + row_padding_size;
size_t newDataLength = height * new_row_length;
unsigned char *newData = malloc(sizeof(unsigned char) *newDataLength);
unsigned char padding[3] = {0, 0, 0};
for(int i=0; i<height; i++) {
memcpy(newData + i*new_row_length, data + i*row_length, row_length);
memcpy(newData + i*new_row_length + row_length, padding, row_padding_size);
*paddedData = newData;
*paddedDataLength = newDataLength;
return row_padding_size;
Now before passing bitmap to resizeBitmap(), I am doing this padding:
unsigned char *paddedData = 0;
size_t paddedDataLength = 0;
int padding = padBitmap(gData, gDataLength, width, height, PNG_BIT_DEPTH_8, GRAYSCALE_COMPONENTS_PER_PIXEL, &paddedData, &paddedDataLength, 4);
width += padding;
And I am using as bitmap paddedData. It seems to work correctly

Correctly Allocate And Fill Frame In FFmpeg

I am filling a Frame with a BGR image for encoding, and I am getting a memory leak. I think I got to the source of the problem but it appears to be a library issue instead. Since FFmpeg is such a mature library, I think I am misusing it and I would like to be instructed on how to do it correctly.
I am allocating a Frame using:
AVFrame *bgrFrame = av_frame_alloc();
And later I allocate the image in the Frame using:
av_image_alloc(bgrFrame->data, bgrFrame->linesize, bgrFrame->width, bgrFrame->height, AV_PIX_FMT_BGR24, 32);
Then I fill the image allocated using:
av_image_fill_pointers(bgrFrame->data, AV_PIX_FMT_BGR24, bgrFrame->height, originalBGRImage.data, bgrFrame->linesize);
Where originalBGRImage is an OpenCV Mat.
And this has a memory leak, apparently, av_image_alloc() allocates memory, and av_image_fill_pointers() also allocates memory, on the same pointers (I can see bgrFrame->data[0] changing between calls).
If I call
After av_image_alloc(), it's fine, but if I call it after av_image_fill_pointers(), the program crashes, even though bgrFrame->data[0] is not NULL, which I find very curious.
Looking FFmpeg's av_image_alloc() source code, I see it calls av_image_fill_pointers() twice inside it, once allocating a buffer buff....and later in av_image_fill_pointers() source code, data[0] is substituted by the image pointer, which is (I think) the source of the memory leak, since data[0] was holding buf from the previous av_image_alloc() call.
So this brings the final question: What's the correct way of filling a frame with an image?.
You should allocate your frame once.
AVFrame* alloc_picture(enum PixelFormat pix_fmt, int width, int height)
AVFrame *f = avcodec_alloc_frame();
if (!f)
return NULL;
int size = avpicture_get_size(pix_fmt, width, height);
uint8_t *buffer = (uint8_t *) av_malloc(size);
if (!buffer) {
return NULL;
avpicture_fill((AVPicture *)f, buffer, pix_fmt, width, height);
return f;
Yes, the cast (AVPicture*) is allowed https://stackoverflow.com/a/20498359/2079934 .
In subsequent frames, you can write into the this frame. Since your OpenCV raw data is BGR and you need RGB or YUV, you can use sws_scale. In my application, I mirror vertically:
struct SwsContext* convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, c->width, c->height, c->pix_fmt, SWS_FAST_BILINEAR, NULL, NULL, NULL);
avpicture_fill(&pic_raw, (uint8_t*)pixelBuffer, PIX_FMT_RGB24, width, height);
// flip
pic_raw.data[0] += (height - 1) * pic_raw.linesize[0];
pic_raw.linesize[0] *= -1;
sws_scale(convertCtx, pic_raw.data, pic_raw.linesize, 0, height, f->data, f->linesize);
out_size = avcodec_encode_video(c, outputBuffer, outputBufferSize, f);
(You can adapt PIX_FMT_RGB24 to your needs and read from cv::Mat without copying data.)
av_fill_arrays() does the job. It will fill the frame's data[] and linesizes[] but not reallocating any memory.
Too late for answer. But after take many hours, i want to share.
In document
* AVBuffer references backing the data for this frame. All the pointers in
* data and extended_data must point inside one of the buffers in buf or
* extended_buf. This array must be filled contiguously -- if buf[i] is
* non-NULL then buf[j] must also be non-NULL for all j < i.
* There may be at most one AVBuffer per data plane, so for video this array
* always contains all the references. For planar audio with more than
* AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in
* this array. Then the extra AVBufferRef pointers are stored in the
* extended_buf array.
Then buf is "smart pointer" for data (extended_buf for extended_data)
for example, i using image one linesize only
int size = av_image_get_buffer_size(AVPixelFormat::AV_PIX_FMT_BGRA, width, height, 1);
AVBufferRef* dataref = av_buffer_alloc(size);//that for av_frame_unref
memcpy(dataref->data, your_buffer, size );
AVFrame* frame = av_frame_alloc();
av_image_fill_arrays(frame->data, frame->linesize, dataref->data, AVPixelFormat::AV_PIX_FMT_BGRA, source->width, source->height, 1
frame->buf[0] = dataref;
av_frame_unref will unref frame->buf and free pointer if ref count to zero

Advice on an algorithm for rendering a bitmap pixel by pixel

I've been working on a bitmap loader, with the main goal to do nothing more than parse the data properly and render it in OpenGL. I'm at the point where I need to draw the pixels on an x/y (i.e., pixel by pixel) basis (at least, this is what I think I need to do as far as rendering is concerned). I've already bound the texture object and called glTexImage2D(...).
Currently, what I'm having trouble with is the pixel by pixel algorithm.
As far as I understand it, bitmap (aka DIB) files store color data in what is known as the pixel array. Each row of pixels consists of x amount of bytes, with each pixel holding a byte count divisible either by 4 ( 32 bits per pixel ), 3 ( 24 bits per pixel ), 2 ( 16 bits per pixel ), or 1 ( 8 bits per pixel ).
I think need to loop through the pixels while at the same time calculating the right offset within the pixel array, which is relative to its pixel x/y coordinate. Is this true, though? If not, what should I do? I'm honestly slightly confused as to whether or not, despite doing what was directed to me in this question I asked sometime ago, this approach is correct.
I assume that going about it on a pixel by pixel basis is the right approach, mainly because
rendering a quad with glVertex* and glTexCoord* produced nothing more than a grayed out rectangle (at the time I thought the OpenGL would handle this by itself, hence why attempting that in the first place).
I should also note that, while my question displays OpenGL 3.1 shaders, I moved to SDL 1.2
so I could just use immediate mode for the time being until I got the right algorithms implemented, and then switch back to modern GL.
The test image I'm parsing:
It's data output (pastebinned due to its very long length):
And The Code:
void R_RenderTexture_PixByPix( texture_bmp_t* const data, const vec3 center )
glBindTexture( GL_TEXTURE_2D, data->texbuf_id );
glBegin( GL_POINTS );
const unsigned width = data->img_data->width + ( unsigned int ) center[ VEC_X ];
const unsigned height = data->img_data->height + ( unsigned int ) center[ VEC_Y ];
const unsigned bytecount = GetByteCount( data->img_data->bpp );
const unsigned char* pixels = data->img_data->pixels;
unsigned color_offset = 0;
unsigned x_pixel;
for ( x_pixel = center[ VEC_X ]; x_pixel < width; ++x_pixel )
unsigned y_pixel;
for ( y_pixel = center[ VEC_Y ]; y_pixel < height; ++y_pixel )
const bool do_color_update = true; //<--- replace true with a condition which checks to see if the color needs to be updated.
if ( do_color_update )
glColor3fv( pixels + color_offset );
color_offset += bytecount;
glBindTexture( GL_TEXTURE_2D, 0 );
You're completely missing the point of a OpenGL texture in your code. The texture holds the image for you and the rasterizer does all the iterations over the pixel data for you. No need to write a slow pixel-pusher loop yourself.
As your code stands right now that texture is completely bogus and does nothing. You could completely omit the calls to glBindTexture and it'd still work – or not, because you're not actually drawing anything, you just set the glColor state. To draw something you'd have to call glVertex.
So why not leverage the pixel-pushing performance of modern GPUs and actually use a texture? How about this:
void R_RenderTexture_PixByPix( texture_bmp_t* const data, const vec3 center )
if( 0 == data->texbuf_id ) {
glGenTextures(1, &(data->texbuf_id));
glBindTexture( GL_TEXTURE_2D, data->texbuf_id );
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
// there are a few more, but the defaults are ok
// if you didn't change them no need for further unpack settings
GLenum internal_format;
GLenum format;
GLenum type;
switch(data->img_data->bpp) {
case 8:
// this could be a palette or grayscale
internal_format = GL_LUMINANCE8;
format = GL_LUMINANCE;
case 15:
internal_format = GL_RGB5;
format = GL_BGR; // BMP files have BGR pixel order
type = GL_UNSIGNED_SHORT_1_5_5_5;
case 16:
internal_format = GL_RGB8;
format = GL_BGR; // BMP files have BGR pixel order
type = GL_UNSIGNED_SHORT_5_6_5;
case 24:
internal_format = GL_RGB8;
format = GL_BGR; // BMP files have BGR pixel order
case 32:
internal_format = GL_RGB8;
format = GL_BGR; // BMP files have BGR pixel order
type = GL_UNSIGNED_INT_8_8_8_8;
glTexImage2D( GL_TEXTURE_2D, 0, internal_format,
data->img_data->width, data->img_data->height, 0,
format, type, data->img_data->pixels );
} else {
glBindTexture( GL_TEXTURE_2D, data->texbuf_id );
static GLfloat verts[] = {
0, 0,
1, 0,
1, 1,
0, 1
// the following is to address texture image pixel centers
// tex coordinates 0 and 1 are not on pixel centers!
float const s0 = 1. / (2.*tex_width);
float const s1 = ( 2.*(tex_width-1) + 1.) / (2.*tex_width);
float const t0 = 1. / (2.*tex_height);
float const t1 = ( 2.*(tex_height-1) + 1.) / (2.*tex_height);
GLfloat texcoords[] = {
s0, t0,
s1, t0,
s1, t1,
s0, t1
glVertexPointer(2, GL_FLOAT, 0, verts);
glTexCoordPointer(2, GL_FLOAT, 0, texcoords);
glColor4f(1., 1., 1., 1.);
glDrawArrays(GL_QUADS, 0, 4);
glBindTexture( GL_TEXTURE_2D, 0 );
Your intuition is basically correct. The pixels are stored as an array of bytes, but the bytes are arranged into consecutive groups, with each group representing a single pixel. To address a single pixel, you'll need to do a calculation like this:
unsigned char* image_data = start_of_pixel_data;
unsigned char* pixel_addr = image_data + bytes_per_pixel * (y * width_in_pixels + x);
Be careful about the width in pixels, as sometimes there is padding at the end of the row to bring the total row width in bytes up to a multiple of 4/8/16/32/64/etc. I recommend looking at the actual bytes of the bitmap in hex first to get a sense of what is going on. It's a great learning exercise and will give you high confidence in your pixel-walking code, which is what you want. You might be able to use a debugger to do this, or else write a simple loop with printf over the image bytes.
