2 layer buffer in STM32F4 - arm

I have faced a new strange problem, I have initialized 2 framebuffer addresses in my SDRAM.
These 2 addresses: SDRAM_START_ADR and the second one SDRAM_START_ADR2.
In stm32f4 2 layers are independent, I programmed a project that 2 images which the first one was in 800 * 480 my LCD resolution and second one was an image smaller than my first image. That was ok when I ran the project.
LTDC read 2 images directly from my flash.
Now I want to save data in my external SDRAM, and read data directly from it, to do this I have configured 2 layers.(which I discussed above).
The strange problem is I want to display these 2 buffers at the same time, but for test, I wrote 2 full rectangle in different colors in 2 layers in my SDRAM, it is ok when I show them independently (layer 1 is enabled and 2 is disabled or vice versa), but to show them at the same time I got my rectangles in disturbed lines.
I want 2 layers in 800 * 480 resolution.
How can I solve this problem?
AND the important question is how can I switch between these 2 layers without any error in display.
STM32429 ZGT6
LCD was configured in 16 bits.
Thanks!
Here is code:
LTDC:
// layer1
LTDC_Layer_InitStruct.LTDC_HorizontalStart = HSYNC_W + HBP;
LTDC_Layer_InitStruct.LTDC_HorizontalStop = 800 + HSYNC_W + HBP - 1;
LTDC_Layer_InitStruct.LTDC_VerticalStart = VSYNC_W + VBP;
LTDC_Layer_InitStruct.LTDC_VerticalStop = 480+ VSYNC_W + VBP - 1;
LTDC_Layer_InitStruct.LTDC_PixelFormat = LTDC_Pixelformat_RGB565;
LTDC_Layer_InitStruct.LTDC_ConstantAlpha = 255;
LTDC_Layer_InitStruct.LTDC_DefaultColorAlpha = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorBlue = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorGreen = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorRed = 0;
LTDC_Layer_InitStruct.LTDC_BlendingFactor_1 = LTDC_BlendingFactor1_CA;
LTDC_Layer_InitStruct.LTDC_BlendingFactor_2 = LTDC_BlendingFactor2_CA;
LTDC_Layer_InitStruct.LTDC_CFBLineLength = ((800 * 2) + 3);
LTDC_Layer_InitStruct.LTDC_CFBPitch = (800 * 2);
LTDC_Layer_InitStruct.LTDC_CFBLineNumber = 480;
LTDC_Layer_InitStruct.LTDC_CFBStartAdress =SDRAM_START_ADR;
LTDC_LayerInit(LTDC_Layer1, &LTDC_Layer_InitStruct);
// layer 2
LTDC_Layer_InitStruct.LTDC_HorizontalStart = HSYNC_W + HBP;
LTDC_Layer_InitStruct.LTDC_HorizontalStop = 800 + HSYNC_W + HBP - 1;
LTDC_Layer_InitStruct.LTDC_VerticalStart = VSYNC_W + VBP;
LTDC_Layer_InitStruct.LTDC_VerticalStop = 480+ VSYNC_W + VBP - 1;
LTDC_Layer_InitStruct.LTDC_PixelFormat = LTDC_Pixelformat_RGB565;
LTDC_Layer_InitStruct.LTDC_ConstantAlpha = 150;
LTDC_Layer_InitStruct.LTDC_DefaultColorAlpha = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorBlue = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorGreen = 0;
LTDC_Layer_InitStruct.LTDC_DefaultColorRed = 0;
LTDC_Layer_InitStruct.LTDC_BlendingFactor_1 = LTDC_BlendingFactor1_CA;
LTDC_Layer_InitStruct.LTDC_BlendingFactor_2 = LTDC_BlendingFactor2_CA;
LTDC_Layer_InitStruct.LTDC_CFBLineLength = ((200 * 2) + 3);
LTDC_Layer_InitStruct.LTDC_CFBPitch = (200 * 2);
LTDC_Layer_InitStruct.LTDC_CFBLineNumber = 100;
LTDC_Layer_InitStruct.LTDC_CFBStartAdress = SDRAM_START_ADR2;
LTDC_LayerInit(LTDC_Layer2, &LTDC_Layer_InitStruct);
In my main.c:
here are these 2 functions:
LTDC_Cmd(ENABLE);
LTDC_LayerCmd(LTDC_Layer1, ENABLE);
LTDC_ReloadConfig(LTDC_IMReload);
LCD_DrawFullRect(200, 200, 200,100, LCD_COLOR_BLUE);
LTDC_LayerCmd(LTDC_Layer2, ENABLE);
LTDC_ReloadConfig(LTDC_IMReload);
LCD_DrawFullRect2((uint32_t *)SDRAM_START_ADR2,400, 240, 200,100, LCD_COLOR_MAGENTA);

Related

Is there any way for image thresholding without GDI+? [duplicate]

Is it possible to directly read/write to a WriteableBitmap's pixel data? I'm currently using WriteableBitmapEx's SetPixel() but it's slow and I want to access the pixels directly without any overhead.
I haven't used HTML5's canvas in a while, but if I recall correctly you could get its image data as a single array of numbers and that's kind of what I'm looking for
Thanks in advance
To answer your question, you can more directly access a writable bitmap's data by using the Lock, write, Unlock pattern, as demonstrated below, but it is typically not necessary unless you are basing your drawing upon the contents of the image. More typically, you can just create a new buffer and make it a bitmap, rather than the other way around.
That being said, there are many extensibility points in WPF to perform innovative drawing without resorting to pixel manipulation. For most controls, the existing WPF primitives (Border, Line, Rectangle, Image, etc...) are more than sufficient - don't be concerned about using many of them, they are rather cheap to use. For complex controls, you can use the DrawingContext to draw D3D primitives. For image effects, you can implement GPU assisted shaders using the Effect class or use the built in effects (Blur and Shadow).
But, if your situation requires direct pixel access, pick a pixel format and start writing. I suggest BGRA32 because it is easy to understand and is probably the most common one to be discussed.
BGRA32 means the pixel data is stored in memory as 4 bytes representing the blue, green, red, and alpha channels of an image, in that order. It is convenient because each pixel ends up on a 4 byte boundary, lending it to storage in an 32 bit integer. When dealing with a 32 bit integer, keep in mind the order will be reversed on most platforms (check BitConverter.IsLittleEndian to determine proper byte order at runtime if you need to support multiple platforms, x86 and x86_64 are both little endian)
The image data is stored in horizontal strips which are one stride wide which compose a single row the width of an image. The stride width is always greater than or equal to the pixel width of the image multiplied by the number of bytes per pixel in the format selected. Certain situations can cause the stride to be longer than the width * bytesPerPixel which are specific to certain architechtures, so you must use the stride width to calculate the start of a row, rather than multiplying the width. Since we are using a 4 byte wide pixel format, our stride does happen to be width * 4, but you should not rely upon it.
As mentioned, the only case I would suggest using a WritableBitmap is if you are accessing an existing image, so that is the example below:
Before / After:
// must be compiled with /UNSAFE
// get an image to draw on and convert it to our chosen format
BitmapSource srcImage = JpegBitmapDecoder.Create(File.Open("img13.jpg", FileMode.Open),
BitmapCreateOptions.None, BitmapCacheOption.OnLoad).Frames[0];
if (srcImage.Format != PixelFormats.Bgra32)
srcImage = new FormatConvertedBitmap(srcImage, PixelFormats.Bgra32, null, 0);
// get a writable bitmap of that image
var wbitmap = new WriteableBitmap(srcImage);
int width = wbitmap.PixelWidth;
int height = wbitmap.PixelHeight;
int stride = wbitmap.BackBufferStride;
int bytesPerPixel = (wbitmap.Format.BitsPerPixel + 7) / 8;
wbitmap.Lock();
byte* pImgData = (byte*)wbitmap.BackBuffer;
// set alpha to transparent for any pixel with red < 0x88 and invert others
int cRowStart = 0;
int cColStart = 0;
for (int row = 0; row < height; row++)
{
cColStart = cRowStart;
for (int col = 0; col < width; col++)
{
byte* bPixel = pImgData + cColStart;
UInt32* iPixel = (UInt32*)bPixel;
if (bPixel[2 /* bgRa */] < 0x44)
{
// set to 50% transparent
bPixel[3 /* bgrA */] = 0x7f;
}
else
{
// invert but maintain alpha
*iPixel = *iPixel ^ 0x00ffffff;
}
cColStart += bytesPerPixel;
}
cRowStart += stride;
}
wbitmap.Unlock();
// if you are going across threads, you will need to additionally freeze the source
wbitmap.Freeze();
However, it really isn't necessary if you are not modifying an existing image. For example, you can draw a checkerboard pattern using all safe code:
Output:
// draw rectangles
int width = 640, height = 480, bytesperpixel = 4;
int stride = width * bytesperpixel;
byte[] imgdata = new byte[width * height * bytesperpixel];
int rectDim = 40;
UInt32 darkcolorPixel = 0xffaaaaaa;
UInt32 lightColorPixel = 0xffeeeeee;
UInt32[] intPixelData = new UInt32[width * height];
for (int row = 0; row < height; row++)
{
for (int col = 0; col < width; col++)
{
intPixelData[row * width + col] = ((col / rectDim) % 2) != ((row / rectDim) % 2) ?
lightColorPixel : darkcolorPixel;
}
}
Buffer.BlockCopy(intPixelData, 0, imgdata, 0, imgdata.Length);
// compose the BitmapImage
var bsCheckerboard = BitmapSource.Create(width, height, 96, 96, PixelFormats.Bgra32, null, imgdata, stride);
And you don't really even need an Int32 intermediate, if you write to the byte array directly.
Output:
// draw using byte array
int width = 640, height = 480, bytesperpixel = 4;
int stride = width * bytesperpixel;
byte[] imgdata = new byte[width * height * bytesperpixel];
// draw a gradient from red to green from top to bottom (R00 -> ff; Gff -> 00)
// draw a gradient of alpha from left to right
// Blue constant at 00
for (int row = 0; row < height; row++)
{
for (int col = 0; col < width; col++)
{
// BGRA
imgdata[row * stride + col * 4 + 0] = 0;
imgdata[row * stride + col * 4 + 1] = Convert.ToByte((1 - (col / (float)width)) * 0xff);
imgdata[row * stride + col * 4 + 2] = Convert.ToByte((col / (float)width) * 0xff);
imgdata[row * stride + col * 4 + 3] = Convert.ToByte((row / (float)height) * 0xff);
}
}
var gradient = BitmapSource.Create(width, height, 96, 96, PixelFormats.Bgra32, null, imgdata, stride);
Edit: apparently, you are trying to use WPF to make some sort of image editor. I would still be using WPF primitives for shapes and source bitmaps, and then implement translations, scaling, rotation as RenderTransform's, bitmap effects as Effect's and keep everything within the WPF model. But, if that does not work for you, we have many other options.
You could use WPF primitives to render to a RenderTargetBitmap which has a chosen PixelFormat to use with WritableBitmap as below:
Canvas cvRoot = new Canvas();
// position primitives on canvas
var rtb = new RenderTargetBitmap(width, height, dpix, dpiy, PixelFormats.Bgra32);
var wb = new WritableBitmap(rtb);
You could use a WPF DrawingVisual to issue GDI style commands then render to a bitmap as demonstrated on the sample on the RenderTargetBitmap page.
You could use GDI using an InteropBitmap created using System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap from an HBITMAP retrieved from a Bitmap.GetHBitmap method. Make sure you don't leak the HBITMAP, though.
After a nice long headache, I found this article that explains a way to do it without using bit arithmetic, and allows me to treat it as an array instead:
unsafe
{
IntPtr pBackBuffer = bitmap.BackBuffer;
byte* pBuff = (byte*)pBackBuffer.ToPointer();
pBuff[4 * x + (y * bitmap.BackBufferStride)] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 1] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 2] = 255;
pBuff[4 * x + (y * bitmap.BackBufferStride) + 3] = 255;
}
You can access the raw pixel data by calling the Lock() method and using the BackBuffer property afterwards. When you're finished, don't forget to call AddDirtyRect and Unlock.
For a simple example, you can take a look at this: http://cscore.codeplex.com/SourceControl/latest#CSCore.Visualization/WPF/Utils/PixelManipulationBitmap.cs

SIMD for alpha blending - how to operate on every Nth byte?

I am trying to optimize my alpha blending code with SIMD. SSE2, specifically.
At first I was hoping for SSE2, but at this point I would settle for SSE4.2 if it's easier. Reason being is that if I use SSE4.2 instead of SSE2, I cut out a significant number of older processors that can run this code. But at this point I'd take the compromise.
I am blitting a sprite onto the screen. Everything is in full 32-bit color, ARGB or BGRA, depending on which direction you read it.
I have read every other seemingly related question on SO and everything I could find on the web but I still have not been able to completely wrap my brain around this one specific concept, and I would appreciate some help. I've been at this for days.
Below is my code. This code works, in that it produces the visual effect that I want. A bitmap is drawn onto the background buffer with alpha blending. Everything looks fine and as expected.
But you will see that even though it works, my code misses the point of SIMD entirely. It is operating on each byte one at a time, just like as if it were completely serialized, and therefore the code sees no performance benefit over my more traditional code that operates on just one pixel at a time. With SIMD, I obviously want to work on 4 pixels (or every channel of one pixel - 128 bits) at a time, in parallel. (I am profiling by measuring frames rendered per second.)
I want to just run the formula once for each channel, i.e., blend all of the red channel at once, all of the green channel at once, all of the blue channel at once, and all of the alpha channel at once. Or alternatively, every channel (RGBA) of one of the pixels at once.
Then I should start to see the full benefit of SIMD.
I feel like I probably need to do some things with masks, but nothing I have tried gets me there.
I would be very grateful for some help.
(This is the inner loop. It only handles 4 pixels. I put this inside of a loop where I iterate over 4 pixels at a time with XPixel+=4.)
__m128i BitmapQuadPixel = _mm_load_si128((uint32_t*)Bitmap->Memory + BitmapOffset);
__m128i BackgroundQuadPixel = _mm_load_si128((uint32_t*)gRenderSurface.Memory + MemoryOffset);;
__m128i BlendedQuadPixel;
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// R G B A R G B A R G B A R G B A
// This is the red component of the first pixel.
BlendedQuadPixel.m128i_u8[0] = BitmapQuadPixel.m128i_u8[0] * BitmapQuadPixel.m128i_u8[3] / 255 + BackgroundQuadPixel.m128i_u8[0] * (255 - BitmapQuadPixel.m128i_u8[3]) / 255;
// This is the green component of the first pixel.
BlendedQuadPixel.m128i_u8[1] = BitmapQuadPixel.m128i_u8[1] * BitmapQuadPixel.m128i_u8[3] / 255 + BackgroundQuadPixel.m128i_u8[1] * (255 - BitmapQuadPixel.m128i_u8[3]) / 255;
// And so on...
BlendedQuadPixel.m128i_u8[2] = BitmapQuadPixel.m128i_u8[2] * BitmapQuadPixel.m128i_u8[3] / 255 + BackgroundQuadPixel.m128i_u8[2] * (255 - BitmapQuadPixel.m128i_u8[3]) / 255;
BlendedQuadPixel.m128i_u8[4] = BitmapQuadPixel.m128i_u8[4] * BitmapQuadPixel.m128i_u8[7] / 255 + BackgroundQuadPixel.m128i_u8[4] * (255 - BitmapQuadPixel.m128i_u8[7]) / 255;
BlendedQuadPixel.m128i_u8[5] = BitmapQuadPixel.m128i_u8[5] * BitmapQuadPixel.m128i_u8[7] / 255 + BackgroundQuadPixel.m128i_u8[5] * (255 - BitmapQuadPixel.m128i_u8[7]) / 255;
BlendedQuadPixel.m128i_u8[6] = BitmapQuadPixel.m128i_u8[6] * BitmapQuadPixel.m128i_u8[7] / 255 + BackgroundQuadPixel.m128i_u8[6] * (255 - BitmapQuadPixel.m128i_u8[7]) / 255;
BlendedQuadPixel.m128i_u8[8] = BitmapQuadPixel.m128i_u8[8] * BitmapQuadPixel.m128i_u8[11] / 255 + BackgroundQuadPixel.m128i_u8[8] * (255 - BitmapQuadPixel.m128i_u8[11]) / 255;
BlendedQuadPixel.m128i_u8[9] = BitmapQuadPixel.m128i_u8[9] * BitmapQuadPixel.m128i_u8[11] / 255 + BackgroundQuadPixel.m128i_u8[9] * (255 - BitmapQuadPixel.m128i_u8[11]) / 255;
BlendedQuadPixel.m128i_u8[10] = BitmapQuadPixel.m128i_u8[10] * BitmapQuadPixel.m128i_u8[11] / 255 + BackgroundQuadPixel.m128i_u8[10] * (255 - BitmapQuadPixel.m128i_u8[11]) / 255;
BlendedQuadPixel.m128i_u8[12] = BitmapQuadPixel.m128i_u8[12] * BitmapQuadPixel.m128i_u8[15] / 255 + BackgroundQuadPixel.m128i_u8[12] * (255 - BitmapQuadPixel.m128i_u8[15]) / 255;
BlendedQuadPixel.m128i_u8[13] = BitmapQuadPixel.m128i_u8[13] * BitmapQuadPixel.m128i_u8[15] / 255 + BackgroundQuadPixel.m128i_u8[13] * (255 - BitmapQuadPixel.m128i_u8[15]) / 255;
BlendedQuadPixel.m128i_u8[14] = BitmapQuadPixel.m128i_u8[14] * BitmapQuadPixel.m128i_u8[15] / 255 + BackgroundQuadPixel.m128i_u8[14] * (255 - BitmapQuadPixel.m128i_u8[15]) / 255;
_mm_store_si128((uint32_t*)gRenderSurface.Memory + MemoryOffset, BlendedQuadPixel);
As I see gRenderSurface, I wonder whether you should just blend images on GPU, e.g., using GLSL shader, or if not, reading memory back from the render surface can be very slow. Anyways, here's my cup of tea using SSE4.1 as I didn't find fully similar linked in comments.
This one shuffles alpha bytes to all color channels using _aa and does the "one minus source alpha" blending by the final masking. With AVX2 it outperforms scalar implementation by factor ~5.7x, while the SSE4.1 version with separate low and high quadword processing is ~3.14x faster than the scalar implementation (both measured using Intel Compiler 19.0).
Division by 255 from How to divide 16-bit integer by 255 with using SSE?
const __m128i _aa = _mm_set_epi8( 15,15,15,15, 11,11,11,11, 7,7,7,7, 3,3,3,3 );
const __m128i _mask1 = _mm_set_epi16(-1,0,0,0, -1,0,0,0);
const __m128i _mask2 = _mm_set_epi16(0,-1,-1,-1, 0,-1,-1,-1);
const __m128i _v255 = _mm_set1_epi8( -1 );
const __m128i _v1 = _mm_set1_epi16( 1 );
const int xmax = 4*source.cols-15;
for ( int y=0;y<source.rows;++y )
{
// OpenCV CV_8UC4 input
const unsigned char * pS = source.ptr<unsigned char>( y );
const unsigned char * pD = dest.ptr<unsigned char>( y );
unsigned char *pOut = out.ptr<unsigned char>( y );
for ( int x=0;x<xmax;x+=16 )
{
__m128i _src = _mm_loadu_si128( (__m128i*)( pS+x ) );
__m128i _src_a = _mm_shuffle_epi8( _src, _aa );
__m128i _dst = _mm_loadu_si128( (__m128i*)( pD+x ) );
__m128i _dst_a = _mm_shuffle_epi8( _dst, _aa );
__m128i _one_minus_src_a = _mm_subs_epu8( _v255, _src_a );
__m128i _s_a = _mm_cvtepu8_epi16( _src_a );
__m128i _s = _mm_cvtepu8_epi16( _src );
__m128i _d = _mm_cvtepu8_epi16( _dst );
__m128i _d_a = _mm_cvtepu8_epi16( _one_minus_src_a );
__m128i _out = _mm_adds_epu16( _mm_mullo_epi16( _s, _s_a ), _mm_mullo_epi16( _d, _d_a ) );
_out = _mm_srli_epi16( _mm_adds_epu16( _mm_adds_epu16( _v1, _out ), _mm_srli_epi16( _out, 8 ) ), 8 );
_out = _mm_or_si128( _mm_and_si128(_out,_mask2), _mm_and_si128( _mm_adds_epu16(_s_a, _mm_cvtepu8_epi16(_dst_a)),_mask1) );
__m128i _out2;
// compute _out2 using high quadword of of the _src and _dst
//...
__m128i _ret = _mm_packus_epi16( _out, _out2 );
_mm_storeu_si128( (__m128i*)(pOut+x), _ret );

Adding values to matrix from a single variable in matlab

I have a problem where i can't add values to my 1 x 250 matrix directly from a variable. This is the code.
COMPORT = 'COM4';
BAUDRATE = 115200;
s1 = serial(COMPORT, 'baudrate', BAUDRATE);
set(s1, 'Terminator', 10);
fopen(s1);
adc = 0;
N = 250;
values = zeros(1, N);
for n = 1:N
adc = fscanf(s1);
values(n) = adc;
flushinput(s1);
flushoutput(s1);
end
x = linspace(0, 250);
plot(x, n);
The values(n) = adc does not seem to work and i don't know how to work my way around it.
This doesn't work because values(n) is a single element and the output from fscanf(s1) consist of several elements.
Maybe you want to use cells?
values{n} = adc;
Substitute the pre-allocation: n = zeros(1, N) with n = cell(1,N);.
Notice that you need to do some changes later in your code. I'll leave that uo to you.

ARM neon performance issue

Consider the two following pieces of code, the first is the C version :
void __attribute__((no_inline)) proj(uint8_t * line, uint16_t length)
{
uint16_t i;
int16_t tmp;
for(i=HPHD_MARGIN; i<length-HPHD_MARGIN; i++) {
tmp = line[i-3] - 4*line[i-2] + 5*line[i-1] - 5*line[i+1] + 4*line[i+2] - line[i+3];
hphd_temp[i]=ABS(tmp);
}
}
The second is the same function (except for the border) using neon intrinsics
void __attribute__((no_inline)) proj_neon(uint8_t * line, uint16_t length)
{
int i;
uint8x8_t b0b7, b8b15, p1p8,p2p9,p4p11,p5p12,p6p13, m4, m5;
uint16x8_t result;
m4 = vdup_n_u8(4);
m5 = vdup_n_u8(5);
b0b7 = vld1_u8(line);
for(i = 0; i < length - 16; i+=8) {
b8b15 = vld1_u8(line + i + 8);
p1p8 = vext_u8(b0b7,b8b15, 1);
p2p9 = vext_u8(b0b7,b8b15, 2);
p4p11 = vext_u8(b0b7,b8b15, 4);
p5p12 = vext_u8(b0b7,b8b15, 5);
p6p13 = vext_u8(b0b7,b8b15, 6);
result = vsubl_u8(b0b7, p6p13); //p[-3]
result = vmlal_u8(result, p2p9, m5); // +5 * p[-1];
result = vmlal_u8(result, p5p12, m4);// +4 * p[1];
result = vmlsl_u8(result, p1p8, m4); //-4 * p[-2];
result = vmlsl_u8(result, p4p11, m5);// -5 * p[1];
vst1q_s16(hphd_temp + i + 3, vabsq_s16(vreinterpretq_s16_u16(result)));
b0b7 = b8b15;
}
/* todo : remaining pixel */
}
I am disappointed by the performance gain : it is around 10 - 15 %. If I look at the generated assembly :
C version is transformed in a 108 instruction loop
Neon version is transformed in a 72 instruction loop.
But one loop in the neon code computes 8 times as much data as an iteration through the C loop, so a dramatic improvement should be seen.
Do you have any explanation regarding the small difference between the two version ?
Additional details :
Test data is a 10 Mpix image, computation time is around 2 seconds for the C version.
CPU : ARM cortex a8
I'm going to take a wild guess and say that caching (data) is the reason you don't see the big performance gain you are expecting. While I don't know if your chipset supports caching or at what level, if the data spans cache lines, has poor alignment, or is running in an environment where the CPU is doing other things at the same time (interrupts, threads, etc.), then that also could muddy your results.

ffmpeg (libavcodec) warning: encoded frame too large

I'm trying to use libavcodec (ffmpeg) to encode raw pixel data to mp4 format. Every thing goes well and I'm getting .avi file with decent quality but some times the codec gives "encoded frame too large" warning. And when ever it does that, a part of some frames (usually bottom portion of the frame) look garbled or all mixed up. Can any one tell me when this warning is given. Following are the settings I'm using for encoder:
qmax = 6;
qmin = 2;
bit_rate = 200000; // if I increase this, I get more warnings.
width = 1360;
height = 768;
time_base.den = 15; // frames per second
time_base.num = 1;
gop_size = 48;
pix_fmt = PIX_FMT_YUV420P;
Regards,
From what I can gather ffmpeg allocates a constant buffer size of 2MB to hold a compressed
frame. 1080p is 3MB uncompressed for example, and the codec can't always compress a large frame into less than 2MB.
You can possibly fix this by increasing the buffer size, and/or making it dynamic.
Very probably that codec's buffer is not big enough. Try to change rc_buffer_size. Alternatively, you can try this settings:
ctx->bit_rate = 500000;
ctx->bit_rate_tolerance = 0;
ctx->rc_max_rate = 0;
ctx->rc_buffer_size = 0;
ctx->gop_size = 40;
ctx->max_b_frames = 3;
ctx->b_frame_strategy = 1;
ctx->coder_type = 1;
ctx->me_cmp = 1;
ctx->me_range = 16;
ctx->qmin = 10;
ctx->qmax = 51;
ctx->scenechange_threshold = 40;
ctx->flags |= CODEC_FLAG_LOOP_FILTER;
ctx->me_method = ME_HEX;
ctx->me_subpel_quality = 5;
ctx->i_quant_factor = 0.71;
ctx->qcompress = 0.6;
ctx->max_qdiff = 4;
ctx->directpred = 1;
ctx->flags2 |= CODEC_FLAG2_FASTPSKIP;
In the example code I found something like:
outbuf_size = 100000;
outbuf = malloc(outbuf_size);
[...]
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture);
Pushing outbuf_size to be larger resolved the issue.

Resources