In my application, we need to display the video frame on the screen. I use libvpx to decode a video from WebM, but frame is decoded to YUV format (VPX_IMG_FMT_I420 according to the documentation). I need to output format is RGB and the documentation says a image supported a RGB format (VPX_IMG_FMT_RGB24). I have a formula for translating YUV->RGB:
R = Y + 1.13983 * (V - 128);
G = Y - 0.39465 * (U - 128) - 0.58060 * (V - 128);
B = Y + 2.03211 * (U - 128);
But I think is too many conversions VP8->YUV->RGB. Is there a method for set a output frame format for conversion function?
If you can afford using Intel's IPP library, here is some CPU friendly piece of code that you can try and apply in your project:
unsigned char* mpRGBBuffer;
void VPXImageToRGB24(vpx_image_t* pImage, bool isUsingBGR)
{
const unsigned int rgbBufferSize = pImage->d_w * pImage->d_h * 3;
mpRGBBuffer - allocate your raw RGB buffer...
const IppiSize sz = { pImage->d_w, pImage->d_h };
const Ipp8u* src[3] = { pImage->planes[PLANE_Y], pImage->planes[PLANE_U], pImage->planes[PLANE_V] };
int srcStep[3] = { pImage->stride[VPX_PLANE_Y], pImage->stride[VPX_PLANE_U], pImage->stride[VPX_PLANE_V] };
if (isUsingBGR) ippiYCbCr420ToBGR_8u_P3C3R(src, srcStep, pDest, pImage->d_w * 3, sz);
else ippiYCbCr420ToRGB_8u_P3C3R(src, srcStep, pDest, pImage->d_w * 3, sz);
}
If you dont want to use IPP, here is a link to some working peace of core that could really be usefull. Tested this, works for 100% but not sure about the CPU cost.
Here is the code from the link above (in case link fails...)
inline int clamp8(int v)
{
return std::min(std::max(v, 0), 255);
}
Image VP8Decoder::convertYV12toRGB(const vpx_image_t* img)
{
Image rgbImg(img->d_w, img->d_h);
std::vector<uint8_t>& data = rgbImg.data;
uint8_t *yPlane = img->planes[VPX_PLANE_Y];
uint8_t *uPlane = img->planes[VPX_PLANE_U];
uint8_t *vPlane = img->planes[VPX_PLANE_V];
int i = 0;
for (unsigned int imgY = 0; imgY < img->d_h; imgY++) {
for (unsigned int imgX = 0; imgX < img->d_w; imgX++) {
int y = yPlane[imgY * img->stride[VPX_PLANE_Y] + imgX];
int u = uPlane[(imgY / 2) * img->stride[VPX_PLANE_U] + (imgX / 2)];
int v = vPlane[(imgY / 2) * img->stride[VPX_PLANE_V] + (imgX / 2)];
int c = y - 16;
int d = (u - 128);
int e = (v - 128);
// TODO: adjust colors ?
int r = clamp8((298 * c + 409 * e + 128) >> 8);
int g = clamp8((298 * c - 100 * d - 208 * e + 128) >> 8);
int b = clamp8((298 * c + 516 * d + 128) >> 8);
// TODO: cast instead of clamp8
data[i + 0] = static_cast<uint8_t>(r);
data[i + 1] = static_cast<uint8_t>(g);
data[i + 2] = static_cast<uint8_t>(b);
i += 3;
}
}
return rgbImg;
}
Related
Screenshot of the rendering system attempting to display "ABCD1234!\n":
(text size is very small, so I had to crop in quite a bit so it's visible)
The characters displayed are not in the font array, so it seems to me like I've indexed something out of bounds somewhere. Here's the relevant code:
// I've left out a bit of initilization code, the rest of the code is in the Github repository
// linked, at bootloader/include/bootloader_tty.h
size_t strlen(const char * _str)
{
size_t i = 0;
while(_str[i++]);
return i - 1;
}
// Prints character 'c' at X, Y
void terminal_putc(char c, unsigned int x, unsigned int y, uint32_t fgcolor)
{
for (unsigned int Y = 0; Y < 8; Y++)
{
for (unsigned int X = 0; X < 8; X++)
{
if ((font[(c * 8) + Y] & (1 << X)))
{
*((uint32_t*)(framebuffer_addr + 4 * pitch * Y + y + 4 * X + x)) = fgcolor;
}
}
}
}
// Writes the string `data` of length `length` to the "terminal"
void terminal_write(const char* data, size_t length)
{
for (size_t i = 0; i < length; i++)
{
char c = data[i];
if (c == '\n')
{
cursorY++;
break;
}
terminal_putc(c, cursorX * 8, cursorY * 8, 0xFFFFFFFF);
cursorX++;
if (cursorX > consoleWidth)
{
cursorX = 0;
cursorY++;
}
}
}
// Writes the string `data` to the "terminal"
void terminal_writestring(const char* data)
{
terminal_write(data, strlen(data));
}
The way I'm calling terminal_writestring() is:
terminal_writestring("ABCD1234!\n");
Github repository: link
The distortion exists for a couple of reasons
The leftmost pixel for any character resides in bit 7 of the concerned font byte. Your code currently fetches it from bit 0, thereby mirroring the output of the character.
Change if ((font[(c * 8) + Y] & (1 << X))) into either
if ((font[(c * 8) + Y] & (1 << (7 - X))))
or
if ((font[(c * 8) + Y] & (0b10000000 >> X)))
The offset address calculation only scales the character-internal (X,Y) offsets, but forgets to also scale the (x,y) screen coordinates.
Change *((uint32_t*)(framebuffer_addr + 4 * pitch * Y + y + 4 * X + x)) = fgcolor; into
*((uint32_t*)(framebuffer_addr + 4 * pitch * (Y + y) + 4 * (X + x))) = fgcolor;
i have an array of n length fullfilled by 16 bit (int16) pcm raw data,the data is in 44100 sample_rate
and stereo,so i have in my array first 2 bytes left channel then right channel etc...i tried to implement a simple low pass converting my array into floating points -1 1,the low pass works but there are round errors that cause little pops in the sound
now i do simply this :
INT32 left_id = 0;
INT32 right_id = 1;
DOUBLE filtered_l_db = 0.0;
DOUBLE filtered_r_db = 0.0;
DOUBLE last_filtered_left = 0;
DOUBLE last_filtered_right = 0;
DOUBLE l_db = 0.0;
DOUBLE r_db = 0.0;
DOUBLE low_filter = filter_freq(core->audio->low_pass_cut);
for(UINT32 a = 0; a < (buffer_size/2);++a)
{
l_db = ((DOUBLE)input_buffer[left_id]) / (DOUBLE)32768;
r_db = ((DOUBLE)input_buffer[right_id]) / (DOUBLE)32768;
///////////////LOW PASS
filtered_l_db = last_filtered_left +
(low_filter * (l_db -last_filtered_left ));
filtered_r_db = last_filtered_right +
(low_filter * (r_db - last_filtered_right));
last_filtered_left = filtered_l_db;
last_filtered_right = filtered_r_db;
INT16 l = (INT16)(filtered_l_db * (DOUBLE)32768);
INT16 r = (INT16)(filtered_r_db * (DOUBLE)32768);
output_buffer[left_id] = (output_buffer[left_id] + l);
output_buffer[right_id] = (output_buffer[right_id] + r);
left_id +=2;
right_id +=2;
}
PS: the input buffer is an int16 array with the pcm data from -32767 to 32767;
i found this function here
Low Pass filter in C
and was the only one that i could understand xd
DOUBLE filter_freq(DOUBLE cut_freq)
{
DOUBLE a = 1.0/(cut_freq * 2 * PI);
DOUBLE b = 1.0/SAMPLE_RATE;
return b/(a+b);
}
my aim is instead to have absolute precision on the wave,and to directly low pass using only integers
with the cost to lose resolution on the filter(and i'm ok with it)..i saw a lot of examples but i really didnt understand anything...someone of you would be so gentle to explain how this is done like you would explain to a little baby?(in code or pseudo code rapresentation) thank you
Assuming the result of function filter_freq can be written as a fraction m/n your filter calculation basically is
y_new = y_old + (m/n) * (x - y_old);
which can be transformed to
y_new = ((n * y_old) + m * (x - y_old)) / n;
The integer division / n truncates the result towards 0. If you want rounding instead of truncation you can implement it as
y_tmp = ((n * y_old) + m * (x - y_old));
if(y_tmp < 0) y_tmp -= (n / 2);
else y_tmp += (n / 2);
y_new = y_tmp / n
In order to avoid losing precision from dividing the result by n in one step and multiplying it by n in the next step you can save the value y_tmp before the division and use it in the next cycle.
y_tmp = (y_tmp + m * (x - y_old));
if(y_tmp < 0) y_new = y_tmp - (n / 2);
else y_new = y_tmp + (n / 2);
y_new /= n;
If your input data is int16_t I suggest to implement the calculation using int32_t to avoid overflows.
I tried to convert the filter in your code without checking other parts for possible problems.
INT32 left_id = 0;
INT32 right_id = 1;
int32_t filtered_l_out = 0; // output value after division
int32_t filtered_r_out = 0;
int32_t filtered_l_tmp = 0; // used to keep the output value before division
int32_t filtered_r_tmp = 0;
int32_t l_in = 0; // input value
int32_t r_in = 0;
DOUBLE low_filter = filter_freq(core->audio->low_pass_cut);
// define denominator and calculate numerator
// use power of 2 to allow bit-shift instead of division
const uint32_t filter_shift = 16U;
const int32_t filter_n = 1U << filter_shift;
int32_t filter_m = (int32_t)(low_filter * filter_n)
for(UINT32 a = 0; a < (buffer_size/2);++a)
{
l_in = input_buffer[left_id]);
r_in = input_buffer[right_id];
///////////////LOW PASS
filtered_l_tmp = filtered_l_tmp + filter_m * (l_in - filtered_l_out);
if(last_filtered_left < 0) {
filtered_l_out = last_filtered_left - filter_n/2;
} else {
filtered_l_out = last_filtered_left + filter_n/2;
}
//filtered_l_out /= filter_n;
filtered_l_out >>= filter_shift;
/* same calculation for right */
INT16 l = (INT16)(filtered_l_out);
INT16 r = (INT16)(filtered_r_out);
output_buffer[left_id] = (output_buffer[left_id] + l);
output_buffer[right_id] = (output_buffer[right_id] + r);
left_id +=2;
right_id +=2;
}
As your filter is initialized with 0 it may need several samples to follow a possible step to the first input value. Depending on your data it might be better to initialize the filter based on the first input value.
I am trying to perform the following calculation using an ATmega328P MCU.
๐๐๐ ๐๐ก๐๐๐ = 1000 ยท ๐๐๐0 + 2000 ยท ๐๐๐1 + โฏ + 8000 ยท ๐๐๐7 / ๐๐๐0+๐๐๐1+โฏ+๐๐๐7
In the main routine (as shown here):
int main(void)
{
//variables
uint16_t raw_values[8];
uint16_t position = 0;
uint16_t positions[8];
char raw[] = " raw";
char space[] = ", ";
char channelString[] = "Channel#: ";
char positionString[] = "Position: ";
//initialize ADC (Analog)
initADC();
//initialize UART
initUART(BAUD, DOUBLE_SPEED);
//give time for ADC to perform & finish 1st conversion
//8us x 25 = 200us
delay_us(200);
while(1)
{
//get the raw values from the ADC for each channel
for(uint8_t channel = 0; channel < 8; channel++)
{
raw_values[channel] = analog(channel);
//invert the raw value
raw_values[channel] = DIVISOR - raw_values[channel];
}
for(uint8_t channel = 0; channel < 8; channel++)
{
//print the channel#
transmitString(channelString);
printDec16bit(channel);
transmitString(space);
//print the raw value from the ADC conversion
printDec16bit(raw_values[channel]);
transmitString(raw);
transmitString(space);
//calculate the position value at each sensor
transmitString(positionString);
positions[channel] = (uint16_t)((POSITION_REF/DIVISOR) * raw_values[channel]);
printDec16bit(positions[channel]);
printCR();
}
printCR();
//calculate and display 'position'
position = calculatePosition(positions);
printDec16bit(position);
printCR();
printCR();
//add a delay
delay_ms(2000);
}
}
I am calling the following function, but the return value I am getting is way off.
uint16_t calculatePosition(uint16_t* channel_positions)
{
uint32_t intermediates[8];
uint32_t temp_sum = 0;
uint16_t divisor = 0;
uint16_t value = 0;
for(uint8_t i = 0; i < 8; i++)
{
intermediates[i] = channel_positions[i] * ((i + 1) * 1000);
}
for(uint8_t j = 0; j < 8; j++)
{
temp_sum = temp_sum + intermediates[j];
}
for(uint8_t k = 0; k < 8; k++)
{
divisor = divisor + channel_positions[k];
}
value = temp_sum/divisor;
return value;
}
Alternatively, I have even tried this code, and get a result that is not what I expect.
uint16_t calculatePosition(uint16_t* channel_positions)
{
uint16_t position;
position = ((1000 * channel_positions[0]) +
(2000 * channel_positions[1]) +
(3000 * channel_positions[2]) +
(4000 * channel_positions[3]) +
(5000 * channel_positions[4]) +
(6000 * channel_positions[5]) +
(7000 * channel_positions[6]) +
(8000 * channel_positions[7])) /
(channel_positions[0] +
channel_positions[1] +
channel_positions[2] +
channel_positions[3] +
channel_positions[4] +
channel_positions[5] +
channel_positions[6] +
channel_positions[7]);
return position;
}
What could I be doing wrong? For an array of values such as {15, 12, 5, 16, 11, 35, 964, 76} I expect a result of 6504, but instead I get a value in the 200's (or some other weird value).
Look at your input array: {15, 12, 5, 16, 11, 35, 964, 76}
Specifically, look at the element that is 964. That element times 7000 is 6748000 which is greater than a uint16_t can handle.
There are a number of solutions. One of them is changing to uint32_t. If this is not an option, you could extract a factor of 1000, like this:
position = 1000 *(
((1 * channel_positions[0]) +
(2 * channel_positions[1]) +
(3 * channel_positions[2]) +
(4 * channel_positions[3]) +
(5 * channel_positions[4]) +
(6 * channel_positions[5]) +
(7 * channel_positions[6]) +
(8 * channel_positions[7])) /
(channel_positions[0] +
channel_positions[1] +
channel_positions[2] +
channel_positions[3] +
channel_positions[4] +
channel_positions[5] +
channel_positions[6] +
channel_positions[7]));
Note that this will not eliminate the problem, but it could possibly reduce it so that the problem never occurs for reasonable input.
Taking the same idea to the loop version, we get:
uint16_t calculatePosition(uint16_t* channel_positions)
{
uint16_t temp_sum = 0;
uint16_t divisor = 0;
for(uint8_t i = 0; i < 8; i++) {
temp_sum += (channel_positions[i] * (i+1));
divisor += channel_positions[i];
}
return 1000*(temp_sum/divisor);
}
Note that you will lose some accuracy in the process due to rounding with integer division. Since you have been very careful with specifying the width, I assume you're not willing to change the type of the input array. This code should give you maximum accuracy with minimal extra memory usage. But if you're running this function often on a 16-bit machine it can impact performance quite a bit.
uint16_t calculatePosition(uint16_t* channel_positions)
{
// Use 32 bit for these
uint32_t temp_sum = 0;
uint32_t divisor = 0;
for(uint8_t i = 0; i < 8; i++) {
// Copy the value to a 32 bit number
uint32_t temp_pos = channel_positions[i];
temp_sum += temp_pos * (i+1);
divisor += temp_pos;
}
// Moved parenthesis for better accuracy
return (1000*temp_sum) / divisor;
}
Provided that the result can fit in a uint16_t there is absolutely zero chance that this version will fail, because the biggest possible value for 1000*temp_sum is 2,359,260,000 and the biggest value it can hold is 4,294,967,295.
Sidenote about MRE (minimal, reproducible example)
MRE:s are described here: https://stackoverflow.com/help/minimal-reproducible-example
In this example, a good main function to post in the question would be:
#include <stdio.h>
int main()
{
uint16_t positions[] = {15, 12, 5, 16, 11, 35, 964, 76};
uint16_t pos = calculatePosition(positions);
printf("%d\n", pos);
}
It's enough to demonstrate the problem you had and no more.
As it was said, the problem is in integer overflow.
Be careful when moving the multiplier outside, when using integer math! (A * 1000) / B does not equal to (A / B) * 1000.
The simplest solution, to convert first of operands in each operation into a wider type. Others will be converted implicitly. E.q.
...
position = ((1000UL * channel_positions[0]) +
(2000UL * channel_positions[1]) +
(3000UL * channel_positions[2]) +
(4000UL * channel_positions[3]) +
(5000UL * channel_positions[4]) +
(6000UL * channel_positions[5]) +
(7000UL * channel_positions[6]) +
(8000UL * channel_positions[7])) /
((uint32_t)channel_positions[0] +
channel_positions[1] + // no need to convert, it will be converted implicitly
channel_positions[2] + // since previous operand is wider
channel_positions[3] +
channel_positions[4] +
channel_positions[5] +
channel_positions[6] +
channel_positions[7]);
I have written a block matching algorithm in c++ using opencv for my thesis .
It is working on grayscale pictures and addresses the IPLImage by his absolute pixeladress.
I have to devide the IPLImage in blocks of the same size (8x8 pxls). In order to access the pixel values within the blocks, I compute the pixeladress and access the pixel value in this way:
for (int yBlock = 0; yBlock < maxYBlocks; yBlock++){
for (int xBlock = 0; yxlock < maxXBlocks; xBlock++){
for (int yPixel = 0; yPixel < 8; yPixel++){
for (int xPixel = 0; xPixel < 8; xPixel++){
pixelAdress = yBlock*imageWidth*8 + xBlock*8 + yPixel*imageWidth + xPixel;
unsigned char* imagePointer = (unsigned char*)(img->imageData);
pixelValue = imagePointer[pixelAdress];
}
}
}
}
I do NOT really itterate over rows and cols and it works great!
Now I have a colored IPLImage (no grayscale) and don't know how to access the r, g, b pixelvalues.
I found this on this forum
for( row = 0; row < img->height; row++ ){
for ( col = 0; col < img->width; col++ ){
b = (int)img->imageData[img->widthStep * row + col * 3];
g = (int)img->imageData[img->widthStep * row + col * 3 + 1];
r = (int)img->imageData[img->widthStep * row + col * 3 + 2];
}
}
but I'm not sure how to use it on my computed pixelAdress. Is it correct just to multiply it by 3 (because I do not iterate over rows and the add 0, 1 or 2? For example:
pixelValueR = imagePointer[pixelAdress*3 + 2];
pixelValueG = imagePointer[pixelAdress*3 + 1];
pixelValueB = imagePointer[pixelAdress*3 + 0];
or do I have to use widthStep where I used imageWidth before, like this:
pixelAdressR = pixelAdress = yBlock*img->widthStep*8 + xBlock*8*3 + yPixel*img->widthStep + xPixel*3 + 2;
pixelAdressG = pixelAdress = yBlock*img->widthStep*8 + xBlock*8*3 + yPixel*img->widthStep + xPixel*3 + 1;
pixelAdressB = pixelAdress = yBlock*img->widthStep*8 + xBlock*8*3 + yPixel*img->widthStep + xPixel*3;
and so access
pixelValueR = imagePointer[pixelAdressR];
pixelValueG = imagePointer[pixelAdressG];
pixelValueB = imagePointer[pixelAdressB];
In case of a multi channel Mat (BGR in this example) you can access the single pixel by using, as described here
Vec3b intensity = img.at<Vec3b>(y, x);
uchar blue = intensity.val[0];
uchar green = intensity.val[1];
uchar red = intensity.val[2];
not sure about your whole algorithm and can't test it at the moment, but for IplImages, the memory is aligned as this:
1. row
baseadress + 0 = b of [0]
baseadress + 1 = g of [0]
baseadress + 2 = r of [0]
baseadress + 3 = b of [1]
etc
2. row
baseadress + widthStep + 0 = b
baseadress + widthStep + 1 = g
baseadress + widthStep + 2 = r
so if you have have n*m blocks of size 8x8 unsigned char bgr data and you want to loop over variables [x,y] in block [bx,by] you can do it like this:
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +0 = b
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +1 = g
baseadress + (by*8+ y_in_block)*widthStep + (bx*8+x)*3 +2 = r
since row by*8+y is adressbaseadress + (by*8+ y_in_block)*widthStep`
and column bx*8+x is adress offset (bx*8+x)*3
For Mat (e.g. Mat img)
Grayscale (8UC1):
uchar intensity = img.at<uchar>(y, x);
Color image (BGR color ordering, the default format returned by imread):
Vec3b intensity = img.at<Vec3b>(y, x);
uchar blue = intensity.val[0];
uchar green = intensity.val[1];
uchar red = intensity.val[2];
For IplImage (e.g. IplImage* img)
Grayscale:
uchar intensity = CV_IMAGE_ELEM(img, uchar, h, w);
Color image:
uchar blue = CV_IMAGE_ELEM(img, uchar, y, x*3);
uchar green = CV_IMAGE_ELEM(img, uchar, y, x*3+1);
uchar red = CV_IMAGE_ELEM(img, uchar, y, x*3+2);
I have images as bitmap and JPEG. I will have to retrieve the pixels from the image there by RGB values of all pixels are obtained. Please suggest a method where RGB values are retrieved from an image file. I would appreciated if there are any functions available in C.
You can parse and get bitmap from JPEG using libJPEG - it is pretty simple
Suppose you have and RGB bimap in 'rgb'. Result will be placed in 'yuv420p' vector.
void rgb2yuv420p(std::vector<BYTE>& rgb, std::vector<BYTE>& yuv420p)
{
unsigned int i = 0;
unsigned int numpixels = width * height;
unsigned int ui = numpixels;
unsigned int vi = numpixels + numpixels / 4;
unsigned int s = 0;
#define sR (BYTE)(rgb[s+2])
#define sG (BYTE)(rgb[s+1])
#define sB (BYTE)(rgb[s+0])
yuv420p.resize(numpixels * 3 / 2);
for (int j = 0; j < height; j++)
for (int k = 0; k < width; k++)
{
yuv420p[i] = (BYTE)( (66*sR + 129*sG + 25*sB + 128) >> 8) + 16;
if (0 == j%2 && 0 == k%2)
{
yuv420p[ui++] = (BYTE)( (-38*sR - 74*sG + 112*sB + 128) >> 8) + 128;
yuv420p[vi++] = (BYTE)( (112*sR - 94*sG - 18*sB + 128) >> 8) + 128;
}
i++;
s += colors;
}
}
If you want to do this yourself, here's teh Wikipedia article that I worked from when I did this at work, about a year back:
http://en.wikipedia.org/wiki/YUV
This is pretty good too:
http://www.fourcc.org/fccyvrgb.php
But MUCH easier is jpeglib - that wasn't an option in my case, because the data wasn't jpeg in the first place.