I am trying to use the V4L2 API to capture images and put the images into an opencv Mat. The problem is my webcam only captures in YUYV (YUY2) So I need to convert to RGB24 first. Here is the complete V4L2 code that I am using.
I was able to get objects in the picture to be recognizable, but it is all pink and green, and it is stretched horizontally and distorted. I have tried many different conversion formulas and I have had the same basic pink/green distorted image. The formula used for this picture is from http://paulbourke.net/dataformats/yuv/. I am using the shotwell photo viewer on linux to view the .raw image. I couldn't get gimp to open it. I am not that knowledgable with how to save image formats, but I am assuming there has to be some kind of header but the shotwell photo viewer seemed to work. Could this possibly be reason for the incorrect image?
I am not sure if V4l2 is returning a signed or unsigned byte image which is pointed to by p. But if this were the problem woudln't my image would just be off-color? But it seems the geometry is distorted too. I believe I took care of the casting to and from floating point properly.
Could someone help me understand
how to find out the underlying type contained in the *void p variable
the proper formula for converting from YUYV to RGB24 including explanations of which types to use
could saving the image with no format (headers) and viewing with Shotwell be the problem?
is there an easy way to save an RGB24 image properly.
general debugging tips
Thanks
static unsigned char *bgr_image;
static void process_image(void *p, int size)
{
frame_number++;
char filename[15];
sprintf(filename, "frame-%d.raw", frame_number);
FILE *fp=fopen(filename,"wb");
int i;
float y1, y2, u, v;
char * bgr_p = bgr_image;
unsigned char * p_tmp = (unsigned char *) p;
for (i=0; i < size; i+=4) {
y1 = p_tmp[i];
u = p_tmp[i+1];
y2 = p_tmp[i+2];
v = p_tmp[i+3];
bgr_p[0] = (y1 + 1.371*(u - 128.0));
bgr_p[1] = (y1 - 0.698*(u - 128.0) - 0.336*(v - 128.0));
bgr_p[2] = (y1 + 1.732*(v - 128.0));
bgr_p[3] = (y2 + 1.371*(v - 128.0));
bgr_p[4] = (y2 - 0.698*(v - 128.0) - 0.336*(u - 128.0));
bgr_p[5] = (y2 + 1.732*(u - 128.0));
bgr_p+=6;
}
fwrite(bgr_image, size, 1, fp);
fflush(fp);
fclose(fp);
}
First, you must understand with what type of YUV422 you are working.
PIX_FMT_YUYV422, ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
PIX_FMT_UYVY422, ///< packed YUV 4:2:2, 16bpp, Cb Y0 Cr Y1
Try replacing y1, u, y2, and v accordingly, but you maybe be not dealing with YUV422 at all, the picture could be a planar, instead of a packed format you are expecting?
I think its better for you to download IrfanViewer, which has a raw yuv file open functionality and try picking the correct values to have a correctly decoded image to find what type of data you are using.
do not try to re-invent the wheel. lots of people have written colorspace-converters and chances are high that your implementation (even if it works) is not the "optimal" one (e.g. being slower than necessary).
the canonical way to deal with V4L2 devices of any colourspace is to use the libv4l-library, which will transparently convert the cameras native colorspace to once of BGR24, RGB24 and YUV420 (if you desire that, which i think is true).
as for saving the image, again use what is already there. personally, i would use imagemagick to save a frame in a "proper" format that can be read by any imageviewer (png or tiff, if quality matters)
Related
//I am trying to crop an image captured by espcam the image is in a jpg format I would like to crop it. As the image is stored as a single-dimensional array I tried to rearrange the elements in the array but no changes occurred //
I have cropped the image in RGB565 but I am struggling to understand the single-dimensional array(image buffer)
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_RGB565;
config.frame_size = FRAMESIZE_SVGA;
// config.jpeg_quality = 10;
config.fb_count = 2;
esp_err_t result = esp_camera_init(&config);
if (result != ESP_OK) {
return false;
}
camera_fb_t * fb = NULL;
fb = esp_camera_fb_get();
if (!fb)
{
Serial.println("Camera capture failed");
}
the Fb buffer is a single-dimensional array I want to extract each individual RGB value.
JPG is a compressed format, meaning that your rows and columns are not corresponding to what you would see by displaying a 1:1 grid on the screen. You need to convert it to the plain RGB (or equivalents) format and then copy it.
JPG achieves compression by splitting the image into YCbCR components, using a mathematical transformation and then filtering. For additional information I refer to this page.
Luckily you can follow this tutorial to do the inverse JPEG transformation on an Arduino (tip: forget to do this in real time, unless your time constraints are very relaxed).
The idea is to use a library that converts the JPEG image into an array of data:
Using the library is fairly simple: we give it the JPEG file, and the library will start generating arrays of pixels – so called Minimum Coded Units, or MCUs for short. The MCU is a block of 16 by 8 pixels. The functions in the library will return the color value for each pixel as 16-bit color value. The upper 5 bits are the red value, the middle 6 are green and the lower 5 are blue. Now we can send these values by any sort of communication channel we like.
For your use case you won't send the data through the communication channel, but rather store it in a local array by pushing the blocks into adjacent tiles, then do the crop.
That depends on what kind of hardware (camera and board) you are using.
I'm basing this on the OV2640 camera module because it's the one I've been working with. It delivers the image to the frame buffer already encoded, so I'm guessing this might be what you are facing.
Trying to crop the image after it has been encoded can be tricky, but you might be able to instruct the camera chip to only deliver a certain part of the sensor output in the first place using a window function.
The easiest way to access this setting is to define a function to access this:
void setWindow(int resolution , int xOffset, int yOffset, int xLength, int yLength) {
sensor_t * s = esp_camera_sensor_get();
resolution = 0;
s->set_res_raw(s, resolution, 0, 0, 0, xOffset, yOffset, xLength, yLength, xLength, yLength, true, true);
}
/*
* resolution = 0 \\ 1600 x 1200
* resolution = 1 \\ 800 x 600
* resolution = 2 \\ 400 x 296
*/
where (xOffset,yOffset) is the origin of the window in pixels and (xLength,yLength) is the size of the window in pixels. Be aware that changing the resolution will effectively overwrite these settings. Otherwise this works great for me, although for some reason only if the aspect ratio of 4:3 is preserved in the window size.
Looking at the output format table for the ESP32 Camera Driver one can see that most output formats are non-jpeg. If you can handle a RAW format instead (it will be slower to save/transfer, and be MUCH larger) then that would allow you to more easily crop the image by make a copy with a couple of loops. JPEG is compressed and not easily cropped. The page linked also mentions this:
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. If you need RGB data, it is recommended that JPEG is captured and then turned into RGB using fmt2rgb888 or fmt2bmp/frame2bmp
If you are using PIXFORMAT_RGB565 (which means each pixel value will be kept in TWO bytes, and the image is not jpeg compressed) and FRAMESIZE_SVGA (800x600 pixels), you should be able to access the framebuffer as a two-dimensional array if you want:
uint16_t *buffer = fb->buf;
uint16_t pxl = buffer[row * 800 + column]; // 800 is the SVGA width
// pxl now contains 5 R-bits, 6 G-bits, 5 B-bits
Even though a question of this nature sounds very similar, I am having problems in converting a jpg image to yuv in C (without using opencv).
This is what I have understood as of now, how to solve this problem :
Identify the structure of file formats for jpg and yuv. i.e what each byte in the file actually contains. This is what I think jpg format looks like.
With the above structure I tried to read a jpg file and tried to decipher its 18th and 19th bytes. I did type cast them to both char and int but I don`t get any meaningful values for width and height of the image.
Once I have read these values, I should be able to convert them from jpg to yuv. I was looking at this resource.
Appropriately, construct yuv image and write it to a (.yuv) file.
Kindly help me by pointing me to appropriate resources. I will keep updating my progress on this post. Thanks in advance.
Usually the image is already stored in YUV (or, to be more precise: YCbCr).
When reading the file, the jpeg reader usually converts YUV to RGB. Converting back will reduce quality somewhat.
In libTurboJpeg (http://libjpeg-turbo.virtualgl.org/) you can read the jpeg without color conversion. Check https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/turbojpeg.h -
it has the tjDecompressToYUV function which gives you the 3 colorspaces on 3 different output buffers.
Not sure what you have against opencv, maybe ImageMagick is acceptable to you? It is installed on most Linux distors and is available for OSX, and Windows. It has C bindings, and also a command-line version that I am showing here. So you can create an image like this:
# Create test image
convert -size 100x100 \
\( xc:red xc:lime xc:blue +append \) \
\( xc:cyan xc:magenta xc:yellow +append \) \
-append image.jpg
Now convert to YUV and write to 3 separate files:
convert image.jpg -colorspace yuv -separate bands.jpg
bands-0.jpg (Y)
bands-1.jpg (U)
bands-2.jpg(V)
Or, closer to what you ask, write all three bands YUV into a binary file:
convert image.jpg -colorspace yuv rgb:yuv.bin
Based on https://en.wikipedia.org/wiki/YUV#Y.27UV444_to_RGB888_conversion
Decoding a JPEG, well in pure C without libraries ... the following code is somewhat straightforward ...
https://bitbucket.org/Halicery/firerainbow-progressive-jpeg-decoder/src
Assuming you have the jpeg decoded to rgb using the above or a library (using a library is likely easier).
int width = (width of the image);
int height = (height of the image);
byte *mydata = (pointer to rgb pixels);
byte *cursor;
size_t byte_count = (length of the pixels .... i.e. width x height x 3);
int n;
for (cursor = mydata, n = 0; n < byte_count; cursor += 3, n += 3)
{
int red = cursor[0], green = cursor[1], blue = cursor[2];
int y = 0.299 * red + 0.587 * green + 0.114 * blue;
int u = -0.147 * red + -0.289 * green + 0.436 * blue;
int v = 0.615 * red + -0.515 * green + -0.100 * blue;
cursor[0] = y, cursor[1] = u, cursor[2] = v;
}
// At this point, the entire image has been converted to yuv ...
And write that to file ...
FILE* fout = fopen ("myfile.yuv, "wb");
if (fout) {
fwrite (mydata, 1, byte_count, fout);
fclose (fout);
}
I have the following setup https://sketchfab.com/show/7e2912f5f8794a7b96ef3ac5930e090a (It's a 3d viewer, use your mouse to view all angles)
The box has two nondirectional electret microphones(black dots). On the ground there are some elements falling down like water or similar(symbolized by the sphere) and creating noises. On top, someone is speaking in the box. Distances are roughly accurate, so the mouth is pretty close.
Inside the box there are two different amplifiers(but the same electret microphones) with two different amplification circuits(the mouth-one is louder in general and has some normalization circuitry integrated. Long story short, I can record this into a raw audio file with 44100 Hz, 16Bit and Stereo, while the left channel is the upper, the right channel is the lower microphone amplifier output.
Goal is to - even though the electret microphones are not directed and even though there are different amplifiers - subtract the lower microphone(facing the ground) from the upper microphone(facing the speaker) to have noise cancellation.
I tried(With Datei being the raw-filename). This includes a high or low pass filter and a routine to put the final result back into a raw mono file (%s.neu.raw)
The problem is - well - undefinable distortion. I can hear my voice but it's not bearable at all. If you need a sample I can upload one.
EDIT: New code.
static void *substractf( char *Datei)
{
char ergebnis[80];
sprintf(ergebnis,"%s.neu.raw",Datei);
FILE* ausgabe = fopen(ergebnis, "wb");
FILE* f = fopen(Datei, "rb");
if (f == NULL)
return;
double g = 0.1;
double RC = 1.0/(1215*2*3.14);
double dt = 1.0/44100;
double alpha = dt/(RC+dt);
double noise_gain = 18.0;
double voice_gain = 1.0;
struct {
uint8_t noise_lsb;
int8_t noise_msb;
uint8_t voice_lsb;
int8_t voice_msb;
} sample;
while (fread(&sample, sizeof sample, 1, f) == 1)
{
int16_t noise_source = sample.noise_msb * 256 + sample.noise_lsb;
int16_t voice_source = sample.voice_msb * 256 + sample.voice_lsb;
double signal, difference_voice_noise;
difference_voice_noise = voice_gain*voice_source - noise_gain*noise_source;
signal = (1.0 - alpha)*signal + alpha*difference_voice_noise;
putc((char) ( (signed)signal & 0xff),ausgabe);
putc((char) (((signed)signal >> 8) & 0xff),ausgabe);
}
fclose(f);
fclose(ausgabe);
char output[300];
sprintf(output,"rm -frv \"%s\"",Datei);
system(output);
}
Your code doesn't take differences of path length into consideration.
The path difference d2 – d1 between the sound source and the two mics corresponds to a time delay of (d2 – d1) / v, where v is the speed of sound (330 m/s).
Suppose d2 – d1 is equal to 10 cm. In this case, any sound wave whose frequency is a multiple of 3300 Hz (i.e., whose period is a multiple of (0.10/330) seconds) will be at exactly the same phase at both microphones. This is how you want things to be at all frequencies.
However, a sound wave at an odd multiple of half that frequency (1650 Hz, 4950 Hz, 8250 Hz, etc.) will have changed in phase by 180° by the time it reaches the second mic. As a result, your subtraction operation will actually have the opposite effect — you'll be boosting these frequencies instead of making them quieter.
The end result will be similar to what you get if you push all the alternate sliders on a graphic equaliser in opposite directions. This is probably what you're experiencing now.
Try estimating the length of this path difference and delaying the samples in one channel by a corresponding amount. At a sampling rate of 44100 Hz, one centimetre corresponds to about 0.75 samples. If the sound source is moving around, then things get a bit complicated. You'll have to find a way of estimating the path difference dynamically from the audio signals themselves.
Ideas too big for a comment.
1) Looks like OP is filtering the l signal jetzt = vorher + (alpha*(l - vorher)) and then subtracting the r with dif = r - g*jetzt. It seems to make more sense to subtract l and r first and apply that difference to the filter.
float signal = 0.0; (outside loop)
...
float dif;
// Differential (with gain adjustments)
dif = gain_l*l - gain_r*r;
// Low pass filter (I may have this backwards)
signal = (1.0 - alpha)*signal + alpha*dif;
// I am not certain if diff or signal should be written
// but testing limit would be useful.
if ((dif > 32767) || (dif < -32767)) report();
int16_t sig = dif;
// I see no reason for the following test
// if (dif != 0)
putc((char) ( (unsigned)dif & 0xff),ausgabe);
putc((char) (((unsigned)dif >> 8) & 0xff),ausgabe);
2) The byte splicing may be off. Suggested simplification
// This assumes incoming data is little endian,
// Maybe data is in big endian and _that_ is OP problem?
struct {
uint8_t l_lsb;
int8_t l_msb;
uint8_t r_lsb;
int8_t r_msb;
} sample;
...
while (fread(&sample, sizeof sample, 1, f) == 1) {
int16_t left = sample.l_msb * 256 + sample.l_lsb;
int16_t right = sample.r_msb * 256 + sample.r_lsb;
3) Use of float vs. double. Usually the more limiting float creates computational noise, but the magnitude of OP's complaint suggest that this issue is unlikely the problem. Still worth considering.
4) Endian of the 16-bit samples may be backwards. Further, depending on A/D encoding the samples may be 16-bit unsigned rather than 16-bit signed.
5) The phase of the 2 signals could be 180 out from each other due to wiring and mic pick-up. Is so try diff = gain_l*l + gain_r*r.
I've been battling with an issue when playing certain sources of uncompressed YUV 4:2:0 planar video data with SDL_Overlay (SDL 1.2.5).
I have no problems playing, say, 640x480 video. But I have just attempted playing a video with the resolution 854x480, and I get a strange effect. The line wraps 1-2 pixels too late (causing a shear-like transformation) and the chroma disappears, to be replaced with alternating R, G or B on each line. See this screenshot
The YUV data itself is correct, as I can save it to a file and play it in another player. It is not padded at this point - the pitch matches the line length.
My suspicion is that some issue occurs when the resolution is not a multiple of 4. Perhaps SDL_Surface expects an SDL_Overlay to have a chroma resolution as a multiple of 2?
Adding to my suspicion, I note that the RGB SDL_Surface that I create at a size of 854*480 has a pitch of 2564, not the 3*854 = 2562 I would expect.
If I add 1 or 2 pixels to the width of the SDL_Surface (but keep the overlay and rectangle the same), it works fine, albeit with a black border to the right. Of course this then breaks with videos which are a multiple of four.
Setup
screen = SDL_SetVideoMode(width, height, 24, SDL_SWSURFACE|SDL_ANYFORMAT|SDL_ASYNCBLIT);
if ( screen == NULL ) {
return 0;
}
YUVOverlay = SDL_CreateYUVOverlay(width, height, SDL_IYUV_OVERLAY, screen);
Ydata = new unsigned char[luma_size];
Udata = new unsigned char[chroma_size];
Vdata = new unsigned char[chroma_size];
YUVOverlay->pixels[0] = Ydata;
YUVOverlay->pixels[1] = Udata;
YUVOverlay->pixels[2] = Vdata;
SDL_DisplayYUVOverlay(YUVOverlay, dest);
Rendering loop:
SDL_LockYUVOverlay(YUVOverlay);
memcpy(Ydata, buffer, luma_size);
memcpy(Udata, buffer+luma_size, chroma_size);
memcpy(Vdata, buffer+luma_size+chroma_size, chroma_size);
int i = SDL_DisplayYUVOverlay(YUVOverlay, dest);
SDL_UnlockYUVOverlay(YUVOverlay);
The easiest fix for me to do is increase the RGB SDL_Surface size so that it is a multiple of 4 in each dimension. But then this adds a black border.
Is there a correct way of fixing this issue? Should I try playing with padding on my YUV data?
Each plane of your input data must start on an address divisible by 8, and the stride of each row must be divisible by 8. To be clear: your chroma planes need to obey this too.
This requirement seems to be from the SDL library's use of MMX multimedia instructions on an x86 cpu. See the comments in src/video/SDL_yuv_mmx.c in the distribution.
update: I looked at the actual assembly code, and there are additional assumptions not mentioned in the source code comments. This is for SDL 1.2.14. In addition to the modulo 8 assumption described above, the code assumes that both the input luma and input chroma planes are packed perfectly (i.e. width == stride).
I've recently started using Intel Performance Primitives (IPP) for image processing. For those who haven't heard of IPP, think of IPP as the analogue of MKL for image processing instead of linear algebra.
I've already implemented a somewhat complicated vision system in OpenCV, and I'd like to swap out some of the OpenCV routines (e.g. convolution and FFT) for faster IPP routines. My OpenCV code always uses the cv::Mat image data structure. However, based on the IPP code samples, it seems that IPP prefers the CIppiImage data structure.
My system does several image transformations in OpenCV, then I want to do a couple of things in IPP, then do more work in OpenCV. Here's a naive way to make OpenCV and IPP play nicely together:
cv::Mat = load original image
use OpenCV to do some work on cv::Mat
write cv::Mat to file
CIppiImage = read cv::Mat from file //for IPP
use IPP to do some work on CIppiImage
write CIppiImage to file
cv::Mat = read CIppiImage from file
use OpenCV to do more work on cv::Mat
write final image to file
However, this is kind of tedious, and reading/writing files probably adds to the overall execution time.
I'm trying to make it more seamless to alternate between OpenCV and IPP in an image processing program. Here are a couple of things that could solve the problem:
Is there a one-liner that would convert a cv::Mat to CIppiImage and vice versa?
I am pretty familiar with the cv::Mat implementation details, but I don't know much about CIppiImage. Do cv::Mat and CIppiImage have the same data layout? If so, could I do something similar to the following cast? CIppiImage cimg = (CIppiImage)(&myMat.data[0])?
There's a clean way to pass OpenCV data into an IPP function.
If we have an OpenCV Mat, we can cast *Mat.data[0] to an const Ipp<type>*. For example, if we're dealing with 8-bit unsigned char (8u) data, we can plug (const Ipp8u*)&img.data[0] into an IPP function. Here's an example using the ippiFilter function with the typical Lena image:
Mat img = imread("./Lena.pgm"); //OpenCV 8U_C1 image
Mat outImg = img.clone(); //allocate space for convolution results
int step = img.cols; //pitch
const Ipp32s kernel[9] = {-1, 0, 1, -1, 0, 1, -1, 0, 1};
IppiSize kernelSize = {3,3};
IppiSize dstRoiSize = {img.cols - kernelSize.width + 1, img.rows - kernelSize.height + 1};
IppiPoint anchor = {2,2};
int divisor = 1;
IppStatus status = ippiFilter_8u_C1R((const Ipp8u*)&img.data[0], step,
(Ipp8u*)&outImg.data[0], step, dstRoiSize,
kernel, kernelSize, anchor, divisor);
When I write outImg (from the above code) to a file, it gives the expected result:
This matches the result I got when I ran the Nvidia version, nppiFilter, with the same parameters:
I mentioned a structure called CIppiImage in the original question. CIppiImage just a simple wrapper for an array.