simple blur of RGB raw image - c

I'm trying to make a simple blur effect in c. I have an image loaded in an 512*512 array of RGB pixels and I'm using 3x3 kernel to blur that image
here is the kernel
float matrix[9] = {1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f, 1.0f/9.0f};
and here is the code that is doing the bluring
void useBlur(){
for(int i = 0; i < ARRAY_SIZE; i++){
float r = 0;
float g = 0;
float b = 0;
int m, n;
for(int y = -1, m = 0; y <= 1; y++, m++){
for(int z = -1, n = 0; z <= 1; z++, n++){
r += (float)orig_image[i + 512 * y + z].r * (float)matrix[m*3+n];
g += (float)orig_image[i + 512 * y + z].g * (float)matrix[m*3+n];
b += (float)orig_image[i + 512 * y + z].b * (float)matrix[m*3+n];
}
}
image[i].r = r;
image[i].g = g;
image[i].b = b;
}
}
I'm not sure what is wrong with that code, but it is producing result:
Any ideas why the colors are wrong? And how to fix it?
EDIT:
fixed matrix[7] from 9.0/9.0 to 1.0/9.0 and uploaded new image

I tried your code with some changes, the images I have are BGRA and I am using opencv for my image container but the structure of the code is the same and it works perfectly.
for (int i = 2*image.cols; i < image.rows*image.cols-2*image.cols; i++)
{
float r = 0;
float g = 0;
float b = 0;
for (int y = -1; y <=1; y++)
{
for (int z = -1; z <=1; z++)
{
b += (float)image.at<cv::Vec4b>(i + image.cols*y+z)(0)*(1.0/9);
g += (float)image.at<cv::Vec4b>(i + image.cols*y+z)(1)*(1.0/9);
r += (float)image.at<cv::Vec4b>(i + image.cols*y+z)(2)*(1.0/9);
}
}
image.at<cv::Vec4b>(i)(0) = b;
image.at<cv::Vec4b>(i)(1) = g;
image.at<cv::Vec4b>(i)(2) = r;
}
I would assume that means the problem is either not in the code you posted, or somehow the opencv cv::Mat class is handling something your image container is not. The obvious candidate for that would be that you seem to be implicitly converting from float to uchar at the end of your code. Could you test running this loop over your image but without modifying it? It could look like this, or any number of other ways. This is ugly, but a quick conversion.
void useBlur(){
for(int i = 0; i < ARRAY_SIZE; i++){
float r = 0;
float g = 0;
float b = 0;
int m, n;
for(int y = -1, m = 0; y <= 1; y++, m++){
for(int z = -1, n = 0; z <= 1; z++, n++){
if(y == 0 && z == 0)
{
r += (float)orig_image[i + 512 * y + z].r * (float)matrix[m*3+n]*9;
g += (float)orig_image[i + 512 * y + z].g * (float)matrix[m*3+n]*9;
b += (float)orig_image[i + 512 * y + z].b * (float)matrix[m*3+n]*9;
}
}
}
image[i].r = r;
image[i].g = g;
image[i].b = b;
}
}
Theoretically nothing should change about the image when you do that, so if the image is changing it is because of some conversion error. I admit the theory is unlikely but the structure of your code seems sound so I don't know what else to suggest.

Related

Resizing an image stored as a 'strided' array: can I make this bilinear interpolation faster?

I have a piece of C code that is part of a public repository (Darknet) which is supposed to resize an image using bilinear interpolation. Because of the way the rest of the code deals with images, the image is stored as a one-dimensional array where the pixel values from the original 3 channel image are read in strides. The value corresponding to pixel (x, y, k) (x: column, y: row, k: channel) is thus stored in the location x + w.h + w.h.c in the 1D array.
The resize function that is actually part of Darknet is taking a considerable amount of time in the pre-processing stage, possibly because of its nested for loops that iterate over the rows and columns and attempt to access corresponding values, as well as possibly the type conversions: hence I am trying to create a more optimized version of it. The original code for resizing is as follows. im is the original image, thus im.w and im.h are the original width and height. w and h are the target width and height.
image resize_image(image im, int w, int h)
{
image resized = make_image(w, h, im.c);
image part = make_image(w, im.h, im.c);
int r, c, k;
float w_scale = (float)(im.w - 1) / (w - 1);
float h_scale = (float)(im.h - 1) / (h - 1);
for(k = 0; k < im.c; ++k){
for(r = 0; r < im.h; ++r){
for(c = 0; c < w; ++c){
float val = 0;
if(c == w-1 || im.w == 1){
val = get_pixel(im, im.w-1, r, k);
} else {
float sx = c*w_scale;
int ix = (int) sx;
float dx = sx - ix;
val = (1 - dx) * get_pixel(im, ix, r, k) + dx * get_pixel(im, ix+1, r, k);
}
set_pixel(part, c, r, k, val);
}
}
}
for(k = 0; k < im.c; ++k){
for(r = 0; r < h; ++r){
float sy = r*h_scale;
int iy = (int) sy;
float dy = sy - iy;
for(c = 0; c < w; ++c){
float val = (1-dy) * get_pixel(part, c, iy, k);
set_pixel(resized, c, r, k, val);
}
if(r == h-1 || im.h == 1) continue;
for(c = 0; c < w; ++c){
float val = dy * get_pixel(part, c, iy+1, k);
add_pixel(resized, c, r, k, val);
}
}
}
free_image(part);
return resized;
}
Is there a way to make this function faster: for instance, by creating a more optimized way to access the pixels instead of this strided read? Also, I note here that in my case:
The dimensions of the source and resized images will be fixed, so my 'custom' resize function does not have to be size-independent. I am going from 640x360 to the dimensions 626x352.
The target platform is an NVIDIA Jetson with an ARM CPU, so instructions like AVX2 are not applicable in my case. But I do have access to CUDA.
I have to make a note here that because of the requirements of my project, this resize function is actually part of a library (.so) that's being called from Python. So I cannot keep anything "in memory" per se, such as CUDA texture objects etc., so creating them again and again might actually create more overhead on the CUDA side.
Any suggestions in improving this routine would be very helpful.
[As Stargateur mentioned] get_pixel et. al are wasteful. Most pixel accesses can be handled with a pointer. This is a pretty standard thing to do when processing an image where speed is required.
Most accesses creep along the x dimension, so we can just increment pointers.
From get_pixel, create this function:
static float *loc_pixel(image m, int x, int y, int c)
{
return &m.data[(c * m.h * m.w) + (y * m.w) + x];
}
The if in resize_image can be moved out of the first inner for loop by some restructuring.
In all for loops, we can remove all *_pixel functions from the inner loop by using loc_pixel and pointers.
Here's a refactored version that uses only pointers that should be faster. Note that I've coded this but neither tested nor compiled it. I think it's pretty close, but you should double check to be sure.
One thing you could add that I didn't do is have loc_pixel take a pointer to the image (i.e. image *m) instead of passing the entire struct.
Also, you could experiment with replacing src[0] with src[c] and *dst with dst[c]. This would eliminate some ++src and ++dst and might be faster. It might also allow the compiler to understand the loops better so it could use any arm vector instructions and might make it more amenable to CUDA. YMMV.
image
resize_image(image im, int w, int h)
{
image resized = make_image(w, h, im.c);
image part = make_image(w, im.h, im.c);
int r,
c,
k;
float w_scale = (float) (im.w - 1) / (w - 1);
float h_scale = (float) (im.h - 1) / (h - 1);
int wm1 = w - 1;
float val;
float marg;
float *src;
float *dst;
for (k = 0; k < im.c; ++k) {
for (r = 0; r < im.h; ++r) {
src = loc_pixel(im, 0, r, k);
dst = loc_pixel(part, 0, r, k);
marg = get_pixel(im, im.w - 1, r, k);
if (im.w == 1) {
for (c = 0; c < w; ++c, ++dst)
*dst = marg;
continue;
}
for (c = 0; c < wm1; ++c, ++src, ++dst) {
float sx = c * w_scale;
int ix = (int) sx;
float dx = sx - ix;
val = (1 - dx) * src[0] + dx * src[1];
*dst = val;
}
// handle c == w - 1 case
*dst = marg;
}
}
for (k = 0; k < im.c; ++k) {
for (r = 0; r < h; ++r) {
float sy = r * h_scale;
int iy = (int) sy;
float dy = sy - iy;
src = loc_pixel(part, 0, iy, k);
dst = loc_pixel(resized, 0, r, k);
for (c = 0; c < w; ++c, ++src, ++dst) {
val = (1 - dy) * src[0];
*dst = val;
}
if (r == h - 1 || im.h == 1)
continue;
src = loc_pixel(part, 0, iy + 1, k);
dst = loc_pixel(resized, 0, r, k, val);
for (c = 0; c < w; ++c, ++src, ++dst) {
val = dy * src[0];
*dst += val;
}
}
}
free_image(part);
return resized;
}

15x15 gaussian mask in c

I came across an 5x5 gaussian mask code in c, when I googling..
but I need 15x15 gaussian mask so I was just modified like this.
void createFilter(double gKernel[][15])
{
// set standard deviation to 1.0
double sigma = 7.0;
double r, s = 2.0 * sigma * sigma;
// sum is for normalization
double sum = 0.0;
// generate 5x5 kernel
//for (int x = -2; x <= 2; x++)
for (int x = -7; x <= 7; x++)
{
//for (int y = -2; y <= 2; y++)
for (int y = -7; y <= 7; y++)
{
r = sqrt(x*x + y*y);
gKernel[x + 7][y + 7] = (exp(-(r*r) / s)) / (M_PI * s);
sum += gKernel[x + 7][y + 7];
}
}
// normalize the Kernel
for (int i = 0; i < 15; ++i)
for (int j = 0; j < 15; ++j)
gKernel[i][j] /= sum;
}
but the output is different between the correct number of 15x15 gaussian mask.
What am I supposed to do to get the correct 15x15 mask coefficients?

how to produce -400,-200,-400 in sequential order

I am trying to write a for loop in the second version which produce the same result in the original code but i am not sure how to get -400,-200,-400 in sequential order.
original code:
p->m_p[0] = randFloat(-400.0f, 400.0);
p->m_p[1] = randFloat(-200.0f, 200.0);
p->m_p[2] = randFloat(-400.0f, 400.0);
second version:
float x = -800;
float y = 800;
for(int i = 0; i < 4; i++)
{
plNew->m_fPosition[i] = randFloat(x / 2,y / 2);
}
If you need it to work in C (or in C++ before C++11), this would work:
#define NUMBER_OF_VALUES 3
float bounds[NUMBER_OF_VALUES] = { 400.0f, 200.0f, 400.0f };
for (int i = 0; i < NUMBER_OF_VALUES; i++)
{
plNew->m_fPosition[i] = randFloat(-bounds[i], bounds[i]);
}
You can extend this to make NUMBER_OF_VALUES be 4 or a larger number as long as you initialize all the members of bounds[NUMBER_OF_VALUES] with the desired constants.
A nice feature of this is that the sequence of constants can be anything you like,
not limited to alternating 400, 200, 400 or any other regular sequence.
Something like this?
for (int i = 0; i < 4; i++) {
float x, y;
if (i % 2) {
x = -400.0f;
y = 400.0f;
} else {
x = -200.0f;
y = 200.0f;
}
p->m_p[i] = randFloat(x, y);
}
What about?
float x = -800;
float y = 800;
for(int i = 0; i < 4; i++)
{
float z = 2.0 * ((float)((i + 1) % 2 + 1));
plNew->m_fPosition[i] = randFloat(x / z, y / z);
}
I would suggest keeping it simple, using something like the following :
const float arr3[] = {-400.0f, -200.0f, -400.0f};
for(int i = 0; i < 3; i++)
{
plNew->m_fPosition[i] = arr3[i];
}
float x = -400;
float y = 400;
for(int i = 0; i < 3; i++)
{
plNew->m_fPosition[i] = randFloat(x / (1 + (i & 1)), y / (1 + (i & 1)));
}

YUV420 to RGB color conversion Error

I am converting an image in YUV420 format to RGB image in opencv but im getting an Orange colored image after conversion. I used following code to do that. Is there any problem in my code ??
int step = origImage->widthStep;
uchar *data = (uchar *)origImage->imageData;
int size = origImage->width * origImage->height;
IplImage* img1 = cvCreateImage(cvGetSize(origImage), IPL_DEPTH_8U, 3);
for (int i = 0; i<origImage->height; i++)
{
for (int j=0; j<origImage->width; j++)
{
float Y = data[i*step + j];
float U = data[ (int)(size + (i/2)*(step/2) + j/2) ];
float V = data[ (int)(size*1.25 + (i/2)*(step/2) + j/2)];
float R = Y + (int)(1.772f*V);
float G = Y - (int)(0.344f*V + 0.714f*U);
float B = Y + (int)(1.402f*U);
if (R < 0){ R = 0; } if (G < 0){ G = 0; } if (B < 0){ B = 0; }
if (R > 255 ){ R = 255; } if (G > 255) { G = 255; } if (B > 255) { B = 255; }
cvSet2D(img1, i, j,cvScalar(B,G,R));
}
}
origImage -> YUV image,
img1 -> RGB image,
http://upload.wikimedia.org/wikipedia/en/0/0d/Yuv420.svg
Is there any opencv function which can convert a pixel in YUV420 format to corresponding RGB pixel ? (not entire image)
I got answer by modifying the formula for calculating R G B values,
This code is working fine
int step = origImage->widthStep;
uchar *data = (uchar *)origImage->imageData;
int size = origImage->width * origImage->height;
IplImage* img1 = cvCreateImage(cvGetSize(origImage), IPL_DEPTH_8U, 3);
for (int i = 0; i<origImage->height; i++)
{
for (int j=0; j<origImage->width; j++)
{
float Y = data[i*step + j];
float U = data[ (int)(size + (i/2)*(step/2) + j/2) ];
float V = data[ (int)(size*1.25 + (i/2)*(step/2) + j/2)];
float R = Y + 1.402 * (V - 128);
float G = Y - 0.344 * (U - 128) - 0.714 * (V - 128);
float B = Y + 1.772 * (U - 128);
if (R < 0){ R = 0; } if (G < 0){ G = 0; } if (B < 0){ B = 0; }
if (R > 255 ){ R = 255; } if (G > 255) { G = 255; } if (B > 255) { B = 255; }
cvSet2D(img1, i, j,cvScalar(B,G,R));
}
}
the 1st problem is using the outdated c-api (it's dead & gone. please use c++ instead).
the 2nd problem is writing your own (slow and error prone) pixel loops
why not use :
cvtColor(crs,dst, CV_YUV2BGR); // or CV_YUV2BGR_I420
instead ?

enlarge BMP uncompressed

im working on this assignment and for some reason its not copying all the rows. It skips certain lines of the bmp so it does not enlarge the picture entirely.
I would greatly appreciate some feedback as to why its doing this.
I know it's got to be related with the pointer arithmetic.
int enlarge(PIXEL* original, int rows, int cols, int scale,
PIXEL** new, int* newrows, int* newcols)
{
*newcols = cols * scale;
*newrows = rows * scale;
/* Allocate memory for enlarged bmp */
*new = (PIXEL*)malloc( (*newrows)*(*newcols) * sizeof(PIXEL));
if(!*new)
{
free(*new);
fprintf(stderr, "Could not allocate memory.\n");
return 1;
}
int i,j,k,l;
int index = 0;
int counter = scale*rows;
PIXEL* o;
PIXEL* n;
for(i = 0; i < rows; i++)
{
for(j = 0; j < cols; j++)
{
for(k = 0; k < scale; k++)
{
o = original + (i*cols) + j;
index++;
for(l = 0; l < scale; l++)
{
n = (*new) + ( i*cols*scale ) + index + (counter*l);
*n = *o;
}
}
}
}
return 0;
}
You're copying a square block of pixels into your integer-scaled image. The outer two loops and your computation of the original pixel address are fine.
Now, looking at the inner two loops, you have this odd thing going on with counter and index which don't quite make sense. Let's rewrite that. All you need to do is copy a group of pixels for each row.
o = original + i*cols + j;
n = *new + (i*cols + j) * scale;
for(sy = 0; sy < scale; sy++)
{
for(sx = 0; sx < scale; sx++)
{
n[sy*cols*scale + sx] = *o;
}
}
I'm really not a fan of variables like i, j, k, and l. It's lazy and imparts no meaning. It's hard to see if k is a row or a column index. So I used sx and sy to mean "the scaled x- and y-coordinates" (I would recommend using x and y instead of j and i but I left them as is).
Now, here's a better way. Copy the pixels in scan lines, rather than jumping around writing a single scaled block. Here, you scale a single row, and do that multiple times. You can replace your four loops with this:
int srows = rows * scale;
int scols = cols * scale;
for( sy = 0; sy < srows; sy++ )
{
y = sy / scale;
o = original + y * cols;
n = *new + sy * scols;
for( x = 0; x < cols; x++ ) {
for( i = 0; i < scale; i++ ) {
*n++ = *o;
}
*o++;
}
}

Resources