i am trying to make a blur filter in c that takes the neighboring pixels of the main pixel, takes the avarage of the rgb values and stores it in the temp array, them changes the image using the temp array values, it seems correct but it is not working as intended, giving an output of a very slightly blured image. I realy dont see my mistake and would be very thankful if someone helped, sorry if i made something horrible, started learning c last week.
i checked this post
Blurring an Image in c pixel by pixel - special cases
but i did not see were i went wrong.
im working with this data struct
BYTE rgbtBlue;
BYTE rgbtGreen;
BYTE rgbtRed;
void blur(int height, int width, RGBTRIPLE image[height][width])
// ints to use later
int j;
int p;
RGBTRIPLE temp[height][width];
for(int n = 0; n < height; n++) // loop to check every pixel
for(int k = 0; k < width; k++)
int widx = 3;
int hghtx = 3;
// conditionals for border cases
int y = 0;
if(n == 0)
p = 0;
hghtx = 2;
if(n == height - 1)
p = -1;
hghtx = 2;
if(k == 0)
j = 0;
widx = 2;
if(k == width - 1)
j = -1;
widx = 2;
for(int u = 0; u < hghtx; u++) // matrix of pixels around the main pixel using the conditionals gathered before
for(int i = 0; i < widx; i++)
if(y == 1) // takes the average of color and stores it in the RGB temp
temp[n][k].rgbtGreen = temp[n][k].rgbtGreen + image[n + p + u][k + j + i].rgbtGreen / (hghtx * widx);
temp[n][k].rgbtRed = temp[n][k].rgbtRed + image[n + p + u][k + j + i].rgbtRed / (hghtx * widx);
temp[n][k].rgbtBlue = temp[n][k].rgbtBlue + image[n + p + u][k + j + i].rgbtBlue / (hghtx * widx);
else // get first value of temp
temp[n][k].rgbtGreen = (image[n + p + u][k + j + i].rgbtGreen) / (hghtx * widx);
temp[n][k].rgbtRed = (image[n + p + u][k + j + i].rgbtRed) / (hghtx * widx);
temp[n][k].rgbtBlue = (image[n + p + u][k + j + i].rgbtBlue) / (hghtx * widx);
// changes the original image to the blured one
for(int n = 0; n < height; n++)
for(int k = 0; k < width; k++)
image[n][k] = temp[n][k];
I think it's a combination of things.
If the code worked the way you expect, you would be still doing a blur of just 3x3 pixels and that can be hardly noticeable, especially on large images (I'm pretty sure it will be unnoticeable on an image 4000x3000 pixels)
There are some problems with the code.
As #Fe2O3 says, at the end of the first line, widx will change to 2 and stay 2 for the rest of the image.
you are reading from temp[][] without initializing it. I think that if you compile that in release mode (not debug), temp[][] will contain random data and not all zeros as you probably expect. (as #WeatherWane pointed out)
The way you calculate the average of the pixels is weird. If you use a matrix 3x3 pixels, each pixel value shoud be divided by 9 in the final sum. But you divide the first pixel nine times by 2 (in effect doing /256), the second one eight times by 2 (so its pixel/128) etc. until the last one is divided by 2. So basically, it's mostly the value of the bottom right pixel.
also, since your RGB values are just bytes, you may want to divide them first and only then add them, because otherwise, you'll get overflows with wild results.
Try using a debugger to see the values you are actually calculating. It can be quite an eye opener :)
I have a matrix multiplication problem. We have an image matrix which can be have variable size. It is required to calculate C = A*B for every possible nxn. C will be added to output image as seen in figure. The center point of A Matrix is located in the lower triangle. Also, B is placed diagonally symmetric to A. A can be overlap, so, B can be overlap too. Figures can be seen in below for more detailed understand:
Blue X points represent all possible mid points of A. Algorithm should just do multiply A and diagonally mirrored version of A or called B. I done it with lots of for loop. I need to reduce number of for that I used. Could you help me please?
What kind of algorithm can be used for this problem? I have some confusing points.
Could you please help me with your genius algorithm talents? Or could you direct me to an expert?
Original Questions is below:
#define SIZE_ARRAY 20
#define SIZE_WINDOW 5
#define INDEX_OFFSET 1
#define END_OFFSET_ROW 2
uint32_t lowerDiagonalIndexMinRow = GET_LOWER_DIAGONAL_INDEX_MIN_ROW;
uint32_t lowerDiagonalIndexMaxRow = GET_LOWER_DIAGONAL_INDEX_MAX_ROW;
uint32_t lowerDiagonalIndexMinCol = GET_LOWER_DIAGONAL_INDEX_MIN_COL;
uint32_t lowerDiagonalIndexMaxCol = GET_LOWER_DIAGONAL_INDEX_MAX_COL;
void parallelMultiplication_Stable_Master()
startTimeStamp = omp_get_wtime();
#pragma omp parallel for num_threads(8) private(outerIterRow, outerIterCol,rA,cA,rB,cB) shared(inputImage, outputImage)
for(outerIterRow = lowerDiagonalIndexMinRow; outerIterRow < lowerDiagonalIndexMaxRow; outerIterRow++)
for(outerIterCol = lowerDiagonalIndexMinCol; outerIterCol < lowerDiagonalIndexMaxCol; outerIterCol++)
if(outerIterCol + 1 < outerIterRow)
rA = outerIterRow - WINDOW_OFFSET;
cA = outerIterCol - WINDOW_OFFSET;
rB = outerIterCol - WINDOW_OFFSET;
cB = outerIterRow - WINDOW_OFFSET;
for(i= outerIterRow - WINDOW_OFFSET; i <= outerIterRow + WINDOW_OFFSET; i++)
for(j= outerIterCol - WINDOW_OFFSET; j <= outerIterCol + WINDOW_OFFSET; j++)
for(k=0; k < SIZE_WINDOW; k++)
#pragma omp critical
outputImage[i][j] += inputImage[rA][cA+k] * inputImage[rB+k][cB];
printf("Thread Number - %d",omp_get_thread_num());
stopTimeStamp = omp_get_wtime();
printArray(outputImage,"Output Image");
printConsoleNotification(100, startTimeStamp, stopTimeStamp);
I am getting segmentation fault error if I set up thread count more than "1". What is the trick ?
I'm not providing a solution, but some thoughts that may help the OP exploring a possible approach.
You can evaluate each element of the resulting C matrix directly, from the values of the original matrix in a way similar to a convolution operation.
Consider the following image (sorry if it's confusing):
Instead of computing each matrix product for every A submatrix, you can evaluate the value of each Ci, j from the values in the shaded areas.
Note that Ci, j depends only on a small subset of row i and that the elements of the upper right triangular submatrix (where the B submatrices are picked) could be copied and maybe transposed in a more chache-friendly accomodation.
Alternatively, it may be worth exploring an approach where for every possible Bi, j, all the corresponding elements of C are evaluated.
Note that you can actually save a lot of calculations (and maybe cache misses) by grouping the terms, see e.g. the first two elements of row i in A:
More formally
Ci,j = Ai,j-4 · (Bj-4,i + Bj-4,i+1 + Bj-4,i+2 + Bj-4,i+3 + Bj-4,i+4)
Ci,j += Ai,j-3 · (Bj-3,i-1 + Bj-3,i+4 + 2·(Bj-3,i + Bj-3,i+1 + Bj-3,i+2 + Bj-3,i+3))
Ci,j += Ai,j-2 · (Bj-2,i-2 + Bj-2,i+4 + 2·(Bj-2,i-1 + Bj-2,i+3) + 3·(Bj-2,i + Bj-2,i+1 + Bj-2,i+2))
Ci,j += Ai,j-1 · (Bj-1,i-3 + Bj-1,i+4 + 2·(Bj-1,i-2 + Bj-1,i+3) + 3·(Bj-1,i-1 + Bj-1,i+2) + 4·(Bj-1,i + Bj-1,i+1))
Ci,j += Ai,j · (Bj,i-4 + Bj,i+4 + 2·(Bj,i-3 + Bj,i+3) + 3·(Bj,i-2 + Bj,i+2) + 4·(Bj,i-1 + Bj,i+1) + 5·Bj,i)
Ci,j += Ai,j+1 · (Bj+1,i-4 + Bj+1,i+3 + 2·(Bj+1,i-3 + Bj+1,i+2) + 3·(Bj+1,i-2 + Bj+1,i+1) + 4·(Bj+1,i-1 + Bj+1,i))
Ci,j += Ai,j+2 · (Bj+2,i-4 + Bj+2,i+2 + 2·(Bj+2,i-3 + Bj+2,i+1) + 3·(Bj+2,i-2 + Bj+2,i-1 + Bj+2,i))
Ci,j += Ai,j+3 · (Bj+3,i-4 + Bj+3,i+1 + 2·(Bj+3,i-3 + Bj+3,i-2 + Bj+3,i-1 + Bj+3,i))
Ci,j += Ai,j+4 · (Bj+4,i-4 + Bj+4,i-3 + Bj+4,i-2 + Bj+4,i-1 + Bj+4,i)
If I correctly estimated, this requires something like 60 additions and 25 (possibly fused) multiplications, compared to 125 operations like Ci,j += Ai,k · Bk,i spread all over the places.
I think that cache-locality may have a bigger impact on performance than the mere reduction of operations.
We could also precompute all the values
Si,j = Bj,i + Bj,i+1 + Bj,i+2 + Bj,i+3 + Bj,i+4
Then the previous formulas become
Ci,j = Ai,j-4 · Sj-4,i
Ci,j += Ai,j-3 · (Sj-3,i-1 + Sj-3,i)
Ci,j += Ai,j-2 · (Sj-2,i-2 + Sj-2,i-1 + Sj-2,i)
Ci,j += Ai,j-1 · (Sj-1,i-3 + Sj-1,i-2 + Sj-1,i-1 + Sj-1,i)
Ci,j += Ai,j · (Sj,i-4 + Sj,i-3 + Sj,i-2 + Sj,i-1 + Sj,i)
Ci,j += Ai,j+1 · (Sj+1,i-4 + Sj+1,i-3 + Sj+1,i-2 + Sj+1,i-1)
Ci,j += Ai,j+2 · (Sj+2,i-4 + Sj+2,i-3 + Sj+2,i-2)
Ci,j += Ai,j+3 · (Sj+3,i-4 + Sj+3,i-3)
Ci,j += Ai,j+4 · Sj+4,i-4
Here is my take. I wrote this before OP showed any code, so I'm not following any of their code patterns.
I start with a suitable image struct, just for my own sanity.
struct Image
float* values;
int rows, cols;
struct Image image_allocate(int rows, int cols)
struct Image rtrn;
rtrn.rows = rows;
rtrn.cols = cols;
rtrn.values = malloc(sizeof(float) * rows * cols);
return rtrn;
void image_fill(struct Image* img)
ptrdiff_t row, col;
for(row = 0; row < img->rows; ++row)
for(col = 0; col < img->cols; ++col)
img->values[row * img->cols + col] = rand() * (1.f / RAND_MAX);
void image_print(const struct Image* img)
ptrdiff_t row, col;
for(row = 0; row < img->rows; ++row) {
for(col = 0; col < img->cols; ++col)
printf("%.3f ", img->values[row * img->cols + col]);
A 5x5 matrix multiplication is too small to reasonably dispatch to BLAS. So I write a simple version myself that can be loop-unrolled and / or inlined. This routine could use a couple of micro-optimizations but let's keep it simple for now.
/** out += left * right for 5x5 sub-matrices */
static void mat_mul_5x5(
float* restrict out, const float* left, const float* right, int cols)
ptrdiff_t row, col, inner;
float sum;
for(row = 0; row < 5; ++row) {
for(col = 0; col < 5; ++col) {
sum = out[row * cols + col];
for(inner = 0; inner < 5; ++inner)
sum += left[row * cols + inner] * right[inner * cols + col];
out[row * cols + col] = sum;
Now for the single-threaded implementation of the main algorithm. Again, nothing fancy. We just iterate over the lower triangular matrix, excluding the diagonal. I keep track of the top-left corner instead of the center point. Makes index computation a bit simpler.
void compute_ltr(struct Image* restrict out, const struct Image* in)
ptrdiff_t top, left, end;
/* if image is not quadratic, find quadratic subset */
end = out->rows < out->cols ? out->rows : out->cols;
assert(in->rows == out->rows && in->cols == out->cols);
memset(out->values, 0, sizeof(float) * out->rows * out->cols);
for(top = 1; top <= end - 5; ++top)
for(left = 0; left < top; ++left)
mat_mul_5x5(out->values + top * out->cols + left,
in->values + top * in->cols + left,
in->values + left * in->cols + top,
The parallelization is a bit tricky because we have to make sure the threads don't overlap in their output matrices. A critical section, atomics or similar stuff would cost too much performance.
A simpler solution is a strided approach: If we always keep the threads 5 rows apart, they cannot interfere. So we simply compute every fifth row, synchronize all threads, then compute the next set of rows, five apart, and so on.
void compute_ltr_parallel(struct Image* restrict out, const struct Image* in)
/* if image is not quadratic, find quadratic subset */
const ptrdiff_t end = out->rows < out->cols ? out->rows : out->cols;
assert(in->rows == out->rows && in->cols == out->cols);
memset(out->values, 0, sizeof(float) * out->rows * out->cols);
* Keep the parallel section open for multiple loops to reduce
* overhead
# pragma omp parallel
ptrdiff_t top, left, offset;
for(offset = 0; offset < 5; ++offset) {
/* Use dynamic scheduling because the work per row varies */
# pragma omp for schedule(dynamic)
for(top = 1 + offset; top <= end - 5; top += 5)
for(left = 0; left < top; ++left)
mat_mul_5x5(out->values + top * out->cols + left,
in->values + top * in->cols + left,
in->values + left * in->cols + top,
My benchmark with 1000 iterations of a 1000x1000 image show 7 seconds for the serial version and 1.2 seconds for the parallelized version on my 8 core / 16 thread CPU.
EDIT for completeness: Here are the includes and the main for benchmarking.
#include <assert.h>
#include <stddef.h>
/* using ptrdiff_t */
#include <stdlib.h>
/* using malloc */
#include <stdio.h>
/* using printf */
#include <string.h>
/* using memset */
/* Insert code from above here */
int main()
int rows = 1000, cols = 1000, rep = 1000;
struct Image in, out;
in = image_allocate(rows, cols);
out = image_allocate(rows, cols);
# if 1
compute_ltr_parallel(&out, &in);
# else
compute_ltr(&out, &in);
# endif
Compile with gcc -O3 -fopenmp.
Regarding the comment, and also your way of using OpenMP: Don't overcomplicate things with unnecessary directives. OpenMP can figure out how many threads are available itself. And private variables can easily be declared within the parallel section (usually).
If you want a specific number of threads, just call with the appropriate environment variable, e.g. on Linux call OMP_NUM_THREADS=8 ./executable
Here is the code I am using. When I run it, it doesn't seem to change anything in the image except the last 1/4 of it. That part turns to a solid color.
void maxFilter(pixel * data, int w, int h)
GLubyte tempRed;
GLubyte tempGreen;
GLubyte tempBlue;
int i;
int j;
int k;
int pnum = 0;
int pnumWrite = 0;
for(i = 0 ; i < (h - 2); i+=3) {
for(j = 0 ; j < (w - 2); j+=3) {
tempRed = 0;
tempGreen = 0;
tempBlue = 0;
for (k = 0 ; k < 3 ; k++){
if ((data[pnum].r) > tempRed){tempRed = (data[pnum + k].r);}
if ((data[pnum].g) > tempGreen){tempGreen = (data[pnum + k].g);}
if ((data[pnum].b) > tempBlue){tempBlue = (data[pnum + k].b);}
if ((data[(pnum + w)].r) > tempRed){tempRed = (data[(pnum + w)].r);}
if ((data[(pnum + w)].g) > tempGreen){tempGreen = (data[(pnum + w)].g);}
if ((data[(pnum + w)].b) > tempBlue){tempBlue = (data[(pnum + w)].b);}
if ((data[(pnum + 2 * w)].r) > tempRed){tempRed = (data[(pnum + 2 * w)].r);}
if ((data[(pnum + 2 * w)].g) > tempGreen){tempGreen = (data[(pnum + 2 * w)].g);}
if ((data[(pnum + 2 * w)].b) > tempBlue){tempBlue = (data[(pnum + 2 * w)].b);}
pnumWrite = pnum - 3;
for (k = 0 ; k < 3 ; k++){
((data[pnumWrite].r) = tempRed);
((data[pnumWrite].g) = tempGreen);
((data[pnumWrite].b) = tempBlue);
((data[(pnumWrite + w)].r) = tempRed);
((data[(pnumWrite + w)].g) = tempGreen);
((data[(pnumWrite + w)].b) = tempBlue);
((data[(pnumWrite + 2 * w)].r) = tempRed);
((data[(pnumWrite + 2 * w)].g) = tempGreen);
((data[(pnumWrite + 2 * w)].b) = tempBlue);
I can see several problems with that code - being difficult to follow not being the least!
I think your main problem is that the loop is (as you probably intended) run through h/3 * w/3 times, once for each 3x3 block in the image. But the pnum index runs only increases by 3 for each block, and reaches a maximum of about h*w/3, rather than the intended h*w. That means that only the first third of your image will be affected by your filter. (And I suspect your painting is done 'bottom-up', so that's why you see the lowest part change. I remember .bmp files being structured that way, but perhaps there are others as well.)
The 'cheap' fix would be to add 2*w at the right point, but nobody will ever understand that code again. I suggest you rewrite your indexing instead, and explicitly compute pnum from i and j in each turn through the loop. That can be improved on for readability, but is reasonably clear.
There's another minor thing: you have code like
if ((data[pnum].r) > tempRed){tempRed = (data[pnum + k].r);}
where the indexing on the right and on the left differ: this is probably also giving you results different from what you intended.
As Jongware points out, writing to the input array is always dangerous - your code is intended, I believe, to avoid that problem by only looking once into each 3x3 block, but his suggestion of a separate output array is very sensible - you probably don't want the blockiness your code gives anyway (you make each 3x3 block all one colour, don't you?), and his suggestion would let you avoid that.
following this previous question Malloc Memory Corruption in C, now i have another problem.
I have the same code. Now I am trying to multiply the values contained in the arrays A * vc
and store in res. Then A is set to zero and i do a second multiplication with res and vc and i store the values in A. (A and Q are square matrices and mc and vc are N lines two columns matrices or arrays).
Here is my code :
int jacobi_gpu(double A[], double Q[],
double tol, long int dim){
int nrot, p, q, k, tid;
double c, s;
double *mc, *vc, *res;
int i,kc;
double vc1, vc2;
mc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(2 * dim * sizeof(double));
vc = (double *)malloc(dim * dim * sizeof(double));
if( mc == NULL || vc == NULL){
fprintf(stderr, "pb allocation matricre\n");
nrot = 0;
for(k = 0; k < dim - 1; k++){
eye(mc, dim);
eye(vc, dim);
for(tid = 0; tid < floor(dim /2); tid++){
p = (tid + k)%(dim - 1);
if(tid != 0)
q = (dim - tid + k - 1)%(dim - 1);
q = dim - 1;
printf("p = %d | q = %d\n", p, q);
if(fabs(A[p + q*dim]) > tol){
symschur2(A, dim, p, q, &c, &s);
mc[2*tid] = p; vc[2 * tid] = c;
mc[2*tid + 1] = q; vc[2*tid + 1] = -s;
mc[2*tid + 2*(dim - 2*tid) - 2] = p; vc[2*tid + 2*(dim - 2*tid) - 2 ] = s;
mc[2*tid + 2*(dim - 2*tid) - 1] = q; vc[2 * tid + 2*(dim - 2*tid) - 1 ] = c;
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; kc++){
if( kc < floor(dim/2)) {
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
zero(A, dim);
for( i = 0; i< dim; i++){
for(kc=0; kc < dim; k++){
if( k < floor(dim/2)){
vc1 = vc[2*kc + i*dim];
vc2 = vc[2*kc + 2*(dim - 2*kc) - 2];
}else {
vc1 = vc[2*kc+1 + i*dim];
vc2 = vc[2*kc - 2*(dim - 2*kc) - 1];
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
affiche(mc,dim,2,"Matrice creuse");
affiche(vc,dim,2,"Valeur creuse");
return nrot;
When i try to compile, i have this error :
jacobi_gpu.c: In function ‘jacobi_gpu’:
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:103: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
jacobi_gpu.c:118: error: array subscript is not an integer
make: *** [jacobi_gpu.o] Erreur 1
The corresponding lines are where I store the results in res and A :
res[kc + i*dim] = A[mc[2*kc] + i*dim]*vc1 + A[mc[2*kc + 1] + i*dim]*vc2;
A[kc + i*dim] = res[mc[2*kc] + i*dim]*vc1 + res[mc[2*kc + 1] + i*dim]*vc2;
Can someone explain me what is this error and how can i correct it?
Thanks for your help. ;)
mc is of type double. It has to be integral type
mc is pointer to double.
A[mc[2*kc + 1]
In above, you are indexing A with a value in mc (double array). And, there are other similar cases. If you are sure of the values, cast to int
Your declaration of mc:
mc = (double *)malloc(2 * dim * sizeof(double));
And then you use mc multiple times in your array access. For example:
A[mc[2*kc + 1] ...]
Can you change mc to be an int array instead of a double?
Looks like you're using entries in mc, which are doubles, as a part of array subscripts, thus making the entire subscript a double.
If you meant to do this, try casting back to an integer. I don't know what the context of this problem is, but I'd take a real good look at what you're doing to ensure you really want to use the contents of mc as a subscript.
The compiler is complaining because the expression you use as an array index evaluates to type double.
In other words, the expression:
mc[2*kc] + i*dim
...will give you a result which is of type double. You may want to look into the rules for usual arithmetic type conversions in C if you don't understand why this expression evaluates to a double.
The problem is that array indices must be integral types, like int or long. This is because the array subscript operator in C is basically shorthand for pointer arithmetic. In other words, saying array[N] is the same as saying *(array + N). But you can't do pointer arithmetic with non-integral types like float or double, so of course the array subscript operator won't work that way either.
To fix this, you'll need to cast the result of your array-indexing expression to an integral type.
mc is an array of doubles, and floating point values cannot be used to index arrays. I notice that nowhere in your code do you assign anything other than integers to mc. You should consider changing mc's type to an array of integers.