I want to implement a convolution function to use in mean filter and gaussian filter and I need to implement those 2 filters as well to apply to pgm files.
I have
typedef struct _PGM{
int row;
int col;
int max_value;
int **matrix;
}PGM;
struct and
int convolution(int ** kernel,int ksize, PGM * image, PGM * output){
int i, j, x, y;
int sum;
int data;
int scale =ksize*ksize;
int coeff;
for (x=ksize/2; x<image->row-ksize/2;++x) {
for (y=ksize/2; y<image->col-ksize/2; ++y){
sum = 0;
for (i=-ksize/2; i<=ksize/2; ++i){
for (j=-ksize/2; j<=ksize/2; ++j){
data = image->matrix[x +i][y +j];
coeff = kernel[i+ksize/2][j+ksize/2];
sum += data * coeff;
}
}
output->matrix[x][y] = sum / scale;
}
}
return sum/scale;
}
convolution function but I get error(actually it terminates) in convolution function so I could not proceed to filter
Can you help me with the implementation ?
Thank you.
In your convolution there are two things wrong that probably aren't causing the crash. The first is style: You're using x to iterate over the rows of an image, something I picture more as a y displacement, and vice-versa. The second is that when you're computing the sum, you're not resetting the variable sum = 0 prior to evaluating the kernel (the inner two loops) for each pixel. Instead you accumulate sum over all pixels, probably eventually causing integer overflow. While strictly speaking this is UB and could cause a crash, it's not the issue you're facing.
If you would kindly confirm that the crash occurs on the first pixel (x = ksize/2, y = ksize/2), then since the crash occurs at the first coefficient read from the kernel, I suspect you may have passed the "wrong thing" as the kernel. As presented, the kernel is an int**. For a kernel size of 3x3, this means that to call this function correctly, you must have allocated on the heap or stack an array of int*, where you stored 3 pointers to int arrays with 3 coefficients each. If you instead passed a int[3][3] array, the convolution function will attempt to interpret the first one or two int in the array as a pointer to an int when it is not, and try to dereference it to pull in the coefficient. This will most likely cause a segfault.
I also don't know why you are returning the accumulated sum. This isn't a "traditional" output of convolution, but I surmise you were interested in the average brightness of the output image, which is legitimate; In this case you should use a separate and wider integer accumulator (long or long long) and, at the end, divide it by the number of pixels in the output.
You probably found the PGM data structure from the internet, say, here. Allow me to part with this best-practice advice. In my field (computer vision), the computer vision library of choice, OpenCV, does not express a matrix as an array of row pointers to buffers of col elements. Instead, a large slab of memory is allocated, in this case of size image->row * image->col * sizeof(int) at a minimum, but often image->row * image->step * sizeof(int) where image->step is image->col rounded up to the next multiple of 4 or 16. Then, only a single pointer is kept, a pointer to the base of the entire image, although an extra field (the step) has to be kept if images aren't continuous.
I would therefore rework your code thus:
/* Includes */
#include <stdlib.h>
/* Defines */
#define min(a, b) (((a) < (b)) ? (a) : (b))
#define max(a, b) (((a) > (b)) ? (a) : (b))
/* Structure */
/**
* Mat structure.
*
* Stores the number of rows and columns in the matrix, the step size
* (number of elements to jump from one row to the next; must be larger than or
* equal to the number of columns), and a pointer to the first element.
*/
typedef struct Mat{
int rows;
int cols;
int step;
int* data;
} Mat;
/* Functions */
/**
* Allocation. Allocates a matrix big enough to hold rows * cols elements.
*
* If a custom step size is wanted, it can be given. Otherwise, an invalid one
* can be given (such as 0 or -1), and the step size will be chosen
* automatically.
*
* If a pointer to existing data is provided, don't bother allocating fresh
* memory. However, in that case, rows, cols and step must all be provided and
* must be correct.
*
* #param [in] rows The number of rows of the new Mat.
* #param [in] cols The number of columns of the new Mat.
* #param [in] step The step size of the new Mat. For newly-allocated
* images (existingData == NULL), can be <= 0, in
* which case a default step size is chosen; For
* pre-existing data (existingData != NULL), must be
* provided.
* #param [in] existingData A pointer to existing data. If NULL, a fresh buffer
* is allocated; Otherwise the given data is used as
* the base pointer.
* #return An allocated Mat structure.
*/
Mat allocMat(int rows, int cols, int step, int* existingData){
Mat M;
M.rows = max(rows, 0);
M.cols = max(cols, 0);
M.step = max(step, M.cols);
if(rows <= 0 || cols <= 0){
M.data = 0;
}else if(existingData == 0){
M.data = malloc(M.rows * M.step * sizeof(*M.data));
}else{
M.data = existingData;
}
return M;
}
/**
* Convolution. Convolves input by the given kernel (centered) and stores
* to output. Does not handle boundaries (i.e., in locations near the border,
* leaves output unchanged).
*
* #param [in] input The input image.
* #param [in] kern The kernel. Both width and height must be odd.
* #param [out] output The output image.
* #return Average brightness of output.
*
* Note: None of the image buffers may overlap with each other.
*/
int convolution(const Mat* input, const Mat* kern, Mat* output){
int i, j, x, y;
int coeff, data;
int sum;
int avg;
long long acc = 0;
/* Short forms of the image dimensions */
const int iw = input ->cols, ih = input ->rows, is = input ->step;
const int kw = kern ->cols, kh = kern ->rows, ks = kern ->step;
const int ow = output->cols, oh = output->rows, os = output->step;
/* Kernel half-sizes and number of elements */
const int kw2 = kw/2, kh2 = kh/2;
const int kelem = kw*kh;
/* Left, right, top and bottom limits */
const int l = kw2,
r = max(min(iw-kw2, ow-kw2), l),
t = kh2,
b = max(min(ih-kh2, oh-kh2), t);
/* Total number of pixels */
const int totalPixels = (r-l)*(b-t);
/* Input, kernel and output base pointers */
const int* iPtr = input ->data;
const int* kPtr = kern ->data + kw2 + ks*kh2;
int* oPtr = output->data;
/* Iterate over pixels of image */
for(y=t; y<b; y++){
for(x=l; x<r; x++){
sum = 0;
/* Iterate over elements of kernel */
for(i=-kh2; i<=kh2; i++){
for(j=-kw2; j<=kw2; j++){
data = iPtr[j + is*i + x];
coeff = kPtr[j + ks*i ];
sum += data * coeff;
}
}
/* Compute average. Add to accumulator and store as output. */
avg = sum / kelem;
acc += avg;
oPtr[x] = avg;
}
/* Bump pointers by one row step. */
iPtr += is;
oPtr += os;
}
/* Compute average brightness over entire output */
if(totalPixels == 0){
avg = 0;
}else{
avg = acc/totalPixels;
}
/* Return average brightness */
return avg;
}
/**
* Main
*/
int main(int argc, char* argv[]){
/**
* Coefficients of K. Binomial 3x3, separable. Unnormalized (weight = 16).
* Step = 3.
*/
int Kcoeff[3][3] = {{1, 2, 1}, {2, 4, 2}, {1, 2, 1}};
Mat I = allocMat(1920, 1080, 0, 0);/* FullHD 1080p: 1920x1080 */
Mat O = allocMat(1920, 1080, 0, 0);/* FullHD 1080p: 1920x1080 */
Mat K = allocMat( 3, 3, 3, &Kcoeff[0][0]);
/* Fill Mat I with something.... */
/* Convolve with K... */
int avg = convolution(&I, &K, &O);
/* Do something with O... */
/* Return */
return 0;
}
Reference: Years of experience in computer vision.
Related
I have to write a function that receives an integer X (the value of the linearized position) and an array that contains the dimensions of a multidimensional array and it has to save in a second array the coordinates of the element with position X in the multidimensional reference system. For example:
X=2 and array[]={A,B} where the array contains the dimensions(A,B) of a 2D matrix in this example. So the position in 2D reference system is:
Knowing: X=x*B+y ----> x=X/B and y=X%B ----> position[]={x,y};
So it was simple to decipher X into x and y because it was the banal case of a 2D matrix but my program has to deal with N-dimensional matrix (So it has to decipher X into position x,y,......,n) .
My idea is to apply the algorithm that I've showed but even I can't find a C code that can deal with a generic N-dimensional matrix (I also tried to write a recursive function without success).
Can someone find a solution to this problem? (Thank you in advance!!!)
I'm a beginner!!!
if you have an array DIM2[X,Y] with dimenstions Xn and Yn, you could represent this (as you said) as a one dimenstional array too.
A[x,y] would then be mapped to DIM1[x + y * Xn]
DIM1 must have size (Xn * Yn)
3 dimension array B[] with dimensions Xn,Yn,Zn could be mapped the same way:
B[x,y,z] would map to DIM1 [ x + y * Xn + z * Xn * Yn], DIM1 must be able to hold (Xn * Yn * Zn) items,
B[x,y,z,a] would map to DIM1 [ x + y * Xn + z * Xn * Yn + a * Xn * Yn *
Zn]
and so on
for a generic N dimensional array, a recursion would be best, where an array with 100 dimensions is array of 99-dimensional arrays. If all dimenstions have the same size, that would be relative simple (writing it, i also mentioned that the recursion can be easily unrolled into a simple for-loop, find it below)
#include <stdio.h>
#include <math.h>
#include <malloc.h>
#define max_depth 5 /* 5 dimensions */
#define size 10 /* array[10] of array */
// recursive part, do not use this one
int _getValue( int *base, int offset, int current, int *coords) {
if (--current)
return _getValue (base + *coords*offset, offset/size, current, coords+1);
return base[*coords];
}
// recursive part, do not use this one
void _setValue( int *base, int offset, int current, int *coords, int newVal) {
if (--current)
_setValue (base + *coords*offset, offset/size, current, coords+1, newVal);
base[*coords]=newVal;
}
// getValue: read item
int getValue( int *base, int *coords) {
int offset=pow( size, max_depth-1); /* amount of ints to skip for first dimension */
return (_getValue (base, offset, max_depth, coords));
}
// setValue: set an item
void setValue( int *base, int *coords, int newVal) {
int offset=pow( size, max_depth-1);
_setValue (base, offset, max_depth, coords, newVal);
}
int main() {
int items_needed = pow( size, max_depth);
printf ("allocating room for %i items\n", items_needed);
int *dataholder = (int *) malloc(items_needed*sizeof(int));
if (!dataholder) {
fprintf (stderr,"out of memory\n");
return 1;
}
int coords1[5] = { 3,1,2,1,1 }; // access member [3,1,2,1,1]
setValue(dataholder, coords1, 4711);
int coords2[5] = { 3,1,0,4,2 };
int x = getValue(dataholder, coords2);
int coords3[5] = { 9,7,5,3,9 };
/* or: access without recursion: */
int i, posX = 0; // position of the wanted integer
int skip = pow( size, max_depth-1); // amount of integers to be skipped for "pick"ing array
for (i=0;i<max_depth; i++) {
posX += coords3[i] * skip; // use array according to current coordinate
skip /= size; // calculate next dimension's size
}
x = dataholder[posX];
return x;
}
If I have a GSL matrix that has already had its memory allocated is there a simple way to reallocate that memory to, e.g., add another row?
The two ways I can think of to do it are:
size_t n = 2;
gsl_matrix invV = gsl_matrix_alloc(n, n);
// do something with matrix
...
// try and add another row (of length n) by reallocating data in the structure
invV->data = realloc(invV->data, sizeof(double)*(n*n + n));
invV->size1++;
or (using matrix views):
size_t n = 2;
double *invV = malloc(sizeof(double)*n*n)
gsl_matrix_view invVview = gsl_matrix_view_array(invV, n, n);
// do something with &invVview.matrix
...
// try adding another row or length n
invV = realloc(invV, invV->data, sizeof(double)*(n*n + n));
invView = gsl_matrix_view_array(invV, n+1, n);
I don't know if there are issue with the first method due to not changing the tda and block values in the gsl_matrix structure. Does anyone know if this would be a problem?
The second method works fine, but it's a pain having to switch back and forth between the double array and the matrix view.
Other suggestions are welcomed.
Update:
I have a simple test code using a version of my first option (called, e.g., testgsl.c):
#include <stdio.h>
#include <stdlib.h>
#include <gsl/gsl_matrix.h>
gsl_matrix *matrix_add_row( gsl_matrix *m);
gsl_matrix *matrix_add_row( gsl_matrix *m ){
if ( !m ){
fprintf(stderr, "gsl_matrix must have already been initialised before adding new rows" );
return NULL;
}
size_t n = m->tda; /* current number of columns in matrix */
/* reallocate the memory of the block */
m->block->data = (double *)realloc(m->block->data, sizeof(double)*(m->block->size + n));
if( !m->block->data ){
fprintf(stderr, "Could not reallocate memory for gsl_matrix!");
exit(1);
}
m->block->size += n; /* update block size (number of elements) */
m->size1++; /* update number of rows */
m->data = m->block->data; /* point data to block->data */
return m;
}
int main( int argc, char **argv){
size_t nrows = 4;
size_t ncols = 1000;
gsl_matrix *invV = gsl_matrix_alloc(nrows, ncols);
//gsl_matrix *testmatrix = gsl_matrix_alloc(1000, 4000);
/* set to zeros */
gsl_matrix_set_zero( invV );
/* try adding a row */
invV = matrix_add_row( invV );
fprintf(stderr, "nrows = %zu, ncols = %zu\n", invV->size1, invV->size2);
/* set some values */
gsl_matrix_set_zero( invV );
gsl_matrix_set( invV, 4, 0, 2.3 );
gsl_matrix_set( invV, 4, 1, 1.2 );
gsl_matrix_free( invV );
//gsl_matrix_free( testmatrix );
return 0;
}
This seems to work fine (although I think there are some underlying memory allocation issues that might arise).
Brian Gough, one of the authors of the gsl_matrix code, seems to suggest that my second option - reallocing arrays and just using vector/matrix views when necessary if the way to go.
I'm working on a C implementation for Conway's game of life, I have been asked to use the following header:
#ifndef game_of_life_h
#define game_of_life_h
#include <stdio.h>
#include <stdlib.h>
// a structure containing a square board for the game and its size
typedef struct gol{
int **board;
size_t size;
} gol;
// dynamically creates a struct gol of size 20 and returns a pointer to it
gol* create_default_gol();
// creates dynamically a struct gol of a specified size and returns a pointer to it.
gol* create_gol(size_t size);
// destroy gol structures
void destroy_gol(gol* g);
// the board of 'g' is set to 'b'. You do not need to check if 'b' has a proper size and values
void set_pattern(gol* g, int** b);
// using rules of the game of life, the function sets next pattern to the g->board
void next_pattern(gol* g);
/* returns sum of all the neighbours of the cell g->board[i][j]. The function is an auxiliary
function and should be used in the following function. */
int neighbour_sum(gol* g, int i, int j);
// prints the current pattern of the g-board on the screen
void print(gol* g);
#endif
I have added the comments to help out with an explanation of what each bit is.
gol.board is a 2-level integer array, containing x and y coordinates, ie board[x][y], each coordinate can either be a 1 (alive) or 0 (dead).
This was all a bit of background information, I'm trying to write my first function create_default_gol() that will return a pointer to a gol instance, with a 20x20 board.
I then attempt to go through each coordinate through the 20x20 board and set it to 0, I am getting a Segmentation fault (core dumped) when running this program.
The below code is my c file containing the core code, and the main() function:
#include "game_of_life.h"
int main()
{
// Create a 20x20 game
gol* g_temp = create_default_gol();
int x,y;
for (x = 0; x < 20; x++)
{
for (y = 0; y < 20; y++)
{
g_temp->board[x][y] = 0;
}
}
free(g_temp);
}
// return a pointer to a 20x20 game of life
gol* create_default_gol()
{
gol* g_rtn = malloc(sizeof(*g_rtn) + (sizeof(int) * 20 * 20));
return g_rtn;
}
This is the first feature I'd like to implement, being able to generate a 20x20 board with 0's (dead) state for every coordinate.
Please feel free to criticise my code, I'm looking to determine why I'm getting the segmentation fault, and if I'm allocating memory properly in the create_default_gol() function.
Thanks!
The type int **board; means that board must contain an array of pointers, each of which points to the start of each row. Your existing allocation omits this, and just allocates *g_rtn plus the ints in the board.
The canonical way to allocate your board, supposing that you must stick to the type int **board;, is:
gol* g_rtn = malloc(sizeof *g_rtn);
g_rtn->size = size;
g_rtn->board = malloc(size * sizeof *g_rtn->board);
for (int i = 0; i < size; ++i)
g_rtn->board[i] = malloc(size * sizeof **g_rtn->board);
This code involves a lot of small malloc chunks. You could condense the board rows and columns into a single allocation, but then you also need to set up pointers to the start of each row, because board must be an array of pointers to int.
Another issue with this approach is alignment. It's guaranteed that a malloc result is aligned for any type; however it is possible that int has stricter alignment requirements than int *. My following code assumes that it doesn't; if you want to be portable then you could add in some compile-time checks (or run it and see if it aborts!).
The amount of memory required is the sum of the last two mallocs:
g_rtn->board = malloc( size * size * sizeof **g_rtn->board
+ size * sizeof *g_rtn->board );
Then the first row will start after the end of the row-pointers (a cast is necessary because we are converting int ** to int *, and using void * means we don't have to repeat the word int):
g_rtn->board[0] = (void *) (g_rtn->board + size);
And the other rows each have size ints in them:
for (int i = 1; i < size; ++i)
g_rtn->board[i] = g_rtn->board[i-1] + size;
Note that this is a whole lot more complicated than just using a 1-D array and doing arithmetic for the offsets, but it was stipulated that you must have two levels of indirection to access the board.
Also this is more complicated than the "canonical" version. In this version we are trading code complexity for the benefit of having a reduced number of mallocs. If your program typically only allocates one board, or a small number of boards, then perhaps this trade-off is not worth it and the canonical version would give you fewer headaches.
Finally - it would be possible to allocate both *g_rtn and the board in the single malloc, as you attempted to do in your question. However my advice (based on experience) is that it is simpler to keep the board separate. It makes your code clearer, and your object easier to use and make changes to, if the board is a separate allocation to the game object.
create_default_gol() misses to initialise board, so applying the [] operator to it (in main() ) the program accesses "invaid" memory and with ethis provokes undefined behaviour.
Although enough memory is allocated, the code still needs to make board point to the memory by doing
gol->board = ((char*) gol) + sizeof(*gol);
Update
As pointed out by Matt McNabb's comment board points to an array of pointers to int, so initialisation is more complicate:
gol * g_rtn = malloc(sizeof(*g_rtn) + 20 * sizeof(*gol->board));
g_rtn->board = ((char*) gol) + sizeof(*gol);
for (size_t i = 0; i<20; ++i)
{
g_rtn->board[i] = malloc(20 * sizeof(*g_rtn->board[i])
}
Also the code misses to set gol's member size. From what you tell us it is not clear whether it shall hold the nuber of bytes, rows/columns or fields.
Also^2 coding "magic numbers" like 20 is bad habit.
Also^3 create_default_gol does not specify any parameters, which explictily allows any numberm and not none as you might perhaps have expected.
All in all I'd code create_default_gol() like this:
gol * create_default_gol(const size_t rows, const size_t columns)
{
size_t size_rows = rows * sizeof(*g_rtn->board));
size_t size_column = columns * sizeof(**g_rtn->board));
gol * g_rtn = malloc(sizeof(*g_rtn) + size_rows);
g_rtn->board = ((char*) gol) + sizeof(*gol);
if (NULL ! = g_rtn)
{
for (size_t i = 0; i<columns; ++i)
{
g_rtn->board[i] = malloc(size_columns); /* TODO: Add error checking here. */
}
g_rtn->size = size_rows * size_columns; /* Or what ever this attribute is meant for. */
}
return g_rtn;
}
gol* create_default_gol()
{
int **a,i;
a = (int**)malloc(20 * sizeof(int *));
for (i = 0; i < 20; i++)
a[i] = (int*)malloc(20 * sizeof(int));
gol* g_rtn = (gol*)malloc(sizeof(*g_rtn));
g_rtn->board = a;
return g_rtn;
}
int main()
{
// Create a 20x20 game
gol* g_temp = create_default_gol();
int x,y;
for (x = 0; x < 20; x++)
{
for (y = 0; y < 20; y++)
{
g_temp->board[x][y] = 10;
}
}
for(x=0;x<20;x++)
free(g_temp->board[x]);
free(g_temp->board);
free(g_temp);
}
main (void)
{
gol* gameOfLife;
gameOfLife = create_default_gol();
free(gameOfLife);
}
gol* create_default_gol()
{
int size = 20;
gol* g_rtn = malloc(sizeof *g_rtn);
g_rtn = malloc(sizeof g_rtn);
g_rtn->size = size;
g_rtn->board = malloc(size * sizeof *g_rtn->board);
int i, b;
for (i = 0; i < size; ++i){
g_rtn->board[i] = malloc(sizeof (int) * size);
for(b=0;b<size;b++){
g_rtn->board[i][b] = 0;
}
}
return g_rtn;
}
Alternatively, since you also need to add a create_gol(size_t new_size) of custom size, you could also write it as the following.
main (void)
{
gol* gameOfLife;
gameOfLife = create_default_gol();
free(gameOfLife);
}
gol* create_default_gol()
{
size_t size = 20;
return create_gol(size);
}
gol* create_gol(size_t new_size)
{
gol* g_rtn = malloc(sizeof *g_rtn);
g_rtn = malloc(sizeof g_rtn);
g_rtn->size = new_size;
g_rtn->board = malloc(size * sizeof *g_rtn->board);
int i, b;
for (i = 0; i < size; ++i){
g_rtn->board[i] = malloc(sizeof (int) * size);
for(b=0;b<size;b++){
g_rtn->board[i][b] = 0;
}
}
return g_rtn;
}
Doing this just minimizes the amount of code needed.
/**
* BLOCK_LOW
* Returns the offset of a local array
* with regards to block decomposition
* of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) offset of local array in global array
*/
#define BLOCK_LOW(id, p, n) ((id)*(n)/(p))
/**
* BLOCK_HIGH
* Returns the index immediately after the
* end of a local array with regards to
* block decomposition of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) offset after end of local array
*/
#define BLOCK_HIGH(id, p, n) (BLOCK_LOW((id)+1, (p), (n)))
/**
* BLOCK_SIZE
* Returns the size of a local array
* with regards to block decomposition
* of a global array.
*
* #param (int) process rank
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) size of local array
*/
#define BLOCK_SIZE(id, p, n) ((BLOCK_HIGH((id), (p), (n))) - (BLOCK_LOW((id), (p), (n))))
/**
* BLOCK_OWNER
* Returns the rank of the process that
* handles a certain local array with
* regards to block decomposition of a
* global array.
*
* #param (int) index in global array
* #param (int) total number of processes
* #param (int) size of global array
* #return (int) rank of process that handles index
*/
#define BLOCK_OWNER(i, p, n) (((p)*((i)+1)-1)/(n))
/*Matricefilenames:
small matrix A.bin of dimension 100 × 50
small matrix B.bin of dimension 50 × 100
large matrix A.bin of dimension 1000 × 500
large matrix B.bin of dimension 500 × 1000
An MPI program should be implemented such that it can
• accept two file names at run-time,
• let process 0 read the A and B matrices from the two data files,
• let process 0 distribute the pieces of A and B to all the other processes,
• involve all the processes to carry out the the chosen parallel algorithm
for matrix multiplication C = A * B ,
• let process 0 gather, from all the other processes, the different pieces
of C ,
• let process 0 write out the entire C matrix to a data file.
*/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include "mpi-utils.c"
void read_matrix_binaryformat (char*, double***, int*, int*);
void write_matrix_binaryformat (char*, double**, int, int);
void create_matrix (double***,int,int);
void matrix_multiplication (double ***, double ***, double ***,int,int, int);
int main(int argc, char *argv[]) {
int id,p; // Process rank and total amount of processes
int rowsA, colsA, rowsB, colsB; // Matrix dimensions
double **A; // Matrix A
double **B; // Matrix B
double **C; // Result matrix C : AB
int local_rows; // Local row dimension of the matrix A
double **local_A; // The local A matrix
double **local_C; // The local C matrix
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &id);
MPI_Comm_size (MPI_COMM_WORLD, &p);
if(argc != 3) {
if(id == 0) {
printf("Usage:\n>> %s matrix_A matrix_B\n",argv[0]);
}
MPI_Finalize();
exit(1);
}
if (id == 0) {
read_matrix_binaryformat (argv[1], &A, &rowsA, &colsA);
read_matrix_binaryformat (argv[2], &B, &rowsB, &colsB);
}
if (p == 1) {
create_matrix(&C,rowsA,colsB);
matrix_multiplication (&A,&B,&C,rowsA,colsB,colsA);
char* filename = "matrix_C.bin";
write_matrix_binaryformat (filename, C, rowsA, colsB);
free(A);
free(B);
free(C);
MPI_Finalize();
return 0;
}
// For this assignment we have chosen to bcast the whole matrix B:
MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast (&colsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&colsB, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&rowsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (&rowsB, 1, MPI_INT, 0, MPI_COMM_WORLD);
local_rows = BLOCK_SIZE(id, p, rowsA);
/* SCATTER VALUES */
int *proc_elements = (int*)malloc(p*sizeof(int)); // amount of elements for each processor
int *displace = (int*)malloc(p*sizeof(int)); // displacement of elements for each processor
int i;
for (i = 0; i<p; i++) {
proc_elements[i] = BLOCK_SIZE(i, p, rowsA)*colsA;
displace[i] = BLOCK_LOW(i, p, rowsA)*colsA;
}
create_matrix(&local_A,local_rows,colsA);
MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
/* END SCATTER VALUES */
create_matrix (&local_C,local_rows,colsB);
matrix_multiplication (&local_A,&B,&local_C,local_rows,colsB,colsA);
/* GATHER VALUES */
MPI_Gatherv(&local_C[0], rowsA*colsB, MPI_DOUBLE,&C[0],
&proc_elements[0],&displace[0],MPI_DOUBLE,0, MPI_COMM_WORLD);
/* END GATHER VALUES */
char* filename = "matrix_C.bin";
write_matrix_binaryformat (filename, C, rowsA, colsB);
free (proc_elements);
free (displace);
free (local_A);
free (local_C);
free (A);
free (B);
free (C);
MPI_Finalize ();
return 0;
}
void create_matrix (double ***C,int rows,int cols) {
*C = (double**)malloc(rows*sizeof(double*));
(*C)[0] = (double*)malloc(rows*cols*sizeof(double));
int i;
for (i=1; i<rows; i++)
(*C)[i] = (*C)[i-1] + cols;
}
void matrix_multiplication (double ***A, double ***B, double ***C, int rowsC,int colsC,int colsA) {
double sum;
int i,j,k;
for (i = 0; i < rowsC; i++) {
for (j = 0; j < colsC; j++) {
sum = 0.0;
for (k = 0; k < colsA; k++) {
sum = sum + (*A)[i][k]*(*B)[k][j];
}
(*C)[i][j] = sum;
}
}
}
/* Reads a 2D array from a binary file*/
void read_matrix_binaryformat (char* filename, double*** matrix, int* num_rows, int* num_cols) {
int i;
FILE* fp = fopen (filename,"rb");
fread (num_rows, sizeof(int), 1, fp);
fread (num_cols, sizeof(int), 1, fp);
/* storage allocation of the matrix */
*matrix = (double**)malloc((*num_rows)*sizeof(double*));
(*matrix)[0] = (double*)malloc((*num_rows)*(*num_cols)*sizeof(double));
for (i=1; i<(*num_rows); i++)
(*matrix)[i] = (*matrix)[i-1]+(*num_cols);
/* read in the entire matrix */
fread ((*matrix)[0], sizeof(double), (*num_rows)*(*num_cols), fp);
fclose (fp);
}
/* Writes a 2D array in a binary file */
void write_matrix_binaryformat (char* filename, double** matrix, int num_rows, int num_cols) {
FILE *fp = fopen (filename,"wb");
fwrite (&num_rows, sizeof(int), 1, fp);
fwrite (&num_cols, sizeof(int), 1, fp);
fwrite (matrix[0], sizeof(double), num_rows*num_cols, fp);
fclose (fp);
}
My task is to do a parallel matrix multiplication of matrix A and B and gather the results in matrix C.
I am doing this by dividing matrix A in rowwise pieces and each process is going to use its piece to multiply matrix B, and get back its piece from the multiplication. Then I am going to gather all the pieces from the processes and put them together to matrix C.
I allready posted a similiar question, but this code is improved and I have progressed but I am still getting a segmentation fault after the scatterv call.
So I see a few problems right away:
MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Here, you're passing not a pointer to doubles, but a pointer to a pointer to a pointer to a double (B is defined as double **B) and you're telling MPI to follow that pointer and send 1 double from there. That is not going to work.
You might think that what you're accomplishing here is sending the pointer to the matrix, from which all tasks can read the array -- that doesn't work. The processes don't share a common memory space (that's why MPI is called distributed memory programming) and the pointer doesn't go anywhere. You're actually going to have to send the contents of the matrix,
MPI_Bcast (&(B[0][0]), rowsB*colsB, MPI_DOUBLE, 0, MPI_COMM_WORLD);
and you're going to have to make sure the other processes have correctly allocated memory for the B matrix ahead of time.
There's similar pointer problems elsewhere:
MPI_Scatterv(&A[0], ..., &local_A[0]
Again, A is a pointer to a pointer to doubles (double **A) as is local_A, and you need to be pointing MPI to pointer to doubles for this to work, something like
MPI_Scatterv(&(A[0][0]), ..., &(local_A[0][0])
that error seems to be present in all the communications routines.
Remember that anything that looks like (buffer, count, TYPE) in MPI means that the MPI routines follow the pointer buffer and send the next count pieces of data of type TYPE there. MPI can't follow pointers within the buffer you sent becaue in general it doens't know they're there. It just takes the next (count * sizeof(TYPE)) bytes from pointer buffer and does whatever communications is appropriate with them. So you have to pass it a pointer to a stream of data of type TYPE.
Having said all that, it would be a lot easier to work with you on this if you had narrowed things down a bit; right now the program you've posted includes a lot of I/O stuff that's irrelevant, and it means that no one can just run your program to see what happens without first figuring out the matrix format and then generating two matrices on their own. When posting a question about source code, you really want to post a (a) small bit of source which (b) reproduces the problem and (c) is completely self-contained.
Consider this an extended comment as Jonathan Dursi has already given a fairly elaborate answer. You matrices are really represented in a weird way but at least you followed the advice given to your other question and allocate space for them as contiguous blocks and not separately for each row.
Given that, you should replace:
MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
with
MPI_Scatterv(A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,local_A[0],
local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);
A[0] already points to the beginning of the matrix data and there is no need to make a pointer to it. The same goes for local_A[0] as well as for the parameters to the MPI_Gatherv() call.
It has been said many times already - MPI doesn't do pointer chasing and only works with flat buffers.
I've also noticed another mistake in your code - memory for your matrices is not freed correctly. You are only freeing the array of pointers and not the matrix data itself:
free(A);
should really become
free(A[0]); free(A);
I am writing a C-program where I need 2D-arrays (dynamically allocated) with negative indices or where the index does not start at zero. So for an array[i][j] the row-index i should take values from e.g. 1 to 3 and the column-index j should take values from e.g. -1 to 9.
For this purpose I created the following program, here the variable columns_start is set to zero, so just the row-index is shifted and this works really fine.
But when I assign other values than zero to the variable columns_start, I get the message (from valgrind) that the command "free(array[i]);" is invalid.
So my questions are:
Why it is invalid to free the memory that I allocated just before?
How do I have to modify my program to shift the column-index?
Thank you for your help.
#include <stdio.h>
#include <stdlib.h>
main()
{
int **array, **array2;
int rows_end, rows_start, columns_end, columns_start, i, j;
rows_start = 1;
rows_end = 3;
columns_start = 0;
columns_end = 9;
array = malloc((rows_end-rows_start+1) * sizeof(int *));
for(i = 0; i <= (rows_end-rows_start); i++) {
array[i] = malloc((columns_end-columns_start+1) * sizeof(int));
}
array2 = array-rows_start; //shifting row-index
for(i = rows_start; i <= rows_end; i++) {
array2[i] = array[i-rows_start]-columns_start; //shifting column-index
}
for(i = rows_start; i <= rows_end; i++) {
for(j = columns_start; j <= columns_end; j++) {
array2[i][j] = i+j; //writing stuff into array
printf("%i %i %d\n",i, j, array2[i][j]);
}
}
for(i = 0; i <= (rows_end-rows_start); i++) {
free(array[i]);
}
free(array);
}
When you shift column indexes, you assign new values to original array of columns: in
array2[i] = array[i-rows_start]-columns_start;
array2[i] and array[i=rows_start] are the same memory cell as array2 is initialized with array-rows_start.
So deallocation of memory requires reverse shift. Try the following:
free(array[i] + columns_start);
IMHO, such modification of array indexes gives no benefit, while complicating program logic and leading to errors. Try to modify indexes on the fly in single loop.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int a[] = { -1, 41, 42, 43 };
int *b;//you will always read the data via this pointer
b = &a[1];// 1 is becoming the "zero pivot"
printf("zero: %d\n", b[0]);
printf("-1: %d\n", b[-1]);
return EXIT_SUCCESS;
}
If you don't need just a contiguous block, then you may be better off with hash tables instead.
As far as I can see, your free and malloc looks good. But your shifting doesn't make sense. Why don't you just add an offset in your array instead of using array2:
int maxNegValue = 10;
int myNegValue = -6;
array[x][myNegValue+maxNegValue] = ...;
this way, you're always in the positive range.
For malloc: you acquire (maxNegValue + maxPosValue) * sizeof(...)
Ok I understand now, that you need free(array.. + offset); even using your shifting stuff.. that's probably not what you want. If you don't need a very fast implementation I'd suggest to use a struct containing the offset and an array. Then create a function having this struct and x/y as arguments to allow access to the array.
I don't know why valgrind would complain about that free statement, but there seems to be a lot of pointer juggling going on so it doesn't surprise me that you get this problem in the first place. For instance, one thing which caught my eye is:
array2 = array-rows_start;
This will make array2[0] dereference memory which you didn't allocate. I fear it's just a matter of time until you get the offset calcuations wrong and run into this problem.
One one comment you wrote
but im my program I need a lot of these arrays with all different beginning indices, so I hope to find a more elegant solution instead of defining two offsets for every array.
I think I'd hide all this in a matrix helper struct (+ functions) so that you don't have to clutter your code with all the offsets. Consider this in some matrix.h header:
struct matrix; /* opaque type */
/* Allocates a matrix with the given dimensions, sample invocation might be:
*
* struct matrix *m;
* matrix_alloc( &m, -2, 14, -9, 33 );
*/
void matrix_alloc( struct matrix **m, int minRow, int maxRow, int minCol, int maxCol );
/* Releases resources allocated by the given matrix, e.g.:
*
* struct matrix *m;
* ...
* matrix_free( m );
*/
void matrix_free( struct matrix *m );
/* Get/Set the value of some elment in the matrix; takes logicaly (potentially negative)
* coordinates and translates them to zero-based coordinates internally, e.g.:
*
* struct matrix *m;
* ...
* int val = matrix_get( m, 9, -7 );
*/
int matrix_get( struct matrix *m, int row, int col );
void matrix_set( struct matrix *m, int row, int col, int val );
And here's how an implementation might look like (this would be matrix.c):
struct matrix {
int minRow, maxRow, minCol, maxCol;
int **elem;
};
void matrix_alloc( struct matrix **m, int minCol, int maxCol, int minRow, int maxRow ) {
int numRows = maxRow - minRow;
int numCols = maxCol - minCol;
*m = malloc( sizeof( struct matrix ) );
*elem = malloc( numRows * sizeof( *elem ) );
for ( int i = 0; i < numRows; ++i )
*elem = malloc( numCols * sizeof( int ) );
/* setting other fields of the matrix omitted for brevity */
}
void matrix_free( struct matrix *m ) {
/* omitted for brevity */
}
int matrix_get( struct matrix *m, int col, int row ) {
return m->elem[row - m->minRow][col - m->minCol];
}
void matrix_set( struct matrix *m, int col, int row, int val ) {
m->elem[row - m->minRow][col - m->minCol] = val;
}
This way you only need to get this stuff right once, in a central place. The rest of your program doesn't have to deal with raw arrays but rather the struct matrix type.