Segmentation Fault on MPI_Gather with 2D arrays - c

I got a problem with a MPI code in C.
I think I created the good algorithm to process a double loop with a 2D array. But, when i try to use MPI_Gather to collect datas from process, there is a segmentation fault error. Here is the code :
#define NN 4096
#define NM 4096
double global[NN][NM];
void range(int n1, int n2, int nprocs, int irank, int *ista, int *iend){
int iwork1;
int iwork2;
iwork1 = ( n2 - n1 + 1 ) / nprocs;
iwork2 = ( ( n2 - n1 + 1 ) % nprocs );
*ista = irank * iwork1 + n1 + fmin(irank, iwork2);
*iend = *ista + iwork1 - 1;
if ( iwork2 > irank )
iend = iend + 1;
}
void runCalculation(int n, int m, int argc, char** argv)
{
const int iter_max = 1000;
const double tol = 1.0e-6;
double error = 1.0;
int rank, size;
int start, end;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
if (size != 16) MPI_Abort( MPI_COMM_WORLD, 1 );
memset(global, 0, n * m * sizeof(double));
if(rank == 0){
for (int j = 0; j < n; j++)
{
global[j][0] = 1.0;
}
}
int iter = 0;
while ( error > tol && iter < iter_max )
{
error = 0.0;
MPI_Bcast(global, NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);
if(iter == 0)
range(1, n, size, rank, &start, &end);
int size = end - start;
double local[size][NM];
memset(local, 0, size * NM * sizeof(double));
for( int j = 1; j < size - 1; j++)
{
for( int i = 1; i < m - 1; i++ )
{
local[j][i] = 0.25 * ( global[j][i+1] + global[j][i-1]
+ global[j-1][i] + global[j+1][i]);
error = fmax( error, fabs(local[j][i] - global[j][i]));
}
}
MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);
printf("%d\n", iter);
if(iter % 100 == 0)
printf("%5d, %0.6f\n", iter, error);
iter++;
}
MPI_Finalize();
}
I run this with 4096x4096 arrays. With the process rank 0, it creates a segmentation fault at the MPI_Gather line. I checked if the size are ok for local arrays and I think it works nicely.
Edit : Added the line of local initialization. New segmentation fault :
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x10602000
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 19216 on machine_name exited on signal 11 (Segmentation fault).

The recvcount parameter of MPI_Gather indicates the number of items it receives from each process, not the total number of items it receives.
MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Should be:
MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], size*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Related

Does MPI_Reduce need an existing pointer for the receive buffer?

The MPI documentation asserts that the adress of address of the receive buffer (recvbuf) is significant only at root. Meaning that the memory may not be allocated in the other processes. This is confirmed by this question.
int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int root, MPI_Comm comm)
At first I thought that recvbuf did not even have to exist: that the memory for recvbuf itself did not have to be allocated (eg by dynamical allocation). Unfortunately (it took me a lot of time to understand my mistake!), it seems that even if the memory that it points to is not valid, the pointer itself has to exist.
See below for the code I have in mind, with a version that gives a segfault, and one that does not.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv) {
// MPI initialization
int world_rank, world_size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int n1 = 3, n2 = 10; // Sizes of the 2d arrays
long **observables = (long **) malloc(n1 * sizeof(long *));
for (int k = 0 ; k < n1 ; ++k) {
observables[k] = (long *) calloc(n2, sizeof(long));
for (long i = 0 ; i < n2 ; ++i) {
observables[k][i] = k * i * world_rank; // Whatever
}
}
long **obs_sum; // This will hold the sum on process 0
#ifdef OLD // Version that gives a segfault
if (world_rank == 0) {
obs_sum = (long **) malloc(n2 * sizeof(long *));
for (int k = 0 ; k < n2 ; ++k) {
obs_sum[k] = (long *) calloc(n2, sizeof(long));
}
}
#else // Correct version
// We define all the pointers in all the processes.
obs_sum = (long **) malloc(n2 * sizeof(long *));
if (world_rank == 0) {
for (int k = 0 ; k < n2 ; ++k) {
obs_sum[k] = (long *) calloc(n2, sizeof(long));
}
}
#endif
for (int k = 0 ; k < n1 ; ++k) {
// This is the line that results in a segfault if OLD is defined
MPI_Reduce(observables[k], obs_sum[k], n2, MPI_LONG, MPI_SUM, 0,
MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
// You may free memory here
return 0;
}
Am I interpreting this correctly? What is the rationale behind this behavior?
The problem is not MPI, but the fact that you are passing obs_sum[k], but you haven't defined/allocated it at all.
for (int k = 0 ; k < n1 ; ++k) {
// This is the line that results in a segfault if OLD is defined
MPI_Reduce(observables[k], obs_sum[k], n2, MPI_LONG, MPI_SUM, 0,
MPI_COMM_WORLD);
}
Even if MPI_Reduce() is not getting its value, the generated code will get obs_sum (undefined and not allocated), add k to it and try to read this pointer (segfault) to be passed to MPI_Reduce().
For example the allocation of the rows should be sufficient for it to work:
#else // Correct version
// We define all the pointers in all the processes.
obs_sum = (long **) malloc(n2 * sizeof(long *));
// try commenting out the following lines
// if (world_rank == 0) {
// for (int k = 0 ; k < n2 ; ++k) {
// obs_sum[k] = (long *) calloc(n2, sizeof(long));
// }
// }
#endif
I would allocate a 2D array as a flat array - I really hate this array-of-arrays representation. Wouldn't this be better?
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv) {
// MPI initialization
int world_rank, world_size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int n1 = 3, n2 = 10; // Sizes of the 2d arrays
long *observables = (long *) malloc(n1*n2*sizeof(long));
for (int k = 0 ; k < n1 ; ++k) {
for (long i = 0 ; i < n2 ; ++i) {
observables[k*n2+i] = k * i * world_rank; // Whatever
}
}
long *obs_sum = nullptr; // This will hold the sum on process 0
if (world_rank == 0) {
obs_sum = (long *) malloc(n1*n2*sizeof(long));
}
MPI_Reduce(observables, obs_sum, n1*n2, MPI_LONG, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
// You may free memory here
return 0;
}

MPI Search In Array

Im trying to find a spesific value inside an array. Im trying to find it with parallel searching by mpi. When my code finds the value, it shows an error.
ERROR
Assertion failed in file src/mpid/ch3/src/ch3u_buffer.c at line 77: FALSE
memcpy argument memory ranges overlap, dst_=0x7ffece7eb590 src_=0x7ffece7eb590 len_=4
PROGRAM
const char *FILENAME = "input.txt";
const size_t ARRAY_SIZE = 640;
int main(int argc, char **argv)
{
int *array = malloc(sizeof(int) * ARRAY_SIZE);
int rank,size;
MPI_Status status;
MPI_Request request;
int done,myfound,inrange,nvalues;
int i,j,dummy;
/* Let the system do what it needs to start up MPI */
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
myfound=0;
if (rank == 0)
{
createFile();
array = readFile(FILENAME);
}
MPI_Bcast(array, ARRAY_SIZE, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Irecv(&dummy, 1, MPI_INT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &request);
MPI_Test(&request, &done, &status);
nvalues = ARRAY_SIZE / size; //EACH PROCESS RUNS THAT MUCH NUMBER IN ARRAY
i = rank * nvalues; //OFFSET FOR EACH PROCESS INSIDE THE ARRAY
inrange = (i <= ((rank + 1) * nvalues - 1) && i >= rank * nvalues); //LIMIT OF THE OFFSET
while (!done && inrange)
{
if (array[i] == 17)
{
dummy = 1;
for (j = 0; j < size; j++)
{
MPI_Send(&dummy, 1, MPI_INT, j, 1, MPI_COMM_WORLD);
}
printf("P:%d found it at global index %d\n", rank, i);
myfound = 1;
}
printf("P:%d - %d - %d\n", rank, i, array[i]);
MPI_Test(&request, &done, &status);
++i;
inrange = (i <= ((rank + 1) * nvalues - 1) && i >= rank * nvalues);
}
if (!myfound)
{
printf("P:%d stopped at global index %d\n", rank, i - 1);
}
MPI_Finalize();
}
Error is somewhere in here because when i put an invalid number for example -5 into if condition, program runs smoothly.
dummy = 1;
for (j = 0; j < size; j++)
{
MPI_Send(&dummy, 1, MPI_INT, j, 1, MPI_COMM_WORLD);
}
printf("P:%d found it at global index %d\n", rank, i);
myfound = 1;
Thanks
Your program is invalid with respect to the MPI standard because you use the same buffer (&dummy) for both MPI_Irecv() and MPI_Send().
You can either use two distinct buffers (e.g. dummy_send and dummy_recv), or since you do not seem to care about the value of dummy, then use NULL as buffer and send/receive zero size messages.

distributed algorithm in C

I am a beginner in C. I have to create a distributed architecture with the library MPI. The following code is:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int N, w = 1, L = 2, M = 50; // with N number of threads
int T= 2;
int myid;
int buff;
float mit[N][T]; // I initialize a 2d array
for(int i = 0; i < N; ++i){
mit[i][0]= M / (float) N;
for (int j = 1; j < T; ++j){
mit[i][j] = 0;
}
}
float tab[T]; // 1d array
MPI_Status stat;
/*********************************************
start
*********************************************/
MPI_Init(&argc,&argv); // Initialisation
MPI_Comm_size(MPI_COMM_WORLD, &N);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
for(int j = 0; j < T; j++) {
for(int i = 0; i < N; i++) { // I iterate for each slave
if (myid !=0) {
float y = ((float) rand()) / (float) RAND_MAX;
mit[i][j + 1] = mit[i][j]*(1 + w * L * y);
buff=mit[i][j+1];
MPI_Send(&buff, 128, MPI_INT, 0, 0, MPI_COMM_WORLD); // I send the variable buff to the master
buff=0;
}
if( myid == 0 ) { // Master
for(int i = 1; i < N; i++){
MPI_Recv(&buff, 128, MPI_INT, i, 0, MPI_COMM_WORLD, &stat);
tab[j] += buff; // I need to receive all the variables buff sent by the salves, sum them and stock into the tab at the index j
}
printf("\n%.20f\n",tab[j]); // I print the result of the sum at index j
}
}
}
MPI_Finalize();
return 0;
}
}
I use the command in the terminal: mpicc .c -o my_file to compile the program
Then mpirun -np 101 my_file_c to start the program with 101 threads
But the problem is I have the following error int the terminal:
It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
It seems that I have a problem with the master but i don't know why...
Any idea ???
Thank you :)
This behavior is very likely the result of a memory corruption.
You cannot
int buff=mit[i][j+1];
MPI_Send(&buff, 128, MPI_INT, ...);
depending on what you want to achieve, you can try instead
int buff=mit[i][j+1];
MPI_Send(&buff, 1, MPI_INT, ...);
// ...
MPI_Recv(&buff, 1, MPI_INT, ...);
or
int *buff=&mit[i][j+1];
MPI_Send(buff, 128, MPI_INT, ...);
// fix MPI_Recv()

Program hangs after MPI_Bcast

I am new to MPI. I need to make a program for matrix multiplication in 2D topology (grid). First matrix (A) distributes along x coordinate, second matrix (B) distributes along y coordinate. Every process counts one submatrix. I use MPI_Bcast to send submatrices in dimensions, but after that program doesn't continue. What did I do wrong?
Here is the code.
#include<stdio.h>
#include<stdlib.h>
#include<mpi/mpi.h>
#define NUM_DIMS 2
#define N 81
#define A(i, j) A[N*(i)+(j)]
#define B(i, j) B[N*(i)+(j)]
#define C(i, j) C[N*(i)+(j)]
#define AA(i, j) AA[k *(i)+(j)] //
#define BB(i, j) BB[k*(i)+(j)]
#define CC(i, j) CC[k*(i)+(j)]
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int threadCount;
int threadRank;
MPI_Comm_size(MPI_COMM_WORLD, &threadCount);
int dims[NUM_DIMS] = {0};
//Создаем решетку
int periods[2] = {0, 0};
MPI_Comm comm_2D;
MPI_Comm comm_1D[2];
MPI_Dims_create(threadCount, NUM_DIMS, dims);
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm_2D);
MPI_Comm_rank(comm_2D, &threadRank);
int k = N/dims[1];
double *A = (double*)calloc(N*N, sizeof(double));
double *B = (double*)calloc(N*N, sizeof(double));
double *C = (double*)calloc(N*N, sizeof(double));
double startTime = MPI_Wtime();
int subdims[2];
subdims[0] = 0;
subdims[1] = 1;
MPI_Cart_sub(comm_2D, subdims, &comm_1D[0]);
subdims[0] = 1;
subdims[1] = 0;
MPI_Cart_sub(comm_2D, subdims, &comm_1D[1]);
MPI_Datatype column, matrix;
MPI_Type_vector(N, N / k, N, MPI_DOUBLE, &column);
MPI_Type_create_resized(column, 0, N / k * sizeof(double), &column);
MPI_Type_commit(&column);
double *AA, *BB, *CC;
AA = (double*)calloc(N * k, sizeof(double));
BB = (double*)calloc(N * k, sizeof(double));
CC = (double*)calloc(k * k , sizeof(double));
int threadCoords[2];
MPI_Comm_rank(comm_2D, &threadRank);
MPI_Cart_coords(comm_2D, threadRank, NUM_DIMS, threadCoords);
if (threadCoords[0] == 0) {
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
A(i, j) = 1;
B(i, j) = 1;
}
}
}
if (threadCoords[1] == 0) {
MPI_Scatter(A, N * k, MPI_DOUBLE, AA, N * k, MPI_DOUBLE, 0, comm_1D[0]);
}
if (threadCoords[0] == 0) {
int offset[3] = {0, 1, 2};
int send[3] = {1, 1, 1};
MPI_Scatterv(B, send, offset, column, BB, N * k , MPI_DOUBLE, 0, comm_1D[1]);
}
int r = MPI_Bcast(AA, k*N, MPI_DOUBLE, 0, comm_1D[1]);
fprintf(stderr, "r = %d\n", r);
int p = MPI_Bcast(BB, k*N, MPI_DOUBLE, 0, comm_1D[0]);
fprintf(stderr, "p = %d\n", p);
/*...*/
}

Troubles in MPI -> Failing at address: (nil)

I'm a beginner in C and MPI and i'm trying to do a program to multiply 2 matrix in MPI.
But I don't kwno what is wrong in my code.
I'm try do 'slice' the matrix M1 in n lines and send then to another process do multiply and broadcast de matrix M2 After I make a Gather to build the final matrix M3.
I make this:
mpirun -n 2 matrix
But I receive a error in terminal:
[adiel-VirtualBox:07921] *** Process received signal ***
[adiel-VirtualBox:07921] Signal: Segmentation fault (11)
[adiel-VirtualBox:07921] Signal code: (128)
[adiel-VirtualBox:07921] Failing at address: (nil)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 7921 on node adiel-VirtualBox exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpirun during cleanup)
mpirun: clean termination accomplished
Can anyone help me?
Here's my code:
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
//#include "mpe.h"
#include <math.h>
void printMatrix(double *M, int m, int n) {
int lin, col;
for (lin=0; lin<m; lin++) {
for (col=0; col<n; col++)
printf("%.2f \t", M[(lin*n+col)]);
printf("\n");
}
}
double* allocateMatrix(int m, int n){
double* M;
M = (double *)malloc(m*n*sizeof(double));
return M;
}
int main( int argc, char *argv[] )
{
int rank, size;
int m1,n1,m2,n2;
int row, col,ctrl,i,k,lines,proc;
double *M1, *M2, *M3, **vp, *v;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
m1 = m2 = n1 = n2 = 3;
lines = (int)ceil(n1/size);
v = (double *)malloc(lines*n1*sizeof(double));
M2 = allocateMatrix(m2,n2);
M3 = allocateMatrix(m1,n2);
if(rank==0)
M1 = allocateMatrix(m1,n1);
//startin matrix
for (col = 0; col < n1; col++){
for (row = 0; row < m1; row++) {
if(rank==0)
M1[(row*m1+col)] = 0;
M2[(row*m2+col)] = 0;
M3[(row*m1+col)] = 0;
}
}
//startin pointers with 0
for(i=0;i<lines*n1;i++)
v[i] = 0;
//populate
if(rank == 0){
for (col = 0; col < n1; col++){
for (row = 0; row < m1; row++) {
M1[row*m1+col] = row*3+(col+1);
M2[(row*m2+col)] = 1;
}
}
}
//---------------------sharing and multiply---------------//
//slicing M1 and sending to other process
if(rank == 0){
proc = size-1;
//for each line
for(row = 0;row<m1;row++){
ctrl = floor(row/lines);
//on each column
for(col=0;col<n1;col++){
v[(ctrl*n1)+col] = M1[(row*n1)+col];
}
if(row%lines == (lines - 1)){
if(proc!=0){
MPI_Send(v,lines*n1,MPI_DOUBLE,proc,1, MPI_COMM_WORLD);
proc--;
//clearing pointers
for(i=0;i<lines*n1;i++)
v[i] = 0;
}
}
}
}
//MPI_Bcast(m1, m*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(M2, m2*n2, MPI_DOUBLE, 0, MPI_COMM_WORLD);
//receiving process
if(rank!=0)
MPI_Recv(v,lines*n1,MPI_DOUBLE,0,1,MPI_COMM_WORLD, MPI_STATUS_IGNORE);
for(row=0;row<lines;row++){
if(v[row*n1]!=0){
for (col = 0; col < n1; col++){
double val = 0.0;
for(k=0;k<m1;k++){
val += v[(row*n1)+k] * M2[(k*n1)+col];
}
M3[((size-1-rank)*size*n1)+(row*n1)+col] = val;
}
}
}
if(rank!=0){
for(row = 0; row < lines; row++){
MPI_Gather(&M3[((size-1-rank)*size*n1)+(row*n1)], n1, MPI_DOUBLE, &M3[((size-1-rank)*size*n1)+(row*n1)], n1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
}
if(rank == 0){
printf("matrix 1------------------------\n");
printMatrix(M1,m1,n1);
printf("matrix 2------------------------\n");
printMatrix(M2,m2,n2);
printf("matrix 3------------------------\n");
printMatrix(M3,m1,n2);
}
MPI_Finalize();
return 0;
}
For one thing, doing all of your sends before the broadcast, and all of the receives after it is asking for trouble. I can easily see that leading to MPI resource exhaustion or deadlock failures. In such a small input that shouldn't arise, but you should fix it regardless. I'll take another look after that.

Resources