parallel Bitonic sort mpi - c

i'm trying to do a parallel bitonic sort using mpi and C but i obtained a deadlock or any other block state when send e recv array in comp_exchange_max or in comp_exchange_min function. can you help me to resolve this problem? thanks
void comp_exchange_max(int j, int rank, int *local_numbers, int dim_array, int *ordered_array)
int message_receive[dim_array];
int i, k, q;
if (rank > 0)
MPI_Recv(&message_receive, dim_array, MPI_INT, rank ^ (1 << j), 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Send(local_numbers, dim_array, MPI_INT, rank ^ (1 << j), 0, MPI_COMM_WORLD);
/* comparison */
k = dim_array - 1;
q = dim_array - 1;
for (i = dim_array - 1; i >= 0; --i)
if (local_numbers[k] > message_receive[q])
ordered_array[i] = local_numbers[k];
ordered_array[i] = message_receive[q];
void comp_exchange_min(int j, int rank, int *local_numbers, int dim_array, int *ordered_array)
int message_receive[dim_array];
int i, k, q;
if (rank > 0)
MPI_Send(local_numbers, dim_array, MPI_INT, rank ^ (1 << j), 0, MPI_COMM_WORLD);
MPI_Recv(&message_receive, dim_array, MPI_INT, rank ^ (1 << j), 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* comparison */
k = 0;
q = 0;
for (i = 0; i < dim_array; ++i)
if (local_numbers[k] < message_receive[q])
ordered_array[i] = local_numbers[k];
ordered_array[i] = message_receive[q];
all code is here


MPI_Gather doesn't receive data

The program should take two cmd line arguments, N = the number of items each worker should generate, and H = the highest value in the range of random numbers generate by each worker. Each worker makes a list of those random values and then the BigList is where I'm trying to gather them all back to but nothing shows up in the array of BigList. So for example:
Running mpirun -np 3 a.out 4 20 gets:
RANK: 1 --- NUM: 18
RANK: 1 --- NUM: 6
RANK: 1 --- NUM: 12
RANK: 1 --- NUM: 10
RANK: 2 --- NUM: 9
RANK: 2 --- NUM: 3
RANK: 2 --- NUM: 6
RANK: 2 --- NUM: 5
and BigList is empty when I'd expect it to get composed of every num listed above.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char* argv[]){
double t1, t2;
MPI_Init(&argc, &argv);
int rank;
int wsize;
int N = 10, H = 5;
int num, k, i;
int locarr[25];
int bigList[300];
if(argc > 1){
N = atoi(argv[1]);
H = atoi(argv[2]);
t1 = MPI_Wtime();
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &wsize);
if( rank == 0){
for(i = 0; i < N; i++){
locarr[i] = (((rand() % H) + 1) / rank);
printf("RANK: %d --- NUM: %d\n", rank, locarr[i]);
MPI_Gather(&locarr, N, MPI_INT, bigList, N, MPI_INT, 0, MPI_COMM_WORLD);
if( rank == 0){
printf("BigList: ");
for(k = 0; k < (rank * N); k++){
printf(" %d", bigList[k]);
t2 = MPI_Wtime();
// printf("\nMPI_Wtime(): %f\n", t2 - t1);
return 0;
Let me expand the comment of Gilles Gouaillardet,
MPI_Gather call is written correctly. To get the expected results, two changes need to be done.
MPI_Bcast is a collective operation. All processes should call it. So the code should be:
if( rank == 0){
for(i = 0; i < N; i++){
locarr[i] = (((rand() % H) + 1) / rank);
printf("RANK: %d --- NUM: %d\n", rank, locarr[i]);
Also, rank 0 prints the contents of the bigList. But in for loop, the loop condition is k<rank*N, for rank 0 this will always be false (k<0*N) as a result loop won't be executed and no value will be printed. So it should be world size (wsize) instead of rank.
if( rank == 0){
printf("BigList: ");
for(k = 0; k < wsize*N; k++){
printf(" %d", bigList[k]);
The print function was printing some extra garbage values in the BigList array. Instead replacing the print part of the code with this would solve the error
if (rank == 0)
printf("BigList: ");
for (k = N; k < wsize * N; k++)
printf(" %d", bigList[k]);
The full code for the problem is-
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[])
double t1, t2;
MPI_Init(&argc, &argv);
int rank;
int wsize;
int N = 10, H = 5;
int num, k, i;
int locarr[25];
int bigList[300];
if (argc > 1)
N = atoi(argv[1]);
H = atoi(argv[2]);
t1 = MPI_Wtime();
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &wsize);
if (rank == 0)
for (i = 0; i < N; i++)
locarr[i] = (((rand() % H) + 1) / rank);
printf("RANK: %d --- NUM: %d\n", rank, locarr[i]);
MPI_Gather(&locarr, N, MPI_INT, bigList, N, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0)
printf("BigList: ");
for (k = N; k < wsize * N; k++)
printf(" %d", bigList[k]);
return 0;

Parallel Merge Sort using MPI

I implemented Parallel Merge sort in this code using the tree Structural scheme; but it doesn't sort the Array!
Could you take look at it and tell me what is wrong?
For communication among the processor I used the normal MPI_send() and MPI_recv().
However I used numbers 0 and 1 and 2 as tags for the fifth argument of MPI_recv().
For 8 processors the tree structural scheme gives the Array to the processor with rank 0 then it splits the array in half an gives the right half to processor 4 and keeps the left half.
Then the processor 4 splits its array in half an gives the right half to processor 6 and keeps the left half.
At the end with this scheme all the processors work an the program and none of them will idle.
Since at the leaves of the tree all the processors have a piece of Array to do sequential Merge_sort_inc on it.
/* print_array() takes the elements of Array to the output */
void print_array(int arr[], int size)
for (int i = 0; i < size; i++)
printf("%d ", arr[i]);
/*copyarray() takes as first argument an Array a[] which its elements
between indexes start_a and end_a are to be copied to the dynamic Array *b with size of (size_b) */
void copyarray(int a[] ,int start_a , int end_a, int* b, int size_b)
int i = 0;
for (i = 0; i < size_b;i++)
b[i] = a[start_a];
if (start_a == end_a)
/* merge () function is just the sequential implementation of merge Sort Algorithm */
void merge(int Arr[], int left, int mid, int right)
int n_l = (mid - left + 1);
int n_r = (right - mid);
int* Arr_l = (int*)calloc(n_l, sizeof(int));
int* Arr_r = (int*)calloc(n_r, sizeof(int));
if (Arr_l == NULL)
if (Arr_r == NULL)
for (int i = 0;i < n_l;i++)
Arr_l[i] = Arr[left + i];
for (int j = 0;j < n_r;j++)
Arr_r[j] = Arr[mid + 1 + j];
int i = 0, j = 0, k = left;
while (i < n_l && j < n_r)
if (Arr_l[i] <= Arr_r[j])
Arr[k] = Arr_l[i];
Arr[k] = Arr_r[j];
while (i < n_l)
Arr[k] = Arr_l[i];
while (j < n_r)
Arr[k] = Arr_r[j];
/*merge_sort_inc() is the sequential algorithm of sorting in increasing order*/
void merge_sort_inc(int Arr[], int left, int right)
int mid = (int)(left + (right - left) / 2);
if (left < right)
merge_sort_inc(Arr, left, mid);
merge_sort_inc(Arr, mid + 1, right - 1);
merge(Arr, left, mid, right);
/*parallelMerge() builds first the tree-structural communication between the processors. at the leafs of the tree,
where there is no more divide and concurrent progress the Function gives the the processor the sequential Merge sort algorithm*/
void parallelMerge(int* array, int size, int height)
int parent;
int rank;
int numberOfproc;
int next;
int rightChild;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numberOfproc);
parent = rank & ~(1 << height);
next = height - 1;
rightChild = rank | (1 << (height - 1));
if (height > 0)
if (rightChild >= numberOfproc)
parallelMerge(array, size, next);
int left_size = (int)(size / 2);
int right_size = size - left_size;
int* leftArray = (int*)calloc(left_size, sizeof(int));
int * rightArray = (int*)calloc(right_size,sizeof(int));
if (leftArray == NULL)
if (rightArray == NULL)
int massage[2];
int i, j , k;
MPI_Status status;
copyarray(array, 0, left_size, leftArray, left_size);
copyarray(array, size - left_size, size, rightArray,right_size);
massage[0] = next;
massage[1] = right_size;
MPI_Send(massage, 2, MPI_INT, rightChild,0, MPI_COMM_WORLD);
MPI_Send(rightArray, right_size, MPI_INT, rightChild, 1, MPI_COMM_WORLD);
parallelMerge(leftArray, left_size, next);
MPI_Recv(rightArray, right_size, MPI_INT, rightChild, 2, MPI_COMM_WORLD, &status);
i = j = k = 0;
while (i < left_size && j < right_size)
if (leftArray[i] < rightArray[j])
array[k] = leftArray[i]; i++, k++;
array[k] = rightArray[j]; j++, k++;
while (i<left_size)
array[k] = leftArray[i];
while (j<right_size)
array[k] = rightArray[j];
merge_sort_inc(array, 0 ,size);
if (parent != rank)
MPI_Send(array, size, MPI_INT, parent, 2, MPI_COMM_WORLD);
int main()
/*building an array with the help of Random function*/
time_t t;
int Arr[100];
int arrSize = sizeof(Arr) / sizeof(int);
for (int i = 0; i < arrSize; i++)
Arr[i] = rand() / 100;
printf("the unsorted array is : \n ");
print_array(Arr, arrSize);
/*starting the parallel sorting*/
int rank;
int comm_size;
MPI_Comm_rank(MPI_COMM_WORLD , &rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
double start = MPI_Wtime();//capture time
if (rank == 0)
int roothight = 0;
int nodeCount = 1;
while (nodeCount < comm_size)
roothight = (int)log(nodeCount);
int* newarray = (int*)calloc(arrSize, sizeof(int));
if (newarray == NULL)
return 1;
copyarray(Arr, 0, arrSize - 1, newarray, arrSize );
parallelMerge(newarray, arrSize, roothight);
double midle = MPI_Wtime();
int massage[2];
int height;
int size_array;
MPI_Status status;
MPI_Recv(massage, 2, MPI_INT, MPI_ANY_SOURCE,0, MPI_COMM_WORLD, &status);
height = massage[0];
size_array = massage[1];
int* newarray = (int*)calloc(size_array, sizeof(int));
if (newarray == NULL)
return 1;
MPI_Recv(newarray, size_array, MPI_INT, MPI_ANY_SOURCE,1, MPI_COMM_WORLD, &status);
parallelMerge(newarray, size_array, height);
double end = MPI_Wtime();
printf("\n the sorted array is : \n");
print_array(Arr, arrSize);
printf("\n the sorting takes %lf time ", (end - start));
return 0;

Use MPI_Sendrecv for Conway Game of Life but the program can't exchange data in the borders

I'm trying to write a code of Conway's Game of Life, the pattern is Rabbit. And I used the Cartesian 2D method to build a processes group, the communication between them is MPI_Sendrecv. But this code didn't work, it just hanging there without any respond when I ran it. It has taken me a long time to find the problem but I got no progress. Could you please help me to figure it out? I'll be so glad for that!
#include <stdio.h>
#include "mpi.h"
#include <math.h>
#include <stdlib.h>
#define array 20
#define arrayhalf (array/2)
main(int argc, char *argv[])
int ndims = 2, ierr;
int p, my_rank, my_cart_rank;
MPI_Comm comm2d;
MPI_Datatype newtype;
int dims[ndims], coord[ndims];
int wrap_around[ndims];
int reorder, nrows, ncols;
int x[arrayhalf+2][arrayhalf+2], x2[arrayhalf+2][arrayhalf+2], x_rev[array+4][array+4];
int left, right, down, top;
MPI_Status status;
int tag_up = 20, tag_down =21, tag_left = 22, tag_right = 23;
long start, stop;
/*** start up initial MPI environment ***/
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* hardcore 2 processes in each dimension */
nrows = ncols = (int) sqrt(p);
dims[0] = dims[1] = 2;
/* create cartesian topology for processes */
MPI_Dims_create(p, ndims, dims);
/*if (my_rank == 0)
printf("PW[%d]/[%d%]: PEdims = [%d x %d] \n", my_rank, p, dims[0], dims[1]);*/
/* create cartesian mapping, and check it is created either correct or wrong */
wrap_around[0] = wrap_around[1] = 1; /*set periodicity to be true */
reorder = 0;
ierr = 0;
ierr = MPI_Cart_create(MPI_COMM_WORLD, ndims, dims, wrap_around, reorder, &comm2d);
if (ierr != 0)
printf("ERROR[%d] creating CART\n", ierr);
MPI_Type_vector( arrayhalf, 1, arrayhalf+2, MPI_INT, &newtype);
MPI_Type_commit( &newtype );
/* get the neighbour process, which is useful for hola exchange */
int SHIFT_ROW = 0;
int SHIFT_COL = 1;
int DISP = 1;
/*** load pattern ***/
/* initialize the data array */
int i, j ;
for (i = 0; i < arrayhalf + 2 ; i++)
for (j = 0; j < arrayhalf + 2; j++)
x[i][j] = 0;
x2[i][j] = 0;
if (my_rank == 0)
int r,c;
r = arrayhalf / 2;
c = arrayhalf / 2;
/* rabbits pattern
1 1 1 1
1 1 1 1
x[r][c] = 1;
x[r][c+4] = 1;
x[r][c+5] = 1;
x[r][c+6] = 1;
x[r+1][c] = 1;
x[r+1][c+1] = 1;
x[r+1][c+2] = 1;
x[r+1][c+5] = 1;
x[r+2][c+1] = 1;
/*** calculate the next generation ***/
int row, col;
int steps;
steps = atoi(argv[1]); /* get the generation number from command line */
start = MPI_Wtime();
int soc;
int destination;
for (i = 1; i <= steps; i++)
/*** use hola exchange in boundary elements ***/
int * send_buffer = (int*) malloc((arrayhalf)*sizeof(int));
int * recv_buffer = (int*) malloc((arrayhalf)*sizeof(int));
/*int * send_buffer = (int *) calloc(arrayhalf,sizeof(int));
int * recv_buffer = (int *) calloc(arrayhalf,sizeof(int));
/* to up */
MPI_Cart_shift(comm2d, 1, 1, &soc,&destination);
MPI_Sendrecv( &x[1][1], arrayhalf, MPI_INT, destination, tag_up,& x[arrayhalf + 1][1], arrayhalf, MPI_INT, soc, tag_up, comm2d, &status );
/* to down */
MPI_Cart_shift(comm2d, 1, 1, &destination,&soc);
MPI_Sendrecv( &x[arrayhalf][1], arrayhalf, MPI_INT, destination, tag_down,& x[0][1], arrayhalf, MPI_INT, soc, tag_down, comm2d, &status);
/* to left */
MPI_Cart_shift(comm2d, 0, 1, &destination,&soc);
MPI_Sendrecv( &x[1][1], 1,newtype, destination, tag_left,& x[1][arrayhalf+1], 1, newtype, soc, tag_left, comm2d, &status );
/*for (j=0;j<arrayhalf;j++) {
MPI_Sendrecv( send_buffer, arrayhalf,MPI_INT, destination, tag_left,recv_buffer, arrayhalf, MPI_INT, soc, tag_left, comm2d, &status );
for (j=0;j<arrayhalf;j++) {
/* to right */
MPI_Cart_shift(comm2d, 0, 1, &soc,&destination);
MPI_Sendrecv( &x[1][arrayhalf], 1, newtype, destination, tag_right, &x[1][0], 1, newtype, soc, tag_right, comm2d, &status );
/*for (j=0;j<arrayhalf;j++) {
MPI_Sendrecv( send_buffer, arrayhalf,MPI_INT, destination, tag_right,recv_buffer, arrayhalf, MPI_INT, soc, tag_right, comm2d, &status );
for (j=0;j<arrayhalf;j++) {
/*** sum the neighbour values and get the next generation ***/
for (row = 1; row < arrayhalf; row++)
for (col = 1; col < arrayhalf; col++)
int neighbor;
neighbor = x[row - 1][col - 1] + x[row - 1][col] + x[row - 1][col + 1] + x[row][col - 1] +
x[row][col + 1] +
x[row + 1][col - 1] + x[row + 1][col] + x[row + 1][col + 1];
if (neighbor == 3)
x2[row][col] = 1;
else if (x[row][col] == 1 && neighbor == 2)
x2[row][col] = 1;
x2[row][col] = 0;
/* used to be swap */
for (row = 1; row < arrayhalf; row++)
for (col = 1; col < arrayhalf; col++)
x[row][col] = x2[row][col];
/*** print the final generation ***/
int population = 0;
int* A;
int process_num = dims[0]*dims[1];
int row_indx;
int col_indx;
int k;
if(my_rank == 0)
A = (int*) malloc((arrayhalf+2)*(arrayhalf+2)*sizeof(int));
for (k= 1; k< process_num; k++)
MPI_Recv(A,(arrayhalf+2)*(arrayhalf+2), MPI_INT,k, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
for (i = 0; i<arrayhalf+2; i++)
for (j = 0; j<arrayhalf+2; j++)
row_indx = (k%dims[1])*(arrayhalf+2)+i;
col_indx = (k/dims[0]*(arrayhalf+2))+j;
x_rev[row_indx][col_indx] = A[i*(arrayhalf+2)+j];
for (i = 0; i<arrayhalf+2; i++)
for (j = 0; j<arrayhalf+2; j++)
x_rev[i][j] = x[i][j];
for (row = 0; row < array+4; row++) {
for (col = 0; col < array+4; col++)
population = population + 1;
stop = MPI_Wtime();
printf("Running Time: %f\n ",stop-start);
printf("Population: %d\n",population);
printf("Generation: %d\n",steps);
A = (int*) malloc((array+4)*(array+4)*sizeof(int));
for (i=0; i< arrayhalf +2; i++)
for(j = 0; j<arrayhalf+2; j++)
A[i*(arrayhalf+2)+j] = x[i][j];
MPI_Comm_free( &comm2d );
MPI_Type_free( &newtype );
I think I found the error.
It is in line 176.
Rank 0 is trying to listen to a msg from rank 0, but rank 0 is not sending a msg to itself. You should start the loop from 1 and not 0.

Program stops at MPI_Send

Program stops working, when I execute it with more than 1 processor.
It stops at first MPI_Send
What am I doing wrong?
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SIZE 200000
#define SIZE2 256
#define VYVOD 1
int main(int argc, char *argv[])
int NX, NT;
double TK, UM, DX, DY, DT;
double starttime, endtime;
int numnode, rank, delta=0, ierr, NXnode;
double **U;
double **U1;
double *sosed1;
double *sosed2;
int i, j, k;
MPI_Status stats;
NX = 1*(SIZE2+1);
TK = 20.00;
UM = 10.0;
DX = 0.1;
DY = DX;
DT = 0.1;
NT = (TK/DT);
if(rank == 0)
printf("\nTotal nodes: %d\n", numnode);
NX = NX - 2;
NXnode = (NX-(NX%numnode))/numnode;
if (rank < (NX%numnode))
delta = rank * NXnode + rank + 1;
delta = rank * NXnode + (NX%numnode) + 1;
if(rank == 0){
printf("Order counting complete, NXnode = %d\n", NXnode);
U = (double**)malloc(NXnode*sizeof(double*));
U1 = (double**)malloc(NXnode*sizeof(double*));
sosed1 = (double*)malloc(SIZE*sizeof(double));
sosed2 = (double*)malloc(SIZE*sizeof(double));
for (i=0; i < NXnode; i++)
U[i] = (double*)malloc(SIZE*sizeof(double));
U1[i] = (double*)malloc(SIZE*sizeof(double));
if (U[i]==NULL || U1[i]==NULL)
printf("Error at memory allocation!");
return 1;
if(rank == 0){
starttime = MPI_Wtime();
printf("Array allocation complete\n");
for (i = 0; i < NXnode; i++)
for (j = 1; j < SIZE-1; j++)
if ((delta)<=(NXnode/2))
U1[i][j]=-2*(UM/NXnode) + 2*UM;
printf("Array init 1 complete, rank %d\n", rank);
if (rank > 0)
MPI_Send(&(U1[0][0]), SIZE, MPI_DOUBLE , rank-1, 0, MPI_COMM_WORLD);
MPI_Recv(&(sosed1[0]), SIZE, MPI_DOUBLE , rank-1, 1, MPI_COMM_WORLD, &stats);
int initInd = 0;
for (initInd = 0; initInd < SIZE; initInd++)
if (rank < (numnode-1))
MPI_Send(&(U1[NXnode-1][0]), SIZE, MPI_DOUBLE , rank+1, 1, MPI_COMM_WORLD);
MPI_Recv(&(sosed2[0]), SIZE, MPI_DOUBLE , rank+1, 0, MPI_COMM_WORLD, &stats);
int initInd = 0;
for (initInd = 0; initInd < SIZE; initInd++)
printf("Send complete, rank %d\n", rank);
printf("Array init complete, rank %d\n", rank);
for (k = 1; k <= NT; k++)
int cycle = 0;
for (cycle=1; cycle < SIZE-1; cycle++)
U[0][cycle] = U1[0][cycle] + DT/(DX*DX)*(U1[1][cycle]-2*U1[0][cycle]+sosed1[cycle])+DT/(DY*DY)*(U1[0][cycle+1]+U1[0][cycle-1]-(U1[0][cycle]*2));
for (i=1; i<NXnode-1; i++)
for(j=1; j<SIZE-1; j++)
U[i][j] = U1[i][j] + DT/(DX*DX)*(U1[i+1][j]-2*U1[i][j]+U[i-1][j])+DT/(DY*DY)*(U1[i][j+1]+U1[i][j-1]-(U1[i][j]*2));
for (cycle=1; cycle < SIZE-1; cycle++)
/*U[0] = U1[0]+DT/(DX*DX)*(U1[0+1]-2*U1[0]+sosed1);
for (j = 0; j<NXnode; j++)
if (rank > 0)
MPI_Send(&(U[0][0]), SIZE, MPI_DOUBLE , rank-1, 0, MPI_COMM_WORLD);
if (rank < (numnode-1))
MPI_Send(&(U[NXnode-1][0]), SIZE, MPI_DOUBLE , rank+1, 0, MPI_COMM_WORLD);
if (rank > 0)
MPI_Recv(&(sosed1[0]), SIZE, MPI_DOUBLE , rank-1, 0, MPI_COMM_WORLD, &stats);
if (rank < (numnode-1))
MPI_Recv(&(sosed2[0]), SIZE, MPI_DOUBLE , rank+1, 0, MPI_COMM_WORLD, &stats);
for (i = 0; i<NXnode; i++)
for (j=0; j<SIZE; j++)
printf("Array count complete, rank %d\n", rank);
if (rank == 0)
printf("\n## TIME: %f\n", endtime-starttime);
Tried it like that, so rank 0 would be the first, still doesn't work:
if (rank == 0 && numnode > 1)
MPI_Recv(&(sosed2[0]), SIZE, MPI_DOUBLE , rank+1, 0, MPI_COMM_WORLD, &stats);
MPI_Send(&(U1[NXnode-1][0]), SIZE, MPI_DOUBLE , rank+1, 1, MPI_COMM_WORLD);
int initInd = 0;
for (initInd = 0; initInd < SIZE; initInd++)
else if (rank == 0)
int initInd = 0;
for (initInd = 0; initInd < SIZE; initInd++)
else if (rank < (numnode-1))
MPI_Send(&(U1[0][0]), SIZE, MPI_DOUBLE , rank-1, 1, MPI_COMM_WORLD);
MPI_Recv(&(sosed1[0]), SIZE, MPI_DOUBLE , rank-1, 0, MPI_COMM_WORLD, &stats);
MPI_Recv(&(sosed2[0]), SIZE, MPI_DOUBLE , rank+1, 0, MPI_COMM_WORLD, &stats);
MPI_Send(&(U1[NXnode-1][0]), SIZE, MPI_DOUBLE , rank+1, 1, MPI_COMM_WORLD);
else if (rank == (numnode - 1))
MPI_Send(&(U1[0][0]), SIZE, MPI_DOUBLE , rank-1, 1, MPI_COMM_WORLD);
MPI_Recv(&(sosed1[0]), SIZE, MPI_DOUBLE , rank-1, 0, MPI_COMM_WORLD, &stats);
int initInd = 0;
for (initInd = 0; initInd < SIZE; initInd++)
Solved, used same tag for all Send/Recv.
MPI_Send is blocking the execution until the corresponding MPI_Recv is invoked (presumably in another process).
In your program, all processes except rank=0 are calling MPI_Send immediately after the first barrier, and no one is ready to Recv the message, so MPI_Send blocks infinitely. Essentially, every process is waiting for its message to be accepted by the process with the lower rank (rank 2 is waiting for rank 1, rank 1 is waiting for rank 0), and rank 0 is not accepting any messages at all (it goes to the next block of code and in turn calls MPI_Send too), so everything just hangs.
It looks like you are missing communication part for the process with rank=0 (it should do something like MPI_Recv(from rank 1); ...; MPI_Send(to rank 1);.
Another thing is that you use MPI_Send with tag 1, but call MPI_Recv with tag 0. This won't couple. You need to use the same tag, or to specify MPI_TAG_ANY in the receive operation.

MPI debugging Segmentation fault

I'm trying to sort an array of random numbers using Odd- Even transposition but I keep getting a segmentation error when running my code:
[islb:48966] *** Process received signal ***
[islb:48966] Signal: Segmentation fault (11)
[islb:48966] Signal code: Address not mapped (1)
[islb:48966] Failing at address: 0x28
[islb:48966] [ 0] /lib64/[0x7fc3da4cb810]
[islb:48966] [ 1] /lib64/[0x7fc3da1c7cf3]
[islb:48966] [ 2] /usr/local/lib/[0x7fc3d9c372db]
[islb:48966] [ 3] /usr/local/lib/openmpi/[0x7fc3d58507a8]
[islb:48966] [ 4] /usr/local/lib/openmpi/[0x7fc3d5850d11]
[islb:48966] [ 5] /usr/local/lib/openmpi/[0x7fc3d5849489]
[islb:48966] [ 6] /usr/local/lib/[0x7fc3da742f40]
[islb:48966] [ 7] oddEven[0x40115a]
[islb:48966] [ 8] /lib64/[0x7fc3da161c36]
[islb:48966] [ 9] oddEven[0x400c19]
[islb:48966] *** End of error message ***
mpirun noticed that process rank 1 with PID 48966 on node islb exited on signal 11 (Segmentation fault).
The program allocates the array, it's when it comes to scattering it amongst the processes that the error seems to occur as the print statment directly after the scatter call only prints for process 0 and then prints the error message.
Here's my code:
#include <stdio.h>
#include <math.h>
#include <malloc.h>
#include <time.h>
#include <string.h>
#include "mpi.h"
const int MAX = 10000;
int myid, numprocs;
int i, n, j, k, arrayChunk, minindex;
int A, B;
int temp;
int swap(int *x, int *y) {
temp = *x;
*x = *y;
*y = temp;
return 0;
int main(int argc, char **argv) {
int* arr = NULL;
int* value = NULL;
MPI_Status status;
//int arr[] = {16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
time_t t1, t2;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
if (myid == 0) {
printf("Enter the number of elements you would like in the array \n");
scanf("%d", &n);
arrayChunk = n/numprocs;
//printf("cpus: %d, #s per cpu: %d\n", numprocs, arrayChunk);
//Allocate memory for the array
arr = malloc(n * sizeof(int));
value = malloc(n * sizeof(int));
// Generate an array of size n random numbers and prints them
printf("Elements in the array: ");
for (i = 0; i < n; i++) {
arr[i] = (rand() % 100) + 1;
printf("%d ", arr[i]);
if ((n % numprocs) != 0) {
if (myid == 0)
printf("Number of Elements are not divisible by numprocs \n");
// Broadcast the size of each chunk
MPI_Bcast(&arrayChunk, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Scatter(&arr, arrayChunk, MPI_INT, &value, arrayChunk, MPI_INT, 0, MPI_COMM_WORLD);
printf("Processor %d receives %d\n", myid, value[0]);
for (i = 0; i < numprocs; i++) {
if (i % 2 == 0) {
if (myid%2 == 0) {
MPI_Send(&value[0], arrayChunk, MPI_INT, myid + 1, 0, MPI_COMM_WORLD);
MPI_Recv(&value[arrayChunk], arrayChunk, MPI_INT, myid + 1, 0, MPI_COMM_WORLD, &status);
for (j = 0; j < (arrayChunk * 2 - 1); j++) {
minindex = j;
for (k = j + 1; k < arrayChunk * 2; k++) {
if (value[k] < value[minindex]) {
minindex = k;
if (minindex > j) {
swap(&value[j], &value[minindex]);
//printf("myid %d i: %d, %d\n", myid, i, value[0]);
} else {
MPI_Recv(&value[arrayChunk], arrayChunk, MPI_INT, myid - 1, 0, MPI_COMM_WORLD, &status);
MPI_Send(&value[0], arrayChunk, MPI_INT, myid - 1, 0, MPI_COMM_WORLD);
for (j = 0; j < (arrayChunk * 2 - 1); j++) {
minindex = j;
for (k = j + 1; k < arrayChunk * 2; k++) {
if (value[k] < value[minindex]) {
minindex = k;
if (minindex > j) {
swap(&value[j], &value[minindex]);
for (j = 0; j < arrayChunk; j++) {
swap(&value[j], &value[j + arrayChunk]);
//printf("myid %d i: %d, %d\n", myid, i, value[0]);
} else {
if ((myid%2 == 1) && (myid != (numprocs-1))) {
MPI_Send(&value[0], arrayChunk, MPI_INT, myid + 1, 0, MPI_COMM_WORLD);
MPI_Recv(&value[arrayChunk], arrayChunk, MPI_INT, myid + 1, 0, MPI_COMM_WORLD, &status);
for (j = 0; j < (arrayChunk * 2 - 1); j++) {
minindex = j;
for (k = j + 1; k < arrayChunk * 2; k++) {
if (value[k] < value[minindex]) {
minindex = k;
if (minindex > j) {
swap(&value[j], &value[minindex]);
//printf("myid %d i: %d, %d\n", myid, i, value[0]);
} else if (myid != 0 && myid != (numprocs-1)) {
MPI_Recv(&value[arrayChunk], arrayChunk, MPI_INT, myid - 1, 0, MPI_COMM_WORLD, &status);
MPI_Send(&value[0], 1, MPI_INT, myid - 1, 0, MPI_COMM_WORLD);
for (j = 0; j < (arrayChunk * 2 - 1); j++) {
minindex = j;
for (k = j + 1; k < arrayChunk * 2; k++) {
if (value[k] < value[minindex]) {
minindex = k;
if (minindex > j) {
swap(&value[j], &value[minindex]);
for (j = 0; j < arrayChunk; j++) {
swap(&value[j], &value[j + arrayChunk]);
//printf("myid %d i: %d, %d\n", myid, i, value[0]);
MPI_Gather(&value[0], arrayChunk, MPI_INT, &arr[0], arrayChunk, MPI_INT, 0, MPI_COMM_WORLD);
if (myid == 0) {
printf("Sorted array: ");
for (i = 0; i < n; i++) {
printf("%d ", arr[i]);
printf("Time in sec. %f\n", difftime(t2, t1));
// Free allocated memory
if (arr != NULL) {
arr = NULL;
value = NULL;
return 0;
I'm not very familiar with C and it could well be that I've used malloc and/or addresses and pointers incorrectly, as such it's probably something simple.
Sorry for the amount of code but I thought it would be better to supply all of it to allow for proper debugging.
The problem is in your MPI_Scatter command. You try to scatter the information and store in value, but if you look above that code, only rank 0 has allocated any memory for value. When any and all other ranks try to store data into value, you will get a segmentation fault (and indeed you do). Instead, remove the value = malloc(...); line from inside the if block, and put it after the MPI_Bcast as value = malloc(arrayChunk * sizeof(int));. I've not looked through the rest of the code to see if there are any issues elsewhere as well, but that is likely the cause of the initial seg-fault.
I would build program with debugging info (most likely -g compile flag), try geting coredump and try using gdb debugger to locate the bug. Corefile is created when process crashes and it holds process memory image at the moment of crash.
If after program crash coredump file is not created, You'll need to figure out how to enable it on Your system. You may create simple buggy program (for example with a=x/0; or similar error) and play a bit. Coredump may be called core, PID.core (PID - number of crashed process), or something similar. Sometimes it is enough to set core file size tu unlimited using ulimit. Also check kernel.core_* sysctl's on Linux.
Once You have corecump, You can use it with gdb or similar debuger (ddd):
gdb executable_file core
