I'm developing an application in c in which the user wants to find certain pattern of 2 digit numbers in a 2 Dimensional array.
For Example, There is a 10x10 array with random single digit numbers and user wants to find 1,0. Our program will search for 1 and when it is found, our program will search for 0 in all directions(top, bottom, sides, diagonals and anti diagonals) to depth 1. Simply, we can say it will search zero on the sides of 1 in a sub-matrix of size 3x3. The function search_number() is performing the job for searching second digit.
I've implemented sequential code for it and I'm trying to convert it into MPI.
I'm super noob with MPI and practicing it first time.
Here is my attempt with MPI.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 255
#define BS N/2
MPI_Status status;
int search_number(int arr[N][N],int row,int col,int digit_2){
int count=0;
for (int i=row-1;i<=row+1;i++){ //from -row to +row = 3 indexes for rows
for(int j=col-1;j<=col+1;j++){ //from -col to +col = 3 indexes for cols
// skip for [row,col] and -1 for both [i,j] as well as till maximum size
if(i<0 || j<0 || i>=N || j>=N || i==row && j==col) continue;
if(arr[i][j] == digit_2){ //if second number is found, increase the counter
return count;
int main(int argc, char **argv)
int nproc,taskId,source,i,j,k,positionX,positionY;
int sum=0;
MPI_Datatype type;
int a[N][N];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskId);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Type_vector(N, BS, N, MPI_INT, &type);
if (taskId == 0) {
srand( time(NULL) );
//Generate two NxN matrix
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
a[i][j]= rand()%10;
printf("Passing 1st chunk:\n");
// first chunk
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
printf("Passing 2nd Chunk:\n");
//second chunk
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
source = 0;
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
//root receives results
if(taskId == 0)
printf("Count: %d\n",sum);
// printMatrix(resultFinal);
The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255. Can you point out my mistake?
The issue I'm facing is my program gets stuck at Passing Chunk 1 line
if I pass set N>255 on top. But works until 0 to 255.
As #Gilles Gouaillardet already pointed out in the comments, and more detailed on this answer:
MPI_Send() is allowed to block until a matching receive is posted (and
that generally happens when the message is "large") ... and the
required matching receive never gets posted.
A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0
before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0
communication with MPI_Sendrecv().
Besides that your parallelization seems wrong, namely:
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
to the process 0 and 1 you have send the same workload, and :
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
with the process 2 and 3 the same issue.
You should try to use a stencil alike approach where each process only shares the borders among them. For instance, a possible distribution, for a 4x4 matrix and 4 processes could be:
process 0 works with the rows 0th, 1th and 2th;
process 1 works with the rows 2th, 3th and 4th;
process 2 works with the rows 4th, 5th, 6th;
process 3 works with the rows 7th, 8th, 9th;
Currently, to each process you send BS*N elements, however in:
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
you specify that you are expecting to receive N*N.
Moreover in:
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
processes are working with positions of the matrix a that they did not receive, naturally that should not be the case.
Finally instead of
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
you should actually use a MPI_Reduce i.e.,
Reduces values on all processes to a single value
I'm trying to write a MPI program that calculates the sum of an array of integers.
For this purpose I used MPI_Scatter to send chunks of the array to the other processes then MPI_Gather to get the sum of each chunk by the root process(process 0).
The problem is one of the processes receives two elements but the other one receives random numbers. I'm running my code with 3 processes.
Here is what I have:
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
int number1[2]; //buffer for processes
int sub_sum = 0;
int sub_sums[2];
int sum;
int number[4];
if(world_rank == 0){
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
MPI_Gather(&sub_sum, 1, MPI_INT, &sub_sums, 1, MPI_INT, 0,MPI_COMM_WORLD);
if(world_rank == 0){
for(int i=0; i<2;i++){
sum+= sub_sums[i];
printf("\nthe sum of array is: %d\n",sum);
return 0;
The result:
I'm process 1 , I received the array : 5 9
I'm process 2 , I received the array : 1494772352 32767
the sum of array is: 14
It seems that you misunderstood how MPI works; Your code is hardcoded to work (correctly) with only two processes. However, you are trying to run the code with 3 processes, with the wrong assumption that the during the MPI_Scatter call the root rank will only send the data to the other processes. If you look at the following image (taken from source):
you notice that the root rank (i.e., rank = 0) also receives part of the data.
The problem is one of the processes receives two elements but the
other one receives random numbers.
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
So you have hardcoded an input as follows number{1,3,5,9} (with only 4 elements); and what is happen during the MPI_Scatter call is that process 0 will get the first and second elements from array number (i.e., {1, 3}), whereas process 1 gets the other two elements (i.e., {5, 9}), and the process 2 will get some random values, consequently:
I'm process 2 , I received the array : 1494772352 32767
You get
the sum of array is: 14
because the array sub_sums will have the sums performed by process 0, which is zero since you excluded, and process 1 which is 3 + 9. Hence, 0 + 14 = 14.
To fix this you need to remove if(world_rank!=0) from:
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
and run your code with only 2 processes.
For the last step instead of the MPI_Gather you can used MPI_Reduce to perform the sum in parallel and collect the value directly on the root rank. Consequently, you would not need to performed the sum manually on the root rank.
A running example:
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
int number1[2];
int number[4];
if(world_rank == 0){
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
int sub_sum = 0;
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
int sum = 0;
MPI_Reduce(&sub_sum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD);
if(world_rank == 0)
printf("\nthe sum of array is: %d\n",sum);
return 0;
Input : {1,3,5,9} running with 2 processes
I'm process 0 , I received the array : 1 3
I'm process 1 , I received the array : 5 9
the sum of array is: 18
If you really want to only have the process 1 and 2 receive the data and performed the sum, I would suggest to look into the routines MPI_Send and MPI_Recv.
if I have this code:
int main(void) {
int result=0;
int num[6] = {1, 2, 4, 3, 7, 1};
if (my_rank != 0) {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
} else {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD)
printf("result = %d\n", result);
the result print is 1 ;
But if the num[0]=9; then the result is 9
I read to solve this problem I must to define the variable num as array.
I can't understand how the function MPI_Reduce works with MPI_MIN. Why, if the num[0] is not equal to the smallest number, then I must to define the variable num as array?
MPI_Reduce performs a reduction over the members of the communicator - not the members of the local array. sendbuf and recvbuf must both be of the same size.
I think the standard says it best:
Thus, all processes provide input buffers and output buffers of the same length, with elements of the same type. Each process can provide one element, or a sequence of elements, in which case the combine operation is executed element-wise on each entry of the sequence.
MPI does not get the minimum of all elements in the array, you have to do that manually.
You can use MPI_MIN to obtain the min value among those passed via reduction.
Lets' examine the function declaration:
int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype
datatype, MPI_Op op, int root, MPI_Comm comm)
Each process send it's value (or array of values) using the buffer sendbuff.
The process identified by the root id receive the buffers and stores them in the buffer recvbuf. The number of elements to receive from each of the other processes is specified in count, so that recvbuff must be allocated with dimension sizeof(datatype)*count.
If each process has only one integer to send (count = 1) then recvbuff it's also an integer, If each process has two integers then recvbuff it's an array of integers of size 2. See this nice post for further explanations and nice pictures.
Now it should be clear that your code is wrong, sendbuff and recvbuff must be of the same size and there is no need of the condition: if(myrank==0). Simply, recvbuff has meaning only for the root process and sendbuff for the others.
In your example you can assign one or more element of the array to a different process and then compute the minvalue (if there are as many processes as values in the array) or the array of minvalues (if there are more values than processes).
Here is a working example that illustrates the usage of MPI_MIN, MPI_MAX and MPI_SUM (slightly modified from this), in the case of simple values (not array).
Each process do some work, depending on their rank and send to the root process the time spent doing the work. The root process collect the times and output the min, max and average values of the times.
#include <stdio.h>
#include <mpi.h>
int myrank, numprocs;
/* just a function to waste some time */
float work()
float x, y;
if (myrank%2) {
for (int i = 0; i < 100000000; ++i) {
x = i/0.001;
y += x;
} else {
for (int i = 0; i < 100000; ++i) {
x = i/0.001;
y += x;
return y;
int main(int argc, char **argv)
int node;
MPI_Comm_rank(MPI_COMM_WORLD, &node);
printf("Hello World from Node %d\n",node);
/*variables used for gathering timing statistics*/
double mytime,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Barrier(MPI_COMM_WORLD); /*synchronize all processes*/
mytime = MPI_Wtime(); /*get time just before work section */
mytime = MPI_Wtime() - mytime; /*get time just after work section*/
/*compute max, min, and average timing statistics*/
MPI_Reduce(&mytime, &maxtime, 1, MPI_DOUBLE,MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&mytime, &mintime, 1, MPI_DOUBLE, MPI_MIN, 0,MPI_COMM_WORLD);
MPI_Reduce(&mytime, &avgtime, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
/* plot the output */
if (myrank == 0) {
avgtime /= numprocs;
printf("Min: %lf Max: %lf Avg: %lf\n", mintime, maxtime,avgtime);
return 0;
If I run this on my OSX laptop, this is what I get:
urcaurca$ mpirun -n 4 ./a.out
Hello World from Node 3
Hello World from Node 0
Hello World from Node 2
Hello World from Node 1
Min: 0.000974 Max: 0.985291 Avg: 0.493081
I'm trying to send a number to p-1 processes. Process 0 sends this value to all other processes. I use an MPI_SEND Command to do this. When I explicitly write out MPI_SEND commands for 3 processes, it works fine. But when I want to put it in a loop, it gives me the output as well as a segmentation fault code. Here is my code:
#include <stdlib.h>
#include <mpi.h>
#include "a1.h"
int main(int argc, char** argv)
RGB *image;
int width, height, max;
int windowLength = atoi(argv[3]);
int my_rank, p, local_height, source, i;
int dest;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
int *processorRows;
processorRows = (int*)malloc(sizeof(int)*(p+1));
if (my_rank == 0) {
printf("Process %d is reading...\n", my_rank);
image = readPPM(argv[1], &width, &height, &max);
//calculate rows to each process
for (i=0; i<p; i++) {
processorRows[i] = height/p;
for (i=0; i< height%p; i++){
for (dest=1; dest<p; dest++) {
MPI_Send(processorRows + dest, 1, MPI_INT, dest, 0, MPI_COMM_WORLD);
//MPI_Send(processorRows + 2, 1, MPI_INT, 2, 0, MPI_COMM_WORLD);
//MPI_Send(processorRows + 3, 1, MPI_INT, 3, 0, MPI_COMM_WORLD);
else {
MPI_Recv(processorRows, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
printf("I am Process %d and will run %d rows...\n", my_rank, *processorRows);
//processImage(width, height, image, windowLength);
//writePPM(argv[2], width, height, max, image);
If I were to remove the for loop, replace "dest" with 1, and uncomment the other 2 MPI_SEND lines, it works completely fine when running mpirun -np 4 ./program
Not sure what's going on here...
I'm not exactly sure what you are trying to accomplish. But, from the statement
Process 0 sends this value to all other processes.
and from the part of the code, I would expect you to do a scatter from Process-0 to all other PEs rather than this send-receive loop tricks.
Remove all the send-receive pairs and remove the loops, just use a single scatter operation. Here is the link for MPI_Scatter operation https://www.open-mpi.org/doc/v1.8/man3/MPI_Scatter.3.php. If you are unsure about the scatter operation, have a look at this neat explanation http://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/
It looks like, the size of processorRows array is the size of the total number of Process used. And, you are trying to send each element of this processorRows array to all other ranks. Hence, your code should look something like this one below:
int *processorRows;
processorRows = (int*)malloc(sizeof(int)*(p+1));
if (my_rank == 0) {
printf("Process %d is reading...\n", my_rank);
image = readPPM(argv[1], &width, &height, &max);
for (i=0; i<p; i++) {
processorRows[i] = height/p;
for (i=0; i< height%p; i++){
MPI_Scatter(processorRows, 1, MPI_INT, processorRows, 1, MPI_INT, 0, MPI_COMM_WORLD);
I removed the
#include "a.h"
image = readPPM(argv[1], &width, &height, &max);
since I do not have these classes, set the height manually to 10 and the code worked. Maybe the problem is with height variable?
After searching and searching finally I have function which allocate memory for nD array like vector or linear.
Function is:
int malloc2dint(int ***array, int n, int m)
/* allocate the n*m contiguous items */
int *p = (int *)malloc(n*m*sizeof(int));
if (!p) return -1;
/* allocate the row pointers into the memory */
(*array) = (int **)malloc(n*sizeof(int*));
if (!(*array))
return -1;
/* set up the pointers into the contiguous memory */
int i;
for (i=0; i<n; i++)
(*array)[i] = &(p[i*m]);
return 0;
By using this method I can broadcast and also scatter 2d dynamic allocated array correctly but problem in MPI_Gather still exist.
main function is:
int length = atoi(argv[1]);
int rank, size, from, to, i, j, k, **first_array, **second_array, **result_array;
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
//2D dynamic memory allocation
malloc2dint(&first_array, length, length);
malloc2dint(&second_array, length, length);
malloc2dint(&result_array, length, length);
//Related boundary to each task
from = rank * length/size;
to = (rank+1) * length/size;
//Intializing first and second array
if (rank==0)
for(i=0; i<length; i++)
for(j=0; j<length; j++)
first_array[i][j] = 1;
second_array[i][j] = 1;
//Broadcast second array so all tasks will have it
MPI_Bcast (&(second_array[0][0]), length*length, MPI_INT, 0, MPI_COMM_WORLD);
//Scatter first array so each task has matrix values between its boundary
MPI_Scatter (&(first_array[0][0]), length*(length/size), MPI_INT, first_array[from], length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
//Now each task will calculate matrix multiplication for its part
for (i=from; i<to; i++)
for (j=0; j<length; j++)
for (k=0; k<length; k++)
result_array[i][j] += first_array[i][k]*second_array[k][j];
//printf("\nrank(%d)->result_array[%d][%d] = %d\n", rank, i, j, result_array[i][j]);
//this line print the correct value
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from]), length*(length/size), MPI_INT, result_array, length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
if (rank==0)
for (i=0; i<length; i++)
printf("\n\t| ");
for (j=0; j<length; j++)
printf("%2d ", result_array[i][j]);
return 0;
Now when I run mpirun -np 2 xxx.out 4 the output is:
| 4 4 4 4 | ---> Good Job!
| 4 4 4 4 | ---> Good Job!
| 1919252078 1852795251 1868524912 778400882 | ---> Where are you baby?!!!
| 540700531 1701080693 1701734758 2037588068 | ---> Where are you baby?!!!
Finally mpirun notice that the process rank 0 exited on signal 6 (aborted).
Strange point for me is where MPI_Bcast and MPI_Scatter work fine but MPI_Gather not.
Any help will highly appreciated
The problem is with how you are passing the buffers. You are doing it correctly in MPI_Scatter, but then do it incorrectly for MPI_Gather.
Passing the result_array as via &result_array[from] will read the memory where the pointer list is saved rather than the actual data of the matrix. Use &result_array[from][0] instead.
Similarly for the receive buffer. Pass &result_array[0][0] instead of result_array to pass a pointer to the position where the data lies in memory.
Hence, instead of:
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from]), length*(length/size), MPI_INT, result_array, length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from][0]), length*(length/size), MPI_INT, &(result_array[0][0]), length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
I have a 2D double precision array that is being manipulated in parallel by several processes. Each process manipulates a part of the array, and at the end of every iteration, I need to ensure that all the processes have the SAME copy of the 2D array.
Assuming an array of size 10*10 and 2 processes (or processors). Process 1 (P1) manipulates the first 5 rows of the 2D row (5*10=50 elements in total) and P2 manipulates the last 5 rows (50 elements total). And at the end of each iteration, I need P1 to have (ITS OWN first 5 rows + P2's last 5 rows). P2 should have (P1's first 5 rows + it's OWN last 5 rows). I hope the scenario is clear.
I am trying to broadcast using the code given below. But my program keeps exiting with this error: "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)".
I am already using a contiguous 2D memory allocator as pointed out here: MPI_Bcast a dynamic 2d array by Jonathan. But I am still getting the same error.
Can someone help me out?
My code:
double **grid, **oldgrid;
int gridsize; // size of grid
int rank, size; // rank of current process and no. of processes
int rowsforeachprocess, offset; // to keep track of rows that need to be handled by each process
/* allocation, MPI_Init, and lots of other stuff */
rowsforeachprocess = ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
/* Each process is handling "rowsforeachprocess" #rows.
* Lots of work done here
* Now I need to broadcast these rows to all other processes.
for(i=0; i<gridsize; i++){
MPI_Bcast(&(oldgrid[i]), gridsize-2, MPI_DOUBLE, (i/rowsforeachprocess), MPI_COMM_WORLD);
Part 2: The code above is part of a parallel solver for the laplace equation using 1D decomposition and I did not want to use a Master-worker model. Will my code be easier if I use a Master-worker model?
The crash-causing problem here is a 2d-array pointer issue -- &(oldgrid[i]) is a pointer-to-a-pointer to doubles, not a pointer to doubles, and it points to the pointer to row i of your array, not to row i of your array. You want MPI_Bcast(&(oldgrid[i][0]),.. or MPI_Bcast(oldgrid[i],....
There's another way to do this, too, which only uses one expensive collective communicator instead of one per row; if you need everyone to have a copy of the whole array, you can use MPI_Allgather to gather the data together and distribute it to everyone; or, in the general case where the processes don't have the same number of rows, MPI_Allgatherv. Instead of the loop over broadcasts, this would look a little like:
int *counts = malloc(size*sizeof(int));
int *displs = malloc(size*sizeof(int));
for (int i=0; i<size; i++) {
counts[i] = rowsforeachprocess*gridsize;
displs[i] = i*rowsforeachprocess*gridsize;
counts[size-1] = (gridsize-(size-1)*rowsforeachprocess)*gridsize;
MPI_Allgatherv(oldgrid[offset], mynumrows*gridsize, MPI_DOUBLE,
oldgrid[0], counts, displs, MPI_DOUBLE, MPI_COMM_WORLD);
where counts are the number of items sent by each task, and displs are the displacements.
But finally, are you sure that every process has to have a copy of the entire array? If you're just computing a laplacian, you probably just need neighboring rows, not the whole array.
This would look like:
int main(int argc, char**argv) {
double **oldgrid;
const int gridsize=10; // size of grid
int rank, size; // rank of current process and no. of processes
int rowsforeachprocess; // to keep track of rows that need to be handled by each process
int offset, mynumrows;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
rowsforeachprocess = (int)ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
mynumrows = rowsforeachprocess;
if (rank == size-1)
mynumrows = gridsize-offset;
rowsforeachprocess = (int)ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
mynumrows = rowsforeachprocess;
if (rank == size-1)
mynumrows = gridsize-offset;
malloc2ddouble(&oldgrid, mynumrows+2, gridsize);
for (int i=0; i<mynumrows+2; i++)
for (int j=0; j<gridsize; j++)
oldgrid[i][j] = rank;
/* exchange row data with neighbours */
int highneigh = rank+1;
if (rank == size-1) highneigh = 0;
int lowneigh = rank-1;
if (rank == 0) lowneigh = size-1;
/* send data to high neibhour and receive from low */
MPI_Sendrecv(oldgrid[mynumrows], gridsize, MPI_DOUBLE, highneigh, 1,
oldgrid[0], gridsize, MPI_DOUBLE, lowneigh, 1,
MPI_COMM_WORLD, &status);
/* send data to low neibhour and receive from high */
MPI_Sendrecv(oldgrid[1], gridsize, MPI_DOUBLE, lowneigh, 1,
oldgrid[mynumrows+1], gridsize, MPI_DOUBLE, highneigh, 1,
MPI_COMM_WORLD, &status);
for (int proc=0; proc<size; proc++) {
if (rank == proc) {
printf("Rank %d:\n", proc);
for (int i=0; i<mynumrows+2; i++) {
for (int j=0; j<gridsize; j++) {
printf("%f ", oldgrid[i][j]);