MPI joining vectors - c

I'm trying to do some parallel calculations and then reduce them to one vector.
I try it by dividing for loop into parts which should be calculated separatedly from vector. Later I'd like to join all those subvectors into one main vector by replacing parts of it with values gotten from processes. Needless to say, I have no idea how to do it and my tries were in vain.
Any help will be appreciated.
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("My id: %d, mySize: %d, myStart: %d, myEnd: %d", rank, size, mystart, myend);
while(delta > granica)
{
ii++;
delta = 0;
//if(rank > 0)
//{
for(i = mystart; i < myend; i++)
{
xNowe[i] = b[i];
for(j = 0; j < n; j++)
{
if(i != j)
{
xNowe[i] -= A[i][j] * x0[j];
}
}
xNowe[i] = xNowe[i] / A[i][i];
printf("Result in iteration %d: %d", i, xNowe[i]);
}
MPI_Reduce(xNowe, xNowe,n,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);

I'm going to ignore your calculations and assume they're all doing whatever it is you want them to do and at the end, you have an array called xNowe that has the results for your rank somewhere within it (in some subarray).
You have two options.
The first way uses an MPI_REDUCE in the way you're currently doing it.
What needs to happen is that you should probably set all of the values that do not pertain to your rank to 0, then you can just do a big MPI_REDUCE (as you're already doing), where each process contributes its xNowe array which will look something like this (depending on the input/rank/etc.):
rank: 0 1 2 3 4 5 6 7
value: 0 0 1 2 0 0 0 0
When you do the reduction (with MPI_SUM as the op), you'll get an array (on rank 0) that has each value filled in with the value contributed by each rank.
The second way uses an MPI_GATHER. Some might consider this to be the "more proper" way.
For this version, instead of using MPI_REDUCE to get the result, you only send the data that was calculated on your rank. You wouldn't have one large array. So your code would look something like this:
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("My id: %d, mySize: %d, myStart: %d, myEnd: %d", rank, size, mystart, myend);
while(delta > granica)
{
ii++;
delta = 0;
for(i = mystart; i < myend; i++)
{
xNowe[i-mystart] = b[i];
for(j = 0; j < n; j++)
{
if(i != j)
{
xNowe[i] -= A[i][j] * x0[j];
}
}
xNowe[i-mystart] = xNowe[i-mystart] / A[i][i];
printf("Result in iteration %d: %d", i, xNowe[i-mystart]);
}
}
MPI_Gather(xNowe, myend-mystart, MPI_DOUBLE, result, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
You would obviously need to create a new array on rank 0 that is called result to hold the resulting values.
UPDATE:
As pointed out by Hristo in the comments below, MPI_GATHER might not work here if myend - mystart is not the same on all ranks. If that's the case, you'd need to use MPI_GATHERV which allows you to specify a different size for each rank.

Related

Program Stuck while sending data to slaves MPI

I'm developing an application in c in which the user wants to find certain pattern of 2 digit numbers in a 2 Dimensional array.
For Example, There is a 10x10 array with random single digit numbers and user wants to find 1,0. Our program will search for 1 and when it is found, our program will search for 0 in all directions(top, bottom, sides, diagonals and anti diagonals) to depth 1. Simply, we can say it will search zero on the sides of 1 in a sub-matrix of size 3x3. The function search_number() is performing the job for searching second digit.
I've implemented sequential code for it and I'm trying to convert it into MPI.
I'm super noob with MPI and practicing it first time.
Here is my attempt with MPI.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 255
#define BS N/2
MPI_Status status;
int search_number(int arr[N][N],int row,int col,int digit_2){
int count=0;
for (int i=row-1;i<=row+1;i++){ //from -row to +row = 3 indexes for rows
for(int j=col-1;j<=col+1;j++){ //from -col to +col = 3 indexes for cols
// skip for [row,col] and -1 for both [i,j] as well as till maximum size
if(i<0 || j<0 || i>=N || j>=N || i==row && j==col) continue;
if(arr[i][j] == digit_2){ //if second number is found, increase the counter
count++;
}
}
}
return count;
}
int main(int argc, char **argv)
{
int nproc,taskId,source,i,j,k,positionX,positionY;
int sum=0;
MPI_Datatype type;
int a[N][N];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskId);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Type_vector(N, BS, N, MPI_INT, &type);
MPI_Type_commit(&type);
//root
if (taskId == 0) {
srand( time(NULL) );
//Generate two NxN matrix
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
a[i][j]= rand()%10;
}
}
printf("Passing 1st chunk:\n");
// first chunk
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
printf("Passing 2nd Chunk:\n");
//second chunk
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
}
//workers
source = 0;
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
//root receives results
if(taskId == 0)
{
printf("Count: %d\n",sum);
// printMatrix(resultFinal);
}
MPI_Finalize();
}
The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255. Can you point out my mistake?
The issue I'm facing is my program gets stuck at Passing Chunk 1 line
if I pass set N>255 on top. But works until 0 to 255.
As #Gilles Gouaillardet already pointed out in the comments, and more detailed on this answer:
MPI_Send() is allowed to block until a matching receive is posted (and
that generally happens when the message is "large") ... and the
required matching receive never gets posted.
A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0
before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0
communication with MPI_Sendrecv().
Besides that your parallelization seems wrong, namely:
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
to the process 0 and 1 you have send the same workload, and :
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
with the process 2 and 3 the same issue.
You should try to use a stencil alike approach where each process only shares the borders among them. For instance, a possible distribution, for a 4x4 matrix and 4 processes could be:
process 0 works with the rows 0th, 1th and 2th;
process 1 works with the rows 2th, 3th and 4th;
process 2 works with the rows 4th, 5th, 6th;
process 3 works with the rows 7th, 8th, 9th;
Currently, to each process you send BS*N elements, however in:
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
you specify that you are expecting to receive N*N.
Moreover in:
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
processes are working with positions of the matrix a that they did not receive, naturally that should not be the case.
Finally instead of
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
you should actually use a MPI_Reduce i.e.,
Reduces values on all processes to a single value

How to convert MPI_Reduce into MPI_Send and MPI_Recv?

I am working on a parallel processing program that uses MPI_Send() and MPI_Recv() instead of using MPI_Reduce(). I understand that MPI_Send() will need to send a value from each processor to the root processor aka 0 and MPI_Recv() will need to receive all of the values from each processor.
I keep getting the error where the value in Send will not be sent to the Receiving side thus making the final value 0. The MPI_Reduce() function is still in the code but commented out to see what needs to be replaced. Can anyone help?
#include "mpi.h"
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[])
{
int n, i;
double PI25DT = 3.141592653589793238462643;
double pi, h, sum, x;
int numprocs, myid;
double startTime, endTime;
/* Initialize MPI and get number of processes and my number or rank*/
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
/* Processor zero sets the number of intervals and starts its clock*/
if (myid==0) {
n=600000000;
startTime=MPI_Wtime();
for (int i = 0; i < numprocs; i++) {
if (i != myid) {
MPI_Send(&n, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
}
}
else {
MPI_Recv(&n, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
/* Calculate the width of intervals */
h = 1.0 / (double) n;
/* Initialize sum */
sum = 0.0;
/* Step over each inteval I own */
for (i = myid+1; i <= n; i += numprocs) {
/* Calculate midpoint of interval */
x = h * ((double)i - 0.5);
/* Add rectangle's area = height*width = f(x)*h */
sum += (4.0/(1.0+x*x))*h;
}
/* Get sum total on processor zero */
//MPI_Reduce(&sum,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
double value = 0;
if (myid != 0) {
MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
else {
for (int i = 1; i < numprocs; i++) {
MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
pi += value;
}
}
/* Print approximate value of pi and runtime*/
if (myid==0) {
printf("pi is approximately %.16f, Error is %e\n",
pi, fabs(pi - PI25DT));
endTime=MPI_Wtime();
printf("runtime is=%.16f",endTime-startTime);
}
MPI_Finalize();
return 0;
}
You are using MPI_INT to send a value of type double:
if (myid != 0) {
MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
// ^^^^^^^
}
int is 4 bytes long; double is 8 bytes long. Although the receive operation succeeds, it cannot construct a value of type MPI_DOUBLE given only 4 bytes from the message, so it doesn't write anything into value and it remains 0.0. Indeed, if you replace:
MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
with
MPI_Status status;
int count;
MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_DOUBLE, &count);
if (count == MPI_UNDEFINED) {
printf("Short message received\n");
MPI_Abort(MPI_COMM_WORLD, 0);
}
your program will abort, indicating that the body of the conditional statement was executed due to MPI_Get_count() returning MPI_UNDEFINED in count, which signals that the length of the received message was not an integer multiple of the size of MPI_DOUBLE.
Also, pi must be explicitly initialised to sum before the receive loop, otherwise you will get the wrong value of pi due to either of the following errors:
pi is left uninitialised and has arbitrary initial value, and
the contribution of rank 0 is not added to the final result.

MPI_Recv() freezing program, not receiving value from MPI_Send() in C

I'm trying to write an implementation of a hyper quicksort in MPI, but I'm having an issue where a process gets stuck on MPI_Recv().
While testing with 2 processes, it seems that inside the else of the if (rank % comm_sz == 0), process 1 is never receiving the pivot from process 0. Process 0 successfully sends its pivot and recurses through the method correctly. If put in some print debug statements and received the output:
(arr, 0, 2, 0, 9)
Rank 0 sending pivot 7 to 1
(arr, 1, 2, 0, 9)
Rank 1 pre-recv from 0
After which, the post-recv message from rank 1 never prints. Rank 0 prints its post-send message and continues through its section of the array. Is there something wrong with my implementation of MPI_Send() or MPI_Recv() that may be causing this?
Here is my code for the quicksort:
(For reference, comm_sz in the parameters for the method refers to the number of processes looking at that section of the array.)
void hyper_quick(int *array, int rank, int comm_sz, int s, int e) {
printf("(arr, %d, %d, %d, %d)\n", rank, comm_sz, s, e);
// Keeps recursing until there is only one element
if (s < e) {
int pivot;
if (comm_sz > 1) {
// One process gets a random pivot within its range and sends that to every process looking at that range
if (rank % comm_sz == 0) {
pivot = rand() % (e - s) + s;
for (int i = rank + 1; i < comm_sz; i++) {
int partner = rank + i;
printf("Rank %d sending pivot %d to %d\n", rank, pivot, partner);
MPI_Send(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD);
printf("Rank %d successfully sent %d to %d\n", rank, pivot, partner);
}
}
else {
int partner = rank - (rank % comm_sz);
printf("Rank %d pre-recv from %d\n", rank, partner);
MPI_Recv(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Rank %d received pivot %d from %d\n", rank, pivot, partner);
}
}
else {
pivot = rand() % (e - s) + s;
}
int tmp = array[pivot];
array[pivot] = array[e];
array[e] = tmp;
// Here is where the actual quick sort happens
int i = s;
int j = e - 1;
while (i < j) {
while (array[e] >= array[i] && i < j) {
i++;
}
while (array[e] < array[j] && i < j) {
j--;
}
if (i < j) {
tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
}
if (array[e] < array[i]) {
tmp = array[i];
array[i] = array[e];
array[e] = tmp;
pivot = i;
}
else {
pivot = e;
}
// Split remaining elements between remaining processes
if (comm_sz > 1) {
// Elements greater than pivot
if (rank % comm_sz >= comm_sz/2) {
hyper_quick(array, rank, comm_sz/2, pivot + 1, e);
}
// Elements lesser than pivot
else {
hyper_quick(array, rank, comm_sz/2, s, pivot - 1);
}
}
// Recurse remaining elements in current process
else {
hyper_quick(array, rank, 1, s, pivot - 1);
hyper_quick(array, rank, 1, pivot + 1, e);
}
}
Rank 0 sending pivot 7 to 1
MPI_Send(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD);
^^^^
So the sender tag is zero.
Rank 1 pre-recv from 0
MPI_Recv(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
^^^^
And the receiver tag is one.
If the receiver asks for only messages with a specific tag, it will not receive a message with a different tag.
Sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages, MPI allows senders and receivers to also specify message IDs with the message (known as tags). When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them. [MPI Tutorial -- Send and Receive]

mpi_gather, 2d dynamic array in c, exited on signal 6 (aborted)

After searching and searching finally I have function which allocate memory for nD array like vector or linear.
Function is:
int malloc2dint(int ***array, int n, int m)
{
/* allocate the n*m contiguous items */
int *p = (int *)malloc(n*m*sizeof(int));
if (!p) return -1;
/* allocate the row pointers into the memory */
(*array) = (int **)malloc(n*sizeof(int*));
if (!(*array))
{
free(p);
return -1;
}
/* set up the pointers into the contiguous memory */
int i;
for (i=0; i<n; i++)
(*array)[i] = &(p[i*m]);
return 0;
}
By using this method I can broadcast and also scatter 2d dynamic allocated array correctly but problem in MPI_Gather still exist.
main function is:
int length = atoi(argv[1]);
int rank, size, from, to, i, j, k, **first_array, **second_array, **result_array;
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
//2D dynamic memory allocation
malloc2dint(&first_array, length, length);
malloc2dint(&second_array, length, length);
malloc2dint(&result_array, length, length);
//Related boundary to each task
from = rank * length/size;
to = (rank+1) * length/size;
//Intializing first and second array
if (rank==0)
{
for(i=0; i<length; i++)
for(j=0; j<length; j++)
{
first_array[i][j] = 1;
second_array[i][j] = 1;
}
}
//Broadcast second array so all tasks will have it
MPI_Bcast (&(second_array[0][0]), length*length, MPI_INT, 0, MPI_COMM_WORLD);
//Scatter first array so each task has matrix values between its boundary
MPI_Scatter (&(first_array[0][0]), length*(length/size), MPI_INT, first_array[from], length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
//Now each task will calculate matrix multiplication for its part
for (i=from; i<to; i++)
for (j=0; j<length; j++)
{
result_array[i][j]=0;
for (k=0; k<length; k++)
result_array[i][j] += first_array[i][k]*second_array[k][j];
//printf("\nrank(%d)->result_array[%d][%d] = %d\n", rank, i, j, result_array[i][j]);
//this line print the correct value
}
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from]), length*(length/size), MPI_INT, result_array, length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
if (rank==0)
{
for (i=0; i<length; i++)
{
printf("\n\t| ");
for (j=0; j<length; j++)
printf("%2d ", result_array[i][j]);
printf("|\n");
}
}
MPI_Finalize();
return 0;
Now when I run mpirun -np 2 xxx.out 4 the output is:
| 4 4 4 4 | ---> Good Job!
| 4 4 4 4 | ---> Good Job!
| 1919252078 1852795251 1868524912 778400882 | ---> Where are you baby?!!!
| 540700531 1701080693 1701734758 2037588068 | ---> Where are you baby?!!!
Finally mpirun notice that the process rank 0 exited on signal 6 (aborted).
Strange point for me is where MPI_Bcast and MPI_Scatter work fine but MPI_Gather not.
Any help will highly appreciated
The problem is with how you are passing the buffers. You are doing it correctly in MPI_Scatter, but then do it incorrectly for MPI_Gather.
Passing the result_array as via &result_array[from] will read the memory where the pointer list is saved rather than the actual data of the matrix. Use &result_array[from][0] instead.
Similarly for the receive buffer. Pass &result_array[0][0] instead of result_array to pass a pointer to the position where the data lies in memory.
Hence, instead of:
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from]), length*(length/size), MPI_INT, result_array, length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);
Do:
//Gathering info from all task and put each partition to resulat_array
MPI_Gather (&(result_array[from][0]), length*(length/size), MPI_INT, &(result_array[0][0]), length*(length/size), MPI_INT, 0, MPI_COMM_WORLD);

MPI Broadcast 2D array

I have a 2D double precision array that is being manipulated in parallel by several processes. Each process manipulates a part of the array, and at the end of every iteration, I need to ensure that all the processes have the SAME copy of the 2D array.
Assuming an array of size 10*10 and 2 processes (or processors). Process 1 (P1) manipulates the first 5 rows of the 2D row (5*10=50 elements in total) and P2 manipulates the last 5 rows (50 elements total). And at the end of each iteration, I need P1 to have (ITS OWN first 5 rows + P2's last 5 rows). P2 should have (P1's first 5 rows + it's OWN last 5 rows). I hope the scenario is clear.
I am trying to broadcast using the code given below. But my program keeps exiting with this error: "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)".
I am already using a contiguous 2D memory allocator as pointed out here: MPI_Bcast a dynamic 2d array by Jonathan. But I am still getting the same error.
Can someone help me out?
My code:
double **grid, **oldgrid;
int gridsize; // size of grid
int rank, size; // rank of current process and no. of processes
int rowsforeachprocess, offset; // to keep track of rows that need to be handled by each process
/* allocation, MPI_Init, and lots of other stuff */
rowsforeachprocess = ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
/* Each process is handling "rowsforeachprocess" #rows.
* Lots of work done here
* Now I need to broadcast these rows to all other processes.
*/
for(i=0; i<gridsize; i++){
MPI_Bcast(&(oldgrid[i]), gridsize-2, MPI_DOUBLE, (i/rowsforeachprocess), MPI_COMM_WORLD);
}
Part 2: The code above is part of a parallel solver for the laplace equation using 1D decomposition and I did not want to use a Master-worker model. Will my code be easier if I use a Master-worker model?
The crash-causing problem here is a 2d-array pointer issue -- &(oldgrid[i]) is a pointer-to-a-pointer to doubles, not a pointer to doubles, and it points to the pointer to row i of your array, not to row i of your array. You want MPI_Bcast(&(oldgrid[i][0]),.. or MPI_Bcast(oldgrid[i],....
There's another way to do this, too, which only uses one expensive collective communicator instead of one per row; if you need everyone to have a copy of the whole array, you can use MPI_Allgather to gather the data together and distribute it to everyone; or, in the general case where the processes don't have the same number of rows, MPI_Allgatherv. Instead of the loop over broadcasts, this would look a little like:
{
int *counts = malloc(size*sizeof(int));
int *displs = malloc(size*sizeof(int));
for (int i=0; i<size; i++) {
counts[i] = rowsforeachprocess*gridsize;
displs[i] = i*rowsforeachprocess*gridsize;
}
counts[size-1] = (gridsize-(size-1)*rowsforeachprocess)*gridsize;
MPI_Allgatherv(oldgrid[offset], mynumrows*gridsize, MPI_DOUBLE,
oldgrid[0], counts, displs, MPI_DOUBLE, MPI_COMM_WORLD);
free(counts);
free(displs);
}
where counts are the number of items sent by each task, and displs are the displacements.
But finally, are you sure that every process has to have a copy of the entire array? If you're just computing a laplacian, you probably just need neighboring rows, not the whole array.
This would look like:
int main(int argc, char**argv) {
double **oldgrid;
const int gridsize=10; // size of grid
int rank, size; // rank of current process and no. of processes
int rowsforeachprocess; // to keep track of rows that need to be handled by each process
int offset, mynumrows;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
rowsforeachprocess = (int)ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
mynumrows = rowsforeachprocess;
if (rank == size-1)
mynumrows = gridsize-offset;
rowsforeachprocess = (int)ceil((float)gridsize/size);
offset = rank*rowsforeachprocess;
mynumrows = rowsforeachprocess;
if (rank == size-1)
mynumrows = gridsize-offset;
malloc2ddouble(&oldgrid, mynumrows+2, gridsize);
for (int i=0; i<mynumrows+2; i++)
for (int j=0; j<gridsize; j++)
oldgrid[i][j] = rank;
/* exchange row data with neighbours */
int highneigh = rank+1;
if (rank == size-1) highneigh = 0;
int lowneigh = rank-1;
if (rank == 0) lowneigh = size-1;
/* send data to high neibhour and receive from low */
MPI_Sendrecv(oldgrid[mynumrows], gridsize, MPI_DOUBLE, highneigh, 1,
oldgrid[0], gridsize, MPI_DOUBLE, lowneigh, 1,
MPI_COMM_WORLD, &status);
/* send data to low neibhour and receive from high */
MPI_Sendrecv(oldgrid[1], gridsize, MPI_DOUBLE, lowneigh, 1,
oldgrid[mynumrows+1], gridsize, MPI_DOUBLE, highneigh, 1,
MPI_COMM_WORLD, &status);
for (int proc=0; proc<size; proc++) {
if (rank == proc) {
printf("Rank %d:\n", proc);
for (int i=0; i<mynumrows+2; i++) {
for (int j=0; j<gridsize; j++) {
printf("%f ", oldgrid[i][j]);
}
printf("\n");
}
printf("\n");
}
MPI_Barrier(MPI_COMM_WORLD);
}

Resources