I am new to MPI and has written the following program using C language. Instead of using pointers, I would like to set up my array as shown below. My first array element reads correctly, after that, it won't read array elements. Can you please tell me if this is not the correct way of using scatter and gather
Following is the Result I get:
$ mpicc test.c -o test
$ mpirun -np 4 test
1. Processor 0 has data 0 1 2 3
2. Processor 0 has data 0
3. Processor 0 doubling the data, now has 5
2. Processor 1 has data 32767
3. Processor 1 doubling the data, now has 5
2. Processor 2 has data -437713961
3. Processor 2 doubling the data, now has 5
2. Processor 3 has data 60
3. Processor 3 doubling the data, now has 5
4. Processor 0 has data: 5 1 2 3
Correct Result should be:
$ mpicc test.c -o test
$ mpirun -np 4 test
1. Processor 0 has data 0 1 2 3
2. Processor 0 has data 0
3. Processor 0 doubling the data, now has 5
2. Processor 1 has data 1
3. Processor 1 doubling the data, now has 5
2. Processor 2 has data 2
3. Processor 2 doubling the data, now has 5
2. Processor 3 has data 3
3. Processor 3 doubling the data, now has 5
4. Processor 0 has data: 5 5 5 5
Any help would be greatly appreciated. Following code is run using 4 processors:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int globaldata[4]; /*wants to declare array this way*/
int localdata[4]; /*without using pointers*/
int i;
if (rank == 0) {
for (i = 0; i < size; i++)
globaldata[i] = i;
printf("1. Processor %d has data: ", rank);
for (i = 0; i < size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
MPI_Scatter(globaldata, 1, MPI_INT, &localdata, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("2. Processor %d has data %d\n", rank, localdata[rank]);
localdata[rank]= 5;
printf("3. Processor %d now has %d\n", rank, localdata[rank]);
MPI_Gather(&localdata, 1, MPI_INT, globaldata, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0) {
printf("4. Processor %d has data: ", rank);
for (i = 0; i < size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
MPI_Finalize();
return 0;
}
Your setup and your scatter is in principle ok. Your problem is in printing, as you have misunderstood a detail of scatter/gather here.
When scattering the 4-element array, each process gets only one element (as you define with the 2nd and 5th arguments of the MPI_Scatter call()). This element is stored in the 0-index of the local array. It is actually a scalar.
In general, you may scatter very big arrays and each process may still have to process a big local array. In these cases it is essential to correctly calculate the global indices and the local indices.
Assume the following toy problem: you want to scatter the array [1 2 3 4 5 6] to two processes. Proc0 should have the [1 2 3] part and the Proc1 should have the [4 5 6] part. In this case, the global array has size 6 and the local arrays have size 3. The Proc0 gets the global elements 0, 1, 2 and assigns them to its local 0, 1, 2. The Proc1 gets the global elements 3, 4, 5 and assigns them to its local 0, 1, 2.
Probably you will understand this concept better when you learn about the MPI_Scatterv which doesn't assume the same number of local elements for every process.
This version of your code seems to work:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int globaldata[4];/*wants to declare array this way*/
int localdata;/*without using pointers*/
int i;
if (rank == 0) {
for (i=0; i<size; i++)
globaldata[i] = i;
printf("1. Processor %d has data: ", rank);
for (i=0; i<size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
MPI_Scatter(globaldata, 1, MPI_INT, &localdata, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("2. Processor %d has data %d\n", rank, localdata);
localdata= 5;
printf("3. Processor %d now has %d\n", rank, localdata);
MPI_Gather(&localdata, 1, MPI_INT, globaldata, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0) {
printf("4. Processor %d has data: ", rank);
for (i=0; i<size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
MPI_Finalize();
return 0;
}
Enjoy learning MPI! :-)
Related
I'm developing an application in c in which the user wants to find certain pattern of 2 digit numbers in a 2 Dimensional array.
For Example, There is a 10x10 array with random single digit numbers and user wants to find 1,0. Our program will search for 1 and when it is found, our program will search for 0 in all directions(top, bottom, sides, diagonals and anti diagonals) to depth 1. Simply, we can say it will search zero on the sides of 1 in a sub-matrix of size 3x3. The function search_number() is performing the job for searching second digit.
I've implemented sequential code for it and I'm trying to convert it into MPI.
I'm super noob with MPI and practicing it first time.
Here is my attempt with MPI.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 255
#define BS N/2
MPI_Status status;
int search_number(int arr[N][N],int row,int col,int digit_2){
int count=0;
for (int i=row-1;i<=row+1;i++){ //from -row to +row = 3 indexes for rows
for(int j=col-1;j<=col+1;j++){ //from -col to +col = 3 indexes for cols
// skip for [row,col] and -1 for both [i,j] as well as till maximum size
if(i<0 || j<0 || i>=N || j>=N || i==row && j==col) continue;
if(arr[i][j] == digit_2){ //if second number is found, increase the counter
count++;
}
}
}
return count;
}
int main(int argc, char **argv)
{
int nproc,taskId,source,i,j,k,positionX,positionY;
int sum=0;
MPI_Datatype type;
int a[N][N];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskId);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Type_vector(N, BS, N, MPI_INT, &type);
MPI_Type_commit(&type);
//root
if (taskId == 0) {
srand( time(NULL) );
//Generate two NxN matrix
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
a[i][j]= rand()%10;
}
}
printf("Passing 1st chunk:\n");
// first chunk
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
printf("Passing 2nd Chunk:\n");
//second chunk
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
}
//workers
source = 0;
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
//root receives results
if(taskId == 0)
{
printf("Count: %d\n",sum);
// printMatrix(resultFinal);
}
MPI_Finalize();
}
The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255. Can you point out my mistake?
The issue I'm facing is my program gets stuck at Passing Chunk 1 line
if I pass set N>255 on top. But works until 0 to 255.
As #Gilles Gouaillardet already pointed out in the comments, and more detailed on this answer:
MPI_Send() is allowed to block until a matching receive is posted (and
that generally happens when the message is "large") ... and the
required matching receive never gets posted.
A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0
before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0
communication with MPI_Sendrecv().
Besides that your parallelization seems wrong, namely:
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
to the process 0 and 1 you have send the same workload, and :
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
with the process 2 and 3 the same issue.
You should try to use a stencil alike approach where each process only shares the borders among them. For instance, a possible distribution, for a 4x4 matrix and 4 processes could be:
process 0 works with the rows 0th, 1th and 2th;
process 1 works with the rows 2th, 3th and 4th;
process 2 works with the rows 4th, 5th, 6th;
process 3 works with the rows 7th, 8th, 9th;
Currently, to each process you send BS*N elements, however in:
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
you specify that you are expecting to receive N*N.
Moreover in:
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
processes are working with positions of the matrix a that they did not receive, naturally that should not be the case.
Finally instead of
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
you should actually use a MPI_Reduce i.e.,
Reduces values on all processes to a single value
I'm trying to write a MPI program that calculates the sum of an array of integers.
For this purpose I used MPI_Scatter to send chunks of the array to the other processes then MPI_Gather to get the sum of each chunk by the root process(process 0).
The problem is one of the processes receives two elements but the other one receives random numbers. I'm running my code with 3 processes.
Here is what I have:
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int number1[2]; //buffer for processes
int sub_sum = 0;
int sub_sums[2];
int sum;
int number[4];
if(world_rank == 0){
number[0]=1;
number[1]=3;
number[2]=5;
number[3]=9;
}
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
if(world_rank!=0){
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
}
MPI_Gather(&sub_sum, 1, MPI_INT, &sub_sums, 1, MPI_INT, 0,MPI_COMM_WORLD);
if(world_rank == 0){
sum=0;
for(int i=0; i<2;i++){
sum+= sub_sums[i];
}
printf("\nthe sum of array is: %d\n",sum);
}
MPI_Finalize();
return 0;
}
The result:
I'm process 1 , I received the array : 5 9
I'm process 2 , I received the array : 1494772352 32767
the sum of array is: 14
It seems that you misunderstood how MPI works; Your code is hardcoded to work (correctly) with only two processes. However, you are trying to run the code with 3 processes, with the wrong assumption that the during the MPI_Scatter call the root rank will only send the data to the other processes. If you look at the following image (taken from source):
you notice that the root rank (i.e., rank = 0) also receives part of the data.
The problem is one of the processes receives two elements but the
other one receives random numbers.
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
So you have hardcoded an input as follows number{1,3,5,9} (with only 4 elements); and what is happen during the MPI_Scatter call is that process 0 will get the first and second elements from array number (i.e., {1, 3}), whereas process 1 gets the other two elements (i.e., {5, 9}), and the process 2 will get some random values, consequently:
I'm process 2 , I received the array : 1494772352 32767
You get
the sum of array is: 14
because the array sub_sums will have the sums performed by process 0, which is zero since you excluded, and process 1 which is 3 + 9. Hence, 0 + 14 = 14.
To fix this you need to remove if(world_rank!=0) from:
if(world_rank!=0){
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
}
and run your code with only 2 processes.
For the last step instead of the MPI_Gather you can used MPI_Reduce to perform the sum in parallel and collect the value directly on the root rank. Consequently, you would not need to performed the sum manually on the root rank.
A running example:
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int number1[2];
int number[4];
if(world_rank == 0){
number[0]=1;
number[1]=3;
number[2]=5;
number[3]=9;
}
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
int sub_sum = 0;
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
int sum = 0;
MPI_Reduce(&sub_sum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD);
if(world_rank == 0)
printf("\nthe sum of array is: %d\n",sum);
MPI_Finalize();
return 0;
}
Input : {1,3,5,9} running with 2 processes
Output
I'm process 0 , I received the array : 1 3
I'm process 1 , I received the array : 5 9
the sum of array is: 18
If you really want to only have the process 1 and 2 receive the data and performed the sum, I would suggest to look into the routines MPI_Send and MPI_Recv.
I am trying to learn to use MPI. Below is my simple program to test MPI scatter and gather. I don't understand how it works and why it produces the result
1 2 3 4 4 5 6 7 8 9 10 11
instead of expected
1 2 3 4 5 6 7 8 9 10 11 12
The documentation and all the examples I can find are too complicated/poorly worded for me to understand. I just want to scatter an array across 3 processes and add one to each value in each process. Alternatively I would be happy to see how a 2D array was sent row by row to each process and each row was processed simply.
int main(int argc, char **argv) {
int rank; // my process ID
int size = 3; // number of processes/nodes
MPI_Status status;
MPI_Init(&argc, &argv); // start MPI
MPI_Comm_size(MPI_COMM_WORLD, &size); // initialize MPI
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
unsigned char inData[12]; // data returned after being "processed"
unsigned char outData[12]; // buffer for receiving data
unsigned long datasize = 12; // size of data to process
unsigned char testData[12]; // data to be processed
if (rank == 0) {
// initialize data
for (int i = 0; i < datasize; i++) {
testData[i] = i;
outData[i] = 0;
inData[i] = 0;
}
}
// scatter the data to the processes
// I am not clear about the numbers sent in and out
MPI_Scatter(&testData, 12, MPI_UNSIGNED_CHAR, &outData,
12, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
// process data
for (int i = 0; i < 4; i++) { outData[i] = outData[i] + 1; }
MPI_Barrier(MPI_COMM_WORLD);
// gather processed data
MPI_Gather(&outData, 12, MPI_UNSIGNED_CHAR, &inData,
12, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
//print processed data from root
if (rank == 0) {
for (int i = 0; i < 12; i++) {
printf("\n%d", inData[i]);
}
MPI_Finalize();
}
return 0;
}
Though your main error is using 12 instead of 4, let's do it step-by-step.
// int size = 3; // number of processes/nodes
int size;
...
MPI_Comm_size(MPI_COMM_WORLD, &size); // initialize MPI
assert(size == 3);
There is no point in setting size to 3. This value will be overwritten by MPI_Comm_size with the actual number of processes. This number is determined by how you run your MPI application (e.g. mpirun -np 3).
//unsigned char outData[12]; // buffer for receiving data
unsigned char outData[4];
We have 12 elements and 3 processes, 4 elements per processes. So, 4 elements are enough for outData.
outData[i] = 0;
inData[i] = 0;
There is no point in zeroing these buffers, they will be overwritten.
// scatter the data to the processes
// I am not clear about the numbers sent in and out
MPI_Scatter(&testData, 4 /*12*/, MPI_UNSIGNED_CHAR, &outData,
4 /*12*/, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
We have 4 elements per processes, so the number should be 4, not 12.
MPI_Barrier(MPI_COMM_WORLD);
You don't need barriers here.
MPI_Gather(&outData, 4 /*12*/, MPI_UNSIGNED_CHAR, &inData,
4 /*12*/, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD);
Same story, 4 instead of 12.
MPI_Finalize();
This should be called by all processes.
I new to MPI and I am trying to write program that uses MPI_scatter. I have 4 nodes(0, 1, 2, 3). Node0 is master, others are slaves. Master asks user for number of elements of array to send to slaves. Then it creates array of size number of elements * 4. Then every node prints it`s results.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
int main(int argc, char **argv) {
int id, nproc, len, numberE, i, sizeArray;
int *arrayN=NULL;
int arrayNlocal[sizeArray];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (id == MASTER){
printf("Enter number of elements: ");
scanf("%d", &numberE);
sizeArray = numberE * 4;
arrayN = malloc(numberE * sizeof(int));
for (i = 0; i < sizeArray; i++){
arrayN[i] = i + 1;
}
}
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal, numberE,MPI_INT, MPI_COMM_WORLD);
printf("Node %d has: ", id);
for (i = 0; i < numberE; i++){
printf("%d ",arrayNlocal[i]);
}
MPI_Finalize();
return 0;
}
And as error i get:
BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
PID 9278 RUNNING AT 192.168.100.100
EXIT CODE: 139
CLEANING UP REMAINING PROCESSES
YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
In arrayNlocal[sizeArray];, sizeArray is not initialized. The best way to go is to broadcast numberE to every processes and allocate memory for arrayNlocal. Something like:
MPI_Bcast( &numberE, 1, MPI_Int, 0, MPI_COMM_WORLD)
arrayN is an array of size sizeArray = numberE * 4, so:
arrayN = malloc(sizeArray * sizeof(int));
MPI_Scatter() needs pointers to the data to be sent on root node, and a pointer to receive buffer on each process of the communicator. Since arrayNlocal is an array:
MPI_Scatter(arrayN, numberE, MPI_INT, arrayNlocal, numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
or alternatively:
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal[0], numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
id is not initialized in id == MASTER: it must be rank==MASTER.
As is, the prints at the end might occur in a mixed way between processes.
Try to compile your code using mpicc main.c -o main -Wall to enable all warnings: it can save you a few hours in the near future!
So far, my application is reading in a txt file with a list of integers. These integers needs to be stored in an array by the master process i.e. processor with rank 0. This is working fine.
Now, when I run the program I have an if statement checking whether it's the master process and if it is, I'm executing the MPI_Scatter command.
From what I understand this will subdivide the array with the numbers and pass it out to the slave processes i.e. all with rank > 0 . However, I'm not sure how to handle the MPI_Scatter. How does the slave process "subscribe" to get the sub-array? How can I tell the non-master processes to do something with the sub-array?
Can someone please provide a simple example to show me how the master process sends out elements from the array and then have the slaves add the sum and return this to the master, which adds all the sums together and prints it out?
My code so far:
#include <stdio.h>
#include <mpi.h>
//A pointer to the file to read in.
FILE *fr;
int main(int argc, char *argv[]) {
int rank,size,n,number_read;
char line[80];
int numbers[30];
int buffer[30];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
fr = fopen ("int_data.txt","rt"); //We open the file to be read.
if(rank ==0){
printf("my rank = %d\n",rank);
//Reads in the flat file of integers and stores it in the array 'numbers' of type int.
n=0;
while(fgets(line,80,fr) != NULL) {
sscanf(line, "%d", &number_read);
numbers[n] = number_read;
printf("I am processor no. %d --> At element %d we have number: %d\n",rank,n,numbers[n]);
n++;
}
fclose(fr);
MPI_Scatter(&numbers,2,MPI_INT,&buffer,2,MPI_INT,rank,MPI_COMM_WORLD);
}
else {
MPI_Gather ( &buffer, 2, MPI_INT, &numbers, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("%d",buffer[0]);
}
MPI_Finalize();
return 0;
}
This is a common misunderstanding of how operations work in MPI with people new to it; particularly with collective operations, where people try to start using broadcast (MPI_Bcast) just from rank 0, expecting the call to somehow "push" the data to the other processors. But that's not really how MPI routines work; most MPI communication requires both the sender and the receiver to make MPI calls.
In particular, MPI_Scatter() and MPI_Gather() (and MPI_Bcast, and many others) are collective operations; they have to be called by all of the tasks in the communicator. All processors in the communicator make the same call, and the operation is performed. (That's why scatter and gather both require as one of the parameters the "root" process, where all the data goes to / comes from). By doing it this way, the MPI implementation has a lot of scope to optimize the communication patterns.
So here's a simple example (Updated to include gather):
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int *globaldata=NULL;
int localdata;
if (rank == 0) {
globaldata = malloc(size * sizeof(int) );
for (int i=0; i<size; i++)
globaldata[i] = 2*i+1;
printf("Processor %d has data: ", rank);
for (int i=0; i<size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
MPI_Scatter(globaldata, 1, MPI_INT, &localdata, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Processor %d has data %d\n", rank, localdata);
localdata *= 2;
printf("Processor %d doubling the data, now has %d\n", rank, localdata);
MPI_Gather(&localdata, 1, MPI_INT, globaldata, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0) {
printf("Processor %d has data: ", rank);
for (int i=0; i<size; i++)
printf("%d ", globaldata[i]);
printf("\n");
}
if (rank == 0)
free(globaldata);
MPI_Finalize();
return 0;
}
Running it gives:
gpc-f103n084-$ mpicc -o scatter-gather scatter-gather.c -std=c99
gpc-f103n084-$ mpirun -np 4 ./scatter-gather
Processor 0 has data: 1 3 5 7
Processor 0 has data 1
Processor 0 doubling the data, now has 2
Processor 3 has data 7
Processor 3 doubling the data, now has 14
Processor 2 has data 5
Processor 2 doubling the data, now has 10
Processor 1 has data 3
Processor 1 doubling the data, now has 6
Processor 0 has data: 2 6 10 14