if I have this code:
int main(void) {
int result=0;
int num[6] = {1, 2, 4, 3, 7, 1};
if (my_rank != 0) {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
} else {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD)
printf("result = %d\n", result);
}
}
the result print is 1 ;
But if the num[0]=9; then the result is 9
I read to solve this problem I must to define the variable num as array.
I can't understand how the function MPI_Reduce works with MPI_MIN. Why, if the num[0] is not equal to the smallest number, then I must to define the variable num as array?
MPI_Reduce performs a reduction over the members of the communicator - not the members of the local array. sendbuf and recvbuf must both be of the same size.
I think the standard says it best:
Thus, all processes provide input buffers and output buffers of the same length, with elements of the same type. Each process can provide one element, or a sequence of elements, in which case the combine operation is executed element-wise on each entry of the sequence.
MPI does not get the minimum of all elements in the array, you have to do that manually.
You can use MPI_MIN to obtain the min value among those passed via reduction.
Lets' examine the function declaration:
int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype
datatype, MPI_Op op, int root, MPI_Comm comm)
Each process send it's value (or array of values) using the buffer sendbuff.
The process identified by the root id receive the buffers and stores them in the buffer recvbuf. The number of elements to receive from each of the other processes is specified in count, so that recvbuff must be allocated with dimension sizeof(datatype)*count.
If each process has only one integer to send (count = 1) then recvbuff it's also an integer, If each process has two integers then recvbuff it's an array of integers of size 2. See this nice post for further explanations and nice pictures.
Now it should be clear that your code is wrong, sendbuff and recvbuff must be of the same size and there is no need of the condition: if(myrank==0). Simply, recvbuff has meaning only for the root process and sendbuff for the others.
In your example you can assign one or more element of the array to a different process and then compute the minvalue (if there are as many processes as values in the array) or the array of minvalues (if there are more values than processes).
Here is a working example that illustrates the usage of MPI_MIN, MPI_MAX and MPI_SUM (slightly modified from this), in the case of simple values (not array).
Each process do some work, depending on their rank and send to the root process the time spent doing the work. The root process collect the times and output the min, max and average values of the times.
#include <stdio.h>
#include <mpi.h>
int myrank, numprocs;
/* just a function to waste some time */
float work()
{
float x, y;
if (myrank%2) {
for (int i = 0; i < 100000000; ++i) {
x = i/0.001;
y += x;
}
} else {
for (int i = 0; i < 100000; ++i) {
x = i/0.001;
y += x;
}
}
return y;
}
int main(int argc, char **argv)
{
int node;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &node);
printf("Hello World from Node %d\n",node);
/*variables used for gathering timing statistics*/
double mytime,
maxtime,
mintime,
avgtime;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Barrier(MPI_COMM_WORLD); /*synchronize all processes*/
mytime = MPI_Wtime(); /*get time just before work section */
work();
mytime = MPI_Wtime() - mytime; /*get time just after work section*/
/*compute max, min, and average timing statistics*/
MPI_Reduce(&mytime, &maxtime, 1, MPI_DOUBLE,MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&mytime, &mintime, 1, MPI_DOUBLE, MPI_MIN, 0,MPI_COMM_WORLD);
MPI_Reduce(&mytime, &avgtime, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
/* plot the output */
if (myrank == 0) {
avgtime /= numprocs;
printf("Min: %lf Max: %lf Avg: %lf\n", mintime, maxtime,avgtime);
}
MPI_Finalize();
return 0;
}
If I run this on my OSX laptop, this is what I get:
urcaurca$ mpirun -n 4 ./a.out
Hello World from Node 3
Hello World from Node 0
Hello World from Node 2
Hello World from Node 1
Min: 0.000974 Max: 0.985291 Avg: 0.493081
Related
I'm developing an application in c in which the user wants to find certain pattern of 2 digit numbers in a 2 Dimensional array.
For Example, There is a 10x10 array with random single digit numbers and user wants to find 1,0. Our program will search for 1 and when it is found, our program will search for 0 in all directions(top, bottom, sides, diagonals and anti diagonals) to depth 1. Simply, we can say it will search zero on the sides of 1 in a sub-matrix of size 3x3. The function search_number() is performing the job for searching second digit.
I've implemented sequential code for it and I'm trying to convert it into MPI.
I'm super noob with MPI and practicing it first time.
Here is my attempt with MPI.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 255
#define BS N/2
MPI_Status status;
int search_number(int arr[N][N],int row,int col,int digit_2){
int count=0;
for (int i=row-1;i<=row+1;i++){ //from -row to +row = 3 indexes for rows
for(int j=col-1;j<=col+1;j++){ //from -col to +col = 3 indexes for cols
// skip for [row,col] and -1 for both [i,j] as well as till maximum size
if(i<0 || j<0 || i>=N || j>=N || i==row && j==col) continue;
if(arr[i][j] == digit_2){ //if second number is found, increase the counter
count++;
}
}
}
return count;
}
int main(int argc, char **argv)
{
int nproc,taskId,source,i,j,k,positionX,positionY;
int sum=0;
MPI_Datatype type;
int a[N][N];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskId);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Type_vector(N, BS, N, MPI_INT, &type);
MPI_Type_commit(&type);
//root
if (taskId == 0) {
srand( time(NULL) );
//Generate two NxN matrix
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
a[i][j]= rand()%10;
}
}
printf("Passing 1st chunk:\n");
// first chunk
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
printf("Passing 2nd Chunk:\n");
//second chunk
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
}
//workers
source = 0;
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
//root receives results
if(taskId == 0)
{
printf("Count: %d\n",sum);
// printMatrix(resultFinal);
}
MPI_Finalize();
}
The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255. Can you point out my mistake?
The issue I'm facing is my program gets stuck at Passing Chunk 1 line
if I pass set N>255 on top. But works until 0 to 255.
As #Gilles Gouaillardet already pointed out in the comments, and more detailed on this answer:
MPI_Send() is allowed to block until a matching receive is posted (and
that generally happens when the message is "large") ... and the
required matching receive never gets posted.
A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0
before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0
communication with MPI_Sendrecv().
Besides that your parallelization seems wrong, namely:
MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
to the process 0 and 1 you have send the same workload, and :
MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
with the process 2 and 3 the same issue.
You should try to use a stencil alike approach where each process only shares the borders among them. For instance, a possible distribution, for a 4x4 matrix and 4 processes could be:
process 0 works with the rows 0th, 1th and 2th;
process 1 works with the rows 2th, 3th and 4th;
process 2 works with the rows 4th, 5th, 6th;
process 3 works with the rows 7th, 8th, 9th;
Currently, to each process you send BS*N elements, however in:
MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
you specify that you are expecting to receive N*N.
Moreover in:
for(int i=0;i<N;i++){
for(int j=0;j<N;j++){
if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
sum+= search_number(a,i,j,0); // funtion will return the count of 0's shared with 1
}
}
}
processes are working with positions of the matrix a that they did not receive, naturally that should not be the case.
Finally instead of
//Send result to root
MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
you should actually use a MPI_Reduce i.e.,
Reduces values on all processes to a single value
I'm trying to write a MPI program that calculates the sum of an array of integers.
For this purpose I used MPI_Scatter to send chunks of the array to the other processes then MPI_Gather to get the sum of each chunk by the root process(process 0).
The problem is one of the processes receives two elements but the other one receives random numbers. I'm running my code with 3 processes.
Here is what I have:
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int number1[2]; //buffer for processes
int sub_sum = 0;
int sub_sums[2];
int sum;
int number[4];
if(world_rank == 0){
number[0]=1;
number[1]=3;
number[2]=5;
number[3]=9;
}
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
if(world_rank!=0){
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
}
MPI_Gather(&sub_sum, 1, MPI_INT, &sub_sums, 1, MPI_INT, 0,MPI_COMM_WORLD);
if(world_rank == 0){
sum=0;
for(int i=0; i<2;i++){
sum+= sub_sums[i];
}
printf("\nthe sum of array is: %d\n",sum);
}
MPI_Finalize();
return 0;
}
The result:
I'm process 1 , I received the array : 5 9
I'm process 2 , I received the array : 1494772352 32767
the sum of array is: 14
It seems that you misunderstood how MPI works; Your code is hardcoded to work (correctly) with only two processes. However, you are trying to run the code with 3 processes, with the wrong assumption that the during the MPI_Scatter call the root rank will only send the data to the other processes. If you look at the following image (taken from source):
you notice that the root rank (i.e., rank = 0) also receives part of the data.
The problem is one of the processes receives two elements but the
other one receives random numbers.
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
So you have hardcoded an input as follows number{1,3,5,9} (with only 4 elements); and what is happen during the MPI_Scatter call is that process 0 will get the first and second elements from array number (i.e., {1, 3}), whereas process 1 gets the other two elements (i.e., {5, 9}), and the process 2 will get some random values, consequently:
I'm process 2 , I received the array : 1494772352 32767
You get
the sum of array is: 14
because the array sub_sums will have the sums performed by process 0, which is zero since you excluded, and process 1 which is 3 + 9. Hence, 0 + 14 = 14.
To fix this you need to remove if(world_rank!=0) from:
if(world_rank!=0){
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
}
and run your code with only 2 processes.
For the last step instead of the MPI_Gather you can used MPI_Reduce to perform the sum in parallel and collect the value directly on the root rank. Consequently, you would not need to performed the sum manually on the root rank.
A running example:
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int number1[2];
int number[4];
if(world_rank == 0){
number[0]=1;
number[1]=3;
number[2]=5;
number[3]=9;
}
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
int sub_sum = 0;
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
int sum = 0;
MPI_Reduce(&sub_sum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD);
if(world_rank == 0)
printf("\nthe sum of array is: %d\n",sum);
MPI_Finalize();
return 0;
}
Input : {1,3,5,9} running with 2 processes
Output
I'm process 0 , I received the array : 1 3
I'm process 1 , I received the array : 5 9
the sum of array is: 18
If you really want to only have the process 1 and 2 receive the data and performed the sum, I would suggest to look into the routines MPI_Send and MPI_Recv.
I have a very simple MPI program to test the behavior of MPI_Reduce. My objectives are simple:
~ Start by having each process create a random number (range 1-100)
Then run program with mpirun -np 5 <program_name_here>
Have process 0, find the sum of all 5 numbers
Have process 1, find the product of all 5 numbers
Have process 2, find the max of all 5 numbers
Have process 3, find the min of all 5 numbers
Have process 4, find the bitwise and of all 5 numbers
And here's my program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
int sum = 0;
int product = 0;
int max = 0;
int min = 0;
int bitwiseAnd = 0;
int main ( int argc, char **argv )
{
int my_id, num_procs;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
int num;
srand(time(NULL) * my_id);
num = rand() % 100; //give num the random number
printf("Process #%i: Here is num: %i\n",my_id,num);
if(my_id == 0){
printf("Okay it entered 0\n");
MPI_Reduce(&num, &sum,1,MPI_INT,MPI_SUM, 0, MPI_COMM_WORLD);
}else if(my_id == 1){
printf("Okay it entered 1\n");
MPI_Reduce(&num, &product,1,MPI_INT,MPI_PROD, 0, MPI_COMM_WORLD);
}else if(my_id == 2){
printf("Okay it entered 2\n");
MPI_Reduce(&num, &max,1,MPI_INT,MPI_MAX, 0, MPI_COMM_WORLD);
}else if(my_id == 3){
printf("Okay it entered 3\n");
MPI_Reduce(&num, &min,1,MPI_INT,MPI_MIN, 0, MPI_COMM_WORLD);
}else if(my_id == 4){
printf("Okay it entered 4\n");
MPI_Reduce(&num, &bitwiseAnd,1,MPI_INT,MPI_BAND, 0, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
if(my_id == 0){
printf("I am process %i and the sum is %i\n",my_id,sum);
printf("I am process %i and the product is %i\n",my_id,product);
printf("I am process %i and the max is %i\n",my_id,max);
printf("I am process %i and the min is %i\n",my_id,min);
printf("I am process %i and the bitwiseAdd is %i\n",my_id,bitwiseAnd);
}
MPI_Finalize();
}
This produces output like this:
[blah#blah example]$ mpirun -np 5 all
Process #2: Here is num: 21
Okay it entered 2
Process #4: Here is num: 52
Okay it entered 4
Process #0: Here is num: 83
Okay it entered 0
Process #1: Here is num: 60
Okay it entered 1
Process #3: Here is num: 66
Okay it entered 3
I am process 0 and the sum is 282
I am process 0 and the product is 0
I am process 0 and the max is 0
I am process 0 and the min is 0
I am process 0 and the bitwiseAdd is 0
[blah#blah example]$
Why doesn't process 0 pick up the MPI_Reduce results from the other processes?
I figured out what's wrong with your program by experimentation, and based on that, I have a hypothesis as to why it's wrong.
This modified version of your program does what you expected it to do:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main (int argc, char **argv)
{
int my_id;
int num_procs;
int num;
int sum = 0;
int product = 0;
int max = 0;
int min = 0;
int bitwiseAnd = 0;
int seed = time(0);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
srand(seed * my_id);
num = rand() % 100;
printf("Process #%i: Here is num: %i\n",my_id,num);
MPI_Reduce(&num, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &product, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &max, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &min, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &bitwiseAnd, 1, MPI_INT, MPI_BAND, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (my_id == 0) {
printf("The sum is %i\n", sum);
printf("The product is %i\n", product);
printf("The max is %i\n", max);
printf("The min is %i\n", min);
printf("The bitwiseAnd is %i\n", bitwiseAnd);
}
MPI_Finalize();
return 0;
}
Many of the changes I made are just cosmetic. The change that makes the difference is, all processes must execute all of the MPI_Reduce calls in order for all of the results to be computed.
Now, why does that matter? I must emphasize that this is a hypothesis. I do not know. But an explanation that fits the available facts is: in both my and your implementation of MPI, the actual computation in an MPI_Reduce call happens only on the root process, but all the other processes must also call MPI_Reduce in order to send a message with their values. That message doesn't depend on the operation argument. So the MPI_SUM call did what it was supposed to do by accident, because the other calls to MPI_Reduce provided the values it needed. But none of the other calls did any computation at all.
If my hypothesis is correct, you're going to need to structure your program quite a bit differently if you want to have each computation carried out in a different process. Abstractly, you want an all-to-all broadcast so that all processes have all the numbers, then local computation of sum, product, etc., then all-to-one send the values back to the root. If I'm reading http://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/#mpi_allgather-and-modification-of-average-program correctly, MPI_Allgather is the name of the function that does all-to-all broadcasts.
The answer from zwol is basically correct, but I would like to reassure his hypothesis:
MPI_Reduce is a collective operation, it has to be called by all members of the communicator argument. In case of MPI_COMM_WORLD this means all initial ranks in the application.
The MPI standard (5.9.1) is also helpful here:
The routine is called by all group members using the same arguments
for count, datatype, op, root and comm. Thus, all processes provide
input buffers of the same length [...]
It is important to understand, that the root is not the one doing all the computations. The operation is done in a distributed fashion, usually using a tree algorithm. This means only a logarithmic amount of time steps have to be performed and is much more efficient than just collecting all data to the root and performing the operation there, especially for large amount of ranks.
So if you want the result at rank 0, you indeed have to run the code unconditionally like this:
MPI_Reduce(&num, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &product, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &max, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &min, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &bitwiseAnd, 1, MPI_INT, MPI_BAND, 0, MPI_COMM_WORLD);
If you need the result at different ranks, you can change the root parameter accordingly. If you want the result to be available at all ranks, use MPI_Allreduce instead.
I new to MPI and I am trying to write program that uses MPI_scatter. I have 4 nodes(0, 1, 2, 3). Node0 is master, others are slaves. Master asks user for number of elements of array to send to slaves. Then it creates array of size number of elements * 4. Then every node prints it`s results.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
int main(int argc, char **argv) {
int id, nproc, len, numberE, i, sizeArray;
int *arrayN=NULL;
int arrayNlocal[sizeArray];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (id == MASTER){
printf("Enter number of elements: ");
scanf("%d", &numberE);
sizeArray = numberE * 4;
arrayN = malloc(numberE * sizeof(int));
for (i = 0; i < sizeArray; i++){
arrayN[i] = i + 1;
}
}
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal, numberE,MPI_INT, MPI_COMM_WORLD);
printf("Node %d has: ", id);
for (i = 0; i < numberE; i++){
printf("%d ",arrayNlocal[i]);
}
MPI_Finalize();
return 0;
}
And as error i get:
BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
PID 9278 RUNNING AT 192.168.100.100
EXIT CODE: 139
CLEANING UP REMAINING PROCESSES
YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
In arrayNlocal[sizeArray];, sizeArray is not initialized. The best way to go is to broadcast numberE to every processes and allocate memory for arrayNlocal. Something like:
MPI_Bcast( &numberE, 1, MPI_Int, 0, MPI_COMM_WORLD)
arrayN is an array of size sizeArray = numberE * 4, so:
arrayN = malloc(sizeArray * sizeof(int));
MPI_Scatter() needs pointers to the data to be sent on root node, and a pointer to receive buffer on each process of the communicator. Since arrayNlocal is an array:
MPI_Scatter(arrayN, numberE, MPI_INT, arrayNlocal, numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
or alternatively:
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal[0], numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
id is not initialized in id == MASTER: it must be rank==MASTER.
As is, the prints at the end might occur in a mixed way between processes.
Try to compile your code using mpicc main.c -o main -Wall to enable all warnings: it can save you a few hours in the near future!
I'm beginner in MPI programming. I'm trying to write a program that dynamically takes in an one dimensional arrays of different sizes (multiples of 100, 1000, 10000, 1000000 and so on) and scatters it to allotted processor cores. Processor cores calculate the sum of the received elements and send the sum back. The root process prints the sum of the elements in input array.
I used MPI_Scatter() and MPI_Reduce() to solve the problem. However, when the number of processor cores allotted are odd in number, some of the data get left out. For example, when I have input data size of 100 and 3 processes - only 99 elements are added and last one is left out.
I searched for the alternatives and found that MPI_Scatterv() can be used for uneven distribution of data. But there is no material available to guide me for it's implementation. Can someone help me? I'm posting my code here. Thanks in advance.
#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
void readArray(char * fileName, double ** a, int * n);
int Numprocs, MyRank;
int mpi_err;
#define Root = 0
void init_it(int *argc, char ***argv) {
mpi_err = MPI_Init(argc, argv);
mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
mpi_err = MPI_Comm_size(MPI_COMM_WORLD, &Numprocs);
}
int main(int argc, char** argv) {
/* .......Variables Initialisation ......*/
int index;
double *InputBuffer, *RecvBuffer, sum=0.0, psum = 0.0;
double ptime = 0.0, Totaltime= 0.0,startwtime = 0.0, endwtime = 0.0;
int Scatter_DataSize;
int DataSize;
FILE *fp;
init_it(&argc,&argv);
if (argc != 2) {
fprintf(stderr, "\n*** Usage: arraySum <inputFile>\n\n");
exit(1);
}
if (MyRank == 0) {
startwtime = MPI_Wtime();
printf("Number of nodes running %d\n",Numprocs);
/*...... Read input....*/
readArray(argv[1], &InputBuffer, &DataSize);
printf("Size of array %d\n", DataSize);
}
if (MyRank!=0) {
MPI_Recv(&DataSize, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, NULL);
}
else {
int i;
for (i=1;i<Numprocs;i++) {
MPI_Send(&DataSize, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
d[i]= i*Numprocs;
}
}
Scatter_DataSize = DataSize / Numprocs;
RecvBuffer = (double *)malloc(Scatter_DataSize * sizeof(double));
MPI_Barrier(MPI_COMM_WORLD);
mpi_err = MPI_Scatter(InputBuffer, Scatter_DataSize, MPI_DOUBLE,
RecvBuffer, Scatter_DataSize, MPI_DOUBLE,
0, MPI_COMM_WORLD);
for (index = 0; index < Scatter_DataSize; index++) {
psum = psum + RecvBuffer[index];
}
//printf("Processor %d computed sum %f\n", MyRank, psum);
mpi_err = MPI_Reduce(&psum, &sum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (MyRank == 0) {
endwtime = MPI_Wtime();
Totaltime = endwtime - startwtime;
printf("Total sum %f\n",sum);
printf("Total time %f\n", Totaltime);
}
MPI_Finalize();
return 0;
}
void readArray(char * fileName, double ** a, int * n) {
int count, DataSize;
double * InputBuffer;
FILE * fin;
fin = fopen(fileName, "r");
if (fin == NULL) {
fprintf(stderr, "\n*** Unable to open input file '%s'\n\n",
fileName);
exit(1);
}
fscanf(fin, "%d\n", &DataSize);
InputBuffer = (double *)malloc(DataSize * sizeof(double));
if (InputBuffer == NULL) {
fprintf(stderr, "\n*** Unable to allocate %d-length array", DataSize);
exit(1);
}
for (count = 0; count < DataSize; count++) {
fscanf(fin, "%lf", &InputBuffer[count]);
}
fclose(fin);
*n = DataSize;
*a = InputBuffer;
}
In your case, you may just play with the sendcount[] array of MPI_Scatterv. Indeed, a trivial implementation would be to compute the number of element (let say Nelement) of type sendtype that all the processes but one will reveive. One of the processes (for instance the last one) will get the remaining data. In that case, sendcount[i] = Nelement for indexes i from 0 to p-2 (p is the number of processes in the communicator, for you MPI_COMM_WORLD). Then the process p-1 will get sendcount[p-1] = DataSize-Nelement*(p-1). Concerning the array of displacements displs[], you have just to specify the displacement (in number of elements) from which to take the outgoing data to process i (cf. [1] page 161). For the previous example this would be:
for (i=0; i<p; ++i)
displs[i]=Nelement*i;
If you decide that another process q must compute the other data, think to set the good displacement displs[q+1] for the process q+1 with 0 ≤ q < q+1 ≤ p.
[1] MPI: A Message-Passing Interface Standard (Version 3.1): http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
The computation of the Scatter_Datasize:
Scatter_DataSize = DataSize / Numprocs;
is correct only if DataSize is a multiple of Numprocs, which in your case, as DataSize is always even, occurs when Numprocs is even. When Numprocs is odd you should explicitly compute the remainder and assign it to one MPI process, i suggest the last.