MPI with C slower if more processes are used - c

I am learning MPI with C and I wrote a code based on the one presented in this link:
In this code a vector containing 1e8 values are summed. However, I am observing that when using more processes the run time is getting bigger. The code is given bellow:
Based on the code presented at
Code which splits a vector and send information to other processes.
In case of main vector does not split equally to all processes, the leftover is passed to process id 1.
Process id 0 is the root process. Therefore it does not count while passing information.
Each process will calculate the partial sum of vector values and send it back to root process, which will calculate the total sum.
Since the processes are independent, the printing order will be different at each run.
compile as: mpicc -o vector_sum vector_send.c -lm
run as: time mpirun -n x vector_sum
x = number of splits desired + root process. For example: if * = 3, the vector will be splited in two.
#define vec_len 100000000
double vec1[vec_len];
double vec2[vec_len];
int main(int argc, char* argv[]){
// defining program variables
int i;
double sum, partial_sum;
// defining parallel step variables
int my_id, num_proc, ierr, an_id, root_process; // id of process and total number of processes
int num_2_send, num_2_recv, start_point, vec_size, rows_per_proc, leftover;
ierr = MPI_Init(&argc, &argv);
root_process = 0;
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
if(my_id == root_process){
// Root process: Define vector size, how to split vector and send information to workers
vec_size = 1e8; // size of main vector
for(i = 0; i < vec_size; i++){
//vec1[i] = pow(-1.0,i+2)/(2.0*(i+1)-1.0); // defining main vector... Correct answer for total sum = 0.78539816339
vec1[i] = pow(i,2)+1.0; // defining main vector...
//printf("Main vector position %d: %f\n", i, vec1[i]); // uncomment if youwhish to print the main vector
rows_per_proc = vec_size / (num_proc - 1); // average values per process: using (num_proc - 1) because proc 0 does not count as a worker.
rows_per_proc = floor(rows_per_proc); // getting the maximum integer possible.
leftover = vec_size - (num_proc - 1)*rows_per_proc; // counting the leftover.
// spliting and sending the values
for(an_id = 1; an_id < num_proc; an_id++){
if(an_id == 1){ // worker id 1 will have more values if there is any leftover.
num_2_send = rows_per_proc + leftover; // counting the amount of data to be sent.
start_point = (an_id - 1)*num_2_send; // defining initial position in the main vector (data will be sent from here)
num_2_send = rows_per_proc;
start_point = (an_id - 1)*num_2_send + leftover; // starting point for other processes if there is leftover.
ierr = MPI_Send(&num_2_send, 1, MPI_INT, an_id, 1234, MPI_COMM_WORLD); // sending the information of how many data is going to workers.
ierr = MPI_Send(&vec1[start_point], num_2_send, MPI_DOUBLE, an_id, 1234, MPI_COMM_WORLD); // sending pieces of the main vector.
sum = 0;
for(an_id = 1; an_id < num_proc; an_id++){
ierr = MPI_Recv(&partial_sum, 1, MPI_DOUBLE, an_id, 4321, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving partial sum.
sum = sum + partial_sum;
printf("Total sum = %f.\n", sum);
// Workers:define which operation will be carried out by each one
ierr = MPI_Recv(&num_2_recv, 1, MPI_INT, root_process, 1234, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving the information of how many data worker must expect.
ierr = MPI_Recv(&vec2, num_2_recv, MPI_DOUBLE, root_process, 1234, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving main vector pieces.
partial_sum = 0;
for(i=0; i < num_2_recv; i++){
//printf("Position %d from worker id %d: %d\n", i, my_id, vec2[i]); // uncomment if youwhish to print position, id and value of splitted vector
partial_sum = partial_sum + vec2[i];
printf("Partial sum of %d: %f\n",my_id, partial_sum);
ierr = MPI_Send(&partial_sum, 1, MPI_DOUBLE, root_process, 4321, MPI_COMM_WORLD); // sending partial sum to root process.
ierr = MPI_Finalize();
Obs.: Compile as
mpicc -o vector_sum vector_send.c -lm
and run as:
time mpirun -n x vector_sum
with x = 2 and 5. You will see that with x=5 it takes more time to run.
Did I do something wrong? I did not expected it to be slower, since the summation of each chunk is independent. Or it is a matter of how the program is sending the information for each process? It seems to me that the loops for sending the information for each process is the responsible for this longer time.

As suggested by Gilles Gouaillardet ( I modified the code to generate the vector pieces in each process instead of passing them from the root process. It worked! Now the elapsed time is smaller for more processes. I am posting the new code bellow:
Based on the code presented at
Code which calculate the sum of a vector using parallel computation.
In case of main vector does not split equally to all processes, the leftover is passed to process id 1.
Process id 0 is the root process. Therefore it does not count while passing information.
Each process will generate and calculate the partial sum of the vector values and send it back to the root process, which will calculate the total sum.
Since the processes are independent, the printing order will be different at each run.
compile as: mpicc -o vector_sum vector_send.c -lm
run as: time mpirun -n x vector_sum
x = number of splits desired + root process. For example: if * = 3, the vector will be splited in two.
Acknowledgements: I would like to thanks Gilles Gouaillardet ( for the helpful suggestion.
#define vec_len 100000000
double vec2[vec_len];
int main(int argc, char* argv[]){
// defining program variables
int i;
double sum, partial_sum;
// defining parallel step variables
int my_id, num_proc, ierr, an_id, root_process; // id of process and total number of processes
int vec_size, rows_per_proc, leftover, num_2_gen, start_point;
ierr = MPI_Init(&argc, &argv);
root_process = 0;
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
if(my_id == root_process){
vec_size = 1e8; // defining main vector size
rows_per_proc = vec_size / (num_proc - 1); // average values per process: using (num_proc - 1) because proc 0 does not count as a worker.
rows_per_proc = floor(rows_per_proc); // getting the maximum integer possible.
leftover = vec_size - (num_proc - 1)*rows_per_proc; // counting the leftover.
// defining the number of data and position corresponding to main vector
for(an_id = 1; an_id < num_proc; an_id++){
if(an_id == 1){ // worker id 1 will have more values if there is any leftover.
num_2_gen = rows_per_proc + leftover; // counting the amount of data to be generated.
start_point = (an_id - 1)*num_2_gen; // defining corresponding initial position in the main vector.
num_2_gen = rows_per_proc;
start_point = (an_id - 1)*num_2_gen + leftover; // defining corresponding initial position in the main vector for other processes if there is leftover.
ierr = MPI_Send(&num_2_gen, 1, MPI_INT, an_id, 1234, MPI_COMM_WORLD); // sending the information of how many data must be generated.
ierr = MPI_Send(&start_point, 1, MPI_INT, an_id, 1234, MPI_COMM_WORLD); // sending the information of initial positions on main vector.
sum = 0;
for(an_id = 1; an_id < num_proc; an_id++){
ierr = MPI_Recv(&partial_sum, 1, MPI_DOUBLE, an_id, 4321, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving partial sum.
sum = sum + partial_sum;
printf("Total sum = %f.\n", sum);
ierr = MPI_Recv(&num_2_gen, 1, MPI_INT, root_process, 1234, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving the information of how many data worker must generate.
ierr = MPI_Recv(&start_point, 1, MPI_INT, root_process, 1234, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // recieving the information of initial positions.
// generate and sum vector pieces
partial_sum = 0;
for(i = start_point; i < start_point + num_2_gen; i++){
vec2[i] = pow(i,2)+1.0;
partial_sum = partial_sum + vec2[i];
printf("Partial sum of %d: %f\n",my_id, partial_sum);
ierr = MPI_Send(&partial_sum, 1, MPI_DOUBLE, root_process, 4321, MPI_COMM_WORLD); // sending partial sum to root process.
ierr = MPI_Finalize();
return 0;
In this new version, instead of passing the main vector pieces, it is passed the just the information of how generate those pieces in each process.

The new code using MPI_Reduce() is faster and simpler than the previous one:
Based on the code presented at
Code which calculate the sum of a vector using parallel computation.
In case of main vector does not split equally to all processes, the leftover is passed to process id 0.
Process id 0 is the root process. However, it will also perform part of calculations.
Each process will generate and calculate the partial sum of the vector values. It will be used MPI_Reduce() to calculate the total sum.
Since the processes are independent, the printing order will be different at each run.
compile as: mpicc -o vector_sum vector_sum.c -lm
run as: time mpirun -n x vector_sum
x = number of splits desired + root process. For example: if x = 3, the vector will be splited in two.
Acknowledgements: I would like to thanks Gilles Gouaillardet ( for the helpful suggestion.
#define vec_len 100000000
double vec2[vec_len];
int main(int argc, char* argv[]){
// defining program variables
int i;
double sum, partial_sum;
// defining parallel step variables
int my_id, num_proc, ierr, an_id, root_process;
int vec_size, rows_per_proc, leftover, num_2_gen, start_point;
vec_size = 1e8; // defining the main vector size
ierr = MPI_Init(&argc, &argv);
root_process = 0;
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
rows_per_proc = vec_size/num_proc; // getting the number of elements for each process.
rows_per_proc = floor(rows_per_proc); // getting the maximum integer possible.
leftover = vec_size - num_proc*rows_per_proc; // counting the leftover.
if(my_id == 0){
num_2_gen = rows_per_proc + leftover; // if there is leftover, it is calculate in process 0
start_point = my_id*num_2_gen; // the corresponding position on the main vector
num_2_gen = rows_per_proc;
start_point = my_id*num_2_gen + leftover; // the corresponding position on the main vector
partial_sum = 0;
for(i = start_point; i < start_point + num_2_gen; i++){
vec2[i] = pow(i,2) + 1.0; // defining vector values
partial_sum += vec2[i]; // calculating partial sum
printf("Partial sum of process id %d: %f.\n", my_id, partial_sum);
MPI_Reduce(&partial_sum, &sum, 1, MPI_DOUBLE, MPI_SUM, root_process, MPI_COMM_WORLD); // calculating total sum
if(my_id == root_process){
printf("Total sum is %f.\n", sum);
ierr = MPI_Finalize();
return 0;


Sum of the element of an array in parallel?

I'm trying to write a MPI program that calculates the sum of an array of integers.
For this purpose I used MPI_Scatter to send chunks of the array to the other processes then MPI_Gather to get the sum of each chunk by the root process(process 0).
The problem is one of the processes receives two elements but the other one receives random numbers. I'm running my code with 3 processes.
Here is what I have:
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
int number1[2]; //buffer for processes
int sub_sum = 0;
int sub_sums[2];
int sum;
int number[4];
if(world_rank == 0){
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
MPI_Gather(&sub_sum, 1, MPI_INT, &sub_sums, 1, MPI_INT, 0,MPI_COMM_WORLD);
if(world_rank == 0){
for(int i=0; i<2;i++){
sum+= sub_sums[i];
printf("\nthe sum of array is: %d\n",sum);
return 0;
The result:
I'm process 1 , I received the array : 5 9
I'm process 2 , I received the array : 1494772352 32767
the sum of array is: 14
It seems that you misunderstood how MPI works; Your code is hardcoded to work (correctly) with only two processes. However, you are trying to run the code with 3 processes, with the wrong assumption that the during the MPI_Scatter call the root rank will only send the data to the other processes. If you look at the following image (taken from source):
you notice that the root rank (i.e., rank = 0) also receives part of the data.
The problem is one of the processes receives two elements but the
other one receives random numbers.
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
So you have hardcoded an input as follows number{1,3,5,9} (with only 4 elements); and what is happen during the MPI_Scatter call is that process 0 will get the first and second elements from array number (i.e., {1, 3}), whereas process 1 gets the other two elements (i.e., {5, 9}), and the process 2 will get some random values, consequently:
I'm process 2 , I received the array : 1494772352 32767
You get
the sum of array is: 14
because the array sub_sums will have the sums performed by process 0, which is zero since you excluded, and process 1 which is 3 + 9. Hence, 0 + 14 = 14.
To fix this you need to remove if(world_rank!=0) from:
printf("I'm process %d , I received the array : ",world_rank);
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
and run your code with only 2 processes.
For the last step instead of the MPI_Gather you can used MPI_Reduce to perform the sum in parallel and collect the value directly on the root rank. Consequently, you would not need to performed the sum manually on the root rank.
A running example:
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
int number1[2];
int number[4];
if(world_rank == 0){
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("I'm process %d , I received the array : ",world_rank);
int sub_sum = 0;
for(int i=0 ; i<2 ; i++){
printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
int sum = 0;
MPI_Reduce(&sub_sum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD);
if(world_rank == 0)
printf("\nthe sum of array is: %d\n",sum);
return 0;
Input : {1,3,5,9} running with 2 processes
I'm process 0 , I received the array : 1 3
I'm process 1 , I received the array : 5 9
the sum of array is: 18
If you really want to only have the process 1 and 2 receive the data and performed the sum, I would suggest to look into the routines MPI_Send and MPI_Recv.

How to convert MPI_Reduce into MPI_Send and MPI_Recv?

I am working on a parallel processing program that uses MPI_Send() and MPI_Recv() instead of using MPI_Reduce(). I understand that MPI_Send() will need to send a value from each processor to the root processor aka 0 and MPI_Recv() will need to receive all of the values from each processor.
I keep getting the error where the value in Send will not be sent to the Receiving side thus making the final value 0. The MPI_Reduce() function is still in the code but commented out to see what needs to be replaced. Can anyone help?
#include "mpi.h"
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[])
int n, i;
double PI25DT = 3.141592653589793238462643;
double pi, h, sum, x;
int numprocs, myid;
double startTime, endTime;
/* Initialize MPI and get number of processes and my number or rank*/
/* Processor zero sets the number of intervals and starts its clock*/
if (myid==0) {
for (int i = 0; i < numprocs; i++) {
if (i != myid) {
MPI_Send(&n, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
else {
/* Calculate the width of intervals */
h = 1.0 / (double) n;
/* Initialize sum */
sum = 0.0;
/* Step over each inteval I own */
for (i = myid+1; i <= n; i += numprocs) {
/* Calculate midpoint of interval */
x = h * ((double)i - 0.5);
/* Add rectangle's area = height*width = f(x)*h */
sum += (4.0/(1.0+x*x))*h;
/* Get sum total on processor zero */
double value = 0;
if (myid != 0) {
MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
else {
for (int i = 1; i < numprocs; i++) {
pi += value;
/* Print approximate value of pi and runtime*/
if (myid==0) {
printf("pi is approximately %.16f, Error is %e\n",
pi, fabs(pi - PI25DT));
printf("runtime is=%.16f",endTime-startTime);
return 0;
You are using MPI_INT to send a value of type double:
if (myid != 0) {
MPI_Send(&sum, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
// ^^^^^^^
int is 4 bytes long; double is 8 bytes long. Although the receive operation succeeds, it cannot construct a value of type MPI_DOUBLE given only 4 bytes from the message, so it doesn't write anything into value and it remains 0.0. Indeed, if you replace:
MPI_Status status;
int count;
MPI_Recv(&value, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_DOUBLE, &count);
if (count == MPI_UNDEFINED) {
printf("Short message received\n");
your program will abort, indicating that the body of the conditional statement was executed due to MPI_Get_count() returning MPI_UNDEFINED in count, which signals that the length of the received message was not an integer multiple of the size of MPI_DOUBLE.
Also, pi must be explicitly initialised to sum before the receive loop, otherwise you will get the wrong value of pi due to either of the following errors:
pi is left uninitialised and has arbitrary initial value, and
the contribution of rank 0 is not added to the final result.

How does MPI_Reduce with MPI_MIN work?

if I have this code:
int main(void) {
int result=0;
int num[6] = {1, 2, 4, 3, 7, 1};
if (my_rank != 0) {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
} else {
MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD)
printf("result = %d\n", result);
the result print is 1 ;
But if the num[0]=9; then the result is 9
I read to solve this problem I must to define the variable num as array.
I can't understand how the function MPI_Reduce works with MPI_MIN. Why, if the num[0] is not equal to the smallest number, then I must to define the variable num as array?
MPI_Reduce performs a reduction over the members of the communicator - not the members of the local array. sendbuf and recvbuf must both be of the same size.
I think the standard says it best:
Thus, all processes provide input buffers and output buffers of the same length, with elements of the same type. Each process can provide one element, or a sequence of elements, in which case the combine operation is executed element-wise on each entry of the sequence.
MPI does not get the minimum of all elements in the array, you have to do that manually.
You can use MPI_MIN to obtain the min value among those passed via reduction.
Lets' examine the function declaration:
int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype
datatype, MPI_Op op, int root, MPI_Comm comm)
Each process send it's value (or array of values) using the buffer sendbuff.
The process identified by the root id receive the buffers and stores them in the buffer recvbuf. The number of elements to receive from each of the other processes is specified in count, so that recvbuff must be allocated with dimension sizeof(datatype)*count.
If each process has only one integer to send (count = 1) then recvbuff it's also an integer, If each process has two integers then recvbuff it's an array of integers of size 2. See this nice post for further explanations and nice pictures.
Now it should be clear that your code is wrong, sendbuff and recvbuff must be of the same size and there is no need of the condition: if(myrank==0). Simply, recvbuff has meaning only for the root process and sendbuff for the others.
In your example you can assign one or more element of the array to a different process and then compute the minvalue (if there are as many processes as values in the array) or the array of minvalues (if there are more values than processes).
Here is a working example that illustrates the usage of MPI_MIN, MPI_MAX and MPI_SUM (slightly modified from this), in the case of simple values (not array).
Each process do some work, depending on their rank and send to the root process the time spent doing the work. The root process collect the times and output the min, max and average values of the times.
#include <stdio.h>
#include <mpi.h>
int myrank, numprocs;
/* just a function to waste some time */
float work()
float x, y;
if (myrank%2) {
for (int i = 0; i < 100000000; ++i) {
x = i/0.001;
y += x;
} else {
for (int i = 0; i < 100000; ++i) {
x = i/0.001;
y += x;
return y;
int main(int argc, char **argv)
int node;
MPI_Comm_rank(MPI_COMM_WORLD, &node);
printf("Hello World from Node %d\n",node);
/*variables used for gathering timing statistics*/
double mytime,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Barrier(MPI_COMM_WORLD); /*synchronize all processes*/
mytime = MPI_Wtime(); /*get time just before work section */
mytime = MPI_Wtime() - mytime; /*get time just after work section*/
/*compute max, min, and average timing statistics*/
MPI_Reduce(&mytime, &maxtime, 1, MPI_DOUBLE,MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&mytime, &mintime, 1, MPI_DOUBLE, MPI_MIN, 0,MPI_COMM_WORLD);
MPI_Reduce(&mytime, &avgtime, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
/* plot the output */
if (myrank == 0) {
avgtime /= numprocs;
printf("Min: %lf Max: %lf Avg: %lf\n", mintime, maxtime,avgtime);
return 0;
If I run this on my OSX laptop, this is what I get:
urcaurca$ mpirun -n 4 ./a.out
Hello World from Node 3
Hello World from Node 0
Hello World from Node 2
Hello World from Node 1
Min: 0.000974 Max: 0.985291 Avg: 0.493081

MPI_scatter of 1D array

I new to MPI and I am trying to write program that uses MPI_scatter. I have 4 nodes(0, 1, 2, 3). Node0 is master, others are slaves. Master asks user for number of elements of array to send to slaves. Then it creates array of size number of elements * 4. Then every node prints it`s results.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
int main(int argc, char **argv) {
int id, nproc, len, numberE, i, sizeArray;
int *arrayN=NULL;
int arrayNlocal[sizeArray];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (id == MASTER){
printf("Enter number of elements: ");
scanf("%d", &numberE);
sizeArray = numberE * 4;
arrayN = malloc(numberE * sizeof(int));
for (i = 0; i < sizeArray; i++){
arrayN[i] = i + 1;
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal, numberE,MPI_INT, MPI_COMM_WORLD);
printf("Node %d has: ", id);
for (i = 0; i < numberE; i++){
printf("%d ",arrayNlocal[i]);
return 0;
And as error i get:
In arrayNlocal[sizeArray];, sizeArray is not initialized. The best way to go is to broadcast numberE to every processes and allocate memory for arrayNlocal. Something like:
MPI_Bcast( &numberE, 1, MPI_Int, 0, MPI_COMM_WORLD)
arrayN is an array of size sizeArray = numberE * 4, so:
arrayN = malloc(sizeArray * sizeof(int));
MPI_Scatter() needs pointers to the data to be sent on root node, and a pointer to receive buffer on each process of the communicator. Since arrayNlocal is an array:
MPI_Scatter(arrayN, numberE, MPI_INT, arrayNlocal, numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
or alternatively:
MPI_Scatter(arrayN, numberE, MPI_INT, &arrayNlocal[0], numberE,MPI_INT,MASTER, MPI_COMM_WORLD);
id is not initialized in id == MASTER: it must be rank==MASTER.
As is, the prints at the end might occur in a mixed way between processes.
Try to compile your code using mpicc main.c -o main -Wall to enable all warnings: it can save you a few hours in the near future!

C MPI - spawning multiple threads in batches

I have a basic question about MPI programming in C. Essentially what I want is that there is a master process that spawns a specific number of child processes, collects some information from all of them (waits until all of the children finish), calculates some metric, based on this metric it decides if it has to spawn more threads... it keeps doing this until the metric meets some specific condition. I have searched through the literature, to no avail. How can this be done? any pointers?.
Thanks for the help.
Courtesy : An introduction to the Message Passing Interface (MPI) using C. In the "complete parallel program to sum an array", lets say, "for some lame reason", I want the master process to sum the contents of the array twice. I.e in the first iteration, the master process starts the slave processes which compute the sum of the arrays, once they are done and the master process returns the value, I would like to invoke the master process to reinvoke another set of threads to do the computation again. Why would the code below not work? I added a while loop around the master process process which spawns the slave processes.
#include <stdio.h>
#include <mpi.h>
#define max_rows 100000
#define send_data_tag 2001
#define return_data_tag 2002
int array[max_rows];
int array2[max_rows];
main(int argc, char **argv)
long int sum, partial_sum,number_of_times;
MPI_Status status;
int my_id, root_process, ierr, i, num_rows, num_procs,
an_id, num_rows_to_receive, avg_rows_per_process,
sender, num_rows_received, start_row, end_row, num_rows_to_send;
/* Now replicte this process to create parallel processes.
* From this point on, every process executes a seperate copy
* of this program */
ierr = MPI_Init(&argc, &argv);
root_process = 0;
/* find out MY process ID, and how many processes were started. */
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if(my_id == root_process) {
/* I must be the root process, so I will query the user
* to determine how many numbers to sum. */
//printf("please enter the number of numbers to sum: ");
//scanf("%i", &num_rows);
while (number_of_times<2)
if(num_rows > max_rows) {
printf("Too many numbers.\n");
avg_rows_per_process = num_rows / num_procs;
/* initialize an array */
for(i = 0; i < num_rows; i++) {
array[i] = i + 1;
/* distribute a portion of the bector to each child process */
for(an_id = 1; an_id < num_procs; an_id++) {
start_row = an_id*avg_rows_per_process + 1;
end_row = (an_id + 1)*avg_rows_per_process;
if((num_rows - end_row) < avg_rows_per_process)
end_row = num_rows - 1;
num_rows_to_send = end_row - start_row + 1;
ierr = MPI_Send( &num_rows_to_send, 1 , MPI_INT,
an_id, send_data_tag, MPI_COMM_WORLD);
ierr = MPI_Send( &array[start_row], num_rows_to_send, MPI_INT,
an_id, send_data_tag, MPI_COMM_WORLD);
/* and calculate the sum of the values in the segment assigned
* to the root process */
sum = 0;
for(i = 0; i < avg_rows_per_process + 1; i++) {
sum += array[i];
printf("sum %i calculated by root process\n", sum);
/* and, finally, I collet the partial sums from the slave processes,
* print them, and add them to the grand sum, and print it */
for(an_id = 1; an_id < num_procs; an_id++) {
ierr = MPI_Recv( &partial_sum, 1, MPI_LONG, MPI_ANY_SOURCE,
return_data_tag, MPI_COMM_WORLD, &status);
sender = status.MPI_SOURCE;
printf("Partial sum %i returned from process %i\n", partial_sum, sender);
sum += partial_sum;
printf("The grand total is: %i\n", sum);
else {
/* I must be a slave process, so I must receive my array segment,
* storing it in a "local" array, array1. */
ierr = MPI_Recv( &num_rows_to_receive, 1, MPI_INT,
root_process, send_data_tag, MPI_COMM_WORLD, &status);
ierr = MPI_Recv( &array2, num_rows_to_receive, MPI_INT,
root_process, send_data_tag, MPI_COMM_WORLD, &status);
num_rows_received = num_rows_to_receive;
/* Calculate the sum of my portion of the array */
partial_sum = 0;
for(i = 0; i < num_rows_received; i++) {
partial_sum += array2[i];
/* and finally, send my partial sum to hte root process */
ierr = MPI_Send( &partial_sum, 1, MPI_LONG, root_process,
return_data_tag, MPI_COMM_WORLD);
ierr = MPI_Finalize();
You should start by looking at MPI_Comm_spawn and collective operations. To collect information from old child processes,one would typically use MPI_Reduce.
This stackoverflow question might also be helpful. spawn more threads...
I guess you meant the right thing since you used "process" instead of "thread" mostly, but just to clarify: MPI only deals with processes and not with threads.
I'm not sure how well you know MPI already - let me know if my answer was any help or if you need more hints.
The MPI-2 standard includes process management functionality. It's described in detail in Chapter 5. I have not used it myself though, so perhaps someone else may weigh in with more practical hints.
