When do I free an MPI sub-group? - c

I'm creating MPI groups in a loop that perform a task, but when I want to free the group, the computation aborts. When should I free the group?
The error I get is:
[KLArch:13617] *** An error occurred in MPI_Comm_free
[KLArch:13617] *** reported by process [1712324609,2]
[KLArch:13617] *** on communicator MPI_COMM_WORLD
[KLArch:13617] *** MPI_ERR_COMM: invalid communicator
[KLArch:13617] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[KLArch:13617] *** and potentially your MPI job)
[KLArch:13611] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[KLArch:13611] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
What I intend to do is to do a calculation with an increasing number of processes in parallel as a benchmark for MPI, i.e. doing the whole calculation with only 1 processe, take the time, run the same calculation with 2 processes, take the time, run the same calculation with 4 processes, take the time... and compare how the problem scales with the number of processes.
MWE:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int j = 0;
int size = 0;
int rank = 0;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Get the group of processes in MPI_COMM_WORLD
MPI_Group world_group;
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
// Construct a group containing all of the ranks smaller than i in MPI_COMM_WORLD
for (i = 1; i <= size; i++)
{
int group_ranks[i];
for (int j = 0; j < i; j++)
{
group_ranks[j] = j;
}
// Construct a group with all the ranks smaller than i
MPI_Group sub_group;
MPI_Group_incl(world_group, i, group_ranks, &sub_group);
// Create a communicator based on the group
MPI_Comm sub_comm;
MPI_Comm_create(MPI_COMM_WORLD, sub_group, &sub_comm);
int sub_rank = -1;
int sub_size = -1;
// If this rank isn't in the new communicator, it will be
// MPI_COMM_NULL. Using MPI_COMM_NULL for MPI_Comm_rank or
// MPI_Comm_size is erroneous
if (MPI_COMM_NULL != sub_comm)
{
MPI_Comm_rank(sub_comm, &sub_rank);
MPI_Comm_size(sub_comm, &sub_size);
}
// Do some work
printf("WORLD RANK/SIZE: %d/%d \t Group RANK/SIZE: %d/%d\n",
rank, size, sub_rank, sub_size);
// Free the communicator and group
MPI_Barrier(MPI_COMM_WORLD);
//MPI_Comm_free(&sub_comm);
//MPI_Group_free(&sub_group);
j = 0;
}
MPI_Group_free(&world_group);
MPI_Finalize();
return 0;
}
The code throws the error, if I uncomment MPI_Comm_free(&sub_comm); and MPI_Group_free(&sub_group); at the end of the loop.

Let me collect some remarks about this code.
that barrier is not needed.
if you test this on a multi-node system you have to be aware that your processes are not spread evenly: 13 processes on 3 6-core nodes would give you 6+6+1 which is unbalanced. You would want 5+4+4 or so. Your mpiexec would do this correctly; achieving this in your code is a little harder. Just be aware of this since you are doing benchmarking.
It's a little tricky getting this code right. When you make a subgroup, all processes have the same value for the group, including the ones that are not in the group. For instance they do not get MPI_GROUP_NULL. Then you have to call MPI_Comm_create collectively on the large communicator; processes that are not in the group get MPI_COMM_NULL as result. They do not participate in the actions on the subcommunicator. Also, and this was your problem: they do not free the subcommunicator, but they do free the subgroup.
(That last point was also pointed out by #GillesGouaillardet)

Related

MPI get the number of non-finalized processes

I have the following code
int main(void) {
...
int numtasks, rank;
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int k = 2;
if (rank == 3) {
do {
...
} while (number_of_not_finalized_processes >= numtasks - k);
}
...
MPI_Finalize();
}
and want to loop process with rank = 3 until k processes are finalized with MPI_Finalize(). How can I find out, how many processes have been finalized and define number_of_not_finalized_processes ? Or maybe I have any better alternative way to find out how many processes are still alive or passed the certain part of code?
You can solve this with one-sided communication. Create a window where every process before it finalizes:
Acquires a lock on the window on rank 3
Uses MPI_Accumulate to increase the counter
This is a very asynchronous solution. If your processes do something synchronized, say a loop where they dynamically decide to exit the loop, you could for instance at the synchonization point create a subcommunicator of only the active processes.

Im learning MPI with C and don't understand why my code don't work

I'm new to MPI and I want my C program, which needs to be launched with two process, to output this:
Hello
Good bye
Hello
Good bye
... (20 times)
But when I start it with mpirun -n 2 ./main it gives me this error and the program don't work:
*** An error occurred in MPI_Send
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[laptop:3786023] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
*** An error occurred in MPI_Send
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[laptop:3786024] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[61908,1],0]
Exit code: 1
--------------------------------------------------------------------------
Here's my code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <unistd.h>
#include <string.h>
#define N 32
void send (int rank)
{
char mess[N];
if (rank == 0)
sprintf(mess, "Hello");
else
sprintf(mess, "Good bye");
MPI_Send(mess, strlen(mess), MPI_CHAR, !rank, 0, MPI_COMM_WORLD);
}
void receive (int rank)
{
char buf[N];
MPI_Status status;
MPI_Recv(buf, N, MPI_CHAR, !rank, 0, MPI_COMM_WORLD, &status);
printf("%s\n", buf);
}
int main (int argc, char **argv)
{
if (MPI_Init(&argc, &argv)) {
fprintf(stderr, "Erreur MPI_Init\n");
exit(1);
}
int size, rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
for (size_t i = 0; i < 20; i ++) {
if (rank == 0) {
send(rank);
receive(rank);
} else {
receive(rank);
send(rank);
}
}
MPI_Finalize();
return 0;
}
I don't understand that error and I don't know how to debug that.
Thank you if you can help me to solve that (probably idiot) problem !
The main error is your function is called send(), and that conflicts with the function of the libc, and hence results in this bizarre behavior.
In order to avoid an other undefined behavior caused by using uninitialized data, you also have to send the NULL terminating character, e.g.
MPI_Send(mess, strlen(mess)+1, MPI_CHAR, !rank, 0, MPI_COMM_WORLD);

Recursive Function using MPI library

I am using the recursive function to find determinant of a 5x5 matrix. Although this sounds a trivial problem and can be solved using OpenMP if the dimensions are huge. I am trying to use MPI to solve this however I cant understand how to deal with recursion which is accumulating the results.
So my question is, How do I use MPI for this?
PS: The matrix is a Hilbert matrix so the answer would be 0
I have written the below code but I think it simply does the same part n times rather than dividing the problem and then accumulating result.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define ROWS 5
double getDeterminant(double matrix[5][5], int iDim);
double matrix[ROWS][ROWS] = {
{1.0, -0.5, -0.33, -0.25,-0.2},
{-0.5, 0.33, -0.25, -0.2,-0.167},
{-0.33, -0.25, 0.2, -0.167,-0.1428},
{-0.25,-0.2, -0.167,0.1428,-0.125},
{-0.2, -0.167,-0.1428,-0.125,0.111},
};
int rank, size, tag = 0;
int main(int argc, char** argv)
{
//Status of messages for each individual rows
MPI_Status status[ROWS];
//Message ID or Rank
MPI_Request req[ROWS];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double result;
result = getDeterminant(matrix, ROWS);
printf("The determinant is %lf\n", result);
//Set barrier to wait for all processess
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
double getDeterminant(double matrix[ROWS][ROWS], int iDim)
{
int iCols, iMinorRow, iMinorCol, iTemp, iSign;
double c[5];
double tempMat[5][5];
double dDet;
dDet = 0;
if (iDim == 2)
{
dDet = (matrix[0][0] * matrix[1][1]) - (matrix[0][1] * matrix[1][0]);
return dDet;
}
else
{
for (iCols = 0; iCols < iDim; iCols++)
{
int temp_row = 0, temp_col = 0;
for (iMinorRow = 0; iMinorRow < iDim; iMinorRow++)
{
for (iMinorCol = 0; iMinorCol < iDim; iMinorCol++)
{
if (iMinorRow != 0 && iMinorCol != iCols)
{
tempMat[temp_row][temp_col] = matrix[iMinorRow][iMinorCol];
temp_col++;
if (temp_col >= iDim - 1)
{
temp_row++;
temp_col = 0;
}
}
}
}
//Hanlding the alternate signs while calculating diterminant
for (iTemp = 0, iSign = 1; iTemp < iCols; iTemp++)
{
iSign = (-1) * iSign;
}
//Evaluating what has been calculated if the resulting matrix is 2x2
c[iCols] = iSign * getDeterminant(tempMat, iDim - 1);
}
for (iCols = 0, dDet = 0.0; iCols < iDim; iCols++)
{
dDet = dDet + (matrix[0][iCols] * c[iCols]);
}
return dDet;
}
}
Expected result should be a very small value close to 0. I am getting the same result but not using MPI
The provided program will be executed in n processes. mpirun launches the n processes and they all executes the provided code. It is the expected behaviour. Unlike openmp, MPI is not a shared memory programming model rather a distributed memory programming model. It uses message passing to communicate with other processes.There are no global variables in MPI. All the data in your program will be local to your process.. If you need to share a data between process, you have to use MPI_send or MPI_Bcast or etc. explicitly to send it. You can use collective operations like MPI_Bcastto send it to all processes or point to point operations like MPI_send to send to specific processes.
For your application to do the expected behaviour, you have to tailor make it for MPI (unlike in openmp where you can use pragmas). All processes has an identifier or rank. Typically, rank 0 (lets call it your main process) should pass the data to all processes using MPI_Send (or any other methods) and the remaining processes should receive it using MPI_Recv (use MPI_Recv for MPI_Send). After receiving the local data from main process, the local processes should perform some computation on it and then send the results back to the main process. Main process will agreggate the result. This is a very basic scenario using MPI. You can use MPI IO etc..
MPI does not do anything by itself for synchronization or data sharing. It just launches instance of the application n times and provides required routines. It is the application developer who is in charge of communication (data structures etc), synchronization(using MPI_Barrier), etc. among processes.
Following is a simple send receive program using MPI. When you run the below code, say with n as 2, two copies of this program will be launched. In the program, using MPI_Comm_rank(), each process will get its id. We can use this ID for further computations/controlling the flow of code. In the code below, the process with rank 0 will send the variable number using MPI_Send and the process with rank 1 will receive this value using MPI_Recv. We can see that if and else if to differentiate between processes and change the control flow to send and receive the data. This is a very basic MPI program that share the data between processes.
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
Here is a tutorial on MPI.

I don't see what the issue is in my program in MPI

I don't know how to fix the problem with this program so far. The purpose of this program is to add up all the number in an array but I can only barely manage to send the arrays before errors start to appear. It has to do with the for loop in the if statement my_rank!=0 section.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]){
int my_rank, p, source, dest, tag, total, n = 0;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
int arr[n];
MPI_Recv( arr, n, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
//printf("%i ", my_rank);
int i;
for(i = ((my_rank-1)*(n/15)); i < ((my_rank-1)+(n/15)); i++ ){
//printf("%i ", arr[0]);
}
}
else{
printf("Please enter an integer:\n");
scanf("%i", &n);
int i;
int arr[n];
for(i = 0; i < n; i++){
arr[i] = i + 1;
}
for(dest = 0; dest < p; dest++){
MPI_Send( &n, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Send( arr, n, MPI_INT, dest, tag, MPI_COMM_WORLD);
}
}
MPI_Finalize();
}
When I take that for loop out it compiles and run but when I put it back in it just stops working. Here is the error it is giving me:
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
[compute-0-24.local:1072] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Please enter an integer:
--------------------------------------------------------------------------
mpirun has exited due to process rank 8 with PID 1072 on
node compute-0-24 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-16.local][[31957,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.4.237 failed: Connection refused (111)
[cs-cluster:11677] 14 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cs-cluster:11677] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
There are two problems in the code you posted:
The send loop starts from p=0, which means that process of rank zero will send to itself. However, since there's no receiving part for process zero, this won't work. Just make the loop to start from p=1 and that should solve it.
The tag you use isn't initialised. So it's value can be whatever (which is OK), but can be a different whatever per process, which will lead to the various communications to never match each-other. Just initialise tag=0 for example, and that should fix that.
With this, your code snippet should work.
Learn to read the informative error messages that Open MPI gives you and to apply some general debugging strategies.
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
The library is telling you that the receive operation was called with an invalid rank value. Armed with that knowledge, you take a look at your code:
int my_rank, p, source, dest, tag, total, n = 0;
...
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
...
The rank is source. source is an automatic variable declared some lines before but never initialised, therefore its initial value is completely random. You fix it by assigning source an initial value of 0 or by simply replacing it with 0 since you've already hard-coded the rank of the sender by singling out its code in the else block of the if operator.
The presence of the above error eventually hints you to examine the other variables too. Thus you notice that tag is also used uninitialised and you either initialise it to e.g. 0 or replace it altogether.
Now your program is almost correct. You notice that it seems to work fine for n up to about 33000 (the default eager limit of the self transport divided by sizeof(int)), but then it hangs for larger values. You either fire a debugger of simply add a printf statement before and after each send and receive operation and discover that already the first call to MPI_Send with dest equal to 0 never returns. You then take a closer look at your code and discover this:
for(dest = 0; dest < p; dest++){
dest starts from 0, but this is wrong since rank 0 is only sending data and not receiving. You fix it by setting the initial value to 1.
Your program should now work as intended (or at least for values of n that do not lead to stack overflow in int arr[n];). Congratulations! Now go and learn about MPI_Probe and MPI_Get_count, which will help you do the same without explicitly sending the length of the array first. Then learn about MPI_Scatter and MPI_Reduce, which will enable you to implement the algorithm even more elegantly.

MPI Deadlock with collective functions

I'm writing a simple program in C with MPI library.
The intent of this program is the following:
I have a group of processes that perform an iterative loop, at the end of this loop all processes in the communicator must call two collective functions(MPI_Allreduce and MPI_Bcast). The first one sends the id of the processes that have generated the minimum value of the num.val variable, and the second one broadcasts from the source num_min.idx_v to all processes in the communicator MPI_COMM_WORLD.
The problem is that I don't know if the i-th process will be finalized before calling the collective functions. All processes have a probability of 1/10 to terminate. This simulates the behaviour of the real program that I'm implementing. And when the first process terminates, the others cause deadlock.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct double_int{
double val;
int idx_v;
}double_int;
int main(int argc, char **argv)
{
int n = 10;
int max_it = 4000;
int proc_id, n_proc;double *x = (double *)malloc(n*sizeof(double));
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &n_proc);
MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);
srand(proc_id);
double_int num_min;
double_int num;
int k;
for(k = 0; k < max_it; k++){
num.idx_v = proc_id;
num.val = rand()/(double)RAND_MAX;
if((rand() % 10) == 0){
printf("iter %d: proc %d terminato\n", k, proc_id);
MPI_Finalize();
exit(EXIT_SUCCESS);
}
MPI_Allreduce(&num, &num_min, 1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD);
MPI_Bcast(x, n, MPI_DOUBLE, num_min.idx_v, MPI_COMM_WORLD);
}
MPI_Finalize();
exit(EXIT_SUCCESS);
}
Perhaps I should create a new group and new communicator before calling MPI_Finalize function in the if statement? How should I solve this?
If you have control over a process before it terminates you should send a non-blocking flag to a rank that cannot terminate early (lets call it the root rank). Then instead of having a blocking all_reduce, you could have sends from all ranks to the root rank with their value.
The root rank could post non-blocking receives for a possible flag, and the value. All ranks would have to have sent one or the other. Once all ranks are accounted for you can do the reduce on the root rank, remove exited ranks from communication and broadcast it.
If your ranks exit without notice, I am not sure what options you have.

Resources