Suppose that I have a code block that attempts to calculate the first 15 numbers in the Fibonacci sequence and distributes each unique number among 3 processes (MPI_Send) using a for loop as shown in the code block below.
int main(int argc, char* argv[]) {
int rank, size, recieve_data1, recieve_data2;
MPI_Init(NULL, NULL);
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Available ranks are: %d \n \n", rank); // first rank rollcall
fflush(stdout);
int num1 = 1; int num2 = 1;
int RecieveNum; int SumNum;
for
(int n = 0; n < 16; n++) {
if
(rank == 0) {
// perform the fibb sequence algorithim
SumNum = num1 + num2;
num1 = num2;
num2 = SumNum;
// define the sorting algorithim
int DeliverTo = (n % 3) + 1;
// send calculated result
MPI_Send(&SumNum, 1, MPI_INT, DeliverTo, 1, MPI_COMM_WORLD);
}
else {
// recieve the element integer
MPI_Recv(&RecieveNum, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
// print and flush the buffer
printf("I am process rank %d, and I recieved the number %d. \n", rank, RecieveNum);
fflush(stdout);
}
}
printf("Available ranks are: %d \n \n", rank); // second rank roll call
fflush(stdout);
/*
more code that I run...
*/
MPI_Finalize();
return 0;
}
Before the first for loop is called, processes 0,1,2,3 all respond to the printf(Available ranks are: %d \n \n", rank); on line 25. However, after executing the first for loop and using a second printf, only process 0 responds. I was expecting all 4 processes 0 - 3 to respond again after the execution of the for loop. To solve this problem, I isolated this section of code and attempted to debug for several hours with no sucess. This particular issue proves to be problematic, as I have additional code (not shown here for the sake of being concise), that will access the numbers generated from this sequence.
Finally, I am running the code by building the solution, running the VS terminal as an administrator, and typing mpiexec -n 4 my_file_name.exe. No build errors or complication mistakes occurred. From what I can see (correct me if I'm wrong), all processes hang after completing the for loop, but I am unsure why or how to fix it.
After searching the website, I did not see anything that answered this question (from my point of view). I am a bit of an MPI (and Stack Overflow) newbie, so any code pointers are also welcomed. Thanks
You have process zero compute who to send to. And then everyone does a receive. That means that all processes that are not the computed receiver will hang.
This scenario where you send to a dynamically computed receiver is not easy to do in MPI. You either need to
send a message to all other processes "no, I have nothing for you", or
send a message to all, and all-but-one process ignores the data, or
use one sided operations where you MPI_Put the data on the computer receiver.
Related
I am starting to get into parallel computing, and have started with MPI using C. I understand how to do such a thing using p2p (send/recv), however my confusion is when I try to use collective communication with bcast and reduce.
My code goes as follows:
int collective(int val, int rank, int n, int *toSum){
int *globalBuf=malloc(n*sizeof(int*));
int globalSum=0;
int localSum=0;
struct timespec before;
if(rank==0){
//only rank 0 will start timer
clock_gettime(CLOCK_MONOTONIC, &before);
}
int numInts=(val*100000)/n;
int *mySum = malloc((numInts)*sizeof(int *));
int j;
for(j=rank*numInts;j<numInts*rank+numInts;j++){
localSum=localSum+(toSum[j]);
}
MPI_Bcast(&localSum, 1, MPI_INT, rank, MPI_COMM_WORLD);
MPI_Reduce(&localSum, &globalSum, n, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if(rank==0){
printf("Communicative sum = %d\n", globalSum);
//only rank 0 will end the timer
//an display
struct timespec after;
clock_gettime(CLOCK_MONOTONIC, &after);
printf("Time to complete = %f\n",(after.tv_nsec-before.tv_nsec));
}
}
Where the parameters being passed in can be described as:
val = the number of total ints that need to be summed - divided by 100000
rank= the rank of this process
n = the total number of processes
toSum = the ints that are going to be added together
Where I begin to run into errors, is when I try to broadcast this processors localSum to be handled by rank 0.
I will explain what I've put into the function call so you can possibly understand where my confusion comes from.
For MPI_Bcast:
&localSum - the address of this processes sum
1 - there is one value that I want to broadcast, the int held by localSum
MPI_INT - meaning implied
rank - the rank of this process that is broadcasting
MPI_COMM_WORLD - meaning implied
For MPI_Reduce
&localSum - the address of the variable that it will "reducing"
&globalSum - the address of the variable that I want to hold the reduced values of localSum
n - the number of "localSum"s that this process will reduce (n is number of processes)
MPI_INT - meaning implied
MPI_SUM - meaning implied
0 - I want rank 0 to be the process that will reduce so it can print
MPI_COMM_WORLD - meaning implied
When I look through the code, I feel it makes sense logically, and it compiles okay, however when I run the program with m amount of processors, I get the following error message:
Assertion failed in file src/mpi/coll/helper_fns.c at line 84: FALSE
memcpy argument memory ranges overlap, dst_=0x7fffffffd2ac src_=0x7fffffffd2a8 len_=16
internal ABORT - process 0
Can anyone help me find a solution? Apologies to anyone who see's this as second nature, this is only my third parallel program, and first time using bcast/reduce!
I see two issues in the call of collective operations (MPI_Bcast, MPI_Reduce) provided in your code. First in MPI_Reduce, you are reducing an integer localSum from every processes to an integer globalSum. Basically, a single integer. But in your MPI_Reduce call, you are trying to reduce n values, in reality you just need to reduce 1 value from n processes. That may cause this error.
The reduce should ideally be like this, if you want to reduce a single value:
MPI_Reduce(&localSum, &globalSum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
For the broadcast,
MPI_Bcast(&localSum, 1, MPI_INT, rank, MPI_COMM_WORLD);
every rank is broadcasting above in your call. According to the general idea of broadcast there should be one root process that should broadcast the value to all the processes. So, the call should be like this:
int rootProcess = 0;
MPI_Bcast(&localSum, 1, MPI_INT, rootProcess, MPI_COMM_WORLD);
Here, the rootProcess will send the value contain in its localSum to all the processes. Meanwhile all the processes calling this broadcast will receive the value from rootProcess and will store in its local variable localSum
I am using the recursive function to find determinant of a 5x5 matrix. Although this sounds a trivial problem and can be solved using OpenMP if the dimensions are huge. I am trying to use MPI to solve this however I cant understand how to deal with recursion which is accumulating the results.
So my question is, How do I use MPI for this?
PS: The matrix is a Hilbert matrix so the answer would be 0
I have written the below code but I think it simply does the same part n times rather than dividing the problem and then accumulating result.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define ROWS 5
double getDeterminant(double matrix[5][5], int iDim);
double matrix[ROWS][ROWS] = {
{1.0, -0.5, -0.33, -0.25,-0.2},
{-0.5, 0.33, -0.25, -0.2,-0.167},
{-0.33, -0.25, 0.2, -0.167,-0.1428},
{-0.25,-0.2, -0.167,0.1428,-0.125},
{-0.2, -0.167,-0.1428,-0.125,0.111},
};
int rank, size, tag = 0;
int main(int argc, char** argv)
{
//Status of messages for each individual rows
MPI_Status status[ROWS];
//Message ID or Rank
MPI_Request req[ROWS];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double result;
result = getDeterminant(matrix, ROWS);
printf("The determinant is %lf\n", result);
//Set barrier to wait for all processess
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
double getDeterminant(double matrix[ROWS][ROWS], int iDim)
{
int iCols, iMinorRow, iMinorCol, iTemp, iSign;
double c[5];
double tempMat[5][5];
double dDet;
dDet = 0;
if (iDim == 2)
{
dDet = (matrix[0][0] * matrix[1][1]) - (matrix[0][1] * matrix[1][0]);
return dDet;
}
else
{
for (iCols = 0; iCols < iDim; iCols++)
{
int temp_row = 0, temp_col = 0;
for (iMinorRow = 0; iMinorRow < iDim; iMinorRow++)
{
for (iMinorCol = 0; iMinorCol < iDim; iMinorCol++)
{
if (iMinorRow != 0 && iMinorCol != iCols)
{
tempMat[temp_row][temp_col] = matrix[iMinorRow][iMinorCol];
temp_col++;
if (temp_col >= iDim - 1)
{
temp_row++;
temp_col = 0;
}
}
}
}
//Hanlding the alternate signs while calculating diterminant
for (iTemp = 0, iSign = 1; iTemp < iCols; iTemp++)
{
iSign = (-1) * iSign;
}
//Evaluating what has been calculated if the resulting matrix is 2x2
c[iCols] = iSign * getDeterminant(tempMat, iDim - 1);
}
for (iCols = 0, dDet = 0.0; iCols < iDim; iCols++)
{
dDet = dDet + (matrix[0][iCols] * c[iCols]);
}
return dDet;
}
}
Expected result should be a very small value close to 0. I am getting the same result but not using MPI
The provided program will be executed in n processes. mpirun launches the n processes and they all executes the provided code. It is the expected behaviour. Unlike openmp, MPI is not a shared memory programming model rather a distributed memory programming model. It uses message passing to communicate with other processes.There are no global variables in MPI. All the data in your program will be local to your process.. If you need to share a data between process, you have to use MPI_send or MPI_Bcast or etc. explicitly to send it. You can use collective operations like MPI_Bcastto send it to all processes or point to point operations like MPI_send to send to specific processes.
For your application to do the expected behaviour, you have to tailor make it for MPI (unlike in openmp where you can use pragmas). All processes has an identifier or rank. Typically, rank 0 (lets call it your main process) should pass the data to all processes using MPI_Send (or any other methods) and the remaining processes should receive it using MPI_Recv (use MPI_Recv for MPI_Send). After receiving the local data from main process, the local processes should perform some computation on it and then send the results back to the main process. Main process will agreggate the result. This is a very basic scenario using MPI. You can use MPI IO etc..
MPI does not do anything by itself for synchronization or data sharing. It just launches instance of the application n times and provides required routines. It is the application developer who is in charge of communication (data structures etc), synchronization(using MPI_Barrier), etc. among processes.
Following is a simple send receive program using MPI. When you run the below code, say with n as 2, two copies of this program will be launched. In the program, using MPI_Comm_rank(), each process will get its id. We can use this ID for further computations/controlling the flow of code. In the code below, the process with rank 0 will send the variable number using MPI_Send and the process with rank 1 will receive this value using MPI_Recv. We can see that if and else if to differentiate between processes and change the control flow to send and receive the data. This is a very basic MPI program that share the data between processes.
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n",
number);
}
Here is a tutorial on MPI.
I don't know how to fix the problem with this program so far. The purpose of this program is to add up all the number in an array but I can only barely manage to send the arrays before errors start to appear. It has to do with the for loop in the if statement my_rank!=0 section.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]){
int my_rank, p, source, dest, tag, total, n = 0;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
int arr[n];
MPI_Recv( arr, n, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
//printf("%i ", my_rank);
int i;
for(i = ((my_rank-1)*(n/15)); i < ((my_rank-1)+(n/15)); i++ ){
//printf("%i ", arr[0]);
}
}
else{
printf("Please enter an integer:\n");
scanf("%i", &n);
int i;
int arr[n];
for(i = 0; i < n; i++){
arr[i] = i + 1;
}
for(dest = 0; dest < p; dest++){
MPI_Send( &n, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Send( arr, n, MPI_INT, dest, tag, MPI_COMM_WORLD);
}
}
MPI_Finalize();
}
When I take that for loop out it compiles and run but when I put it back in it just stops working. Here is the error it is giving me:
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
[compute-0-24.local:1072] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Please enter an integer:
--------------------------------------------------------------------------
mpirun has exited due to process rank 8 with PID 1072 on
node compute-0-24 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-16.local][[31957,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.4.237 failed: Connection refused (111)
[cs-cluster:11677] 14 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cs-cluster:11677] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
There are two problems in the code you posted:
The send loop starts from p=0, which means that process of rank zero will send to itself. However, since there's no receiving part for process zero, this won't work. Just make the loop to start from p=1 and that should solve it.
The tag you use isn't initialised. So it's value can be whatever (which is OK), but can be a different whatever per process, which will lead to the various communications to never match each-other. Just initialise tag=0 for example, and that should fix that.
With this, your code snippet should work.
Learn to read the informative error messages that Open MPI gives you and to apply some general debugging strategies.
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
The library is telling you that the receive operation was called with an invalid rank value. Armed with that knowledge, you take a look at your code:
int my_rank, p, source, dest, tag, total, n = 0;
...
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
...
The rank is source. source is an automatic variable declared some lines before but never initialised, therefore its initial value is completely random. You fix it by assigning source an initial value of 0 or by simply replacing it with 0 since you've already hard-coded the rank of the sender by singling out its code in the else block of the if operator.
The presence of the above error eventually hints you to examine the other variables too. Thus you notice that tag is also used uninitialised and you either initialise it to e.g. 0 or replace it altogether.
Now your program is almost correct. You notice that it seems to work fine for n up to about 33000 (the default eager limit of the self transport divided by sizeof(int)), but then it hangs for larger values. You either fire a debugger of simply add a printf statement before and after each send and receive operation and discover that already the first call to MPI_Send with dest equal to 0 never returns. You then take a closer look at your code and discover this:
for(dest = 0; dest < p; dest++){
dest starts from 0, but this is wrong since rank 0 is only sending data and not receiving. You fix it by setting the initial value to 1.
Your program should now work as intended (or at least for values of n that do not lead to stack overflow in int arr[n];). Congratulations! Now go and learn about MPI_Probe and MPI_Get_count, which will help you do the same without explicitly sending the length of the array first. Then learn about MPI_Scatter and MPI_Reduce, which will enable you to implement the algorithm even more elegantly.
I'm writing a simple program in C with MPI library.
The intent of this program is the following:
I have a group of processes that perform an iterative loop, at the end of this loop all processes in the communicator must call two collective functions(MPI_Allreduce and MPI_Bcast). The first one sends the id of the processes that have generated the minimum value of the num.val variable, and the second one broadcasts from the source num_min.idx_v to all processes in the communicator MPI_COMM_WORLD.
The problem is that I don't know if the i-th process will be finalized before calling the collective functions. All processes have a probability of 1/10 to terminate. This simulates the behaviour of the real program that I'm implementing. And when the first process terminates, the others cause deadlock.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct double_int{
double val;
int idx_v;
}double_int;
int main(int argc, char **argv)
{
int n = 10;
int max_it = 4000;
int proc_id, n_proc;double *x = (double *)malloc(n*sizeof(double));
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &n_proc);
MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);
srand(proc_id);
double_int num_min;
double_int num;
int k;
for(k = 0; k < max_it; k++){
num.idx_v = proc_id;
num.val = rand()/(double)RAND_MAX;
if((rand() % 10) == 0){
printf("iter %d: proc %d terminato\n", k, proc_id);
MPI_Finalize();
exit(EXIT_SUCCESS);
}
MPI_Allreduce(&num, &num_min, 1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD);
MPI_Bcast(x, n, MPI_DOUBLE, num_min.idx_v, MPI_COMM_WORLD);
}
MPI_Finalize();
exit(EXIT_SUCCESS);
}
Perhaps I should create a new group and new communicator before calling MPI_Finalize function in the if statement? How should I solve this?
If you have control over a process before it terminates you should send a non-blocking flag to a rank that cannot terminate early (lets call it the root rank). Then instead of having a blocking all_reduce, you could have sends from all ranks to the root rank with their value.
The root rank could post non-blocking receives for a possible flag, and the value. All ranks would have to have sent one or the other. Once all ranks are accounted for you can do the reduce on the root rank, remove exited ranks from communication and broadcast it.
If your ranks exit without notice, I am not sure what options you have.
I'm trying to do a simple test on MPI's RMA operation using MPI_Win_lock and MPI_Win_unlock. The program just let process 0 to update the integer value in process 1 and display it.
The below program runs correctly (at least the result seems correct to me):
#include "mpi.h"
#include "stdio.h"
#define root 0
int main(int argc, char *argv[])
{
int myrank, nprocs;
int send, recv, err;
MPI_Win nwin;
int *st;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Alloc_mem(1*sizeof(int), MPI_INFO_NULL, &st);
st[0] = 0;
if (myrank != root) {
MPI_Win_create(st, 1*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
else {
MPI_Win_create(NULL, 0, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
if (myrank == root) {
st[0] = 1;
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, nwin);
MPI_Put(st, 1, MPI_INT, 1, 0, 1, MPI_INT, nwin);
MPI_Win_unlock(1, nwin);
MPI_Win_free(&nwin);
}
else { // rank 1
MPI_Win_free(&nwin);
printf("Rank %d, st = %d\n", myrank, st[0]);
}
MPI_Free_mem(st);
MPI_Finalize();
return 0;
}
The output I got is Rank 1, st = 1. But curiously, if I switch the lines in the else block for rank 1 to
else { // rank 1
printf("Rank %d, st = %d\n", myrank, st[0]);
MPI_Win_free(&nwin);
}
The output is Rank 1, st = 0.
I cannot find out the reason behind it, and why I need to put MPI_Win_free after loading the data is originally I need to put all the stuff in a while loop and let rank 0 to determine when to stop the loop. When condition is satisfied, I try to let rank 0 to update the flag (st) in rank 1. I try to put the MPI_Win_free outside the while loop so that the window will only be freed after the loop. Now it seems that I cannot do this and need to create and free the window every time in the loop?
I'll be honest, MPI RMA is not my speciality, but I'll give this a shot:
The problem is that you're running into a race condition. When you do the MPI_PUT operation, it sends the data from rank 0 to rank 1 to be put into the buffer at some point in the future. You don't have any control over that from rank 0's perspective.
One rank 1's side, you're not doing anything to complete the operation. I know that RMA (or one-sided operations) sound like they shouldn't require any intervention on the target side, but the do require a bit. When you use one-sided operations, you have to have something on the receiving side that also synchronizes the data. In this case, you're trying to use MPI put/get operations in combination with non-MPI load store operations. This is erroneous and results in the race condition you're seeing. When you switch the MPI_WIN_FREE to be first, you complete all of the outstanding operations so your data is correct.
You can find out lots more about passive target synchronization (which is what you're doing here) with this question: MPI with C: Passive RMA synchronization.