MPI_send and MPI_receive function getting stuck - c

I have a code of page_rank where I have to compute page rank in parallel. My code hangs where I wrote the following MPI_send and MPI_receive function. What could be the problem?
int **sendto_list = (int **)malloc(comm_size*sizeof(int*));
for(i=0; i < comm_size; i++) {
sendto_list[i] = (int *)malloc(18*sizeof(int));
sendto_list[i][0] = 16;
sendto_list[i][1] = 0;
}
int temp_data = 1;
for(i=0; i < comm_size; i++) {
if(request_list[i][1] > 0) {
for(k=0; k < request_list[i][1]; ) {
for(j=0; j < 200; j++) {
if( k >= request_list[i][1] )
break;
sendrecv_buffer_int[j] = request_list[i][k+2];
k++;
}
// Request appropriate process for pagerank.
if(i!= my_rank)
MPI_Send(&temp_data, 1, MPI_INT, i, TAG_PR_REQ, MPI_COMM_WORLD);
}
}
if( i != my_rank )
MPI_Send(&temp_data, 1, MPI_INT, i, TAG_PR_DONE, MPI_COMM_WORLD);
}
int expected_requests = 0, done = 0,temp,s;
s=0;
while( (done == 0) && (comm_size > 1) ) {
if(expected_requests == (comm_size - 1))
break;
int count;
// Receive pagerank requests or messages with TAG_PR_DONE(can be from any process).
MPI_Recv(&temp, 1, MPI_INT, MPI_ANY_SOURCE ,MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &count);
switch(status.MPI_TAG) {
case TAG_PR_REQ:{
for(i = 0 ; i < count; i++)
insert_into_adj_list(&sendto_list[status.MPI_SOURCE], sendrecv_buffer_int[i], num_nodes);
break;
}
case TAG_PR_DONE:{
expected_requests++;
break;
}
default:
break;
}
}

A cursory glance over your code looks as if your issue is because your MPI_Send() calls are blocking and thus nothing is receiving and freeing them up.
If (request_list[i][1] > 0) and (i!= my_rank) evaluate to true you try to perform 2 MPI_Send() operations to process rank i but you only have 1 matching MPI_Recv() operation in each process, i.e. process rank i.
You may want to try changing
if(request_list[i][1] > 0) {
...
}
if( i != my_rank )
MPI_Send(&temp_data, 1, MPI_INT, i, TAG_PR_DONE, MPI_COMM_WORLD);
to
if(request_list[i][1] > 0) {
...
} else if( i != my_rank ) {
MPI_Send(&temp_data, 1, MPI_INT, i, TAG_PR_DONE, MPI_COMM_WORLD);
}
Note the addition of else turning if into else if. This ensures only 1 MPI_Send() operation per process. It does not look like the two MPI_Send() operations should be executed if the above conditions are true.
Alternatively if you need you could look into MPI_Isend() and MPI_Irecv(). Although I don't think they will solve your problem entirely in this case. I still think you need the else if clause.
I'd also like to point out in C there is no need to cast the return of malloc(). This topic has been covered extensively on StackOverflow so I won't dwell on it too long.
You should also check the result of malloc() is a valid pointer. It returns NULL if an error occured.

Related

Implementing MPI_Allreduce in C - why does my code hang indefinitely?

I am trying to implement my own version of MPI_Allreduce in C. The logic is that odd ranks send their data to even ranks to their rank-1, ie rank 1 sends to rank 0. Rank 0 then receives the data from rank 1 and operates on it depending on the MPI_Op operator passed as an argument. I am unsure if my way of operating on the data is correct, for example if MPI_SUM is passed then for each count passed my code does recvbuf[count] += sendbuf[count]. Once the even rank has received data from the odd rank+1, it then does rank /= 2 and the contents of the while loop repeat, with rank 0, being unchanged, rank 2 becoming rank 1 and now containing the original ranks 2+3 data, and so on. Num is also halfed each interation of the while loop until it becomes 1, which is when only rank 0 should remain.
My code thus far is below:
int tree_reduction(const int rank_in, const int np, const int *sendbuf, int *recvbuf,
int count, MPI_Op op, MPI_Comm comm){
// Create variables for rank and size
int rank = rank_in;
int num = np;
int tag = 0;
int depth = 1; // Depth of the tree
// While size is greater than 1 there is 2 or more ranks to operate on
while(num > 1){
if(rank < num){
if( (rank % 2) != 0 ){ // If rank is odd
MPI_Ssend(sendbuf, count, MPI_INT, (rank-1)*depth, tag, comm);
rank *= num; // any ranks above 0 will be filtered out
break;
}
else{ // If rank is even
MPI_Recv(recvbuf, count, MPI_INT, (rank+1)*depth, tag, comm,
MPI_STATUS_IGNORE);
/* START OF OPERATORS */
if(op == MPI_SUM){
for(int c=0; c<count; c++){
recvbuf[c] += sendbuf[c];
}
}
if(op == MPI_PROD){
for(int c=0; c<count; c++){
recvbuf[count] *= sendbuf[count];
}
}
if(op == MPI_MIN){
for(int c=0; c<count; c++){
if(sendbuf[count] < recvbuf[count]){
recvbuf[count] = sendbuf[count];
}
}
}
if(op == MPI_MAX){
for(int c=0; c<count; c++){
if(sendbuf[count] > recvbuf[count]){
recvbuf[count] = sendbuf[count];
}
}
}
/* END OF OPERATORS */
}
depth *= 2;
}
num = num/2;
}
// NEED TO BROADCAST BACK TO ALL RANKS
MPI_Bcast(recvbuf, count, MPI_INT, 0, comm);
return 0;
When I run this with more than 2 processes, it hangs indefinitely without printing anything to the terminal, why is this?

Gather a split 2D array with MPI in C

I need to adapt this part of a very long code to mpi in c.
for (i = 0; i < total; i++) {
sum = A[next][0][0]*B[i][0] + A[next][0][1]*B[i][1] + A[next][0][2]*B[i][2];
next++;
while (next < last) {
col = column[next];
sum += A[next][0][0]*B[col][0] + A[next][0][1]*B[col][1] + A[next][0][2]*B[col][2];
final[col][0] += A[next][0][0]*B[i][0] + A[next][1][0]*B[i][1] + A[next][2][0]*B[i][2];
next++;
}
final[i][0] += sum;}
And I was thinking of code like this:
for (i = 0; i < num_threads; i++) {
for (j = 0; j < total; j++) {
check_thread[i][j] = false;
}
}
part = total / num_threads;
for (i = thread_id * part; i < ((thread_id + 1) * part); i++) {
sum = A[next][0][0]*B[i][0] + A[next][0][1]*B[i][1] + A[next][0][2]*B[i][2];
next++;
while (next < last) {
col = column[next];
sum += A[next][0][0]*B[col][0] + A[next][0][1]*B[col][1] + A[next][0][2]*B[col][2];
if (!check_thread[thread_id][col]) {
check_thread[thread_id][col] = true;
temp[thread_id][col] = 0.0;
}
temp[thread_id][col] += A[next][0][0]*B[i][0] + A[next][1][0]*B[i][1] + A[next][2][0]*B[i][2];
next++;
}
if (!check_thread[thread_id][i]) {
check_thread[thread_id][i] = true;
temp[thread_id][i] = 0.0;
}
temp[thread_id][i] += sum;
}
*
for (i = 0; i < total; i++) {
for (j = 0; j < num_threads; j++) {
if (check_thread[j][i]) {
final[i][0] += temp[j][i];
}
}
}
Then I need to gather all the temporary parts in one, I was thinking of MPI_Allgather and something like this just before the last two for (where *):
MPI_Allgather(temp, (part*sizeof(double)), MPI_DOUBLE, temp, sizeof(**temp), MPI_DOUBLE, MPI_COMM_WORLD);
But I get an execution error, Is it possible to send and receive in the same variable?, if not, what could be the other solution in this case?.
You are calling the MPI_Allgather with the wrong parameters:
MPI_Allgather(temp, (part*sizeof(double)), MPI_DOUBLE, temp, sizeof(**temp), MPI_DOUBLE, MPI_COMM_WORLD);
Instead you should have (source) :
MPI_Allgather
Gathers data from all tasks and distribute the combined data to all
tasks
Input Parameters
sendbuf starting address of send buffer (choice)
sendcount number of elements in send buffer (integer)
sendtype data type of send buffer elements (handle)
recvcount number of elements received from any process (integer)
recvtype data type of receive buffer elements (handle)
comm communicator (handle)
Your sendcount and recvcount arguments are both wrong, instead of (part*sizeof(double)) and sizeof(**temp) you should pass the number of elements from the matrix temp that will be gather by all processes involved.
The matrix can be gather in a single call if that matrix is continuously allocated in memory, if it was created as an array of pointers, then you will have to call MPI_Allgather for each row of the matrix, or use MPI_Allgatherv instead.
Is it possible to send and receive in the same variable?
Yes, by using the In-place Option
When the communicator is an intracommunicator, you can perform an
all-gather operation in-place (the output buffer is used as the input
buffer). Use the variable MPI_IN_PLACE as the value of sendbuf. In
this case, sendcount and sendtype are ignored. The input data of each
process is assumed to be in the area where that process would receive
its own contribution to the receive buffer. Specifically, the outcome
of a call to MPI_Allgather that used the in-place option is identical
to the case in which all processes executed n calls to
MPI_GATHER ( MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, recvbuf,
recvcount, recvtype, root, comm )

"Primary job terminated normally but 1 process returned a non-zero exit code" only in some cases

Good afternoon, I've developed a 2D FFT in MPI for scientific purpose.
Everything used to work until I've implemented MPI_Scatterv.
Since I've implemented it something odd started happening. In particular if I stay below 64 modes I don't get problems, but when I push above it I get the message:
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
>--------------------------------------------------------------------------
>mpiexec noticed that process rank 0 with PID 0 on node MacBook-Pro-di-Mirco
>exited on signal 11 (Segmentation fault: 11).`
I can't figure out where is the mistake, but I'm pretty sure it is in MPI_Scatterv.
Could anyone help me please?
/********************************** Setup factors for scattering **********************************/
// Alloc the arrays
int* displs = (int *)malloc(size*sizeof(int));
int* scounts = (int *)malloc(size*sizeof(int));
int* receive = (int *)malloc(size*sizeof(int));
// Setup matrix
int modes_per_proc[size];
for (int i = 0; i < size; i++){
modes_per_proc[i] = 0;
}
// Set modes per processor
cores_handler( nx*nz, size, modes_per_proc);
// Scattering parameters
for (int i=0; i<size; ++i) {
scounts[i] = modes_per_proc[i]*ny*2;
receive[i] = scounts[i];
displs[i] = displs[i-1] + modes_per_proc[i-1] *ny*2; // *2 to handle complex numbers
if (i == 0 ) displs[0] = 0;
}
/************************************************ Data scattering ***********************************************/
MPI_Scatterv(U, scounts, displs, MPI_DOUBLE, u, receive[rank] , MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
The core_handler function:
void cores_handler( int modes, int size, int modes_per_proc[size]) {
int rank =0;
int check=0;
for (int i = 0; i < modes; i++) {
modes_per_proc[rank] = modes_per_proc[rank]+1;
rank = rank+1;
if (rank == size ) rank = 0;
}
for (int i = 0; i < size; i++){
//printf("%d modes on rank %d\n", modes_per_proc[i], i);
check = check+modes_per_proc[i];
}
if ( (int)(check - modes) != 0 ) {
printf("[ERROR] check - modes = %d!!\nUnable to scatter modes properly\nAbort... \n", check - modes);
}

MPI_Recv() freezing program, not receiving value from MPI_Send() in C

I'm trying to write an implementation of a hyper quicksort in MPI, but I'm having an issue where a process gets stuck on MPI_Recv().
While testing with 2 processes, it seems that inside the else of the if (rank % comm_sz == 0), process 1 is never receiving the pivot from process 0. Process 0 successfully sends its pivot and recurses through the method correctly. If put in some print debug statements and received the output:
(arr, 0, 2, 0, 9)
Rank 0 sending pivot 7 to 1
(arr, 1, 2, 0, 9)
Rank 1 pre-recv from 0
After which, the post-recv message from rank 1 never prints. Rank 0 prints its post-send message and continues through its section of the array. Is there something wrong with my implementation of MPI_Send() or MPI_Recv() that may be causing this?
Here is my code for the quicksort:
(For reference, comm_sz in the parameters for the method refers to the number of processes looking at that section of the array.)
void hyper_quick(int *array, int rank, int comm_sz, int s, int e) {
printf("(arr, %d, %d, %d, %d)\n", rank, comm_sz, s, e);
// Keeps recursing until there is only one element
if (s < e) {
int pivot;
if (comm_sz > 1) {
// One process gets a random pivot within its range and sends that to every process looking at that range
if (rank % comm_sz == 0) {
pivot = rand() % (e - s) + s;
for (int i = rank + 1; i < comm_sz; i++) {
int partner = rank + i;
printf("Rank %d sending pivot %d to %d\n", rank, pivot, partner);
MPI_Send(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD);
printf("Rank %d successfully sent %d to %d\n", rank, pivot, partner);
}
}
else {
int partner = rank - (rank % comm_sz);
printf("Rank %d pre-recv from %d\n", rank, partner);
MPI_Recv(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Rank %d received pivot %d from %d\n", rank, pivot, partner);
}
}
else {
pivot = rand() % (e - s) + s;
}
int tmp = array[pivot];
array[pivot] = array[e];
array[e] = tmp;
// Here is where the actual quick sort happens
int i = s;
int j = e - 1;
while (i < j) {
while (array[e] >= array[i] && i < j) {
i++;
}
while (array[e] < array[j] && i < j) {
j--;
}
if (i < j) {
tmp = array[i];
array[i] = array[j];
array[j] = tmp;
}
}
if (array[e] < array[i]) {
tmp = array[i];
array[i] = array[e];
array[e] = tmp;
pivot = i;
}
else {
pivot = e;
}
// Split remaining elements between remaining processes
if (comm_sz > 1) {
// Elements greater than pivot
if (rank % comm_sz >= comm_sz/2) {
hyper_quick(array, rank, comm_sz/2, pivot + 1, e);
}
// Elements lesser than pivot
else {
hyper_quick(array, rank, comm_sz/2, s, pivot - 1);
}
}
// Recurse remaining elements in current process
else {
hyper_quick(array, rank, 1, s, pivot - 1);
hyper_quick(array, rank, 1, pivot + 1, e);
}
}
Rank 0 sending pivot 7 to 1
MPI_Send(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD);
^^^^
So the sender tag is zero.
Rank 1 pre-recv from 0
MPI_Recv(&pivot, 1, MPI_INT, partner, rank, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
^^^^
And the receiver tag is one.
If the receiver asks for only messages with a specific tag, it will not receive a message with a different tag.
Sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages, MPI allows senders and receivers to also specify message IDs with the message (known as tags). When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them. [MPI Tutorial -- Send and Receive]

MPI joining vectors

I'm trying to do some parallel calculations and then reduce them to one vector.
I try it by dividing for loop into parts which should be calculated separatedly from vector. Later I'd like to join all those subvectors into one main vector by replacing parts of it with values gotten from processes. Needless to say, I have no idea how to do it and my tries were in vain.
Any help will be appreciated.
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("My id: %d, mySize: %d, myStart: %d, myEnd: %d", rank, size, mystart, myend);
while(delta > granica)
{
ii++;
delta = 0;
//if(rank > 0)
//{
for(i = mystart; i < myend; i++)
{
xNowe[i] = b[i];
for(j = 0; j < n; j++)
{
if(i != j)
{
xNowe[i] -= A[i][j] * x0[j];
}
}
xNowe[i] = xNowe[i] / A[i][i];
printf("Result in iteration %d: %d", i, xNowe[i]);
}
MPI_Reduce(xNowe, xNowe,n,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
I'm going to ignore your calculations and assume they're all doing whatever it is you want them to do and at the end, you have an array called xNowe that has the results for your rank somewhere within it (in some subarray).
You have two options.
The first way uses an MPI_REDUCE in the way you're currently doing it.
What needs to happen is that you should probably set all of the values that do not pertain to your rank to 0, then you can just do a big MPI_REDUCE (as you're already doing), where each process contributes its xNowe array which will look something like this (depending on the input/rank/etc.):
rank: 0 1 2 3 4 5 6 7
value: 0 0 1 2 0 0 0 0
When you do the reduction (with MPI_SUM as the op), you'll get an array (on rank 0) that has each value filled in with the value contributed by each rank.
The second way uses an MPI_GATHER. Some might consider this to be the "more proper" way.
For this version, instead of using MPI_REDUCE to get the result, you only send the data that was calculated on your rank. You wouldn't have one large array. So your code would look something like this:
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("My id: %d, mySize: %d, myStart: %d, myEnd: %d", rank, size, mystart, myend);
while(delta > granica)
{
ii++;
delta = 0;
for(i = mystart; i < myend; i++)
{
xNowe[i-mystart] = b[i];
for(j = 0; j < n; j++)
{
if(i != j)
{
xNowe[i] -= A[i][j] * x0[j];
}
}
xNowe[i-mystart] = xNowe[i-mystart] / A[i][i];
printf("Result in iteration %d: %d", i, xNowe[i-mystart]);
}
}
MPI_Gather(xNowe, myend-mystart, MPI_DOUBLE, result, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
You would obviously need to create a new array on rank 0 that is called result to hold the resulting values.
UPDATE:
As pointed out by Hristo in the comments below, MPI_GATHER might not work here if myend - mystart is not the same on all ranks. If that's the case, you'd need to use MPI_GATHERV which allows you to specify a different size for each rank.

Resources