MPI and MemoryLeaks and MPI_Wait() with async send and recv - c

I am new to MPI programming and I am trying to create a program that would perform 2-way communication between processes in a ring.
I was getting MemoryLeaks errors at the MPI_Finalize() statement. Later I found out that I could use the -fsanitize=address -fno-omit-frame-pointer flags to help me debug where the leaks could be.
Now I get a very bizarre (at least for me) error.
Here's my code:
MPI_Request request_s1, request_s2, request_r1, request_r2;
// receiving 2 elems from the left neighbor, which i shall be needing
if (0 > MPI_Irecv(lefties, EXTENT, MPI_DOUBLE, my_left, 1, MPI_COMM_WORLD, &request_r1)) {
return 2;
}
// receiving 2 elems from my right neighbor which i will be appending at the end of my input
if (0 > MPI_Irecv(righties, EXTENT, MPI_DOUBLE, my_right, 1, MPI_COMM_WORLD, &request_r2)) {
return 2;
}
// sending the first 2 elems which will be required by the left neighbor
if (0 > MPI_Isend(my_output_buffer, EXTENT, MPI_DOUBLE, my_left, 1, MPI_COMM_WORLD, &request_s1)) {
return 2;
}
// sending the last 2 elems to my right neighbor
if (0 > MPI_Isend(&my_output_buffer[displacement - EXTENT], EXTENT, MPI_DOUBLE, my_right, 1, MPI_COMM_WORLD, &request_s2)) {
return 2;
}
MPI_Wait(&request_r2, MPI_STATUS_IGNORE);
MPI_Wait(&request_r1, MPI_STATUS_IGNORE);
The error I get is
[my_machine:18353] *** An error occurred in MPI_Wait
[my_machine:18359] *** reported by process [204079105,1]
[my_machine:18359] *** on communicator MPI_COMM_WORLD
[my_machine:18359] *** MPI_ERR_TRUNCATE: message truncated
[my_machine:18359] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[my_machine:18359] *** and potentially your MPI job)
[my_machine:18353] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[my_machine:18353] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
and I have no clue how to progress from here.

You seem to be reusing your request variables. Don't. If one is created, you have to wait for it.
It wouldn't hurt to initialize the request variables with MPI_REQUEST_NULL, in case you're waiting for a request that was not created.
The 0>MPI_whatever idiom is strange. Instead: MPI_SUCCESS!=MPI_Whatever.
But even that may not be work because the default is that routines do not return on error, but abort the program.
And it may be something else entirely which I can't tell without seeing the rest of the code.

Related

Can MPI_Gather be used to receive data from threads that use MPI_Send?

I have a master process and more slave processes. I want that every slave process to send back to the master one integer, so I guess I should gather them using MPI_Gather. But somehow it doesn't work and I started to think that MPI_Gather is incompatible with MPI_Send.
The relevant lines of code look like this:
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &process_id);
MPI_Comm_size(MPI_COMM_WORLD, &process_count);
int full_word_count = 0;
int* receiving_buffer = (int*)malloc(sizeof(int) * 100);
if (process_id == 0)
{
// Some Master code here ...
MPI_Gather(full_word_count, 1, MPI_INT, receiving_buffer, 1, MPI_INT, 0, MPI_COMM_WORLD);
// ...
}
else
{
// Some Slave code here ...
MPI_Send(full_word_count, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
//...
}
MPI_Finalize();
I also know that I used "1" for MPI_Gather because I tried to run only for two processes, so process 1 would send, and process 0 would gather; of course, for more processes I should modify it using ranks. But my main question here is that I can use (and if yes, how) MPI_Gather combined with MPI_Send for a situation like this.
MPI_Gather() is a collective operation and must hence be called by all the ranks of the communicator. They also must provide matching signatures (datatype and count) and all use the same root value.
Note the send buffer of the root rank is also gathered into the receive buffers, so if the send count is 1, you really should allocate your receive buffer with
int* receiving_buffer = (int*)malloc(sizeof(int) * process_count)
and since all ranks send 1 * MPI_INT, a correct receive signature is also be 1 * MPI_INT.
Also note that "threads" is improper in this context. MPI tasks or MPI processes are the right terminology.
Keep in mind that the standard does not specify how a collective operation should be implemented. In the case of MPI_Gather(), a naive implementation would have all MPI tasks send their buffer to the root rank. But some more sophisticated algorithm can be used such as a tree-based gather, and in that case, not all tasks would send their buffer to the root rank.

Trying to receive a vector with MPI_Recv

I'm implementing the Chan and Dehne sorting algorithm using MPI and the CGM realistic parallel model. So far each process receives N/p numbers from the original vector, each process then order their numbers sequentially using quick sort, each process then creates a sample from it's local vector (the sample has size p), each process then sends their sample over to P0; P0 should receive all samples in a bigger vector of size p*p so it can accommodate data from all processors. This is where I'm stuck, it seems to be working but for some reason after P0 receives all the data it exits with Signal: Segmentation fault (11). Thank you.
Here is the relevant part of the code:
// Step 2. Each process calculates it's local sample with size comm_sz
local_sample = create_local_sample(sub_vec, n_over_p, comm_sz);
// Step 3. Each process sends it's local sample to P0
if (my_rank == 0) {
global_sample_receiver = (int*)malloc(pow(comm_sz,2)*sizeof(int));
global_sample_receiver = local_sample;
for (i = 1; i < comm_sz; i++) {
MPI_Recv(global_sample_receiver+(i*comm_sz), comm_sz, MPI_INT,
i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
} else {
MPI_Send(local_sample, comm_sz, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
printf("P%d got here\n", my_rank);
MPI_Finalize();
What is funny is that every process reachs the command printf("P%d got here\n", my_rank); and therefor prints to the terminal. Also global_sample_receiver does contain the data it is supposed to contain at the end, but the program still finished with a segmentation fault.
Here is the output:
P2 got here
P0 got here
P3 got here
P1 got here
[Krabbe-Ubuntu:05969] *** Process received signal ***
[Krabbe-Ubuntu:05969] Signal: Segmentation fault (11)
[Krabbe-Ubuntu:05969] Signal code: Address not mapped (1)
[Krabbe-Ubuntu:05969] Failing at address: 0x18000003e7
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 5969 on node Krabbe-Ubuntu
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Edit: I found the problem, turns out local_sample also needed a malloc.
The issue is you overwrite global_sample_receiver (which is a pointer) with local_sample (which is an other pointer) on rank zero.
If you want to set the first comm_sz elements of global_sample_receiver with the first comm_sz elements from local_sample, then you have to copy the data (e.g. not the pointer) manually.
memcpy(global_sample_receiver, local_sample, comm_sz * sizeof(int));
That being said, the natural MPI way of doing this is via MPI_Gather().
Here is what step 3 would look like :
// Step 3. Each process sends it's local sample to P0
if (my_rank == 0) {
global_sample_receiver = (int*)malloc(pow(comm_sz,2)*sizeof(int));
}
MPI_Gather(global_sample_receiver,comm_sz, MPI_INT, local_sample, comm_sz, MPI_INT, 0, MPI_COMM_WORLD);

C - MPI child processes receiving incorrect string

I am making a MPI password cracker, that uses brute force approach to crack a SHA512 hash key, I have code that works fine with 1 password and multiple processes, or multiple passwords and 1 process, but when I do multiple passwords & multiple processes I get the following error:
[ubuntu:2341] *** An error occurred in MPI_Recv
[ubuntu:2341] *** reported by process [191954945,1]
[ubuntu:2341] *** on communicator MPI_COMM_WORLD
[ubuntu:2341] *** MPI_ERR_TRUNCATE: message truncated
[ubuntu:2341] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ubuntu:2341] *** and potentially your MPI job)
I believe this is caused by process rank #1 receiving a string "/" instead of the password hash.
The issue is I am not sure why.
I have also noticed something strange with my code, I have the following loop in process rank 0:
while(!done){
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &done, &status);
if(done==1) {
for(i=1;i<size;i++){
if(i!=status.MPI_SOURCE){
printf("sending done to process %d\n", i);
MPI_Isend(&done, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &request[i]);
}
}
}
}
Which keeps looping waiting for one of the child processes to alert it that it has found the password. Lets say I am running 2 processes (excluding the base process) and process 2 finds the password, the output will then be:
sending done to process 1
sending done to process 1
When it should only be sending that once, or at the least if it is sending it twice surely the one of those values should be 2, not both of them being 1?
The main bulk of my code is as follows:
Process 0 :
while(!feof(f)) {
fscanf(f, "%s\n", buffer);
int done = 0;
int i, sm;
// lengths of the word (we know it should be 92 though)
length = strlen(buffer);
// Send the password to every process except process 0
for (sm=1;sm<size;sm++) {
MPI_Send(&length, 1, MPI_INT, sm, 0, MPI_COMM_WORLD);
MPI_Send(buffer, length+1, MPI_CHAR, sm, 0, MPI_COMM_WORLD);
}
// While the passwords are busy cracking - Keep probing.
while(!done){
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &done, &status);
if(done==1) {
for(i=1;i<size;i++){
if(i!=status.MPI_SOURCE){
printf("sending done to process %d\n", i);
MPI_Isend(&done, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &request[i]);
}
}
}
}
}
Which loops through the file, grabs a new password, sends the string to the child processes, at which point they receive it:
MPI_Recv(&length, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("string to be recieived has %d characters\n", length);
MPI_Recv(buffer, length+1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %s\n", rank, buffer);
The child processes crack the password, then repeat with the next one. (assuming there is one, currently it's an infinite loop but I want to sort it out with 2 passwords before I fix that).
All processes recieve the correct password for the first time, it's when they grab the second password that only 1 process has the correct one and the rest receive a "/" character.
Alright, typical case of me getting worked up and over looking something simple.
I'll leave this question just in case anyone else happens to have the same issue though.
I was forgetting to also receive the solution after probing it.
Was never fully clear how probe differed from receive but I guess probe just flags something changes, but to actually take it out of the "queue" you need to then collect it with receive.

I don't see what the issue is in my program in MPI

I don't know how to fix the problem with this program so far. The purpose of this program is to add up all the number in an array but I can only barely manage to send the arrays before errors start to appear. It has to do with the for loop in the if statement my_rank!=0 section.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]){
int my_rank, p, source, dest, tag, total, n = 0;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
int arr[n];
MPI_Recv( arr, n, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
//printf("%i ", my_rank);
int i;
for(i = ((my_rank-1)*(n/15)); i < ((my_rank-1)+(n/15)); i++ ){
//printf("%i ", arr[0]);
}
}
else{
printf("Please enter an integer:\n");
scanf("%i", &n);
int i;
int arr[n];
for(i = 0; i < n; i++){
arr[i] = i + 1;
}
for(dest = 0; dest < p; dest++){
MPI_Send( &n, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Send( arr, n, MPI_INT, dest, tag, MPI_COMM_WORLD);
}
}
MPI_Finalize();
}
When I take that for loop out it compiles and run but when I put it back in it just stops working. Here is the error it is giving me:
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
[compute-0-24.local:1072] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Please enter an integer:
--------------------------------------------------------------------------
mpirun has exited due to process rank 8 with PID 1072 on
node compute-0-24 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-16.local][[31957,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.4.237 failed: Connection refused (111)
[cs-cluster:11677] 14 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cs-cluster:11677] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
There are two problems in the code you posted:
The send loop starts from p=0, which means that process of rank zero will send to itself. However, since there's no receiving part for process zero, this won't work. Just make the loop to start from p=1 and that should solve it.
The tag you use isn't initialised. So it's value can be whatever (which is OK), but can be a different whatever per process, which will lead to the various communications to never match each-other. Just initialise tag=0 for example, and that should fix that.
With this, your code snippet should work.
Learn to read the informative error messages that Open MPI gives you and to apply some general debugging strategies.
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
The library is telling you that the receive operation was called with an invalid rank value. Armed with that knowledge, you take a look at your code:
int my_rank, p, source, dest, tag, total, n = 0;
...
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
...
The rank is source. source is an automatic variable declared some lines before but never initialised, therefore its initial value is completely random. You fix it by assigning source an initial value of 0 or by simply replacing it with 0 since you've already hard-coded the rank of the sender by singling out its code in the else block of the if operator.
The presence of the above error eventually hints you to examine the other variables too. Thus you notice that tag is also used uninitialised and you either initialise it to e.g. 0 or replace it altogether.
Now your program is almost correct. You notice that it seems to work fine for n up to about 33000 (the default eager limit of the self transport divided by sizeof(int)), but then it hangs for larger values. You either fire a debugger of simply add a printf statement before and after each send and receive operation and discover that already the first call to MPI_Send with dest equal to 0 never returns. You then take a closer look at your code and discover this:
for(dest = 0; dest < p; dest++){
dest starts from 0, but this is wrong since rank 0 is only sending data and not receiving. You fix it by setting the initial value to 1.
Your program should now work as intended (or at least for values of n that do not lead to stack overflow in int arr[n];). Congratulations! Now go and learn about MPI_Probe and MPI_Get_count, which will help you do the same without explicitly sending the length of the array first. Then learn about MPI_Scatter and MPI_Reduce, which will enable you to implement the algorithm even more elegantly.

MPI - Message Truncation in MPI_Recv

I am having problems in one my project related to MPI development. I am working on the implementation of an RNA parsing algorithm using MPI in which I started the parsing of an input string based on some parsing rules and parsing table (contains different states and related actions) with a master node. In parsing table, there are multiple actions for each state which can be done in parallel. So, I have to distribute these actions among different processes. To do that, I am sending the current state and parsing info (current stack of parsing) to the nodes using separate thread to receive actions from other nodes while the main thread is busy in parsing based on received actions. Following are the code snippets of the sender and receiver:
Sender Code:
StackFlush(&snd_stack);
StackPush(&snd_stack, state_index);
StackPush(&snd_stack, current_ch);
StackPush(&snd_stack, actions_to_skip);
elements_in_stack = stack.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
StackPush(&snd_stack, stack.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = parse_tree.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
StackPush(&snd_stack, parse_tree.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = snd_stack.top+1;
MPI_Send(&elements_in_stack, 1, MPI_INT, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD);
MPI_Send(&snd_stack.contents[0], elements_in_stack, MPI_CHAR, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK, MPI_COMM_WORLD);
Receiver Code:
MPI_Recv(&e_count, 1, MPI_INT, MPI_ANY_SOURCE, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD, &status);
if(e_count == 0){
break;
}
while((bt_stack.top + e_count) >= bt_stack.maxSize - 1){usleep(500);}
pthread_mutex_lock(&mutex_bt_stack); //using mutex for accessing shared data among threads
MPI_Recv(&bt_stack.contents[bt_stack.top + 1], e_count, MPI_CHAR, status.MPI_SOURCE, MSG_ACTION_STACK, MPI_COMM_WORLD, &status);
bt_stack.top += e_count;
pthread_mutex_unlock(&mutex_bt_stack);
The program is running fine for small input having less communications but as we increase the input size which in response increases the communication so the receiver receives many requests while processing few then it get crashed with the following errors:
Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(186) ……………………………………: MPI_Recv(buf=0x5b8d7b1, count=19, MPI_CHAR, src=3, tag=1, MPI_COMM_WORLD, status=0x41732100) failed
MPIDI_CH3U_Request_unpack_uebuf(625)L Message truncated; 21 bytes received but buffer size is 19
Rank 0 in job 73 hpc081_56549 caused collective abort of all ranks exit status of rank 0: killed by signal 9.
I have also tried this by using Non-Blocking MPI calls but still the similar errors.
I don't know what the rest of the code looks like, but here's an idea. Since there is a break I'm assuming the receiver code is part of a loop or a switch statement. If that's the case, there is a mismatch between sends and receives when the element count becomes 0:
The sender will send the element count and a zero-length message (the MPI_Send(&snd_stack.contents... line).
There will be no matching receive for this second message because the receiver breaks out of the loop.
The zero-length message will then match something else, possibly causing the error you are seeing down the line.

Resources