I am making a MPI password cracker, that uses brute force approach to crack a SHA512 hash key, I have code that works fine with 1 password and multiple processes, or multiple passwords and 1 process, but when I do multiple passwords & multiple processes I get the following error:
[ubuntu:2341] *** An error occurred in MPI_Recv
[ubuntu:2341] *** reported by process [191954945,1]
[ubuntu:2341] *** on communicator MPI_COMM_WORLD
[ubuntu:2341] *** MPI_ERR_TRUNCATE: message truncated
[ubuntu:2341] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ubuntu:2341] *** and potentially your MPI job)
I believe this is caused by process rank #1 receiving a string "/" instead of the password hash.
The issue is I am not sure why.
I have also noticed something strange with my code, I have the following loop in process rank 0:
while(!done){
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &done, &status);
if(done==1) {
for(i=1;i<size;i++){
if(i!=status.MPI_SOURCE){
printf("sending done to process %d\n", i);
MPI_Isend(&done, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &request[i]);
}
}
}
}
Which keeps looping waiting for one of the child processes to alert it that it has found the password. Lets say I am running 2 processes (excluding the base process) and process 2 finds the password, the output will then be:
sending done to process 1
sending done to process 1
When it should only be sending that once, or at the least if it is sending it twice surely the one of those values should be 2, not both of them being 1?
The main bulk of my code is as follows:
Process 0 :
while(!feof(f)) {
fscanf(f, "%s\n", buffer);
int done = 0;
int i, sm;
// lengths of the word (we know it should be 92 though)
length = strlen(buffer);
// Send the password to every process except process 0
for (sm=1;sm<size;sm++) {
MPI_Send(&length, 1, MPI_INT, sm, 0, MPI_COMM_WORLD);
MPI_Send(buffer, length+1, MPI_CHAR, sm, 0, MPI_COMM_WORLD);
}
// While the passwords are busy cracking - Keep probing.
while(!done){
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &done, &status);
if(done==1) {
for(i=1;i<size;i++){
if(i!=status.MPI_SOURCE){
printf("sending done to process %d\n", i);
MPI_Isend(&done, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &request[i]);
}
}
}
}
}
Which loops through the file, grabs a new password, sends the string to the child processes, at which point they receive it:
MPI_Recv(&length, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("string to be recieived has %d characters\n", length);
MPI_Recv(buffer, length+1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %s\n", rank, buffer);
The child processes crack the password, then repeat with the next one. (assuming there is one, currently it's an infinite loop but I want to sort it out with 2 passwords before I fix that).
All processes recieve the correct password for the first time, it's when they grab the second password that only 1 process has the correct one and the rest receive a "/" character.
Alright, typical case of me getting worked up and over looking something simple.
I'll leave this question just in case anyone else happens to have the same issue though.
I was forgetting to also receive the solution after probing it.
Was never fully clear how probe differed from receive but I guess probe just flags something changes, but to actually take it out of the "queue" you need to then collect it with receive.
Related
I am trying to implement some form of persistent calling. Somehow the following code keeps hanging - I guessed I must have introduced a deadlock but can't really wrap my head around it...
MPI_Request r[4];
[...]
MPI_Send_init(&Arr[1][1], 1, MPI_DOUBLE, 1, A, MPI_COMM_WORLD, &r[0]);
MPI_Recv_init(&Arr[1][0], 1, MPI_DOUBLE, 0, A, MPI_COMM_WORLD, &r[1]);
MPI_Send_init(&Arr[2][1], 1, MPI_DOUBLE, 0, B, MPI_COMM_WORLD, &r[2]);
MPI_Recv_init(&Arr[2][0], 1, MPI_DOUBLE, 1, B, MPI_COMM_WORLD, &r[3]);
[...]
MPI_Startall(4, r);
MPI_Waitall(4, r, MPI_STATUSES_IGNORE);
I think this is perfect material for deadlock - what would be the remedy here if I want to init these send/receive message and just invoke the processes later all with Startall and Waitall?
EDIT: So if I do
MPI_Start(&r[0]);
MPI_Wait(&r[0], &status):
Then it does not hang. Invoking something like:
for (int k=0; k<1; k++)
{
MPI_Start(&r[k]);
MPI_Wait(&r[k], &status);
}
fail and hang. if that helps
your tags do not match.
for example, rank 0 receives from itself with tag A
but it sends to itself with tag B
I have to admit I'm not familiar with the concept of MPI requests and the MPI_Send/Recv_init. However, I could reproduce the deadlock with simple sends and receives. This is the code (it has a deadlock):
double someVal = 3.5;
const int MY_FIRST_TAG = 42;
MPI_Send(&someVal, 1, MPI_DOUBLE, 1, MY_FIRST_TAG, MPI_COMM_WORLD);
MPI_Recv(&someVal, 1, MPI_DOUBLE, 0, MY_FIRST_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
Even if you run it with only two processes, the problem is the following: Both, process 0 and 1 send a message to process 1. Then both processes want to receive a message from process 0. Process 1 can because process zero actually sent a message to process 1. But nobody sent a message to process 0. Consequently, this process will wait there forever.
How to fix: You need to specify that only process 0 sends to process 1 and only process 1 is supposed to receive from process 0. You can simply do it with:
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
MPI_Send(&someVal, 1, MPI_DOUBLE, 1, MY_FIRST_TAG, MPI_COMM_WORLD);
else // Assumption: Only two processes
MPI_Recv(&someVal, 1, MPI_DOUBLE, 0, MY_FIRST_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
I'm not 100% sure how to translate this to the concept of requests and MPI_Send/Recv_init but maybe this helps you nonetheless.
This is what I am trying to achieve.
Blue is the message.
Yellow is when the specific node changes the leader known to it.
Green is the final election of each node.
The code seems correct to me but it's always stuck inside the while loop no matter what I tried. For a small number of nodes during runtime it returns a segmentation fault after a while.
election_status=0;
firstmsg[0]=world_rank; // self rank
firstmsg[1]=0; // counter for hops
chief=world_rank; // each node declares himself as leader
counter=0; // message counter for each node
// each node sends the first message to the next one
MPI_Send(&firstmsg, 2, MPI_INT, (world_rank+1)%world_size, 1, MPI_COMM_WORLD);
printf("Sent ID with counter to the right node [%d -> %d]\n",world_rank, (world_rank+1)%world_size);
while (election_status==0){
// EDIT: Split MPI_Recv for rank 0 and rest
if (world_rank==0){
MPI_Recv(&incoming, 2, MPI_INT, world_size-1, 1, MPI_COMM_WORLD, &status);
}
else {
MPI_Recv(&incoming, 2, MPI_INT, (world_rank-1)%world_size, 1, MPI_COMM_WORLD, &status);
}
counter=counter+1;
if (incoming[0]<chief){
chief=incoming[0];
}
incoming[1]=incoming[1]+1;
// if message is my own and hopped same times as counter
if (incoming[0]==world_rank && incoming[1]==counter) {
printf("Node %d declares node %d a leader.\n", world_rank, chief);
election_status=1;
}
//sends the incremented message to the next node
MPI_Send(&incoming, 2, MPI_INT, (world_rank+1)%world_size, 1, MPI_COMM_WORLD);
}
MPI_Finalize();
In order to determine some minimum number among a number of ranks for all ranks, use MPI_Allreduce!
MPI_Send is blocking. It can block forever until a matching receive is posted. Your program deadlocks on the first call to MPI_Send - and any successive once should it complete by coincidence. To avoid that specifically use MPI_Sendrecv.
(world_rank-1)%world_size will produce -1 for world_rank == 0. Using -1 as rank number is not valid. It might coincidentially be MPI_ANY_SOURCE.
What I have
I have a C-program using MPI, and it uses 4 processes: 1 vehicle(taskid=0) and 3 passengers.
Vehicle can accommodate 2 passengers at a time.
3 customers keep coming back to get a ride.
For Vehicle, I have:
int passengers[C] = {0};
while(1)
MPI_Recv(&pid, 1, MPI_INT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status);
//put pid in passengers[totalNumberArrived]
if(totalNumberArrived == 2)
printf("vehicle left...");
sleep(5);
printf("vehicle came back...");
for (i=0; i<2; i++)
MPI_Send(&passengers[i], 1, MPI_INT, passengers[i], 1, MPI_COMM_WORLD);
totalNumberArrived = 0;
if(done)//omitting the details here
break;
And, for each passengers, I have:
for (i to NumOfRound)
sleep(X);
printf("%d is sending a msg", tasked)
MPI_Send(&taskid, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Recv(&pid, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &pstatus);
printf("%d received from %d\n", tasked, pid, pstatus.MPI_SOURCE);
issue
If the vehicle left for a ride with taskid 1 and 3, I expect to see this kind of output:
vehicle left...
vehicle came back...
1 is sending a msg
3 is sending a msg (this could be before 1's msg though)
but I sometimes get
vehicle left...
1 is sending a msg
3 is sending a msg
vehicle came back...
which looks like the passenger is not blocked until the vehicle comes back.
I thought that MPI_Recv blocks the task until it gets a msg from the vehicle, so I researched and read that MPI_Recv does block but this kind of issues occur because the printf is not necessarily printing in order. I also read that some recommends to use flush but in some cases flush doesn't work.
I'm not sure what I should do in my case. Is it really just the matter of printf order?
I've also read this: Ordering Output in MPI
and wonder if I should add a master thread and let it be the central controller for vehicle and passengers??
You can't rely on printing output to be ordered between processes. The only thing you can count on is that output will be ordered per processes. Therefore, if for some reason it's critical that you can print things to STDOUT/STDERR in order, you need to aggregate it to one process first.
I am trying to probe for messages in order to communicate between MPI processes (8 processes). The first process to reach a certain part of the code will signal all the other processes, and the others will terminate upon reaching there.
Here is what I've implemented: (any simpler solutions are welcome)
if(depth == size){
endTime = MPI_Wtime();
MPI_Iprobe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &stat, MPI_STATUS_IGNORE);
if(!stat){
printf("Execution completed in %.5f seconds.\n", endTime - beginTime);
for (ctr = 0; ctr < mpi_processors; ctr++) {
if(ctr == mpi_my_pid) continue;
MPI_Isend(&stat, 1, MPI_INT, ctr, 0, MPI_COMM_WORLD, &req);
printf("sent to %d from %d\n",ctr,mpi_my_pid);
}
}
return 1;
}
The code is self-explanatory. stat is a dummy variable just for "send"ing a message, and also used as the flag for the Iprobe. The problem is, stat is always zero in all processes, meaning that probing doesn't return any waiting messages to be received. But I can confirm that MPI_Isend runs correctly and sends the message.
Am I doing something fundamentally wrong, or is there a simple bug somewhere that I can't see?
Thanks,
Can.
I am having problems in one my project related to MPI development. I am working on the implementation of an RNA parsing algorithm using MPI in which I started the parsing of an input string based on some parsing rules and parsing table (contains different states and related actions) with a master node. In parsing table, there are multiple actions for each state which can be done in parallel. So, I have to distribute these actions among different processes. To do that, I am sending the current state and parsing info (current stack of parsing) to the nodes using separate thread to receive actions from other nodes while the main thread is busy in parsing based on received actions. Following are the code snippets of the sender and receiver:
Sender Code:
StackFlush(&snd_stack);
StackPush(&snd_stack, state_index);
StackPush(&snd_stack, current_ch);
StackPush(&snd_stack, actions_to_skip);
elements_in_stack = stack.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
StackPush(&snd_stack, stack.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = parse_tree.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
StackPush(&snd_stack, parse_tree.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = snd_stack.top+1;
MPI_Send(&elements_in_stack, 1, MPI_INT, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD);
MPI_Send(&snd_stack.contents[0], elements_in_stack, MPI_CHAR, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK, MPI_COMM_WORLD);
Receiver Code:
MPI_Recv(&e_count, 1, MPI_INT, MPI_ANY_SOURCE, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD, &status);
if(e_count == 0){
break;
}
while((bt_stack.top + e_count) >= bt_stack.maxSize - 1){usleep(500);}
pthread_mutex_lock(&mutex_bt_stack); //using mutex for accessing shared data among threads
MPI_Recv(&bt_stack.contents[bt_stack.top + 1], e_count, MPI_CHAR, status.MPI_SOURCE, MSG_ACTION_STACK, MPI_COMM_WORLD, &status);
bt_stack.top += e_count;
pthread_mutex_unlock(&mutex_bt_stack);
The program is running fine for small input having less communications but as we increase the input size which in response increases the communication so the receiver receives many requests while processing few then it get crashed with the following errors:
Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(186) ……………………………………: MPI_Recv(buf=0x5b8d7b1, count=19, MPI_CHAR, src=3, tag=1, MPI_COMM_WORLD, status=0x41732100) failed
MPIDI_CH3U_Request_unpack_uebuf(625)L Message truncated; 21 bytes received but buffer size is 19
Rank 0 in job 73 hpc081_56549 caused collective abort of all ranks exit status of rank 0: killed by signal 9.
I have also tried this by using Non-Blocking MPI calls but still the similar errors.
I don't know what the rest of the code looks like, but here's an idea. Since there is a break I'm assuming the receiver code is part of a loop or a switch statement. If that's the case, there is a mismatch between sends and receives when the element count becomes 0:
The sender will send the element count and a zero-length message (the MPI_Send(&snd_stack.contents... line).
There will be no matching receive for this second message because the receiver breaks out of the loop.
The zero-length message will then match something else, possibly causing the error you are seeing down the line.