I'm learning MPI_Send, but I'm confused about this method. I wrote a simple pingpong program which the rank-0 node send the message to rank-1 node, and then the latter one returns a message to the former one.
if (rank == 0) { /* Send Ping, Receive Pong */
dest = 2;
source = 2;
rc = MPI_Send(pingmsg, strlen(pingmsg)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(buff, strlen(pongmsg)+1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
printf("Rank0 Sent: %s & Received: %s\n", pingmsg, buff);
}
else if (rank == 2) { /* Receive Ping, Send Pong */
dest = 0;
source = 0;
rc = MPI_Recv(buff, strlen(pingmsg)+1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
printf("Rank1 received: %s & Sending: %s\n", buff, pongmsg);
rc = MPI_Send(pongmsg, strlen(pongmsg)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
I run this program on a 3 nodes environment. However, the system displays:
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173)..............: MPI_Send(buf=0xbffffb90, count=10, MPI_CHAR, dest=2, tag=1, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1811): Communication error with rank 2: Unknown error 4294967295
I'm wondering why I can send a message from rank-0 node to rank-1 node, but an error occurs when changed from rank-0 node to rank-1 node? Thanks.
Actually have you checked whether strlen(pingmsg) is the same in both MPI_SEND and MPI_RECV
The amount of data sent using MPI_SEND should be less than or equal to the amount of data to be received by MPI_RECV or else it will lead to an error.
Related
I am creating a program in C, and I need assistance identifying the source for an MPI program when I receive the data from any source. Essentially, the lines of code are below:
int i; // Declares the variable to store the received int in.
if(WORLD_RANK == 0) {
MPI_Recv(&i, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Received %d from %d\n", i, SOURCE_WORLD_RANK)
} else { MPI_Send()}// complete some work here and then send data back
SOURCE_WORLD_RANK needs to be the source of the data I received, the value saved in I.
I am writing an MPI program that has the first instance working as a master, sending and receiving results from its workers.
The receive function does something like this:
struct result *check_for_message(void) {
...
static unsigned int message_size;
static char *buffer;
static bool started_reception = false;
static MPI_Request req;
if (!started_reception) {
MPI_Irecv(&message_size, 1, MPI_INT, MPI_ANY_SOURCE, SIZE_TAG,
MPI_COMM_WORLD, &req);
started_reception = true;
} else {
int flag = 0;
MPI_Status status;
MPI_Test(&req, &flag, &status);
if (flag == 1) {
started_reception = false;
buffer = calloc(message_size + 1, sizeof(char));
DIE_IF_NULL(buffer); // printf + MPI_Finalize + exit
MPI_Request content_req;
MPI_Irecv(buffer, MAX_MSG_SIZE, MPI_CHAR, status.MPI_SOURCE, CONTENT_TAG,
MPI_COMM_WORLD, &content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
ret = process_request(buffer);
free(buffer);
}
}
...
}
The send function does something like this:
MPI_Request size_req;
MPI_Request content_req;
MPI_Isend(&size, 1, MPI_INT, dest, SIZE_TAG, MPI_COMM_WORLD, &size_req);
MPI_Wait(&size_req, MPI_STATUS_IGNORE);
MPI_Isend(buf, size, MPI_CHAR, dest, CONTENT_TAG, MPI_COMM_WORLD,
&content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
I noticed that if I remove the MPI_Wait in the sending function it often happens that the execution blocks or some sort of SIGNAL stops the execution of an instance(I can check the output but I think it was something about a free error SIGSEGV).
When I add the MPI_Wait it always seems to run perfectly. Could it be something related to the order in which the two sends perform? Aren't they supposed to be in order?
I run the program locally with -n 16 but have also tested with -n 128. The messages that I send are above 50 chars (90% of the time), some being even > 300 chars.
The following code is a implementation of MPI. There is a message with increasing length that is being sent en returned. With each iteration the number of elements of the message increase.
There are 2 statements in the code that I don't understand.
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
On the first sight it looked not necessary in order to pass the message forward and back, but when I deteted it and compiled my code, it gave an error.
I also checked this post: what is the difference between MPI_Probe and MPI_Get_count in mpi
""While MPI_Probe may be used to find the size of a message you have to use MPI_Get_count to get that size. MPI_Probe returns a status which is a data structure providing information about the message, including its source, tag and size. But to get that size you call MPI_Get_count with the status as an argument.""
Why is it important to know the size and length of the message with the functions MPI_Probe and MPI_Get_Count? This is confusing me because you already describe the number of elements you send and receive in the MPI_Send and MPI_Recv functions.
for (message_size = 1; message_size <= MAX_ARRAY_SIZE; message_size <<= 1)
{
// Use a loop to vary the message size
if (myRank == 0)
{
double startTime, endTime;
numberOfElementsToSend = message_size;
printf("Rank %2.1i: Sending %i elements\n", myRank, numberOfElementsToSend);
// Measure the time spent in MPI communication
// (use the variables startTime and endTime)
startTime = MPI_Wtime();
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 1, 0,
MPI_COMM_WORLD);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 1, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
endTime = MPI_Wtime();
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
printf("Ping Pong took %f seconds\n", endTime - startTime);
}
else if (myRank == 1)
{
// Probe message in order to obtain the amount of data
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 0, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
numberOfElementsToSend = numberOfElementsReceived;
printf("Rank %2.1i: Sending back %i elements\n",
myRank, numberOfElementsToSend);
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 0, 0,
MPI_COMM_WORLD);
}
}
I have a piece of code, unfortunately, I couldn't run it, but I was trying to find if it has an error logically. Or if there is something missing, here is the
code:
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg, outmsg=’x’;
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = 1;
source = 1;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
else if (rank == 1) {
dest = 0;
source = 0;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d \n",
rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG);
MPI_Finalize();
}
And, is it allowed to save MPI send and receive in a variable, here rc has been used?
Your code is wrong. It contains a deadlock, which means that it can hang forever or misbehave otherwise. MPI_Send is a blocking operation - it may block until the respective MPI_Recv is called. So both processes will be stuck at their respective MPI_Send operation before MPI_Recv is called. Use MPI_Sendrecv instead.
Note that due to optimizations, MPI may instead chose to send the data immediately for small messages, so the code may complete even though it is wrong. Do not rely on that!
Normally, you don't have to check MPI return codes, as errors are fatal in MPI by default. In particular, don't assign the return code without checking it for MPI_SUCCESS.
Note that you can easily install MPI on any system, e.g. OpenMPI is available for most Linux distributions. There is no reason not to play around with MPI on a normal desktop system.
I have developed a given simple MPI program such that process 0 sends message to process 1 and receives message from process p-1. Following is the code :
In the skeleton given to me ,
char *message;
message= (char*)malloc(msg_size);
is confusing me. To check the correctness of program, I am trying to look value of message that been sent or received. So should it be hexadecimal value?
int main(int argc, char **argv)
{
double startwtime, endwtime;
float elapsed_time, bandwidth;
int my_id, next_id; /* process id-s */
int p; /* number of processes */
char* message; /* storage for the message */
int i, k, max_msgs, msg_size, v;
MPI_Status status; /* return status for receive */
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &my_id );
MPI_Comm_size( MPI_COMM_WORLD, &p );
if (argc < 3)
{
fprintf (stderr, "need msg count and msg size as params\n");
goto EXIT;
}
if ((sscanf (argv[1], "%d", &max_msgs) < 1) ||
(sscanf (argv[2], "%d", &msg_size) < 1))
{
fprintf (stderr, "need msg count and msg size as params\n");
goto EXIT;
}
**message = (char*)malloc (msg_size);**
if (argc > 3) v=1; else v=0; /*are we in verbose mode*/
/* don't start timer until everybody is ok */
MPI_Barrier(MPI_COMM_WORLD);
int t=0;
if( my_id == 0 ) {
startwtime = MPI_Wtime();
// do max_msgs times:
// send message of size msg_size chars to process 1
// receive message of size msg_size chars from process p-1
while(t<max_msgs) {
MPI_Send((char *) message, msg_size, MPI_CHAR, 1 , 0, MPI_COMM_WORLD);
MPI_Recv((char *) message, msg_size, MPI_CHAR, p-1, 0, MPI_COMM_WORLD, &status);
t++;
}
MPI_Barrier(MPI_COMM_WORLD);
endwtime = MPI_Wtime();
elapsed_time = endwtime-startwtime;
bandwidth = 2.0 * max_msgs * msg_size / (elapsed_time);
printf("Number, size of messages: %3d , %3d \n", max_msgs, msg_size);
fflush(stdout);
printf("Wallclock time = %f seconds\n", elapsed_time );
fflush(stdout);
printf("Bandwidth = %f bytes per second\n", bandwidth);
fflush(stdout);
} else if( my_id == p-1 ) {
// do max_msgs times:
// receive message of size msg_size from process to the left
// send message of size msg_size to process to the right (p-1 sends to 0)
while(t<max_msgs) {
MPI_Send((char *) message, msg_size, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
MPI_Recv((char *) message, msg_size, MPI_CHAR, my_id-1, 0, MPI_COMM_WORLD, &status);
t++;
}
} else {
while(t<max_msgs) {
MPI_Send((char *) message, msg_size, MPI_CHAR, my_id+1, 0, MPI_COMM_WORLD);
MPI_Recv((char *) message, msg_size, MPI_CHAR, my_id-1, 0, MPI_COMM_WORLD, &status);
t++;
}
}
MPI_Barrier(MPI_COMM_WORLD);
EXIT:
MPI_Finalize();
return 0;
}
I am not completely sure if this is what you mean, but I will try.
For what I understand, you want to know what is the message being sent. Well, for the code you provide, memory is assign to the message but any real "readable" message is specify. In this line.
message = (char*)malloc (msg_size);
malloc reserves the memory for the messages, so anyone can write it, however, it doesn't provide any initial value. Sometimes, the memory contains other information previously stored and freed. Then, the message being sent is that "garbage" that is before. This is probably what you call hexadecimal (I hope I understand this right).
The type of value in this case is char (defined as MPI_CHAR in the MPI_Send and MPI_Recv functions). Here you can find more data types for MPI.
I will suggest to assign a value to the message with the with my_id and next_id. So you know who is sending to whom.