I'm developing a client/server application by using MPI. I'm using MPI_Open_port(), MPI_Comm_Accept() and MPI_Comm_connect() for doing that. Inside the application I also use MPI_Send() and MPI_Recv() for the client/server comms.
All is fine when I transfer small arrays by using MPI_Send()/MPI_Recv(). I have a problem when some array growth in size. For some reason for large arrays MPI_Send/Recv functions are blocking the execution. It is like I'm sending and receiving the data on the the same Rank, what is true, but I'm using different communicators. I really don't understand why it doesn't work.
For a size 16*16*16 all works fine but, for instance, for a size equal to 32*32*32 it doesn't. I really want to know what is happening and I'll appreciate any advice for solving this issue.
Thanks.
The server is:
#include<mpi.h>
#include<stdio.h>
#define SIZE 32*32*32
//#define SIZE 16*16*16
int main(int argc, char *argv[])
{
MPI_Comm client;
char port_name[MPI_MAX_PORT_NAME];
FILE* port_file;
int buf[SIZE];
MPI_Init(&argc,&argv);
port_file = fopen("ports.txt","w");
MPI_Open_port(MPI_INFO_NULL, port_name);
fprintf(port_file,"%s\n",port_name);
fclose(port_file);
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client);
printf("Client connected...\n");
printf("Sending data...\n");
MPI_Send((void*)&buf, SIZE, MPI_INT, 0, 0, client);
printf("Data sent...\n");
MPI_Comm_free(&client);
MPI_Close_port(port_name);
MPI_Finalize();
return 0;
}
The client is:
#include<mpi.h>
#include<stdio.h>
#define SIZE 32*32*32
//#define SIZE 16*16*16
int main(int argc, char *argv[])
{
MPI_Comm server;
char port_name[MPI_MAX_PORT_NAME];
FILE* port_file;
int buf[SIZE];
MPI_Init(&argc,&argv);
port_file = fopen("ports.txt","r");
fscanf(port_file,"%s\n",port_name);
fclose(port_file);
MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
printf("Client connected...\n");
printf("Receiving data...\n");
MPI_Recv((void*)&buf, SIZE, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, server, MPI_STATUS_IGNORE);
printf("Data received...\n");
//MPI_Comm_disconnect(&server);
MPI_Finalize();
return 0;
}
Related
I am receiving a writing error when trying to scatter a dynamically allocated matrix (it is contiguous), it happens when more than 5 cores are involved in the computation. I have placed printfs and it occurs in the scatter, the code is the next:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cblas.h>
#include <sys/time.h>
int main(int argc, char* argv[])
{
int err = MPI_Init(&argc, &argv);
MPI_Comm world;
world=MPI_COMM_WORLD;
int size = 0;
err = MPI_Comm_size(world, &size);
int rank = 0;
err = MPI_Comm_rank(world, &rank);
int n_rows=2400, n_cols=2400, n_rpc=n_rows/size;
float *A, *Asc, *B, *C; //Dyn alloc A B and C
Asc=malloc(n_rpc*n_cols*sizeof(float));
B=malloc(n_rows*n_cols*sizeof(float));
C=malloc(n_rows*n_cols*sizeof(float));
A=malloc(n_rows*n_cols*sizeof(float));
if(rank==0)
{
for (int i=0; i<n_rows; i++)
{
for (int j=0; j<n_cols; j++)
{
A[i*n_cols+j]= i+1.0;
B[i*n_cols+j]=A[i*n_cols+j];
}
}
}
struct timeval start, end;
if(rank==0) gettimeofday(&start,NULL);
MPI_Bcast(B, n_rows*n_cols, MPI_FLOAT, 0, MPI_COMM_WORLD);
if(rank==0) printf("Before Scatter\n"); //It is breaking here
MPI_Scatter(A, n_rpc*n_cols, MPI_FLOAT, Asc, n_rpc*n_cols, MPI_FLOAT, 0, MPI_COMM_WORLD);
if(rank==0) printf("After Scatter\n");
/* Some computation */
err = MPI_Finalize();
if (err) DIE("MPI_Finalize");
return err;
}
Upto 4 cores, it works correctly and performs the scatter, but with 5 or more it does not, and I can not find a clear reason.
The error message is as follows:
[raspberrypi][[26238,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0xac51e0, 8)
Bad address(3)
[raspberrypi][[26238,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0xaf197048, 29053982)
Bad address(1)
[raspberrypi:05345] pml_ob1_sendreq.c:308 FATAL
Thanks in advance!
Multiple errors, first of all, take care of using always the same type when defining variables. Then, when you use scatter, the send count and receive are the same, and you will be sending Elements/Cores. Also when receiving with gather you have to receive the same amount you sent, so again Elements/Cores.
I have n workers, 1 master (rank 0) and need to send messages via MPI from the n workers to the master. The message format is a variable length vector (float *dta) and constant size header struct { int32_t x, int32_t y } dtaHdr.
The master just loops through the incoming results and processes them. It is important for it to be able to associate which dtaHdrgoes with which dta.
I know how to:
Create MPI_Datatype for the constant sized dtaHdr and send it via P2P MPI_Send/MPI_Recv.
Send a variable length vector of any datatype (e.g. MPI_Float) via P2P MPI_Send/MPI_Recv.
The issue is I have no idea how to combine these two approaches.
I know that I could:
Send header first and then the data second in two separate messages.
That has the problem of reordering & interleaving of messages. I need a reliable, easy, and scalable way to associate the header and its data on master. With two messages I can't see a way how to always & simply get both a header and data for incoming message on the master. I.e. two workers could send hdr & data messages to master & they could become interleaved. (I'm not sure what the guarantees for ordering are TBH, even after reading MPI specification).
Encode both the header and the data to a MPI_Byte array and send it esseentially as a binary blob.
Sounds really dirty & breaks some guarantees.
My question is: how do I MPI-idiomatically send one identifiable logical message that contains both a constant sized header of one type and a variable sized vector of some second type.
This program uses MPI_Pack and MPI_Unpack to send two different types in the same message:
#include <mpi.h>
#include <stdio.h>
#define ARRAY_SIZE(array) \
(sizeof(array) / sizeof(array[0]))
struct data_header {
int32_t x;
int32_t y;
};
MPI_Datatype dt_header;
MPI_Datatype dt_vector;
void sendmsg(void) {
struct data_header header = { 1, 2 };
float example[] = { 1.0, 2.0, 3.0, 4.0 };
char buffer[4096];
int position;
MPI_Pack(&header, 1, dt_header, buffer, sizeof(buffer), &position, MPI_COMM_WORLD);
MPI_Pack(example, 1, dt_vector, buffer, sizeof(buffer), &position, MPI_COMM_WORLD);
MPI_Send(buffer, position, MPI_PACKED, 0, 0, MPI_COMM_WORLD);
}
void recvmsg(void) {
struct data_header header;
float example[4];
char buffer[4096];
int position = 0;
MPI_Recv(buffer, sizeof(buffer), MPI_PACKED, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Unpack(buffer, sizeof(buffer), &position, &header, 1, dt_header, MPI_COMM_WORLD);
MPI_Unpack(buffer, sizeof(buffer), &position, example, 1, dt_vector, MPI_COMM_WORLD);
printf("x = %d, y = %d\n", header.x, header.y);
for (int index = 0; index < ARRAY_SIZE(example); index++) {
printf("%f ", example[index]);
}
printf("\n");
}
int main(void) {
int world_size;
int world_rank;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Type_contiguous(2, MPI_INT, &dt_header);
MPI_Type_commit(&dt_header);
MPI_Type_contiguous(4, MPI_FLOAT, &dt_vector);
MPI_Type_commit(&dt_vector);
if (0 == world_rank) {
recvmsg();
}
else {
sendmsg();
}
MPI_Finalize();
return 0;
}
Output
x = 1, y = 2
1.000000 2.000000 3.000000 4.000000
This is really just proof-of-concept code. Hopefully, it will help you find the solution you are looking for.
Note
This code does no error checking and should not be used in a production environment.
I have the following code which works:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
int world_rank, world_size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int n = 10000;
int ni, i;
double t[n];
int x[n];
int buf[n];
int buf_size = n*sizeof(int);
MPI_Buffer_attach(buf, buf_size);
if (world_rank == 0) {
for (ni = 0; ni < n; ++ni) {
int msg_size = ni;
int msg[msg_size];
for (i = 0; i < msg_size; ++i) {
msg[i] = rand();
}
double time0 = MPI_Wtime();
MPI_Bsend(&msg, msg_size, MPI_INT, 1, 0, MPI_COMM_WORLD);
t[ni] = MPI_Wtime() - time0;
x[ni] = msg_size;
MPI_Barrier(MPI_COMM_WORLD);
printf("P0 sent msg with size %d\n", msg_size);
}
}
else if (world_rank == 1) {
for (ni = 0; ni < n; ++ni) {
int msg_size = ni;
int msg[msg_size];
MPI_Request request;
MPI_Barrier(MPI_COMM_WORLD);
MPI_Irecv(&msg, msg_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
printf("P1 received msg with size %d\n", msg_size);
}
}
MPI_Buffer_detach(&buf, &buf_size);
MPI_Finalize();
}
As soon as I remove the print statements, the program crashes, telling me there is a MPI_ERR_BUFFER: invalid buffer pointer. If I remove only one of the print statements the other print statements are still executed, so I believe it crashes at the end of the program. I don't see why it crashes and the fact that it does not crash when I am using the print statements goes beyond my logic...
Would anybody have a clue what is going on here?
You are simply not providing enough buffer space to MPI. In buffered mode, all ongoing messages are stored in the buffer space which is used as a ring buffer. In your code, there can be multiple messages that need to be buffered, regardless of the printf. Note that not even 2*n*sizeof(int) would be enough buffer space - the barriers do not provide a guarantee that the buffer is locally freed even though the corresponding receive is completed. You would have to provide (n*(n-1)/2)*sizeof(int) memory to be sure, or something in-between and hope.
Bottom line: Don't use buffered mode.
Generally, use standard blocking send calls and write the application such that it doesn't deadlock. Tune the MPI implementation such that small messages regardless of the receiver - to avoid wait times on late receivers.
If you want to overlap communication and computation, use nonblocking messages - providing proper memory for each communication.
EDIT: There is no problem with this code in particular. I created a reduced version of my code and this part works perfectly. I still don't understand why it is not working in my whole code, because I have everything commented but this, but that's maybe too particular. Sorry, wrong question.
(I edited and added at the bottom the error that I get).
I'm trying to parallelize a C program.
I'm encountering errors when I try to pass an array allocated with malloc from the master process to the rest of processes. Or better, when I try to receive it.
This is the piece of code I'm having trouble with:
if (rank == 0)
{
int *data=(int *) malloc(size*sizeof(int));
int error_code = MPI_Send(data, size, MPI_INT, 1, 1, MPI_COMM_WORLD);
if (error_code != MPI_SUCCESS) {
char error_string[BUFSIZ];
int length_of_error_string;
MPI_Error_string(error_code, error_string, &length_of_error_string);
printf("%3d: %s\n", rank, error_string);
}
printf("Data sent.");
}
else if (rank == 1)
{
int *data=(int *) malloc(size*sizeof(int));
int error_code = MPI_Recv(data, size, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
if (error_code != MPI_SUCCESS) {
char error_string[BUFSIZ];
int length_of_error_string;
MPI_Error_string(error_code, error_string, &length_of_error_string);
printf("%3d: %s\n", rank, error_string);
}
printf("Received.");
}
"Data sent." is printed, followed by a segmentation fault (with memory dump) caused by the second process and "Received" is never printed.
I think I'm not receiving well the data. But I tried several possibilities, I think I have to pass the address of the variable and not just the pointer to the first position, so I thought this was the correct way, but it is not working.
From the error codes nothing gets printed.
Does anyone know what's causing the error and what was my mistake?
Thanks!
EDIT:
This is the exact error:
*** Process received signal ***
*** End of error message ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
EDIT 2:
This code works:
int main(int argc, char* argv[])
{
int size_x = 12;
int size_y = 12;
int rank, size, length;
char nodename[BUFSIZ];
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Get_processor_name(nodename, &length);
MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
if (rank == 0)
{
int *data=malloc(size*sizeof(int));
int error_code = MPI_Send(data, size, MPI_INT, 1, 1, MPI_COMM_WORLD);
if (error_code != MPI_SUCCESS)
{
char error_string[BUFSIZ];
int length_of_error_string;
MPI_Error_string(error_code, error_string, &length_of_error_string);
printf("%3d: %s\n", rank, error_string);
}
printf("Data sent.");
}
else if (rank > 0)
{
int *data=malloc(size*sizeof(int));
int error_code = MPI_Recv(data, size, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
if (error_code != MPI_SUCCESS)
{
char error_string[BUFSIZ];
int length_of_error_string;
MPI_Error_string(error_code, error_string, &length_of_error_string);
printf("%3d: %s\n", rank, error_string);
}
printf("Received.");
}
MPI_Finalize();
return 0;
}
I found the problem, it was not the MPI calls, but there was a problem with a previous function (forgot a variable in "printf") which I didn't notice. That made the whole code break. Tricky MPI...
I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.