I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.
Related
So, somehow MPI_Probe receives the same message even though it is only sent once.
I execute the program with only 2 process of which process 1 is sending two messages, one two retrieve a task and another one to send the result. I send the messages with different tags to differentiate those two.
So what's supposed to happen is following:
Process 0 waits for a task request
Process 1 sends a task request indicated by tag=0
Process 0 sends the task
Process 1 does the task and sends the results back to process 0 indicated by tag=1.
-- This is the part where the first problem occurs: Process 0 still receives tag=0
shown by the printf
Process 0 receives the task enters the else-if-block where tag==1 -- It does not.
Process 1 breaks the while loop - DEBUGGING PURPOSE
Procces 0 is supposed to be blocked by MPI_Probe but is not, instead it continues to
run and always shows that the received Tag is still 0
The code is still messy and inefficient. I just want a minimal working program to build upon and to optimize. But still any tip is appreciated!
The code:
if(rank == 0) {
struct stack* idle_stack = init_stack(env_size-1);
struct sudoku_stack* sudoku_stack_ptr = init_sudoku_stack(256);
push_sudoku(sudoku_stack_ptr, sudoku);
int it=0;
while(1) {
printf("ITERATION %d\n", it);
int idle_stack_size = stack_size(idle_stack);
int _sudoku_stack_empty = sudoku_stack_empty(sudoku_stack_ptr);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("TAG: %d\n", status.MPI_TAG);
// So this part is supposed to be entered once
if(status.MPI_TAG == 0) {
if(!sudoku_stack_empty(sudoku_stack_ptr)) {
printf("SENDING TASK\n");
int *next_sudoku = pop_sudoku(sudoku_stack_ptr);
MPI_Send(next_sudoku, v_size, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
} else {
// But since the Tag stays 0, it is called multiple times until
// a stack overflow occurs
printf("PUSHING TO IDLE STACK\n");
push(idle_stack, status.MPI_SOURCE);
}
} else if(status.MPI_TAG == 1) {
// This part should actually be entered by the second received message
printf("RECEIVING SOLUTION\n");
int count;
MPI_Get_count(&status, MPI_INT, &count);
int* recv_sudokus = (int*)malloc(count * sizeof(int));
MPI_Recv(recv_sudokus, count, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
for(int i = 0; i < count; i+=v_size) {
printf("%d ", recv_sudokus[i]);
if((i+1) % m_size == 0){
printf("\n");
}
}
// DEBUG - EXIT PROGRAM
teardown(sudoku_stack_ptr, idle_stack);
break;
push_sudoku(sudoku_stack_ptr, recv_sudokus);
} else if(status.MPI_TAG == 2) {
//int* solved_sudoku = (int*)malloc(v_size * sizeof(int));
//MPI_Recv(solved_sudoku, v_size, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
//TODO
}
it++;
}
} else {
int* sudoku = (int*)malloc(sizeof(int)*v_size);
int* possible_sudokus = (int*)malloc(sizeof(int)*m_size*v_size);
while(1) {
// Send task request
printf("REQUESTING TASK\n");
int i = 0;
MPI_Send(&i, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
// Wait for and receive task
printf("RECEIVING TASK\n");
MPI_Recv(sudoku, v_size, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("CALCULATING\n");
int index = 0;
for(int i = 1; i <= m_size; i++) {
int is_safe_res = is_safe_first_empty_cell(sudoku, i);
if(is_safe_res) {
int* sudoku_cp = (int*)malloc(sizeof(int)*v_size);
memcpy(sudoku_cp, sudoku, sizeof(int)*v_size);
insert_to_first_empty_cell(sudoku_cp, i);
memcpy(&possible_sudokus[index], sudoku_cp, sizeof(int)*v_size);
index+=v_size;
free(sudoku_cp);
}
}
printf("SENDING\n");
MPI_Send(possible_sudokus, index*v_size, MPI_INT, 0, 1, MPI_COMM_WORLD);
break;
}
}
I have the following code which works:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
int world_rank, world_size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int n = 10000;
int ni, i;
double t[n];
int x[n];
int buf[n];
int buf_size = n*sizeof(int);
MPI_Buffer_attach(buf, buf_size);
if (world_rank == 0) {
for (ni = 0; ni < n; ++ni) {
int msg_size = ni;
int msg[msg_size];
for (i = 0; i < msg_size; ++i) {
msg[i] = rand();
}
double time0 = MPI_Wtime();
MPI_Bsend(&msg, msg_size, MPI_INT, 1, 0, MPI_COMM_WORLD);
t[ni] = MPI_Wtime() - time0;
x[ni] = msg_size;
MPI_Barrier(MPI_COMM_WORLD);
printf("P0 sent msg with size %d\n", msg_size);
}
}
else if (world_rank == 1) {
for (ni = 0; ni < n; ++ni) {
int msg_size = ni;
int msg[msg_size];
MPI_Request request;
MPI_Barrier(MPI_COMM_WORLD);
MPI_Irecv(&msg, msg_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
printf("P1 received msg with size %d\n", msg_size);
}
}
MPI_Buffer_detach(&buf, &buf_size);
MPI_Finalize();
}
As soon as I remove the print statements, the program crashes, telling me there is a MPI_ERR_BUFFER: invalid buffer pointer. If I remove only one of the print statements the other print statements are still executed, so I believe it crashes at the end of the program. I don't see why it crashes and the fact that it does not crash when I am using the print statements goes beyond my logic...
Would anybody have a clue what is going on here?
You are simply not providing enough buffer space to MPI. In buffered mode, all ongoing messages are stored in the buffer space which is used as a ring buffer. In your code, there can be multiple messages that need to be buffered, regardless of the printf. Note that not even 2*n*sizeof(int) would be enough buffer space - the barriers do not provide a guarantee that the buffer is locally freed even though the corresponding receive is completed. You would have to provide (n*(n-1)/2)*sizeof(int) memory to be sure, or something in-between and hope.
Bottom line: Don't use buffered mode.
Generally, use standard blocking send calls and write the application such that it doesn't deadlock. Tune the MPI implementation such that small messages regardless of the receiver - to avoid wait times on late receivers.
If you want to overlap communication and computation, use nonblocking messages - providing proper memory for each communication.
I am using an example code from an MPI book [will give the name shortly].
What it does is the following:
a) It creates two communicators world = MPI_COMM_WORLD containing all the processes and worker which excludes the random number generator server (the last rank process).
b) So, the server generates random numbers and serves them to the workers on requests from the workers.
c) What the workers do is they count separately the number of samples falling inside and outside an unit circle inside an unit square.
d) After sufficient level of accuracy, the counts inside and outside are Allreduced to compute the value of PI as their ratio.
**The code compiles well. However, when running with the following command (actually with any value of n) **
>mpiexec -n 2 apple.exe 0.0001
I get the following errors:
Fatal error in MPI_Allreduce: Invalid communicator, error stack:
MPI_Allreduce(855): MPI_Allreduce(sbuf=000000000022EDCC, rbuf=000000000022EDDC,
count=1, MPI_INT, MPI_SUM, MPI_COMM_NULL) failed
MPI_Allreduce(780): Null communicator
pi = 0.00000000000000000000
job aborted:
rank: node: exit code[: error message]
0: PC: 1: process 0 exited without calling finalize
1: PC: 123
Edit: ((( Removed: But when I am removing any one of the two MPI_Allreduce() functions, it is running without any runtime errors, albeit with wrong answer.))
Code:
#include <mpi.h>
#include <mpe.h>
#include <stdlib.h>
#define CHUNKSIZE 1000
/* message tags */
#define REQUEST 1
#define REPLY 2
int main(int argc, char *argv[])
{
int iter;
int in, out, i, iters, max, ix, iy, ranks [1], done, temp;
double x, y, Pi, error, epsilon;
int numprocs, myid, server, totalin, totalout, workerid;
int rands[CHUNKSIZE], request;
MPI_Comm world, workers;
MPI_Group world_group, worker_group;
MPI_Status status;
MPI_Init(&argc,&argv);
world = MPI_COMM_WORLD;
MPI_Comm_size(world,&numprocs);
MPI_Comm_rank(world,&myid);
server = numprocs-1; /* last proc is server */
if(myid==0) sscanf(argv[1], "%lf", &epsilon);
MPI_Bcast(&epsilon, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Comm_group(world, &world_group);
ranks[0] = server;
MPI_Group_excl(world_group, 1, ranks, &worker_group);
MPI_Comm_create(world, worker_group, &workers);
MPI_Group_free(&worker_group);
if(myid==server) /* I am the rand server */
{
srand(time(NULL));
do
{
MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, REQUEST, world, &status);
if(request)
{
for(i=0; i<CHUNKSIZE;)
{
rands[i] = rand();
if(rands[i]<=INT_MAX) ++i;
}
MPI_Send(rands, CHUNKSIZE, MPI_INT,status.MPI_SOURCE, REPLY, world);
}
}
while(request>0);
}
else /* I am a worker process */
{
request = 1;
done = in = out = 0;
max = INT_MAX; /* max int, for normalization */
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
MPI_Comm_rank(workers, &workerid);
iter = 0;
while(!done)
{
++iter;
request = 1;
MPI_Recv(rands, CHUNKSIZE, MPI_INT, server, REPLY, world, &status);
for(i=0; i<CHUNKSIZE;)
{
x = (((double) rands[i++])/max)*2-1;
y = (((double) rands[i++])/max)*2-1;
if(x*x+y*y<1.0) ++in;
else ++out;
}
/* ** see error here ** */
MPI_Allreduce(&in, &totalin, 1, MPI_INT, MPI_SUM, workers);
MPI_Allreduce(&out, &totalout, 1, MPI_INT, MPI_SUM, workers);
/* only one of the above two MPI_Allreduce() functions working */
Pi = (4.0*totalin)/(totalin+totalout);
error = fabs( Pi-3.141592653589793238462643);
done = (error<epsilon||(totalin+totalout)>1000000);
request = (done)?0:1;
if(myid==0)
{
printf("\rpi = %23.20f", Pi);
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
}
else
{
if(request)
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
}
MPI_Comm_free(&workers);
}
}
if(myid==0)
{
printf("\npoints: %d\nin: %d, out: %d, <ret> to exit\n", totalin+totalout, totalin, totalout);
getchar();
}
MPI_Finalize();
}
What is the error here? Am I missing something? Any help or pointer will be highly appreciated.
You are freeing the workers communicator before you are done using it. Move the MPI_Comm_free(&workers) call after the while(!done) { ... } loop.
I've got many slave nodes which might or might not send messages to the master node. So currently there's no way the master node knows how many MPI_Recv to expect. Slave nodes had to sent minimum number of messages to the master node for efficiency reasons.
I managed to find a cool trick, which sends an additional "done" message when its no longer expecting any messages. Unfortunately, it doesn't seem to work in my case, where there're variable number of senders. Any idea on how to go about this? Thanks!
if(rank == 0){ //MASTER NODE
while (1) {
MPI_Recv(&buffer, 10, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == DONE) break;
/* Do stuff */
}
}else{ //MANY SLAVE NODES
if(some conditions){
MPI_Send(&buffer, 64, MPI_INT, root, 1, MPI_COMM_WORLD);
}
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Send(NULL, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);
Not working, the program seem to be still waiting for a MPI_Recv
A simpler and more elegant option would be to use the MPI_IBARRIER. Have each worker call all of the sends that it needs to and then call MPI_IBARRIER when it's done. On the master, you can loop on both an MPI_IRECV on MPI_ANY_SOURCE and an MPI_IBARRIER. When the MPI_IBARRIER is done, you know that everyone has finished and you can cancel the MPI_IRECV and move on. The pseudocode would look something like this:
if (master) {
/* Start the barrier. Each process will join when it's done. */
MPI_Ibarrier(MPI_COMM_WORLD, &requests[0]);
do {
/* Do the work */
MPI_Irecv(..., MPI_ANY_SOURCE, &requests[1]);
/* If the index that finished is 1, we received a message.
* Otherwise, we finished the barrier and we're done. */
MPI_Waitany(2, requests, &index, MPI_STATUSES_IGNORE);
} while (index == 1);
/* If we're done, we should cancel the receive request and move on. */
MPI_Cancel(&requests[1]);
} else {
/* Keep sending work back to the master until we're done. */
while( ...work is to be done... ) {
MPI_Send(...);
}
/* When we finish, join the Ibarrier. Note that
* you can't use an MPI_Barrier here because it
* has to match with the MPI_Ibarrier above. */
MPI_Ibarrier(MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
}
1- you called MPI_Barrier in wrong place, it should be called after MPI_Send.
2- the root will exit from loop when it receives DONE from all other ranks (size -1).
the code after some modifications:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char** argv)
{
MPI_Init(NULL, NULL);
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Status status;
int DONE = 888;
int buffer = 77;
int root = 0 ;
printf("here is rank %d with size=%d\n" , rank , size);fflush(stdout);
int num_of_DONE = 0 ;
if(rank == 0){ //MASTER NODE
while (1) {
MPI_Recv(&buffer, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("root recev %d from %d with tag = %d\n" , buffer , status.MPI_SOURCE , status.MPI_TAG );fflush(stdout);
if (status.MPI_TAG == DONE)
num_of_DONE++;
printf("num_of_DONE=%d\n" , num_of_DONE);fflush(stdout);
if(num_of_DONE == size -1)
break;
/* Do stuff */
}
}else{ //MANY SLAVE NODES
if(1){
buffer = 66;
MPI_Send(&buffer, 1, MPI_INT, root, 1, MPI_COMM_WORLD);
printf("rank %d sent data.\n" , rank);fflush(stdout);
}
}
if(rank != 0)
{
buffer = 55;
MPI_Send(&buffer, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
printf("rank %d done.\n" , rank);fflush(stdout);
MPI_Finalize();
return 0;
}
output:
hosam#hosamPPc:~/Desktop$ mpicc -o aa aa.c
hosam#hosamPPc:~/Desktop$ mpirun -n 3 ./aa
here is rank 2 with size=3
here is rank 0 with size=3
rank 2 sent data.
here is rank 1 with size=3
rank 1 sent data.
root recev 66 from 1 with tag = 1
num_of_DONE=0
root recev 66 from 2 with tag = 1
num_of_DONE=0
root recev 55 from 2 with tag = 888
num_of_DONE=1
root recev 55 from 1 with tag = 888
num_of_DONE=2
rank 0 done.
rank 1 done.
rank 2 done.
Ok, so the aim of the game here is for each one of 64 processors (representing an 8x8 grid) to generate a random number (between 0 and 1), and give process zero a string representing the complete situation. For example grid:
[0,1,0,1]
[1,1,1,1]
[0,0,0,0]
would ultimately get have string '0101111000' for a 4x3.
Each process can only communicate with the ones above and to their left.
To do this, I have each process receive a string of all numbers on its right (if it's not on the far right), add its number to the front of the string and send it to the left.
If the process is on the far left, it also receives a string from the process below it (not including bottom left, rank 56), the description of the state of all nodes below that rank .It joins its own value, the left and bottom strings, and sends it up.
All far left nodes begin their row's string.
My attempted code is below:
#include <stdio.h>
#include "mpi.h"
#include <string.h>
#include <stdlib.h>
int farLeft(int rank){// edit
if (rank%8==0){
return 1;
}
return 0;
}
int farRight(int rank){// edit
if (rank%8==7){
return 1;
}
return 0;
}
int main(argc, argv)
int argc;
char **argv;
{
char inputList[100],myWhisp[100],snum[256];
int rank,value;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
srand(rank);
value = rand() % 2;
sprintf(snum, "%d", value);
strcpy(myWhisp,snum);
if (farLeft(rank)){
MPI_Recv(inputList, strlen(inputList)+1, MPI_CHAR, rank+1, 0, MPI_COMM_WORLD, &status);
strcat(snum,inputList);
strcpy(myWhisp,snum);
if (rank !=56){
MPI_Recv(inputList, strlen(inputList)+1, MPI_CHAR, rank+8, 0, MPI_COMM_WORLD, &status);//rank48 crashes here
strcat(myWhisp,inputList);
}
strcpy(inputList,myWhisp);
if(rank==0){
printf("%s\n",inputList);
}
else{
MPI_Send(inputList, strlen(inputList)+1, MPI_CHAR, rank-8, 0, MPI_COMM_WORLD);
}
}
else if (farRight(rank)){
strcpy(inputList,myWhisp);
MPI_Send(inputList, strlen(inputList)+1, MPI_CHAR, rank-1, 0, MPI_COMM_WORLD);
}
else{
MPI_Recv(inputList, strlen(inputList)+1, MPI_CHAR, rank+1, 0, MPI_COMM_WORLD, &status);
strcat(snum,inputList);
strcpy(inputList,snum);
MPI_Send(inputList, strlen(inputList)+1, MPI_CHAR, rank-1, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I'm getting a truncation error with rank 48, the second last rank in the far left. This happens on the receive function below if(rank != 56). So there's something wrong with the way I send/ receive inputString I guess...
Thanks very much.
You're passing the count parameter of MPI_Recv as strlen(inputList)+1, but inputList was never initialised. You probably want sizeof(inputList) here.