MPI_Scatter() followed by malloc results in a segfault - c

I'm programming using MPI and C and I'm using the root rank to read data from a file then distribute it to the remaining ranks. My MPI_Scatter works fine and I print out the values to make sure they're correct (and they are). My problem is that after allocating structures, I seg fault when trying to access them from other ranks than the root rank.
pr_graph * graph = malloc(sizeof(*graph));
....
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(graph->nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(graph->nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
// graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
And my output is:
2 4
2 4
2 4
Which is correct. But when I uncomment the commented line, I get:
2 4
2 4
[phi01:07170] *** Process received signal ***
[phi01:07170] Signal: Segmentation fault (11)
[phi01:07170] Signal code: (128)
[phi01:07170] Failing at address: (nil)
[phi01:07170] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5740503390]
[phi01:07170] [ 1] ./pagerank[0x401188]
[phi01:07170] [ 2] ./pagerank[0x400c73]
[phi01:07170] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5740149830]
[phi01:07170] [ 4] ./pagerank[0x400ce9]
[phi01:07170] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 7170 on node phi01 exited on signal 11 (Segmentation fault).
Which means that only rank 0 could access the structure it allocated. Can anyone point me as to why? Thank you!
EDIT:
Plugging in any values for the two recvbuffers does NOT segfault AND prints out the correct values. It seems that the error is rooted in using MPI_Scatter().
graph->nvtxs = 2;
graph->nedges = 4;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}

I found a solution to the problem. I'll post it first then try to understand why it works.
pr_int * nvtxs = malloc(sizeof(pr_int));
pr_int * nedges = malloc(sizeof(pr_int));
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
graph->nvtxs = nvtxs;
graph->nedges = nedges;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
I think I wasn't using actual buffers (pointers) for receiving, just regular variables. They might have been converted to pointers (address values) during the call to malloc and that's why the size of the structure might have been crazy. I'm still not sure why I was able to print the values, however, or even how rank 0 worked with no problems. Any ideas would be greatly appreciated! Thank you!

Related

How to send the last element array of each processor in MPI

I am struggled to write the code to perform like the following example similar to the Up Phase part in prefix scan and not want to use the function MPI_Scan:
WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7]
Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15]
To send and sum the last array with 2 strides:
(stride 1)
Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]
Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3]
(stride 2)
Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]
At last I would want to use MPI_Gather to let the result be:
WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]
I find it hard to write code to let the program do like the following 4nodes example:
(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2
(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1
Here is the code that I have written so far:
int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{
int key = 1;
int temp = my_input[size_per_process-1];
while(key <= size/2)
{
if((rank+1) % key == 0)
{
if(rank/key % 2 == 0)
{
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
}
else
{
MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
my_input[size_per_process]+= temp;
}
key = 2 * key;
MPI_Barrier(MPI_COMM_WORLD);
}
}
return (*my_input);
}
There is some issues in your code, namely 1) it always send the same temp variable across processes.
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
The temp variable is initialized before the loop:
int temp = my_input[size_per_process-1];
while(key <= size/2)
{ ...}
but never updated inside the loop. This leads to wrong results, since after the first stride the last element of the my_input array will be different for some processes. Instead you should do:
temp = localdata[size_per_process-1];
MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
Moreover, 2) the following statement
my_input[size_per_process]+= temp;
does not add temp to the last position of the array my_input. Instead, it should be:
my_input[size_per_process-1]+= temp;
Finally, 3) there is deadlock and infinite loop issues. For starters having a call to a collective communication routine such as MPI_barrier inside a single conditional is typically a big red flag. Instead of:
while(key <= size/2)
{
if((rank+1) % key == 0){
...
MPI_Barrier(MPI_COMM_WORLD);
}
}
you should have:
while(key <= size/2)
{
if((rank+1) % key == 0){
...
}
MPI_Barrier(MPI_COMM_WORLD);
}
to ensure that every process calls the MPI_Barrier.
The infinite loop happens because the while condition depends on the update of the key, but the key is only update when if((rank+1) % key == 0) evaluates to true. Therefore, when if((rank+1) % key == 0) evaluates to false, the process will never update the key, and consequently, gets stuck in an infinite loop.
I running example with all the problems fixed :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv){
int rank, mpisize, total_size = 16;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &mpisize);
int *data = NULL;
if(rank == 0){
data = malloc(total_size * sizeof(int));
for(int i = 0; i < total_size; i++)
data[i] = i;
}
int size_per_process = total_size / mpisize;
int *localdata = malloc(size_per_process * sizeof(int));
MPI_Scatter(data, size_per_process, MPI_INT, localdata, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);
int key = 1;
int temp = 0;
while(key <= mpisize/2){
if((rank+1) % key == 0){
if(rank/key % 2 == 0){
temp = localdata[size_per_process-1];
MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
}
else {
MPI_Recv(&temp, 1, MPI_INT, rank-key, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
localdata[size_per_process-1]+= temp;
}
}
key = 2 * key;
MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Gather(localdata, size_per_process, MPI_INT, data, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);
if(rank == 0){
for(int i = 0; i < total_size; i++)
printf("%d ", data[i]);
printf("\n");
}
free(data);
free(localdata);
MPI_Finalize();
return 0;
}
Input:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Output:
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]

How to Send/Receive in MPI using all processors

This program written using C Lagrange and MPI. I am new to MPI and want to use all processors to do some calculations, including process 0. To learn this concept, I have written the following simple program. But this program hangs at the bottom after receiving input from the process 0 and won't send results back to the process 0.
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
int result;
if (world_rank == 0)
{
number = -2;
int i;
for(i = 0; i < 4; i++)
{
MPI_Send(&number, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
for(i = 0; i < 4; i++)
{ /*Error: can't get result send by other processos bellow*/
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
}
/*I want to do this without using an else statement here, so that I can use process 0 to do some calculations as well*/
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("*Process %d received number %d from process 0\n",world_rank, number);
result = world_rank + 1;
MPI_Send(&result, 1, MPI_INT, 0, 99, MPI_COMM_WORLD); /* problem happens here when trying to send result back to process 0*/
MPI_Finalize();
}
Runing and getting results:
:$ mpicc test.c -o test
:$ mpirun -np 4 test
*Process 1 received number -2 from process 0
*Process 2 received number -2 from process 0
*Process 3 received number -2 from process 0
/* hangs here and will not continue */
If you can, please show me with an example or edit the above code if possible.
I don't really get what would be wrong with using 2 if statements, surrounding the working domain. But anyway, here is an example of what could be done.
I modified your code to use collective communications as they make much more sense than the series of send/receive you used. Since the initial communications are with a uniform value, I use a MPI_Bcast() which does the same in one single call.
Conversely, since the result values are all different, a call to MPI_Gather() is perfectly appropriate.
I also introduce a call to sleep() just to simulate that the processes are working for a while, prior to sending back their results.
The code now looks like this:
#include <mpi.h>
#include <stdlib.h> // for malloc and free
#include <stdio.h> // for printf
#include <unistd.h> // for sleep
int main( int argc, char *argv[] ) {
MPI_Init( &argc, &argv );
int world_rank;
MPI_Comm_rank( MPI_COMM_WORLD, &world_rank );
int world_size;
MPI_Comm_size( MPI_COMM_WORLD, &world_size );
// sending the same number to all processes via broadcast from process 0
int number = world_rank == 0 ? -2 : 0;
MPI_Bcast( &number, 1, MPI_INT, 0, MPI_COMM_WORLD );
printf( "Process %d received %d from process 0\n", world_rank, number );
// Do something usefull here
sleep( 1 );
int my_result = world_rank + 1;
// Now collecting individual results on process 0
int *results = world_rank == 0 ? malloc( world_size * sizeof( int ) ) : NULL;
MPI_Gather( &my_result, 1, MPI_INT, results, 1, MPI_INT, 0, MPI_COMM_WORLD );
// Process 0 prints what it collected
if ( world_rank == 0 ) {
for ( int i = 0; i < world_size; i++ ) {
printf( "Process 0 received result %d from process %d\n", results[i], i );
}
free( results );
}
MPI_Finalize();
return 0;
}
After compiling it as follows:
$ mpicc -std=c99 simple_mpi.c -o simple_mpi
It runs and gives this:
$ mpiexec -n 4 ./simple_mpi
Process 0 received -2 from process 0
Process 1 received -2 from process 0
Process 3 received -2 from process 0
Process 2 received -2 from process 0
Process 0 received result 1 from process 0
Process 0 received result 2 from process 1
Process 0 received result 3 from process 2
Process 0 received result 4 from process 3
Actually, processes 1-3 are indeed sending the result back to processor 0. However, processor 0 is stuck in the first iteration of this loop:
for(i=0; i<4; i++)
{
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
In the first MPI_Recv call, processor 0 will block waiting to receive a message from itself with tag 99, a message that 0 did not send yet.
Generally, it is a bad idea for a processor to send/receive messages to itself, especially using blocking calls. 0 already have the value in memory. It does not need to send it to itself.
However, a workaround is to start the receive loop from i=1
for(i=1; i<4; i++)
{
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
Running the code now will give you:
Process 1 received number -2 from process 0
Process 2 received number -2 from process 0
Process 3 received number -2 from process 0
Process 0 received number 2 from i:1
Process 0 received number 3 from i:2
Process 0 received number 4 from i:3
Process 0 received number -2 from process 0
Note that using MPI_Bcast and MPI_Gather as mentioned by Gilles is a much more efficient and standard way for data distribution/collection.

Matrix multiplication error with Open MPI

Im trying to compute a NxN matrix multiplication using the OpenMPI and C. Everything runs as expected, except for the MPI_Bcast(). As far as I understand, the MASTER must broadcast matrix_2 to the rest of the WORKER processes. At the same time, when WORKERS reach the MPI_Bcast() they should wait there until the selected process (in this case the MASTER) does the broadcast.
The error I'm getting is a Segmentation fault and Address not mapped, so it surely has something to do with the dynamic allocation of the matrices. What I do is send parts of matrix_1 to each process, and each one of them then does partial multiplications and additions with the previously broadcast matrix_2.
I know that the error must be on the MPI_Bcast() because when I comment it the program finishes correctly (but obviously without computing the product). There must be something I'm not being aware of. I leave both the code and the error message I got. Thanks in advanced.
CODE
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/* MACROS */
#define MASTER_TO_SLAVE_TAG 1
#define SLAVE_TO_MASTER_TAG 4
#define MASTER 0
#define WORKER 1
int *matrix_1;
int *matrix_2;
int *result;
double start_time;
double end_time;
int procID;
int numProc;
int size, numRows, from, to;
int i,j,k;
MPI_Status status;
MPI_Request request;
void addressMatrixMemory(int);
int main(int argc, char *argv[]){
size = atoi(argv[1]);
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &procID);
MPI_Comm_size(MPI_COMM_WORLD, &numProc);
addressMatrixMemory(size);
/* MASTER starts. */
if(procID == MASTER){
start_time = MPI_Wtime();
for(i = 1; i < numProc; i++){
numRows = size/(numProc - 1);
from = (i - 1) * numRows;
if(((i + 1) == numProc) && ((size % (numProc - 1))) != 0){
to = size;
} else {
to = from + numRows;
}
MPI_Isend(&from, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &request);
MPI_Isend(&to, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &request);
MPI_Isend(matrix_1, (to - from) * size, MPI_INT, i, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &request);
}
}
MPI_Bcast(&matrix_2, size * size, MPI_INT, MASTER, MPI_COMM_WORLD);
/* WORKERS task */
if(procID >= WORKER){
int row, col;
int *matrix = malloc(sizeof(matrix_1[0])*size*size);
MPI_Recv(&from, 1, MPI_INT, MASTER, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
MPI_Recv(&to, 1, MPI_INT, MASTER, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
MPI_Recv(matrix, (to - from) * size, MPI_INT, MASTER, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
for(row = from; row < to; row++){
for(col = 0; col < size; col++){
result[row * size + col] = 0;
for(k = 0; k < size; k++);
result[row * size + col] += matrix[row * size + k] * matrix_2[k * size + col];
}
}
MPI_Isend(&from, 1, MPI_INT, MASTER, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &request);
MPI_Isend(&to, 1, MPI_INT, MASTER, SLAVE_TO_MASTER_TAG + 1, MPI_COMM_WORLD, &request);
MPI_Isend(&result[from], (to - from) * size, MPI_INT, MASTER, SLAVE_TO_MASTER_TAG + 2, MPI_COMM_WORLD, &request);
}
/* MASTER gathers WORKERS job. */
if(procID == MASTER){
for(i = 1; i < numProc; i++){
MPI_Recv(&from, 1, MPI_INT, i, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &status);
MPI_Recv(&to, 1, MPI_INT, i, SLAVE_TO_MASTER_TAG + 1, MPI_COMM_WORLD, &status);
MPI_Recv(&result[from], (to - from) * size, MPI_INT, i, SLAVE_TO_MASTER_TAG + 2, MPI_COMM_WORLD, &status);
}
end_time = MPI_Wtime();
printf("\nRunning Time = %f\n\n", end_time - start_time);
}
MPI_Finalize();
free(matrix_1);
free(matrix_2);
free(result);
return EXIT_SUCCESS;
}
void addressMatrixMemory(int n){
matrix_1 = malloc(sizeof(matrix_1[0])*n*n);
matrix_2 = malloc(sizeof(matrix_2[0])*n*n);
result = malloc(sizeof(result[0])*n*n);
/* Matrix init with values between 1 y 100. */
srand(time(NULL));
int r = rand() % 100 + 1;
int i;
for(i = 0; i < n*n; i++){
matrix_1[i] = r;
r = rand() % 100 + 1;
matrix_2[i] = r;
r = rand() % 100 + 1;
}
}
ERROR MESSAGE
[tuliansPC:28270] *** Process received signal ***
[tuliansPC:28270] Signal: Segmentation fault (11)
[tuliansPC:28270] Signal code: Address not mapped (1)
[tuliansPC:28270] Failing at address: 0x603680
[tuliansPC:28270] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f0a98ce0340]
[tuliansPC:28270] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x97ffe) [0x7f0a9899fffe]
[tuliansPC:28270] [ 2] /usr/lib/libmpi.so.1(opal_convertor_pack+0x129) [0x7f0a98fef779]
[tuliansPC:28270] [ 3] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_prepare_src+0x1fd) [0x7f0a923c385d]
[tuliansPC:28270] [ 4] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_rndv+0x1dc) [0x7f0a93245c9c]
[tuliansPC:28270] [ 5] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x8ec) [0x7f0a9323856c]
[tuliansPC:28270] [ 6] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x3fc) [0x7f0a914f49fc]
[tuliansPC:28270] [ 7] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_pipeline+0xbc) [0x7f0a914f4d5c]
[tuliansPC:28270] [ 8] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x134) [0x7f0a914ec7a4]
[tuliansPC:28270] [ 9] /usr/lib/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x64) [0x7f0a917096a4]
[tuliansPC:28270] [10] /usr/lib/libmpi.so.1(MPI_Bcast+0x13d) [0x7f0a98f5678d]
[tuliansPC:28270] [11] ej5Exec() [0x400e8c]
[tuliansPC:28270] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0a98929ec5]
[tuliansPC:28270] [13] ej5Exec() [0x400ac9]
[tuliansPC:28270] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 28270 on node tuliansPC exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Let's start with the first problem that jumps out. You're using non-blocking communication incorrectly. MPI_Isend is a non-blocking send function which means that when you call MPI_Isend, all you are really doing is telling MPI about a message that you'd like to send at some point in the future. It may get sent right then, it may not. In order to guarantee that the data is actually sent, you need to complete the call with something like MPI_Wait.Usually when people use non-blocking calls (MPI_Isend), they don't mix them with blocking calls (MPI_Recv). If you use all non-blocking calls, then you can have all of them complete with a single function, MPI_Waitall.
Try fixing these issues first and see if that solves your problem. Just because you commented out the collective, doesn't mean that the other issues weren't there. MPI programs can be notoriously difficult to debug because of weird behavior like this.

Error in rank ID using MPI_Reduce

Well, I'm doing some homework using MPI + C. In fact, i just wrote a small code of programing assignment 3.2 from Peter Pacheco's book, called An Introduction to Parallel Programming . The code seems to work for 3 or 5 processes... but when I try more than 6 processes, the program breaks.
I'm using a very "bad" debugging approach, which is to put some printfs to trace where problems are occuring. Using this "method" i discover that after MPI_Reduce, some strange behaviour occurs and my program gets confused about the ranks IDs, specifically rank 0 disappears and one very large (and erroneous) rank appear.
My code is below, and after it, i'm posting the output for 3 and 9 processes... I'm running with
mpiexec -n X ./name_of_program
where X is the number of processes.
My code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(void)
{
MPI_Init(NULL,NULL);
long long int local_toss=0, local_num_tosses=-1, local_tosses_in_circle=0, global_tosses_in_circle=0;
double local_x=0.0,local_y=0.0,pi_estimate=0.0;
int comm_sz, my_rank;
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (my_rank == 0) {
printf("\nEnter the number of dart tosses: ");
fflush(stdout);
scanf("%lld",&local_num_tosses);
fflush(stdout);
}
//
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast( &local_num_tosses, 1, MPI_LONG_LONG_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
srand( rand() ); //tried to improve randomness here!
for (local_toss=0;local_toss<local_num_tosses;local_toss++) {
local_x = (-1) + (double)rand() / (RAND_MAX / 2);
local_y = (-1) + (double)rand() / (RAND_MAX / 2);
if ( (local_x*local_x + local_y*local_y) <= 1 ) {local_tosses_in_circle++;}
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Reduce
(
&local_tosses_in_circle,
&global_tosses_in_circle,
comm_sz,
MPI_LONG_LONG_INT,
MPI_SUM,
0,
MPI_COMM_WORLD
);
printf("\n\nDEBUG: myrank = %d, comm_size = %d",my_rank,comm_sz);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
if (my_rank == 0) {
pi_estimate = ( (double)(4*global_tosses_in_circle) )/( (double) comm_sz*local_num_tosses );
printf("\nPi estimate = %1.5lf \n",pi_estimate);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
Now, 2 outputs:
(i) For 3 processes:
Enter the number of dart tosses: 1000000
DEBUG: myrank = 0, comm_size = 3
DEBUG: myrank = 1, comm_size = 3
DEBUG: myrank = 2, comm_size = 3
Pi estimate = 3.14296
(ii) For 9 processes: (note that the \n output is strange, sometimes the it does not work)
Enter the number of dart tosses: 10000000
DEBUG: myrank = 1, comm_size = 9
DEBUG: myrank = 7, comm_size = 9
DEBUG: myrank = 3, comm_size = 9
DEBUG: myrank = 2, comm_size = 9DEBUG: myrank = 5, comm_size = 9
DEBUG: myrank = 8, comm_size = 9
DEBUG: myrank = 6, comm_size = 9
DEBUG: myrank = 4, comm_size = 9DEBUG: myrank = -3532887, comm_size = 141598939[PC:06511] *** Process received signal ***
[PC:06511] Signal: Segmentation fault (11)
[PC:06511] Signal code: (128)
[PC:06511] Failing at address: (nil)
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 6511 on node PC exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
It works for me when the third argument of MPI_Reduce is 1, not comm_size (because the number of elements in each buffer is 1):
MPI_Reduce
(
&local_tosses_in_circle,
&global_tosses_in_circle,
1, //instead of comm_size
MPI_LONG_LONG_INT,
MPI_SUM,
0,
MPI_COMM_WORLD
);
When you increase the number of processes, MPI_Reduce overwrites other stuff in the stack of the function, e.g. my_rank and comm_sz, and corrupts the data.
Also, I don't think you need any of the MPI_Barrier statements. MPI_Reduce and MPI_Bcast are blocking anyway.
I wouldn't worry about the newlines. They are not missing, but in some other place in the output, probably because many processes write to stdout at the same time.
By the way: Debugging using printf's is very common.

Very strange MPI behavior

I wrote a program on MPI where it would go around each processor in a sort of ring fashion x amount of times (for example if I wanted it to go twice around the "ring" of four processors it would go to 0, 1, 2, 3, 0 ,1....3).
Everything compiled fine but when I ran the program on my Ubuntu VM it would never output anything. It wouldn't even run the first output. Can anyone explain what's going on?
This is my code:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv){
int rank, size, tag, next, from, num;
tag = 201;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
next = (rank + 1)/ size;
from = (rank - 1)/size;
if (rank == 0){
printf("How many times around the ring? :: ");
scanf ("%d", &num);
MPI_Send(&num, 1, MPI_INT, 1, tag, MPI_COMM_WORLD);
}
do{
MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
printf("Process %d received %d from process %d\n", rank, num, status.MPI_SOURCE);
if (rank == 0){
num--;
printf("Process 0 has decremented the number\n");
}
printf("Process %d sending %d to process %d\n", rank, num ,next);
MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
}while (num > 0);
printf("Process %d has exited", rank);
if (rank == 0){
MPI_Recv(&num, 1, MPI_INT, size - 1, tag, MPI_COMM_WORLD, &status);
printf("Process 0 has received the last round, exiting");
}
MPI_Finalize();
return 0;
}
There's a problem with your neighbour assignment. If we insert the following line after the next/from calculation
printf("Rank %d: from = %d, next = %d\n", rank, from, next);
we get:
$ mpirun -np 4 ./ring
Rank 0: from = 0, next = 0
Rank 1: from = 0, next = 0
Rank 2: from = 0, next = 0
Rank 3: from = 0, next = 1
You want something more like
next = (rank + 1) % size;
from = (rank - 1 + size) % size;
which gives
$ mpirun -np 4 ./ring
Rank 0: from = 3, next = 1
Rank 1: from = 0, next = 2
Rank 2: from = 1, next = 3
Rank 3: from = 2, next = 0
and after that your code seems to work.
Whether your code is good or not, your first printf should be output.
If you have no messages printed at all, even the printf in the "if(rank==)" block, then it could be a problem with your VM. Are you sure you have any network interface activated on that VM ?
If the answer is yes, it might be useful to check its compatibility with MPI by checking the OpenMPI FAQ over tcp questions. Sections 7 (How do I tell Open MPI which TCP networks to use?) and 13 (Does Open MPI support virtual IP interfaces?) seems both interesting for any possible problems with running MPI in a Virtual Machine.

Resources