Well, I'm doing some homework using MPI + C. In fact, i just wrote a small code of programing assignment 3.2 from Peter Pacheco's book, called An Introduction to Parallel Programming . The code seems to work for 3 or 5 processes... but when I try more than 6 processes, the program breaks.
I'm using a very "bad" debugging approach, which is to put some printfs to trace where problems are occuring. Using this "method" i discover that after MPI_Reduce, some strange behaviour occurs and my program gets confused about the ranks IDs, specifically rank 0 disappears and one very large (and erroneous) rank appear.
My code is below, and after it, i'm posting the output for 3 and 9 processes... I'm running with
mpiexec -n X ./name_of_program
where X is the number of processes.
My code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(void)
{
MPI_Init(NULL,NULL);
long long int local_toss=0, local_num_tosses=-1, local_tosses_in_circle=0, global_tosses_in_circle=0;
double local_x=0.0,local_y=0.0,pi_estimate=0.0;
int comm_sz, my_rank;
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (my_rank == 0) {
printf("\nEnter the number of dart tosses: ");
fflush(stdout);
scanf("%lld",&local_num_tosses);
fflush(stdout);
}
//
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast( &local_num_tosses, 1, MPI_LONG_LONG_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
srand( rand() ); //tried to improve randomness here!
for (local_toss=0;local_toss<local_num_tosses;local_toss++) {
local_x = (-1) + (double)rand() / (RAND_MAX / 2);
local_y = (-1) + (double)rand() / (RAND_MAX / 2);
if ( (local_x*local_x + local_y*local_y) <= 1 ) {local_tosses_in_circle++;}
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Reduce
(
&local_tosses_in_circle,
&global_tosses_in_circle,
comm_sz,
MPI_LONG_LONG_INT,
MPI_SUM,
0,
MPI_COMM_WORLD
);
printf("\n\nDEBUG: myrank = %d, comm_size = %d",my_rank,comm_sz);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
if (my_rank == 0) {
pi_estimate = ( (double)(4*global_tosses_in_circle) )/( (double) comm_sz*local_num_tosses );
printf("\nPi estimate = %1.5lf \n",pi_estimate);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
Now, 2 outputs:
(i) For 3 processes:
Enter the number of dart tosses: 1000000
DEBUG: myrank = 0, comm_size = 3
DEBUG: myrank = 1, comm_size = 3
DEBUG: myrank = 2, comm_size = 3
Pi estimate = 3.14296
(ii) For 9 processes: (note that the \n output is strange, sometimes the it does not work)
Enter the number of dart tosses: 10000000
DEBUG: myrank = 1, comm_size = 9
DEBUG: myrank = 7, comm_size = 9
DEBUG: myrank = 3, comm_size = 9
DEBUG: myrank = 2, comm_size = 9DEBUG: myrank = 5, comm_size = 9
DEBUG: myrank = 8, comm_size = 9
DEBUG: myrank = 6, comm_size = 9
DEBUG: myrank = 4, comm_size = 9DEBUG: myrank = -3532887, comm_size = 141598939[PC:06511] *** Process received signal ***
[PC:06511] Signal: Segmentation fault (11)
[PC:06511] Signal code: (128)
[PC:06511] Failing at address: (nil)
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 6511 on node PC exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
It works for me when the third argument of MPI_Reduce is 1, not comm_size (because the number of elements in each buffer is 1):
MPI_Reduce
(
&local_tosses_in_circle,
&global_tosses_in_circle,
1, //instead of comm_size
MPI_LONG_LONG_INT,
MPI_SUM,
0,
MPI_COMM_WORLD
);
When you increase the number of processes, MPI_Reduce overwrites other stuff in the stack of the function, e.g. my_rank and comm_sz, and corrupts the data.
Also, I don't think you need any of the MPI_Barrier statements. MPI_Reduce and MPI_Bcast are blocking anyway.
I wouldn't worry about the newlines. They are not missing, but in some other place in the output, probably because many processes write to stdout at the same time.
By the way: Debugging using printf's is very common.
Related
#include "mpi.h"
#include <stdio.h>
int main(int argc,char *argv[]){
int numtasks, rank, rc, count, tag=1, i =0;
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) //for process 0 we print received messages
{
for(i=0; i< 9; i ++){
printf("value of i is: %d\n",i );
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &Stat);
printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG);
}
}
else //for the other 9 processes
{
if(rank % 2 == 0){ //if rank is an even number
rc = MPI_Send(&outmsg, 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD); //send message to process with rank 0
}
}
MPI_Finalize();
}
//
This program is run with 10 processes. Process with rank 0 receives messages and prints them out if the source process has an even numbered rank. Processes with rank other than 0 send process with rank 0 a message containing a character 'x'
Now, in regards to rank 0, it has a for loop that basically loops 9 times. In the loop it prints out the value of the iterating variable i, and the received character and source process.
However, when I run my program it does not terminate.
The output looks like this:
Task 0: Received 0 char(s) from task 2 with tag 1
value of i is: 1
Task 0: Received 0 char(s) from task 6 with tag 1
value of i is: 2
Task 0: Received 0 char(s) from task 4 with tag 1
value of i is: 3
Task 0: Received 0 char(s) from task 8 with tag 1
value of i is: 4
How do I get it to print the other values of i such as 5,6,7,8,9?
You're using a master-slave architecture for parallel processing, your process 0 is the master and is waiting for the input of 9 other process, but in your code only process with even id number will fire an output, namely process 2, 4, 6, 8.
you didn't put behavior for process 1,3,5,7 and 9 so the master is still waiting for them hence, the program waiting for parallel process to finish:
you need to complete your source code here
if(rank % 2 == 0){ //if rank is an even number
rc = MPI_Send(&outmsg, 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD); //send message to process with rank 0
}else{
//logic for process 1,3,5,7,9
}
I'm programming using MPI and C and I'm using the root rank to read data from a file then distribute it to the remaining ranks. My MPI_Scatter works fine and I print out the values to make sure they're correct (and they are). My problem is that after allocating structures, I seg fault when trying to access them from other ranks than the root rank.
pr_graph * graph = malloc(sizeof(*graph));
....
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(graph->nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(graph->nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
// graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
And my output is:
2 4
2 4
2 4
Which is correct. But when I uncomment the commented line, I get:
2 4
2 4
[phi01:07170] *** Process received signal ***
[phi01:07170] Signal: Segmentation fault (11)
[phi01:07170] Signal code: (128)
[phi01:07170] Failing at address: (nil)
[phi01:07170] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5740503390]
[phi01:07170] [ 1] ./pagerank[0x401188]
[phi01:07170] [ 2] ./pagerank[0x400c73]
[phi01:07170] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5740149830]
[phi01:07170] [ 4] ./pagerank[0x400ce9]
[phi01:07170] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 7170 on node phi01 exited on signal 11 (Segmentation fault).
Which means that only rank 0 could access the structure it allocated. Can anyone point me as to why? Thank you!
EDIT:
Plugging in any values for the two recvbuffers does NOT segfault AND prints out the correct values. It seems that the error is rooted in using MPI_Scatter().
graph->nvtxs = 2;
graph->nedges = 4;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
I found a solution to the problem. I'll post it first then try to understand why it works.
pr_int * nvtxs = malloc(sizeof(pr_int));
pr_int * nedges = malloc(sizeof(pr_int));
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
graph->nvtxs = nvtxs;
graph->nedges = nedges;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
I think I wasn't using actual buffers (pointers) for receiving, just regular variables. They might have been converted to pointers (address values) during the call to malloc and that's why the size of the structure might have been crazy. I'm still not sure why I was able to print the values, however, or even how rank 0 worked with no problems. Any ideas would be greatly appreciated! Thank you!
This program written using C Lagrange and MPI. I am new to MPI and want to use all processors to do some calculations, including process 0. To learn this concept, I have written the following simple program. But this program hangs at the bottom after receiving input from the process 0 and won't send results back to the process 0.
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
int result;
if (world_rank == 0)
{
number = -2;
int i;
for(i = 0; i < 4; i++)
{
MPI_Send(&number, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
for(i = 0; i < 4; i++)
{ /*Error: can't get result send by other processos bellow*/
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
}
/*I want to do this without using an else statement here, so that I can use process 0 to do some calculations as well*/
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("*Process %d received number %d from process 0\n",world_rank, number);
result = world_rank + 1;
MPI_Send(&result, 1, MPI_INT, 0, 99, MPI_COMM_WORLD); /* problem happens here when trying to send result back to process 0*/
MPI_Finalize();
}
Runing and getting results:
:$ mpicc test.c -o test
:$ mpirun -np 4 test
*Process 1 received number -2 from process 0
*Process 2 received number -2 from process 0
*Process 3 received number -2 from process 0
/* hangs here and will not continue */
If you can, please show me with an example or edit the above code if possible.
I don't really get what would be wrong with using 2 if statements, surrounding the working domain. But anyway, here is an example of what could be done.
I modified your code to use collective communications as they make much more sense than the series of send/receive you used. Since the initial communications are with a uniform value, I use a MPI_Bcast() which does the same in one single call.
Conversely, since the result values are all different, a call to MPI_Gather() is perfectly appropriate.
I also introduce a call to sleep() just to simulate that the processes are working for a while, prior to sending back their results.
The code now looks like this:
#include <mpi.h>
#include <stdlib.h> // for malloc and free
#include <stdio.h> // for printf
#include <unistd.h> // for sleep
int main( int argc, char *argv[] ) {
MPI_Init( &argc, &argv );
int world_rank;
MPI_Comm_rank( MPI_COMM_WORLD, &world_rank );
int world_size;
MPI_Comm_size( MPI_COMM_WORLD, &world_size );
// sending the same number to all processes via broadcast from process 0
int number = world_rank == 0 ? -2 : 0;
MPI_Bcast( &number, 1, MPI_INT, 0, MPI_COMM_WORLD );
printf( "Process %d received %d from process 0\n", world_rank, number );
// Do something usefull here
sleep( 1 );
int my_result = world_rank + 1;
// Now collecting individual results on process 0
int *results = world_rank == 0 ? malloc( world_size * sizeof( int ) ) : NULL;
MPI_Gather( &my_result, 1, MPI_INT, results, 1, MPI_INT, 0, MPI_COMM_WORLD );
// Process 0 prints what it collected
if ( world_rank == 0 ) {
for ( int i = 0; i < world_size; i++ ) {
printf( "Process 0 received result %d from process %d\n", results[i], i );
}
free( results );
}
MPI_Finalize();
return 0;
}
After compiling it as follows:
$ mpicc -std=c99 simple_mpi.c -o simple_mpi
It runs and gives this:
$ mpiexec -n 4 ./simple_mpi
Process 0 received -2 from process 0
Process 1 received -2 from process 0
Process 3 received -2 from process 0
Process 2 received -2 from process 0
Process 0 received result 1 from process 0
Process 0 received result 2 from process 1
Process 0 received result 3 from process 2
Process 0 received result 4 from process 3
Actually, processes 1-3 are indeed sending the result back to processor 0. However, processor 0 is stuck in the first iteration of this loop:
for(i=0; i<4; i++)
{
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
In the first MPI_Recv call, processor 0 will block waiting to receive a message from itself with tag 99, a message that 0 did not send yet.
Generally, it is a bad idea for a processor to send/receive messages to itself, especially using blocking calls. 0 already have the value in memory. It does not need to send it to itself.
However, a workaround is to start the receive loop from i=1
for(i=1; i<4; i++)
{
MPI_Recv(&number, 1, MPI_INT, i, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received number %d from i:%d\n", number, i);
}
Running the code now will give you:
Process 1 received number -2 from process 0
Process 2 received number -2 from process 0
Process 3 received number -2 from process 0
Process 0 received number 2 from i:1
Process 0 received number 3 from i:2
Process 0 received number 4 from i:3
Process 0 received number -2 from process 0
Note that using MPI_Bcast and MPI_Gather as mentioned by Gilles is a much more efficient and standard way for data distribution/collection.
How do I pair processes using MPI in C? It's a tree structured approach. Process 0 should be adding from all of the other even processes, which they are paired with. I only need it to work for powers of 2.
Should I be using MPI_Reduce instead of MPI Send/Receive? If so, why?
My program doesn't seem to get past for loop inside the first if statement. Why?
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <mpi.h>
int main(void){
int sum, comm_sz, my_rank, i, next, value;
int divisor = 2;
int core_difference = 1;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
srandom((unsigned)time(NULL) + my_rank);
value = random() % 10;
//process should recieve and add
if (my_rank % divisor == 0){
printf("IF----");
printf("Process %d generates: %d\n", my_rank, value);
for (i = 0; i < comm_sz; i++)
{
MPI_Recv(&value, 1, MPI_INT, i, my_rank , MPI_COMM_WORLD, MPI_STATUS_IGNORE);
sum += value;
printf("Current Sum=: %d\n", sum);
}
printf("The NEW divisor is:%d\n", divisor);
divisor *= 2;
core_difference *= 2;
}
//sending the random value - no calculation
else if (my_rank % divisor == core_difference){
printf("ELSE----");
printf("Process %d generates: %d\n", my_rank, value);
MPI_Send(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
else
if (my_rank==0)
printf("Sum=: %d\n", sum);
MPI_Finalize();
return 0;
}
The problem is that your processes are all receiving from themselves. If I add a print statement before each send and receive with the processes involved in the operation, here's the output:
$ mpiexec -n 8 ./a.out
IF----Process 0 generates: 5
ELSE----Process 1 generates: 1
ELSE----Process 3 generates: 1
IF----Process 4 generates: 9
ELSE----Process 5 generates: 7
IF----Process 6 generates: 2
ELSE----Process 7 generates: 0
0 RECV FROM 0
1 SEND TO 0
3 SEND TO 0
4 RECV FROM 0
5 SEND TO 0
6 RECV FROM 0
7 SEND TO 0
IF----Process 2 generates: 7
2 RECV FROM 0
1 SEND TO 0 DONE
3 SEND TO 0 DONE
5 SEND TO 0 DONE
7 SEND TO 0 DONE
Obviously, everyone is hanging while waiting for rank 0, including rank 0. If you want to send to yourself, you'll need to use either MPI_Sendrecv to do both the send and receive at the same time or use nonblocking sends and receives (MPI_Isend/MPI_Irecv).
As you said, another option would be to use collectives, but if you do that, you'll need to create new subcommunicators. Collectives require all processes in the communicator to participate. You can't pick just a subset.
I wrote a program on MPI where it would go around each processor in a sort of ring fashion x amount of times (for example if I wanted it to go twice around the "ring" of four processors it would go to 0, 1, 2, 3, 0 ,1....3).
Everything compiled fine but when I ran the program on my Ubuntu VM it would never output anything. It wouldn't even run the first output. Can anyone explain what's going on?
This is my code:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv){
int rank, size, tag, next, from, num;
tag = 201;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
next = (rank + 1)/ size;
from = (rank - 1)/size;
if (rank == 0){
printf("How many times around the ring? :: ");
scanf ("%d", &num);
MPI_Send(&num, 1, MPI_INT, 1, tag, MPI_COMM_WORLD);
}
do{
MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
printf("Process %d received %d from process %d\n", rank, num, status.MPI_SOURCE);
if (rank == 0){
num--;
printf("Process 0 has decremented the number\n");
}
printf("Process %d sending %d to process %d\n", rank, num ,next);
MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
}while (num > 0);
printf("Process %d has exited", rank);
if (rank == 0){
MPI_Recv(&num, 1, MPI_INT, size - 1, tag, MPI_COMM_WORLD, &status);
printf("Process 0 has received the last round, exiting");
}
MPI_Finalize();
return 0;
}
There's a problem with your neighbour assignment. If we insert the following line after the next/from calculation
printf("Rank %d: from = %d, next = %d\n", rank, from, next);
we get:
$ mpirun -np 4 ./ring
Rank 0: from = 0, next = 0
Rank 1: from = 0, next = 0
Rank 2: from = 0, next = 0
Rank 3: from = 0, next = 1
You want something more like
next = (rank + 1) % size;
from = (rank - 1 + size) % size;
which gives
$ mpirun -np 4 ./ring
Rank 0: from = 3, next = 1
Rank 1: from = 0, next = 2
Rank 2: from = 1, next = 3
Rank 3: from = 2, next = 0
and after that your code seems to work.
Whether your code is good or not, your first printf should be output.
If you have no messages printed at all, even the printf in the "if(rank==)" block, then it could be a problem with your VM. Are you sure you have any network interface activated on that VM ?
If the answer is yes, it might be useful to check its compatibility with MPI by checking the OpenMPI FAQ over tcp questions. Sections 7 (How do I tell Open MPI which TCP networks to use?) and 13 (Does Open MPI support virtual IP interfaces?) seems both interesting for any possible problems with running MPI in a Virtual Machine.