I'm studying a bit of MPI, and decided to do a test by making a program that calls objects, eg main.c -> main program, function.c -> any function
function.c that will only use the MPI. compiling I as follows:
gcc-c main.c
to create main.o, mpicc-c to create function.c function.o, of course I create the file function.h too.
I compile with mpicc-o program main.o function.o
Here is main.c
#include <stdio.h>
#include "function.h"
void main(int argc, char *argv[])
{
printf("Hello\n");
function();
printf("Bye\n");
}
just function has the MPI code, but when I'm running the program mpiexe -np 2 I get
Hello
Hello
----- function job here -----
Bye
Bye
But I wanted it to be
Hello
------ function job -----
Bye
What can I do?
Your whole program is run on both of the two processors you set with the -np 2. A common way to prevent duplicates of printouts, final results, etc., is have one thread do those things just by checking the thread id first. Like:
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
if (id == 0) {
printf("only process %d does this one\n", id);
}
printf("hello from process %d\n", id); // all processes do this one
When starting out in MPI I found it helpful to print out those id numbers along with whatever partial results or data each thread was dealing with. Helped me make more sense out of what was happening.
Basically mpirun -np 2 starts 2 identical processes and you have to use MPI_Comm_rank function to check process rank.
Here is a quick snippet:
int main(int argc, char **argv)
{
int myrank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
printf("Hello\n");
function();
MPI_Barrier(MPI_COMM_WORLD);
printf("Done\n");
} else {
function();
MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I generally prefer this method for printing data.
It involves barriers. So you must be careful while using it.
if(1)
do
for(i = 0 to num_threads)
do
if(i==my_rank)
do
do_printf
done
******* barrier ********
end for
done
If the set of threads printing the value does not include all threads, just add the relevant threads to barrier.
Another method is for every thread to write its output in a dedicated file. This way :
you don't have to access a barrier
you do not lose printfs of any thread
you output is explicit. so there is no cluttering while debugging programs.
Code :
sprintf(my_op_file_str, "output%d", myThreadID);
close(1)
open(my_op_file_str)
Now use printf's anywhere you may like.
Related
This question already has answers here:
Ordering Output in MPI
(4 answers)
Closed 5 years ago.
Below is a very basic MPI program
#include <mpi.h>
#include <stdio.h>
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello 2 from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye 2 from %d\n", rank);
MPI_Finalize();
return 0;
}
You would expect the output to be (for 2 processes)
Hello from 0
Hello from 1
Goodbye from 1
Goodbye from 0
Hello 2 from 1
Hello 2 from 0
Goodbye 2 from 0
Goodbye 2 from 1
Or something similar (the hellos and goodbyes should be grouped, but process order is not guaranteed).
Here is my actual output:
Hello from 0
Goodbye from 0
Hello 2 from 0
Goodbye 2 from 0
Hello from 1
Goodbye from 1
Hello 2 from 1
Goodbye 2 from 1
Am I fundamentally misunderstanding what MPI_Barrier is supposed to do? From what I can tell, if I use it only once, then it gives me expected results, but any more than that and it seems to do absolutely nothing.
I realize many similar questions have been asked before, but the askers in the questions I viewed misunderstood the function of MPI_Barrier.
the hellos and goodbyes should be grouped
They should not, there is additional (MPI-asynchronous) buffering inside printf function, and other one buffering is stdout gathering from multiple MPI processes to single user terminal.
printf just prints into in-memory buffer of libc (glibc), which is sometimes flushed to real file descriptor (stdout; use fflush to flush the buffer); and fprintf(stderr,...) usually have less buffering than stdout
Remote tasks are started by mpirun/mpiexec, usually with ssh remote shell, which does stdout/stderr forwarding. ssh (and TCP too) will buffer data and there can be reordering when data from ssh is showed at your terminal by mpirun/mpiexec or other entity (several data streams are multiplexed into one).
What you get is like 4 strings from first process are buffered and flushed at its exit (all strings were printed to stdout with usually has buffer of several kilobytes); and 4 more strings from second process are buffered too till exit. All 4 strings are sent by ssh or other launch method to your console as single "packet" and your console just shows both packets of 4 lines each in some order, either "4_lines_packet_from_id_0; 4_lines_packet_from_id_1;" or as "4_lines_packet_from_id_1;4_lines_packet_from_id_0;".
MPI_Barrier is supposed to do?
MPI_Barrier synchronizes parts of code, but it can't disable any buffering in libc/glibc printing and file I/O functions, nor in ssh or other remote shell.
If all of your processes run on machines with synchronized system clocks (they will be when you have single machine and they should be when there is ntpd on your cluster), you can add timestamp field to every printf to check that real order is respecting your barrier (gettimeofday looks for current time, and has no extra buffering). You may sort timestamped output even if printf and ssh will reorder messages.
#include <mpi.h>
#include <sys/time.h>
#include <stdio.h>
void print_timestamped_message(int mpi_rank, char *s);
{
struct timeval now;
gettimeofday(&now, NULL);
printf("[%u.%06u](%d): %s\n", now.tv_sec, now.tv_usec, mpi_rank, s);
}
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Goodbye");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Goodbye");
MPI_Finalize();
return 0;
}
Hi fellow programmers,
I wanted to write a simple multi-threaded program in C with pthread, but somehow the pthread_join seems to hang.
It seems happens not always, sometimes everything is running fine, next time it't hanging againg, usually on the first thread.
I already minimized the code to the bare minimum, so that i can exclude other problems. In the full code the threads did some computations, of course. But the problems still persists even with this very reduced code. And on more than one machine, with different OS'es.
strace shows me that the hanging has something to do with a FUTEX_WAIT, the full last lines:
write(1, "Joining Thread 0...\n", 20Joining Thread 0...
) = 20
futex(0x7fff61718a20, FUTEX_WAIT, 1634835878, NULL
I tried to debug it with gdb, but my poor debugging stills are very limited, especially with multi-threaded programs.
I also tried to compile it using different C-standards (C99 and Ansi) and both pthread parameters (-lpthread, -pthread), but the problem still persists.
The (reduced) code monte2.c:
#include <stdio.h>
#include <pthread.h>
void *monte(struct MonteArgs *args) {
pthread_exit(NULL);
}
int main(int argc, char **argv) {
int numThreads, numSamples;
int i;
if (argc != 3) {
printf("Usage: monte threads samples\n");
exit(1);
}
numThreads = atoi(argv[1]);
pthread_t threads[numThreads];
numSamples = atoi(argv[2]);
for (i=0; i<numThreads; i++) {
if (pthread_create(&threads[i], NULL, monte, NULL)) {
printf("Error Creating Thread %d!\n", i);
return 1;
}
}
for (i=0; i<numThreads; i++){
printf("Joining Thread %d...\n", i);
pthread_join(&threads[i], NULL);
}
printf("End!\n");
fflush(stdout);
return(0);
}
I compile with
gcc monte2.c -lpthread -o monte
and run with
./monte2 3 100
where the first argument is the number of threads, and the second is actually not needed for the reduced code.
It's been a while since I've done multi-threaded C, but you shouldn't ignore compiler warnings :-). Compile with -Wall.
You should be seeing a warning like this:
note: expected 'pthread_t' but argument is of type 'pthread_t *'
int WINPTHREAD_API pthread_join(pthread_t t, void **res);
You are passing a pthread_t* when you should be passing pthread_t.
Refer to the pthread_join docs: http://man7.org/linux/man-pages/man3/pthread_join.3.html
I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}
This is a pretty basic MPI question, but I can't wrap my head around it. I have a main function that calls another function that uses MPI. I want the main function to execute in serial, and the other function to execute in parallel. My code is like this:
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
void parallel_function(int arg1, int arg2)
{
//init MPI and do some stuff in parallel
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//now do some parallel stuff
//....
//finalize to end MPI??
MPI_Finalize();
}
My code runs fine and gets the expected output, but the issue is that the main function is also being run in separate processes and so the serial code executes more than once. I don't know how it's running multiple times, because I haven't even called MPI_Init yet (if I printf in main before I call parallel_function, I see multiple printf's)
How can I stop my program running in parallel after I'm done?
Thanks for any responses!
Have a look at this answer.
Short story: MPI_Init and MPI_Finalize do not mark the beginning and end of parallel processing. MPI processes run in parallel in their entirety.
#suszterpatt is correct to state that "MPI processes run in parallel in their entirety". When you run a parallel program using, for example, mpirun or mpiexec this starts the number of processes you requested (with the -n flag) and each process begins execution at the start of main. So in your example code
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
every process will execute the //some serial code goes here and //some more serial code goes here parts (and of course they will all call parallel_function). There isn't one master process which calls parallel_function and then spawns other processes once MPI_Init is called.
Generally it is best to avoid doing what you are doing: MPI_Init should be one of the first function calls in your program (ideally it should be the first). In particular, take note of the following (from here):
The MPI standard does not say what a program can do before an MPI_INIT or after an MPI_FINALIZE. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output.
Not respecting this can lead to some nasty bugs.
It is better practice to rewrite your code to something like the following:
int main (int argc, char *argv[])
{
// Initialise MPI
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some serial code goes here
}
// Parallel part: executed by all processes.
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some more serial code goes here
}
// Finalize MPI
MPI_Finalize();
return 0;
}
Note: I am not a C programmer, so use the above code with care. Also, shouldn't main always return something, especially when defined as int main()?
I am trying to learn Unix C and doing some exercises for practice. The current problem I am working on involves POSIX threads (mainly pthread_create() and pthread_join())
The problem asks to repeatedly print "Hello World" using two threads. One thread is to print "Hello" 1000 times, while the second thread prints "World" 1000 times. The main program/thread is to wait for the two threads to finish before proceeding.
Here is what I have right now.
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>
void *print_hello(void *arg)
{
int iCount;
for(iCount = 0; iCount < 1000; iCount++)
{
printf("Hello\n");
}
}
void *print_world(void *arg)
{
int iCount;
for(iCount = 0; iCount < 1000; iCount++)
{
printf("World\n");
}
}
int main(void)
{
/* int status; */
pthread_t thread1;
pthread_t thread2;
pthread_create(&thread1, NULL, print_hello, (void*)0);
pthread_create(&thread2, NULL, print_world, (void*)0);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
return 0;
}
This does not seem to work fully. It prints "Hello" as expected. But "World" is not printed at all. Seems like the second thread is not running at all. Not sure I am using pthread_join correctly. My intention is for the main thread to "wait" for these two threads as the exercise asks.
Any help would be appreciated.
What makes you think it isn't running both threads? I think the output is just screaming past you too quickly to notice -- you're going to get a large number of each thread's output in a single block.
Try redirecting the output to a file and reviewing what actually got printed.
I just ran your code.
$ gcc ./foo.c -pthread
$ ./a.out | grep World | wc -l
1000
$ ./a.out | grep Hello | wc -l
1000
Works for me on Ubuntu 10.10 with gcc-4.5.2. Double check your compilation and your output.