What happens when I MPI_Send to a process that has finished?
I am learning MPI, and writing a small sugar distribution-simulation in C. When the factories stop producing, those processes end. When warehouses run empty, they end. Can I somehow tell if the shop's order to a warehouse did not succeed(because the warehouse process has ended) by looking at the return value of MPI_Send? The documentation doesn't mention a specific error code for this situation, but that no error is returned for success.
Can I do:
if (MPI_Send(...)) {
...
/* destination has ended */
...
}
And disregard the error code?
Thanks
Writing code with unmatched MPI_Send calls is not allowed by the standard. Among other things, this means the resulting behavior will be implementation dependent. The range of possible behaviors includes several "obvious" options: exit, hang/deadlock, memory corruption, and so on.
Most implementations have some level of debugging output that could be helpful in tracking down this kind of logical programing error. It is possible to use MPI_Wait* to barrier on the completion of all MPI_Send/MPI_Recv pairs. In a general case, it is not possible to know that the MPI_Send will not be matched until the recv'ing node enters MPI_Finalize. Said another way, a use of a barrier in this condition will cause the program to hang.
In any event, this would be an error condition for MPI_Finalize. The target rank for the MPI_Send should be detected as having exited...so that the MPI_Send can never be matched. However, this kind of error condition may cause the MPI job to fail to clean up all the rank processes.
As far as I'm aware, the MPI standard defines no return value and no out parameters for MPI_Send(): it doesn't provide any information on the message sending event, probably because message buffering can make it so that no information is available on the result at the time the call returns.
If you want one process to see when another ends, you should send a message from the finishing process with a designated tag, and peridocally post nonblocking receives at the other process to see if an exit notification was sent.
Or, if you want the entire program to abort when one process stops, the easiest thing to do is to simply call MPI_Abort() in the finishing process with MPI_COMM_WORLD as the communicator, which is guaranteed to shut down all processes.
Edit: To actually answer the question in the title, "What happens when I MPI_Send to a process that has finished?": as I understand it, that depends on whether buffering is used or not. If buffering is not used, then the program will hang. If buffering is used, then MPI_Send() will buffer the message and the process will continue to run, but because no matching receive will ever be posted, the message will never leave the buffer. Doing this a lot will eventually cause the program to run out of memory.
Related
So more recently, I have been developing some asynchronous algorithms in my research. I was doing some parallel performance studies and I have been suspicious that I am not properly understanding some details about the various non-blocking MPI functions.
I've seen some insightful posts on here, namely:
MPI: blocking vs non-blocking
MPI Non-blocking Irecv didn't receive data?
There's a few things I am uncertain about or just want to clarify related to working with non-blocking functionality that I think will help me potentially increase the performance of my current software.
From the Nonblocking Communication part of the MPI 3.0 standard:
A nonblocking send start call initiates
the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed
concurrently with computations done at the sender after the send was initiated and before it completed.
...
If the send mode is standard then the send-complete call may
return before a matching receive is posted, if the message is
buffered. On the other hand, the receive-complete may not complete
until a matching receive is posted, and the message was copied into
the receive buffer.
So as a first set of questions about the MPI_Isend (and similarly MPI_Irecv), it seems as though to ensure a non-blocking send finishes, I need to use some mechanism to check that it is complete because in the worst case, there may not be suitable hardware to transfer the data concurrently, right? So if I never use something like MPI_Test or MPI_Wait following the non-blocking send, the MPI_Isend may never actually get its message out, right?
This question applies to some of my work because I am sending messages via MPI_Isend and not actually testing for completeness until I get the expected response message because I want to avoid the overhead of MPI_Test calls. While this approach has been working, it seems faulty based on my reading.
Further, the second paragraph appears to say that for the standard non-blocking send, MPI_Isend, it may not even begin to send any of its data until the destination process has called a matching receive. Given the availability of MPI_Probe/MPI_Iprobe, does this mean an MPI_Isend call will at least send out some preliminary metadata of the message, such as size, source, and tag, so that the probe functions on the destination process can know a message wants to be sent there and so the destination process can actually post a corresponding receive?
Related is a question about the probe. In the Probe and Cancel section, the standard says that
MPI_IPROBE(source, tag, comm, flag, status) returns flag = true if there is a message that can be received and that matches the pattern specifed by the arguments source, tag, and comm. The call matches the same message that would have been received by a call to MPI_RECV(..., source, tag, comm, status) executed at the same point in the program, and returns in status the same value that would have been returned by MPI_RECV(). Otherwise, the call returns flag = false, and leaves status undefined.
Going off of the above passage, it is clear the probing will tell you whether there's an available message you can receive corresponding to the specified source, tag, and comm. My question is, should you assume that the data for the corresponding send from a successful probing has not actually been transferred yet?
It seems reasonable to me now, after reading the standard, that indeed a message the probe is aware of need not be a message that the local process has actually fully received. Given the previous details about the standard non-blocking send, it seems you would need to post a receive after doing the probing to ensure the source non-blocking standard send will complete, because there might be times where the source is sending a large message that MPI does not want to copy into some internal buffer, right? And either way, it seems that posting the receive after a probing is how you ensure that you actually get the full data from the corresponding send to be sent. Is this correct?
This latter question relates to one instance in my code where I am doing a MPI_Iprobe call and if it succeeds, I perform an MPI_Recv call to get the message. However, I think this could be problematic now because I was thinking in my mind that if the probe succeeds, that means it has gotten the whole message already. This implied to me that the MPI_Recv would run quickly, then, since the full message would already be in local memory somewhere. However, I am feeling this was an incorrect assumption now that some clarification on would be helpful.
The MPI standard does not mandate a progress thread. That means that MPI_Isend() might do nothing at all until communications are progressed. Progress occurs under the hood by most MPI subroutines, MPI_Test(), MPI_Wait() and MPI_Probe() are the most obvious ones.
I am afraid you are mixing progress and synchronous send (e.g. MPI_Ssend()).
MPI_Probe() is a local operation, it means it will not contact the sender and ask if something was sent nor progress it.
Performance wise, you should as much as possible avoid unexpected messages, it means a receive should be posted on one end before the message is sent by the other end.
There is a trade-off between performance and portability here :
if you want to write portable code, then you cannot assume there is a MPI progress thread
if you want to optimize your application on a given system, you should give a try to a MPI library that implements a progress thread on the interconnect you are using
Keep in mind most MPI implementations (read this is not mandated by the MPI standard, and you should not rely on it) send small messages in eager mode.
It means MPI_Send() will likely return immediately if the message is small enough (and small enough depends among other things on your MPI implementation, how it is tuned or which interconnect is used).
I know similar questions have been asked, but I think my situation is little bit different. I need to check if child thread is alive, and if it's not print error message. Child thread is supposed to run all the time. So basically I just need non-block pthread_join and in my case there are no race conditions. Child thread can be killed so I can't set some kind of shared variable from child thread when it completes because it will not be set in this case.
Killing of child thread can be done like this:
kill -9 child_pid
EDIT: alright, this example is wrong but still I'm sure there exists way to kill a specific thread in some way.
EDIT: my motivation for this is to implement another layer of security in my application which requires this check. Even though this check can be bypassed but that is another story.
EDIT: lets say my application is intended as a demo for reverse engineering students. And their task is to hack my application. But I placed some anti-hacking/anti-debugging obstacles in child thread. And I wanted to be sure that this child thread is kept alive. As mentioned in some comments - it's probably not that easy to kill child without messing parent so maybe this check is not necessary. Security checks are present in main thread also but this time I needed to add them in another thread to make main thread responsive.
killed by what and why that thing can't indicate the thread is dead? but even then this sounds fishy
it's almost universally a design error if you need to check if a thread/process is alive - the logic in the code should implicitly handle this.
In your edit it seems you want to do something about a possibility of a thread getting killed by something completely external.
Well, good news. There is no way to do that without bringing the whole process down. All ways of non-voluntary death of a thread kill all threads in the process, apart from cancellation but that can only be triggered by something else in the same process.
The kill(1) command does not send signals to some thread, but to a entire process. Read carefully signal(7) and pthreads(7).
Signals and threads don't mix well together. As a rule of thumb, you don't want to use both.
BTW, using kill -KILL or kill -9 is a mistake. The receiving process don't have the opportunity to handle the SIGKILL signal. You should use SIGTERM ...
If you want to handle SIGTERM in a multi-threaded application, read signal-safety(7) and consider setting some pipe(7) to self (and use poll(2) in some event loop) which the signal handler would write(2). That well-known trick is well explained in Qt documentation. You could also consider the signalfd(2) Linux specific syscall.
If you think of using pthread_kill(3), you probably should not in your case (however, using it with a 0 signal is a valid but crude way to check that the thread exists). Read some Pthread tutorial. Don't forget to pthread_join(3) or pthread_detach(3).
Child thread is supposed to run all the time.
This is the wrong approach. You should know when and how a child thread terminates because you are coding the function passed to pthread_create(3) and you should handle all error cases there and add relevant cleanup code (and perhaps synchronization). So the child thread should run as long as you want it to run and should do appropriate cleanup actions when ending.
Consider also some other inter-process communication mechanism (like socket(7), fifo(7) ...); they are generally more suitable than signals, notably for multi-threaded applications. For example you might design your application as some specialized web or HTTP server (using libonion or some other HTTP server library). You'll then use your web browser, or some HTTP client command (like curl) or HTTP client library like libcurl to drive your multi-threaded application. Or add some RPC ability into your application, perhaps using JSONRPC.
(your putative usage of signals smells very bad and is likely to be some XY problem; consider strongly using something better)
my motivation for this is to implement another layer of security in my application
I don't understand that at all. How can signal and threads add security? I'm guessing you are decreasing the security of your software.
I wanted to be sure that this child thread is kept alive.
You can't be sure, other than by coding well and avoiding bugs (but be aware of Rice's theorem and the Halting Problem: there cannot be any reliable and sound static source code program analysis to check that). If something else (e.g. some other thread, or even bad code in your own one) is e.g. arbitrarily modifying the call stack of your thread, you've got undefined behavior and you can just be very scared.
In practice tools like the gdb debugger, address and thread sanitizers, other compiler instrumentation options, valgrind, can help to find most such bugs, but there is No Silver Bullet.
Maybe you want to take advantage of process isolation, but then you should give up your multi-threading approach, and consider some multi-processing approach. By definition, threads share a lot of resources (notably their virtual address space) with other threads of the same process. So the security checks mentioned in your question don't make much sense. I guess that they are adding more code, but just decrease security (since you'll have more bugs).
Reading a textbook like Operating Systems: Three Easy Pieces should be worthwhile.
You can use pthread_kill() to check if a thread exists.
SYNOPSIS
#include <signal.h>
int pthread_kill(pthread_t thread, int sig);
DESCRIPTION
The pthread_kill() function shall request that a signal be delivered
to the specified thread.
As in kill(), if sig is zero, error checking shall be performed
but no signal shall actually be sent.
Something like
int rc = pthread_kill( thread_id, 0 );
if ( rc != 0 )
{
// thread no longer exists...
}
It's not very useful, though, as stated by others elsewhere, and it's really weak as any type of security measure. Anything with permissions to kill a thread will be able to stop it from running without killing it, or make it run arbitrary code so that it doesn't do what you want.
I'm running a multi-threaded C program (process?) , making use of semaphores & pthreads. The threads keep interacting, blocking, waking & printing prompts on stdout continuously, without any human intervention. I want to be able to exit this process (gracefully after printing a message & putting down all threads, not via a crude CTRL+C SIGINT) by pressing a keyboard character like #.
What are my options for getting such an input from the user?
What more relevant information could I provide that will help to solve this problem?
Edit:
All your answers sound interesting, but my primary question remains. How do I get user input, when I don't know which thread is currently executing? Also, semaphore blocking using sem_wait() breaks if signalled via SIGINT, which may cause a deadlock.
There is no difference in reading standard input from threads except if more than one thread is trying to read it at the same time. Most likely your threads are not all calling functions to read standard input all the time, though.
If you regularly need to read input from the user you might want to have one thread that just reads this input and then sets flags or posts events to other threads based on this input.
If the kill character is the only thing you want or if this is just going to be used for debugging then what you probably want to do is occasionally poll for new data on standard input. You can do this either by setting up standard input as non-blocking and try to read from it occasionally. If reads return 0 characters read then no keys were pressed. This method has some problems, though. I've never used stdio.h functions on a FILE * after having set the underlying file descriptor (an int) to non-blocking, but suspect that they may act odd. You could avoid the use of the stdio functions and use read to avoid this. There is still an issue I read about once where the block/non-block flag could be changed by another process if you forked and exec-ed a new program that had access to a version of that file descriptor. I'm not sure if this is a problem on all systems. Nonblocking mode can be set or cleared with a 'fcntl' call.
But you could use one of the polling functions with a very small (0) timeout to see if there is data ready. The poll system call is probably the simplest, but there is also select. Various operating systems have other polling functions.
#include <poll.h>
...
/* return 0 if no data is available on stdin.
> 0 if there is data ready
< 0 if there is an error
*/
int poll_stdin(void) {
struct pollfd pfd = { .fd = 0, .events = POLLIN };
/* Since we only ask for POLLIN we assume that that was the only thing that
* the kernel would have put in pfd.revents */
return = poll(&pfd, 1, 0);
}
You can call this function within one of your threads until and as long as it retuns 0 you just keep on going. When it returns a positive number then you need to read a character from stdin to see what that was. Note that if you are using the stdio functions on stdin elsewhere there could actually be other characters already buffered up in front of the new character. poll tells you that the operating system has something new for you, not what C's stdio has.
If you are regularly reading from standard input in other threads then things just get messy. I'm assuming you aren't doing that (because if you are and it works correctly you probably wouldn't be asking this question).
You would have a thread listening for keyboard input, and then it would join() the other threads when receiving # as input.
Another way is to trap SIGINT and use it to handle the shutdown of your application.
The way I would do it is to keep a global int "should_die" or something, whose range is 0 or 1, and another global int "died," which keeps track of the number of threads terminated. should_die and died are both initially zero. You'll also need two semaphores to provide mutex around the globals.
At a certain point, a thread checks the should_die variable (after acquiring the mutex, of course). If it should die, it acquires the died_mutex, ups the died count, releases the died_mutex, and dies.
The main initial thread periodically wakes up, checks that the number of threads that have died is less than the number of threads, and goes back to sleep. The main thread dies when all the other threads have checked in.
If the main thread doesn't spawn all the threads itself, a small modification would be to have "threads_alive" instead of "died". threads_alive is incremented when a thread forks, and decremented when the thread dies.
In general, terminating a multithreaded operation cleanly is a pain in the butt, and besides special cases where you can use things like the semaphore barrier design pattern, this is the best I've heard of. I'd love to hear it if you find a better, cleaner one.
~anjruu
In general, I have threads waiting on a set of events and one of those events is the termination event.
In the main thread, when I have triggered the termination event, I then wait on all the threads having exited.
SIGINT is actually not that difficult to handle and is often used for graceful termination. You need a signal handler and a way to tell all the threads that it's time to stop. One global flag that threads check in their loops and the signal handler sets might do. Same approach works for "on user command" termination, though you need a way to get the input from the terminal - either poll in a dedicated thread, or again, set the terminal to generate a signal for you.
The tricky part is to unblock waiting threads. You have to carefully design the notification protocol of who tells who to stop and what they need to do - put dummy message into a queue, set a flag and signal a cv, etc.
I have two different way to check whether a process is still up and running:
1) using GetExitCodeProcess()
2) walking the list of processes using CreateToolhelp32Snapshot() and checking PIDs
now, in both cases I'm still getting that a process that I terminated with TerminateProcess is till alive even tho it is not.
Is there a way to positively know whether a process is still alive or dead passing the PID?
thanks!
Don't use PID for something like this. PIDs are reused and have a very narrow range, with a very high collision probability. In other words, you will find a running process but will be a different process.
A call to GetExitCodeProcess should return STILL_ACTIVE for active processes. After a call to TerminateProcess, the process will be dead, and a different value will be returned.
Another way to check if a process is alive is WaitForSingleObject. If you call this on the process handle with a timeout of 0, it will immediately return WAIT_TIMEOUT if the process is still running.
You cannot assume a low level API call functions the way it seems or how you think it should function from its name or high level description. A kernel still has things to do and often calls are just requests to the kernel and there are a multitude of things a kernel needs to do (depending on implementation) before it will actually release the PID. In this case after you issue the call you may assume the process is dead, however the kernel still has to clean up.
From MSDN :
The TerminateProcess function is use
to unconditionally cause a process to
exit. The state of global data
maintained by dynamic-link libraries
(DLLs) may be compromised if
TerminateProcess is used rather than
ExitProcess.
TerminateProcess initiates termination
and returns immediately. This stops
execution of all threads within the
process and requests cancellation of
all pending I/O. The terminated
process cannot exit until all pending
I/O has been completed or canceled.
A process cannot prevent itself from
being terminated.
Could you make use of the Process Status API? There are functions for enumerating all running processes on a system - this could help you.
This is for an assignment I'm working on, and NO I'm not looking for you to just GIVE me the answer. I just need someone to point me in the right direction, maybe with a line or two of sample code.
I need to figure out how to set the priority of a file read operation from within my program. To the point:
server process receives a message and spawns a child to handle it
child tries to open the filename from the message and starts loading the file contents into the message queue
there may be several children running at the same time, and the initial message contains a priority so some messages may get more device access
The only way I can think to do this (right now, anyways) would be to increment a counter every time I create a message, and to do something like sched_yield after the counter reaches a given value for that process' assigned priority. That's most likely a horrible, horrible approach, but it's all I can think of at the moment. The assignment is more about the message queues than anything else, but we still have to have data transfer priority.
Any help/guidance is appreciated :)
Have the pool of child processes share a semaphore. Once a child acquires the semaphore it can read a predefined number of bytes from the resource and return it to the client. The number of bytes read can be related to the priority of the request. Once the process has read the predefined number of bytes release the semaphore.
Until recently, there was no IO prioritization in Linux. Now there is ionice. But I doubt you are meant to use it in your assignment.
Are you sure your assignment is talking about files and not system V message queues?
Read the man pages for:
msgctl(2), msgget(2), msgrcv(2), msgsnd(2), capabilities(7),
mq_overview(7), svipc(7)
Although I think you can use a file as a key to create a message queue, so that multiple processes have a way to rendezvous via the message queue, a Sys V message queue itself is not a file.
Just wondering because you mention "message queues" specifically, and talk about "priorities", which might conceivably map to the msgtyp field of eg. msgsnd and msgrcv, though it's hard to tell with what information you've given what the assignment really is about.