weird output when I use pthread and printf - c

I write a program using pthread.
Environment:windows 7 , CYGWIN_NT-6.1 i686 Cygwin , gcc (GCC) 4.5.3
The source code
#include<stdio.h>
#include<pthread.h>
void *th_func(void *p)
{
int iLoop = 0;
for(iLoop = 0;iLoop<100;iLoop++)
{
printf("Thread Thread Thread Thread\n");
}
return;
}
int main()
{
int iLoop = 0;
pthread_t QueThread;
printf("Main : Start Main\n");
printf("Main : Start Create Thread\n");
pthread_create(&QueThread,NULL,th_func,NULL);
printf("Main : End Create Thread\n");
for(iLoop = 0;iLoop<100;iLoop++)
{
printf("Main Main Main Main\n");
}
pthread_join(QueThread,NULL);
printf("Main : End Main\n");
printf("---------------\n");
return 0;
}
When I compile the source code, there are no warnings or errors,but it's output is weird.
A part of it's output
Main : Start Main
Main : Start Create Thread
Thread Thread Thread ThreThread Thread Thread Thread
Main Main Main Main
Thread Thread Thread Thread
Main Main Main Main
I want to know the cause of such phenomenon.
In this output, Main : End Create Thread is not printed completely. And at line 3, a newline \n at the end of "Thread Thread Thread Thread\n" disappear.
Is everyone's output like this? It does not occur every time, but occurs sometime.
If I use mutex to call printf safely,the weird output seem to be stopped.
POSIX says printf is thread-safe, and according to Cygwin.com, cygwin provides posix-style API. However, there is the unexpected output.
Is printf really thread-safe?
I executed the same program 100 times in Linux(Ubuntu), and this output did not occur.
In addition, I have not understood the reason why some words on the output disappeared.

This looks like it may be a bug in Cygwin, or maybe something is misconfigured. Several answer here indicate that 'thread safe' only promises that the function won't cause harm to the program, and that thread safety doesn't necessarily mean that a function is 'atomic'. But, as far as I know, POSIX doesn't formally define 'thread safe' (if anyone has a pointer to such a definition, please post it in a comment).
However, not only does POSIX specify that printf() is thread safe, POSIX also specifies that:
All functions that reference ( FILE *) objects shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these ( FILE *) objects.
Since printf() implicitly references the stdout FILE* object, all printf() calls should be atomic with respect to each other (and any other function that uses stdout).
Note that this might not be true on other systems, but in my experience it does hold true for many multi threaded systems.

The POSIX standard has functions like putc_unlocked() where the commentary says:
Versions of the functions getc(), getchar(), putc(), and putchar() respectively named getc_unlocked(), getchar_unlocked(), putc_unlocked(), and putchar_unlocked() shall be provided which are functionally equivalent to the original versions, with the exception that they are not required to be implemented in a thread-safe manner. They may only safely be used within a scope protected by flockfile() (or ftrylockfile()) and funlockfile(). These functions may safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the (FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.
That clearly indicates that the low-level functions for single character I/O are normally thread-safe. However, it also indicates that the level of granularity is a single character output operation. The specification for printf() says:
Characters generated by fprintf() and printf() are printed as if fputc() had been called.
And for putc(), it says:
The putc() function shall be equivalent to fputc(), except that if it is implemented as a macro it may evaluate stream more than once, so the argument should never be an expression with side-effects.
The page for fputc() doesn't say anything about thread-safety, so you have to look elsewhere for that information.
Another section describes threads and says:
All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions need not be thread-safe.
And the list following includes the *_unlocked() functions.
So, printf() and fputc() have to be thread-safe, but the writing by printf() is done 'as if' by fputc(), so the interleaving of output between threads may be at the character level, which is more or less consistent with what you see. If you want to make calls to printf() non-interleaved, you would need to use the flockfile() and funlockfile() calls to give your thread ownership of stdout while the printf() is executed. Similarly for fprintf(). You could write an fprintf_locked() function quite easily to achieve this result:
int fprintf_locked(FILE *fp, const char *format, ...)
{
flockfile(fp);
va_list args;
va_start(args, format);
int rc = vfprintf(fp, format, args);
va_end(args);
funlockfile(fp);
return rc;
}
You could insert a fflush(fp) in there if you wished. You could also have a vfprintf_locked() and have the function above call that to do the lock, format, (flush) and unlock operations. It's probably how I'd code it, trusting the compiler to inline the code if that was appropriate and doable. Supporting the versions using stdout is likewise pretty straight-forward.
Note the fragment of POSIX specification for flockfile() quoted by Michael Burr in his answer:
All functions that reference (FILE *) objects, except those with names ending in _unlocked, shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these (FILE *) objects.
Apart from the odd parentheses around the FILE *, these lines impact all the other standard I/O functions, but you have to know that these lines exist in one of the less frequently used man pages. Thus, my fprintf_locked() function should be unnecessary. If you find an aberrant implementation of fprintf() that does not lock the file, then the fprintf_locked() function could be used instead, but it should only be done under protest — the library should be doing that for you anyway.

Just because a function is thread-safe, it doesn't mean it's atomic.
In your case, if you want to ensure that your output don't get interleaved, you need to use a mutex to ensure that only one thread calls printf at a time.

Threads behave like this for a reason. If threads were executed one after another and not 'at the same time' (in an interleaved manner), there would be no point in this kind of 'concurrency'. When you use mutexes, the threads will be blocked according to your intention, and they generate the expected output.
Also, you write return; in a function that returns void * and that is undefined behavior, so anything can happen when running your program.

I will put this in a simple way you have two threads which are trying to access a resource. And also there is no kinds priority checks or anything like a mutex. Theoretically, threads without mutex or priority gets assigned with resources randomly. try creating two threads with one thread printing yes and the other one printing no. you will find this unusual behavior. Also remember running time is different for different threads in this case. If you try the same stuff with one thread writing the info to a file and other guy writing to console. You will not encounter such an issue. Hope that helps....

Related

What is unlocked_stdio in C?

So, I was looking for random linux manual pages, when I encountered this weird one, you can see it by executing "man unlocked_stdio", or you can view it in your browser by going to this page
So, what is this for? It has weird functions like getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked, and etc, all those functions have one thing in common, they have a FILE stream parameter, I know that all those functions are normal IO functions with a "_unlocked" appended to them, but what does that mean?
It has to do with thread safety.
From your link
Each of these functions has the same behavior as its counterpart without the "_unlocked" suffix, except that they do not use locking (they do not set locks themselves, and do not test for the presence of locks set by others) and hence are thread-unsafe. See flockfile(3).
And from flockfile:
The stdio functions are thread-safe. This is achieved by assigning to each FILE object a lockcount and (if the lockcount is nonzero) an owning thread. For each library call, these functions wait until the FILE object is no longer locked by a different thread, then lock it, do the requested I/O, and unlock the object again.
Some pseudocode that shows how it works. This is not necessarily exactly how it is implemented in reality, but it demonstrates the idea, and clearly shows the difference with the unlocked version. Functionalitywise, the locked version is essentially a wrapper around the unlocked version.
int getchar(void) {
// Wait until stdinlock is unlocked and then lock it
// This is an atomic operation
wait_until_unlocked_and_then_lock(stdinlock);
// Get the character from stdin
int ret = getchar_unlocked();
// Release the lock to make the input stream available to other threads
unlock(stdinlock);
// And return the value
return ret;
}

How does the function pthread_yield work?

I am implementing a threads library in C and I am stuck on the meaning of pthread_yield(). I have looked it up on the man page in the terminal but I did not really understand the explanation. Could someone explain it to me?
Note well that its name notwithstanding, pthread_yield is not standardized. Its Linux manual page says this, for example:
This call is nonstandard, but present on several other systems. Use the standardized sched_yield(2) instead.
The specifications for sched_yield() are written in much the same terms as those of pthread_yield(), however:
The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments.
This just means that the thread that calls the function allows other threads and processes a chance to run, waiting to resume until its turn comes again. It is not necessary to do this in a preemptive multitasking system such as pthreads is designed around -- the kernel manages assigning CPU time to threads and processes without any such help -- but there may occasionally be special cases where it smooths out thread scheduling issues.
In the GLIBC, pthread_yield merely invokes sched_yield() system call (cf. nptl/pthread_yield.c in the source tree of the library):
/* With the 1-on-1 model we implement this function is equivalent to
the 'sched_yield' function. */
int
pthread_yield (void)
{
return sched_yield ();
}
As you are implementing a thread library, note that the above GLIBC source code (2.31 version) of pthread_yield() results in an unusual pthread API behavior which may be an implementation bug as it returns directly the result of sched_yield(). Like most of the Linux system calls, the latter returns -1 and sets errno if it fails (even if the manual specifies that it actually never returns in error). So, theoretically, this makes pthread_yield() return -1 and set errno in case of error although the pthread API usually returns 0 if successful and the error number in case of error (errno is not supposed to be set). So, the manual is wrong or at least does not comply with the GLIBC's implementation when it describes the returned value as:
RETURN VALUE
On success, pthread_yield() returns 0; on error, it returns an error number.
The expected source code could be something like:
int
pthread_yield (void)
{
return sched_yield() == -1 ? errno : 0;
}
For example, pthread_sigmask() is coded as:
int
pthread_sigmask (int how, const sigset_t *newmask, sigset_t *oldmask)
{
[...]
return sigprocmask (how, newmask, oldmask) == -1 ? errno : 0;
[...]
}
which complies with what is stated in the manual:
RETURN VALUE
On success, pthread_sigmask() returns 0; on error, it returns an
error number.

calling an elaborate function from SIGINT handle - when is it ok?

I have a large function, and I want to allow the user to stop it and send it to a recovery procedure (implemented by elaborate_function). I thought of using a SIGINT handle:
void handle(){
FILE * devNull = fopen("/dev/null", "w");
elaborate_function(devNull);
}
void foo(){
signal(SIGINT, handle);
...
}
The elaborate_function() does the following:
file manipulation
printf / fprintf
calls other functions
system calls (tar, cp, mv...)
reset
I know that using printf from a signal handler is considered unsafe (as said here) but from what I understand, the unsafety only means possible bad output, and this doesn't bother me in this specific case (I'll maybe pass /dev/null as the FILE* fout parameter to block output, as suggested here).
Aside from the <stdio.h> functions, is there any other problem with what I intend? This is the simplest way of a non-blocking-input-mechanism I could think of.
Running only on Linux, so I don't worry about portability
There are many unsafe function because internally, these function can lock something (like a mutex), so the "worst" scenario for you is that to have a race condition causing a deadlock. You can also have race condition causing sigsev, and many other hellish bug that you want to avoid (since race condition is hard to debug).
As you have said, only the function listed here are safe.
For a non-blocking mecanism input :
You can create a thread that will pool the input file.
You can use signal to raise a flag (like "must_read_input") and in your code, call "elaborate_function" as soon as you see the flag raised.
You can modify your "elaborate_function" to use only signal safe function.
There is plenty way.

Should we use exit() in C?

There is question about using exit in C++. The answer discusses that it is not good idea mainly because of RAII, e.g., if exit is called somewhere in code, destructors of objects will not be called, hence, if for example a destructor was meant to write data to file, this will not happen, because the destructor was not called.
I was interested how is this situation in C. Are similar issues applicable also in C? I thought since in C we don't use constructors/destructors, situation might be different in C. So is it ok to use exit in C? For example I have seen following functions sometimes used in C:
void die(const char *message)
{
if(errno) {
perror(message);
} else {
printf("ERROR: %s\n", message);
}
exit(1);
}
Rather than abort(), the exit() function in C is considered to be a "graceful" exit.
From C11 (N1570) 7.22.4.4/p2 The exit function (emphasis mine):
The exit function causes normal program termination to occur.
The Standard also says in 7.22.4.4/p4 that:
Next, all open streams with unwritten buffered data are flushed, all
open streams are closed, and all files created by the tmpfile function
are removed.
It is also worth looking at 7.21.3/p5 Files:
If the main function returns to its original caller, or if the exit
function is called, all open files are closed (hence all output
streams are flushed) before program termination. Other paths to
program termination, such as calling the abort function, need not
close all files properly.
However, as mentioned in comments below you can't assume that it will cover every other resource, so you may need to resort to atexit() and define callbacks for their release individually. In fact it is exactly what atexit() is intended to do, as it says in 7.22.4.2/p2 The atexit function:
The atexit function registers the function pointed to by func, to be
called without arguments at normal program termination.
Notably, the C standard does not say precisely what should happen to objects of allocated storage duration (i.e. malloc()), thus requiring you be aware of how it is done on particular implementation. For modern, host-oriented OS it is likely that the system will take care of it, but still you might want to handle this by yourself in order to silence memory debuggers such as Valgrind.
Yes, it is ok to use exit in C.
To ensure all buffers and graceful orderly shutdown, it would be recommended to use this function atexit, more information on this here
An example code would be like this:
void cleanup(void){
/* example of closing file pointer and free up memory */
if (fp) fclose(fp);
if (ptr) free(ptr);
}
int main(int argc, char **argv){
/* ... */
atexit(cleanup);
/* ... */
return 0;
}
Now, whenever exit is called, the function cleanup will get executed, which can house graceful shutdown, clean up of buffers, memory etc.
You don't have constructors and destructors but you could have resources (e.g. files, streams, sockets) and it is important to close them correctly. A buffer could not be written synchronously, so exiting from the program without correctly closing the resource first, could lead to corruption.
Using exit() is OK
Two major aspects of code design that have not yet been mentioned are 'threading' and 'libraries'.
In a single-threaded program, in the code you're writing to implement that program, using exit() is fine. My programs use it routinely when something has gone wrong and the code isn't going to recover.
But…
However, calling exit() is a unilateral action that can't be undone. That's why both 'threading' and 'libraries' require careful thought.
Threaded programs
If a program is multi-threaded, then using exit() is a dramatic action which terminates all the threads. It will probably be inappropriate to exit the entire program. It may be appropriate to exit the thread, reporting an error. If you're cognizant of the design of the program, then maybe that unilateral exit is permissible, but in general, it will not be acceptable.
Library code
And that 'cognizant of the design of the program' clause applies to code in libraries, too. It is very seldom correct for a general purpose library function to call exit(). You'd be justifiably upset if one of the standard C library functions failed to return just because of an error. (Obviously, functions like exit(), _Exit(), quick_exit(), abort() are intended not to return; that's different.) The functions in the C library therefore either "can't fail" or return an error indication somehow. If you're writing code to go into a general purpose library, you need to consider the error handling strategy for your code carefully. It should fit in with the error handling strategies of the programs with which it is intended to be used, or the error handling may be made configurable.
I have a series of library functions (in a package with header "stderr.h", a name which treads on thin ice) that are intended to exit as they're used for error reporting. Those functions exit by design. There are a related series of functions in the same package that report errors and do not exit. The exiting functions are implemented in terms of the non-exiting functions, of course, but that's an internal implementation detail.
I have many other library functions, and a good many of them rely on the "stderr.h" code for error reporting. That's a design decision I made and is one that I'm OK with. But when the errors are reported with the functions that exit, it limits the general usefulness the library code. If the code calls the error reporting functions that do not exit, then the main code paths in the function have to deal with error returns sanely — detect them and relay an error indication to the calling code.
The code for my error reporting package is available in my SOQ (Stack Overflow Questions) repository on GitHub as files stderr.c and stderr.h in the src/libsoq sub-directory.
One reason to avoid exit in functions other than main() is the possibility that your code might be taken out of context. Remember, exit is a type of non local control flow. Like uncatchable exceptions.
For example, you might write some storage management functions that exit on a critical disk error. Then someone decides to move them into a library. Exiting from a library is something that will cause the calling program to exit in an inconsitent state which it may not be prepared for.
Or you might run it on an embedded system. There is nowhere to exit to, the whole thing runs in a while(1) loop in main(). It might not even be defined in the standard library.
Depending on what you are doing, exit may be the most logical way out of a program in C. I know it's very useful for checking to make sure chains of callbacks work correctly. Take this example callback I used recently:
unsigned char cbShowDataThenExit( unsigned char *data, unsigned short dataSz,unsigned char status)
{
printf("cbShowDataThenExit with status %X (dataSz %d)\n", status, dataSz);
printf("status:%d\n",status);
printArray(data,dataSz);
cleanUp();
exit(0);
}
In the main loop, I set everything up for this system and then wait in a while(1) loop. It is possible to make a global flag to exit the while loop instead, but this is simple and does what it needs to do. If you are dealing with any open buffers like files and devices you should clean them up before close for consistency.
It is terrible in a big project when any code can exit except for coredump. Trace is very import to maintain a online server.

printf flush at program exit

I'm interested in knowing how the printf() function's flush works when the program exits.
Let's take the following code:
int main(int ac, char **av)
{
printf("Hi");
return 0;
}
In this case, how does printf() manage to flush its buffer to stdout?
I guess it's platform dependent, so let's take Linux.
It could be implemented using gcc's __attribute__((dtor)) but then the standard library would be compiler dependent. I assume this is not the way it works.
Any explanations or links to documentation is appreciated. Thank you.
The C runtime will register atexit() handlers to flush standard buffers when exit() is called.
See this explanation.
When the program exits normally, the exit function has always performed a clean shutdown of the standard I/O library, this causes all buffered output data to be flushed.
Returning an integer value from the main function is equivalent to calling exit with the same value.So, return 0 has the same effect with exit(0)
If _Exit or _exit was called, the process will be terminated immediately, the IO won't be flushed.
Just to expand trofanjoe's response:
exit causes normal program termination. atexit functions are called in
reverse order of registration, open files are flushed, open streams
are closed, and control is returned to the environment.
and
Within main, return expr is equivalent to exit(expr). exit has the
advantage that it can be called from other functions
From man stdio on my machine here (emphasis added), whic runs RHEL 5.8:
A file may be subsequently reopened, by the same or another
program execution, and its contents reclaimed or modified (if it can
be repositioned at the start). If the main function returns to its
original caller, or the exit(3) function is called, all open files are
closed (hence all output streams are flushed) before program
termination. Other methods of program termination, such as abort(3)
do not bother about closing files properly.

Resources