So, I was looking for random linux manual pages, when I encountered this weird one, you can see it by executing "man unlocked_stdio", or you can view it in your browser by going to this page
So, what is this for? It has weird functions like getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked, and etc, all those functions have one thing in common, they have a FILE stream parameter, I know that all those functions are normal IO functions with a "_unlocked" appended to them, but what does that mean?
It has to do with thread safety.
From your link
Each of these functions has the same behavior as its counterpart without the "_unlocked" suffix, except that they do not use locking (they do not set locks themselves, and do not test for the presence of locks set by others) and hence are thread-unsafe. See flockfile(3).
And from flockfile:
The stdio functions are thread-safe. This is achieved by assigning to each FILE object a lockcount and (if the lockcount is nonzero) an owning thread. For each library call, these functions wait until the FILE object is no longer locked by a different thread, then lock it, do the requested I/O, and unlock the object again.
Some pseudocode that shows how it works. This is not necessarily exactly how it is implemented in reality, but it demonstrates the idea, and clearly shows the difference with the unlocked version. Functionalitywise, the locked version is essentially a wrapper around the unlocked version.
int getchar(void) {
// Wait until stdinlock is unlocked and then lock it
// This is an atomic operation
wait_until_unlocked_and_then_lock(stdinlock);
// Get the character from stdin
int ret = getchar_unlocked();
// Release the lock to make the input stream available to other threads
unlock(stdinlock);
// And return the value
return ret;
}
Related
I have a threaded server that can add/append/read files and relay data to the client.
If a file is being added, no other thread can append/read it. If a file is being appended, no threads can append/read it. If a file is being read, no other thread can append to it. However, if a file is being read, other files can read it.
Currently I have a mutex system that will do this, except it won't allow multiple reads.
To fix this, in the read method, I will change:
pthread_mutex_lock(&(fm->mutex));//LOCK
//do some things`
...
pthread_mutex_unlock(&(fm->mutex));
to
pthread_mutex_trylock(&(fm->mutex));//TRYLOCK [NonBlocking, so the thread can continue the read]
//do some things`
...
pthread_mutex_unlock(&(fm->mutex));
Question
How can I unlock the file without allowing the other methods (just append really) to begin writing to the file before all the other read()'s have finished?
Example
For example, if the reading thread that originally locked the file completes and unlocks the file and there are still other threads trying to read the file, then an appending thread gets the chance to lock the file and begin appending while the others are still reading, which is a no-no.
Idea
I want to keep a count of the number of threads currently reading a file. When a thread finishes, reduce the count. If the count is 0, meaning no threads are still reading, unlock the file. But, I'm worried that this would not be thread safe. If this is a viable solution, how could I make it thread safe? Another but, I believe only the original thread can successfully unlock the mutex.
It sounds like you may be looking for a read-write lock, which is provided by pthreads. It allows two modes of locking: a shared/read-lock mode, which can be locked by multiple threads at once, and an exclusive/write-lock mode, where the lock call won't return until all other threads (readers and writers) have given up their hold on the lock.
You could use a semaphore instead of the mutex (see this link about the differences). The semaphore does thread-safe synchronized counting for you.
You can live without an additional mutex to lock the file for writing if you limit the number of simultaneous read accesses to a (sufficient large) number N and require the semaphore to be increased by that number for write access. This way you can only gain write access if the number of readers is zero and all other readers will be locked out until your writer has finished.
Note that the POSIX documentation for pthread_mutex_lock() says:
If successful, the pthread_mutex_lock(), pthread_mutex_trylock(), and pthread_mutex_unlock() functions shall return zero; otherwise, an error number shall be returned to indicate the error.
Since you don't show your code testing the return values, you don't know whether your lock operations (in particular) succeeded or not.
Separately, since you want a read/write lock, why not use one:
pthread_rwlock_rdlock()
pthread_rwlock_wrlock()
pthread_rwlock_unlock()
pthread_rwlock_init()
pthread_rwlock_destroy()
There are four pthread_rwlockattr_*() functions and a total of 9 pthread_rwlock_*() functions; I only listed the most important functions in the family.
It seems that glibc's implementation of fprintf() is thread-safe, but is that so for Microsoft's CRT, as well?
By thread-safe, I don't mean just crashing, but also that if multiple threads (in the same process) call fprintf(), the texts will not be mixed.
That is, for example, if thread A calls fprintf(stdout, "aaaa"); and thread B calls fprintf(stdout, "bbbb"); it's guaranteed not to mix to become aabbaabb.
Is there such a guarantee?
Yes. In the multithreaded runtime libraries, every stream has an associated lock. This lock is acquired at the beginning of any call to a printf function and not released until just before that printf function returns.
This behavior is required by C11 (there was no concept of "threads" in standard C until C11). C11 ยง7.21.2/7-8 states:
Each stream has an associated lock that is used to prevent data races when multiple
threads of execution access a stream, and to restrict the interleaving of stream operations performed by multiple threads. Only one thread may hold this lock at a time. The lock is reentrant: a single thread may hold the lock multiple times at a given time.
All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.
Visual C++ does not fully support C11, but it does conform to this requirement. A couple of other Visual C++-specific comments:
As long as you are not defining _CRT_DISABLE_PERFCRIT_LOCKS (which only works with the statically-linked runtime libraries, libcmt.lib and friends) or using the _nolock-suffixed functions, then most operations on a single stream are atomic.
If you require atomicity across multiple operations on a stream, you can acquire the lock for a file yourself by acquiring and releasing the stream lock yourself using _lock_file and _unlock_file.
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
Say my program has some threads, since the file descriptors are shared among the threads, if I call close(stderr), all the threads won't output to stderr. my question: is there a way to shut down the output of stderr in one thread, but not the others?
To be more specific, one thread of my program calls a third party library function, and it keeps output warning messages which I know are useless. But I have no access to this third party library source.
No. File descriptors are global resources available to all threads in a process. Standard error is file descriptor number 2, of course, so it is a global resource and you can't stop the third party code from writing to it.
If the problem is serious enough to warrant the treatment, you can do:
int fd2_copy = dup(2);
int fd2_null = open("/dev/null", O_WRONLY);
Before calling your third-party library function:
dup2(fd2_null, 2);
third_party_library_function();
dup2(fd2_copy, 2);
Basically, for the duration of the third-party library, switch standard error to /dev/null, reinstating the normal output after the function.
You should, of course, error check the system calls.
The downside of this is that while this thread is executing the third party function, any other thread that needs to write to standard error will also write to /dev/null.
You'd probably have to think in terms of adding an 'error writing thread' (EWT) which can be synchronized with the 'third-party library executing thread' (TPLET). Other threads would write a message to the EWT. If the TPLET was executing the third-party library, the EWT would wait until it was done, and only then write any queued messages. (While that would 'work', it is hard work.)
One way around this would be to have the error reporting functions used by the general code (other than the third-party library code) write to fd2_copy rather than standard error per se. This would require a disciplined use of error reporting functions, but is a whole heap easier than an extra thread.
stderr is per process not per thread, so closing it will close for all threads.
If you want to skip particular messages, may be you can use grep -v.
On Linux it is possible to give the current thread its own private file descriptor table, using the unshare() function declared in <sched.h>:
unshare(CLONE_FILES);
After that call, you can call close(2); and it will affect only the current thread.
Note however that once the file descriptor table is unshared, you can't go back to sharing it again - it's a one-way operation. This is also Linux-specific, so it's not portable.
I'm writing a server web.
Each connection is served by a separate thread, so I don't know in advance the number of threads.
There are also a group of text files (don't know the number, too), and each thread can read/write on each file.
A file can be written by just one thread a time, but different threads can write on different files at the same time.
If a file is read by one or more threads (reads can be concurrent), no thread can write on THAT file.
Now, I noticed this (Thread safe multi-file writing) solution, but I'd like also to use functions as fgets(), for example.
So, can I flock() a file, and then use a fgets() or another stdio read/write library function?
First of all, use fcntl, not flock. The latter is a non-standard, deprecated BSD function and does not work with NFS and possibly other filesystems. fcntl locking on the other hand is POSIX standard and is intended to work everywhere.
Now if you want to use file-level reader-writer locking mixed with stdio, it will work, but you have to take some care to ensure that buffering does not break your assumptions about locks. The method I'm about to explain is not the only one, but I believe it's the clearest/simplest:
When you want to operate on one of your files with stdio, obtaining the correct type of lock (read or write, aka shared of exclusive) should be the first thing you do after fopen. Use fileno to get the file descriptor number and apply the lock to it. After that, perform your entire read or write operation. Do not make any attempt to unlock the file; instead, call fclose to close the file and let it be implicitly unlocked when it's closed. Otherwise you may release the lock while unbuffered data is still unwritten, or later read data that was buffered before the lock was released, that's no longer valid after the lock is released.