C - Does printf() cause thread to sleep? - c

When a program is doing I/O, my understanding is that the thread will briefly sleep and then resume (e.g. when writing to a file). My question is that when we do printing using printf(), does a C program thread sleep in any way ?

Since you've specifically asked for printf(), I'm going to assume that you mean in the most generic way where it will fill a reasonably sized buffer and invoke the system call write(2) to stdout and that the stdout happens to point to your terminal.
In most operating systems, when you invoke certain system calls the calling thread/process is removed from CPU runnable list and placed in a separate waiting list. This is true for all I/O calls like read/write/etc. Being temporarily removed from processing due to I/O is not the same as being put to sleep via a timer.
For example, in Linux there's uninterruptible sleep state of a thread/process specifically meant for I/O waiting, while interruptible sleep state for those thread/process that are waiting on timers and events. Though, from a dumb user's perspective they both seem to be same, their implementation behind the scenes are significantly different.
To answer your question, a call to printf() isn't exactly sleeping but waiting for the buffer to be flushed to device rather than actually being in sleep. Even then there are a few more quirks which you can read about it in signal(7) and even more about various process/thread states from Marek's blog.
Hope this helps.

Much of the point of stdio.h is that it buffers I/O: a call to printf will often simply put text into a memory buffer (owned by the library by default) and perform zero system calls, thus offering no opportunity to yield the CPU. Even when something like write(2) is called, the thread may continue running: the kernel can copy the data into kernel memory (from which it will be transferred to the disk later, e.g. by DMA) and return immediately.
Of course, even on a single-core system, most operating systems frequently interrupt the running thread in order to share it. So another thread can still run at any time, even if no blocking calls are made.

Related

System call: does Read function change process?

enter image description here
I learned that when a system call function is called, the process changes. But what is process B if I call the read function without the fork() function? isn't there is only one process?
On x86-64, there is one specific instruction to do system calls: syscall (https://www.felixcloutier.com/x86/syscall.html). When you call read() in C, it is compiled to placing the proper syscall number in a register along with the arguments you provide and to one syscall instruction. When syscall is executed, it jumps to the address stored in the IA32_LSTAR register. After that, it is in kernel mode executing the kernel's syscall handler.
At that point, it is still in the context of process A. Within its handler, the kernel realizes that you want to read from disk. It will thus start a DMA operation by writing some registers of the hard-disk controller. From there, process A is waiting for IO. There is no point in leaving the core idle so the kernel calls the scheduler and it will probably decide to switch the context of the core to another process B.
When the DMA IO operation is done, the hard-disk controller triggers an interrupt. The kernel thus puts process A back into the ready queue and calls the scheduler which will probably have the effect of switching the context of the core back to process A.
The image you provide isn't very clear so I can understand the confusion. Overall, on most architectures it will work similarly to what is stated above.
The image is somewhat misleading. What actually happens is, the read system call needs to wait for IO. There is nothing else that can be done in the context of process (or thread) A.
So kernel needs to find something else for the CPU to do. Usually there is some other process or processes which do have something to do (not waiting for a system call to return). It could also be another thread of process A that is given time to execute (from kernel point of view, thread and process aren't really much different, actually). There may be several processes which get to execute while process A waits for system call to complete, too.
And if there is nothing else for any other process and thread to do, then kernel will just be idle, let the CPU sleep for a bit, basically save power (especially important on a laptop).
So the image in the question shows just one possible situation.

Safe operations on PortAudio's PaStreamFinishedCallback

I'd like to know what operations are safe in PortAudio's PaStreamFinishedCallback. I know typically it is not a good idea to attempt operations that could block on the PaStreamCallback for playback as that could cause pops/glitches on the user's or other application's audio streams. Do the same limitations apply to the PaStreamFinishedCallback? I guess ultimately I'm curious if that callback is also called on the OS's audio thread.
Alternately, is there a function like Pa_StopStream that will block until the callback has returned paComplete/paAbort, but without inducing a stop? That'd actually be ideal for my use, since I have a thread that's the right place for me to clean up. I know I could achieve this by having my callback signal to my thread that it's done, and then the thread could call Pa_StopStream but that feels heavy handed.
edit: To give a bit more context about my use, I have a ring buffer that holds some PCM and uses a pthread condvar to signal when space is available in the buffer. One thread writes into this ring and then the PaStreamCallback reads out of the the other end. When things are finished, the writer sets a closed flag on the ring and then the callback drains whatever is left. I'd like to make sure my ring drains and that PortAudio flushes. The callback is the only place that knows when the ring drains, so returning paComplete feels appropriate. But then I need some way to know that it's ok to deallocate my ring.
The answer to this is that it depends highly on the host and the behavior may change over time even for one host. I went ahead and read the implementation, and I discovered a couple of useful pieces of information here.
Pa_StopStream will just invoke the host system's Stop()-like behavior. I didn't read all the implementations but presumably most have some sort of blocking Stop(). That means that it's unlikely that blocking for a stop, without actually asking for one, will be a supported behavior.
PaStreamFinishedCallback is also just a thin wrapper on the host's own stream stopped callback. For example, in OSX Core Audio this is a Listener on kAudioOutputUnitProperty_IsRunning. It's entirely up to the host how and when this is called. I think the smart play here is to be as cautious as possible -- assume no blocking operations are safe inside this callback.
So, if you're in the same situation as me where one thread feeds PCM into a ring buffer, and the PaStreamCallback reads from that ring, then you'll probably want to
Subscribe to PaStreamFinishedCallback
Producer thread closes ring buffer and lets PaStreamCallback drain it
Return paComplete from PaStreamCallback when the ring is drained
Signal the producer thread that work is done from PaStreamFinishedCallback, in my case using pthread_cond_signal
Producer thread wakes up and cleans up by deallocating
Even signaling (and locking mutexes) from the audio thread is probably best to avoid, but it's hard to imagine there's an alternative. For regular reading from PCM ring buffer, the PaStreamCallback should probably spin some limited number of times before giving up. For the completion signal, the producer thread should lock and then immediately wait, so that it holds the lock as little as possible.

How does a process know that semaphore is available

I have a very basic doubt.
when a process is waiting on a semaphore , it goes into sleep state.
So no way it can poll the semaphore value.
Does kernel poll the semaphore value and if available sends a signal to all process waiting for it ? If so, wont it be too much overhead for the kernel.
Or does the signal() call internally notifies all the process waiting for the semaphore.
Please let me know on this.
The operating system schedules the process once more when the operating system is told by another process that it has done with the semaphore.
Semaphores are just one of the ways of interacting with the OS scheduler.
The kernel doesn't poll the semaphore; it doesn't need to. Every time a process calls sem_post() (or equivalent), that involves interaction with the kernel. What the kernel does during the sem_post() is look up whatever processes have previously called sem_wait() on the same semaphore. If one or more processes have called sem_wait(), it picks the process with the highest priority and schedules it. This shows up as that sem_wait() finally returning and that process carries on executing.
How This is Implemented Under the Hood
Fundamentally the kernel needs to implement something called an "atomic test and set". That is an operation where by the value of some variable can be tested and, if a certain condition is met (such as the value == 0) the variable value is altered (e.g. value = 1). If this succeeds, the kernel will do one thing, (like schedule a process), if this does not (because the condition value==0 was false) the kernel will do something difference (like put a process on the do-not-schedule list). The 'atomic' part is that this decision is made without anything else being able to look at and change the same variable at the same time.
There's several ways of doing this. One is to suspend all processes (or at least all activity within the kernel) so that nothing else is testing the value of the variable at the same time. That's not very fast.
For example, the Linux kernel once had something called the Big Kernel Lock. I don't know if this was used to process semaphore interactions, but that's the kind of thing that OSes used to have for atomic test & sets.
These days CPUs have atomic test & set op codes, which is a lot faster. The good ole' Motorola 68000 had one of these a long time ago; it took CPUs like the PowerPC and the x86 many, many years to get the same kind of instruction.
If you root around inside linux you'll find mention of futexes. a futex is a fast mutex - it relies on a CPU's test/set instruction to implement a fast mutex semaphore.
Post a Semaphore in Hardware
A variation is a mailbox semaphore. This is a special variation on a semaphore that is extremely useful in some system types where hardware needs to wake up a process at the end of a DMA transfer. A mailbox is a special location in memory which when written to will cause an interrupt to be raised. This can be turned into a semaphore by the kernel because when that interrupt is raised, it goes through the same motions as it would had something called sem_post().
This is incredibly handy; a device can DMA a large amount of data to some pre-arranged buffer, and top that off with a small DMA transfer to the mail box. The kernel handles the interrupt, and if a process has previously called sem_wait() on the mailbox semaphore the kernel schedules it. The process, which also knows about this pre-arranged buffer, can then process the data.
On a real time DSP systems this is very useful, because it's very fast and very low latency; it allows a process to receive data from some device with very little delay. The alternative, to have a full up device driver stack that uses read() / write() to transfer data from the device to the process is incredibly slow by comparison.
Speed
The speed of semaphore interactions depends entirely on the OS.
For OSes like Windows and Linux, the context switch time is fairly slow (in the order of several microseconds, if not tens of microseconds). Basically this means that when a process calls something like sem_post(), the kernel is doing a lot of different things whilst it has the opportunity before finally returning control to the process(es). What it's doing during this time could be, well, almost anything!
If a program has made use of a lot threads, and they're all rapidly interacting between themselves using semaphores, quite a lot of time is lost to the sem_post() and sem_wait(). This places an emphasis on doing a decent amount of work once a process has returned from sem_wait() before calling the next sem_post().
However on OSes like VxWorks, the context switch time is lightning fast. That is there's very little code in the kernel that gets run when sem_post() is called. The result is that a semaphore interaction is a lot more efficient. Moreover, and OS like VxWorks is written in such a way so as to guarantee that the time take to do all this sem_post() / sem_wait() work is constant.
This influences the architecture of one's software on these systems. On VxWorks, where a context switch is cheap, there's very little penalty in having a large number of threads all doing quite small tasks. On Windows / Linux there's more of an emphasis on the opposite.
This is why OSes like VxWorks are excellent for hard real time applications, and Windows / Linux are not.
The Linux PREEMPT_RT patch set in part aims to improve the latency of the linux kernel during operations like this. For example, it pushes a lot of device interrupt handlers (device drivers) up into kernel threads; these are scheduled almost just like any other thread. The idea is to reduce the amount of work that is being done by the kernel (and have more done by kernel threads), so that the work it still has to do itself (such as handling sem_post() / sem_wait()) takes less time and is more consistent about how long this takes. It still not a hard guarantee of latency, but it's a pretty good improvement. This is what we call a soft-realtime kernel. The impact though is that overall throughput of the machine can be lower.
Signals
Signals are nasty, horrible things that really get in the way of using things like sem_post() and sem_wait(). I avoid them like the plague.
If you are on a Linux platform and you do have to use signals, take a serious long look at signalfd (man page). This is a far better way of dealing with signals because you can choose to accept them at a convenient time (simply by called read()), instead of having to handle them as soon as they occur. Certainly if you're using epoll() or select() anywhere at all in a program then signalfd is the way to go.

What is meant by "blocking system call"?

What is the meaning of "blocking system call"?
In my operating systems course, we are studying multithreaded programming. I'm unsure what is meant when I read in my textbook "it can allow another thread to run when a thread make a blocking system call"
A blocking system call is one that must wait until the action can be completed. read() would be a good example - if no input is ready, it'll sit there and wait until some is (provided you haven't set it to non-blocking, of course, in which case it wouldn't be a blocking system call). Obviously, while one thread is waiting on a blocking system call, another thread can be off doing something else.
For a blocking system call, the caller can't do anything until the system call returns. If the system call may be lengthy (e.g. involve file IO or networking IO) this can be a bad thing (e.g. imagine a frustrated user hammering a "Cancel" button in an application that doesn't respond because that thread is blocked waiting for a packet from the network that isn't arriving). To get around that problem (to do useful work while you wait for a blocking system call to return) you can use threads - while one thread is blocked the other thread/s can continue doing useful work.
The alternative is non-blocking system calls. In this case the system call returns (almost) immediately. For lengthy system calls the result of the system call is either sent to the caller later (e.g. as some sort of event or message or signal) or polled by the caller later. This allows you to have a single thread waiting for many different lengthy system calls to complete at the same time; and avoids the hassle of threads (and locking, race conditions, the overhead of thread switches, etc). However, it also increases the hassle involved with getting and handling the system call's results.
It is (almost always) possible to write a non-blocking wrapper around a blocking system call; where the wrapper spawns a thread and returns (almost) immediately, and the spawned thread does the blocking system call and either sends the system call's results to the original caller or stores them where the original caller can poll for them.
It is also (almost always) possible to write a blocking wrapper around a non-blocking system call; where the wrapper does the system call and waits for the results before it returns.
I would suggest having a read on this very short text:
http://files.mkgnu.net/files/upstare/UPSTARE_RELEASE_0-12-8/manual/html-multi/x755.html
In particular you can read there why blocking system calls can be a worry with threads, not just with concurrent processes:
This is particularly problematic for multi-threaded applications since
one thread blocking on a system call may indefinitely delay the update
of the code of another thread.
Hope it helps.
A blocking system call is a system call by means of which any process is requesting some service from the system but that service is not currently available. So that particular system call blocks the process.
If you want to make it clear in context with multi threading you can go through the link...

Strategy flushing file outputs at termination

I have an application that monitors a high-speed communication link and writes logs to a file (via standard C file IO). The response time to messages that arrive on the link is important, so I knowingly don't fflush the file at each message, because this slows down my response time.
However, in some circumstances my application is terminated "violently" (e.g. by killing the process), and in these cases the last few log messages are not written (even if the communication link has been quiet for some time).
What techniques/strategies can I use to make sure most of my data is flushed, but without giving up speed of response?
Edit: The application runs on Windows
Using a thread is the standard solution to this. Have your data collection code write data to a thread-safe queue and use a semaphore to signal the writing thread.
However, before you go there, double-check your assertion that fflush() would be slow. Most operating systems have a file system cache. It makes writes very fast, as simple memory-to-memory block copy. The data gets written to disk lazily, your crash won't affect it.
If you are on Unix or Linux, your process would receive some termination signal which you can catch (except SIGKILL) and fflush() in your signal handler.
For signal catching see man sigaction.
EDIT: No idea about Windows.
I would suggest an asynchronous write-though. That way you don't need to wait for the write IOP to happen, nor will the OS will delay the IOP. See CreateFile() flags FILE_FLAG_WRITE_THROUGH | FILE_FLAG_OVERLAPPED.
You don't need FILE_FLAG_NO_BUFFERING. That's only to skip the OS cache. You would only need it if you are worried about the entire OS dying violently.
If your program terminates by calling exit() or returning from main(), the C standard guarantees that open streams are flushed and closed, so no special handling is needed. It sounds from your description like this is what is happening: if your program died due to a signal, you wouldn't see the flush.
I'm having trouble understanding what the problem is exactly.
If it's just that you're trying to find a happy medium between flushing often and the default fully buffered output, then maybe line buffering is what you want:
setvbuf(stream, 0, _IOLBF, 0);

Resources