I'd like to know what operations are safe in PortAudio's PaStreamFinishedCallback. I know typically it is not a good idea to attempt operations that could block on the PaStreamCallback for playback as that could cause pops/glitches on the user's or other application's audio streams. Do the same limitations apply to the PaStreamFinishedCallback? I guess ultimately I'm curious if that callback is also called on the OS's audio thread.
Alternately, is there a function like Pa_StopStream that will block until the callback has returned paComplete/paAbort, but without inducing a stop? That'd actually be ideal for my use, since I have a thread that's the right place for me to clean up. I know I could achieve this by having my callback signal to my thread that it's done, and then the thread could call Pa_StopStream but that feels heavy handed.
edit: To give a bit more context about my use, I have a ring buffer that holds some PCM and uses a pthread condvar to signal when space is available in the buffer. One thread writes into this ring and then the PaStreamCallback reads out of the the other end. When things are finished, the writer sets a closed flag on the ring and then the callback drains whatever is left. I'd like to make sure my ring drains and that PortAudio flushes. The callback is the only place that knows when the ring drains, so returning paComplete feels appropriate. But then I need some way to know that it's ok to deallocate my ring.
The answer to this is that it depends highly on the host and the behavior may change over time even for one host. I went ahead and read the implementation, and I discovered a couple of useful pieces of information here.
Pa_StopStream will just invoke the host system's Stop()-like behavior. I didn't read all the implementations but presumably most have some sort of blocking Stop(). That means that it's unlikely that blocking for a stop, without actually asking for one, will be a supported behavior.
PaStreamFinishedCallback is also just a thin wrapper on the host's own stream stopped callback. For example, in OSX Core Audio this is a Listener on kAudioOutputUnitProperty_IsRunning. It's entirely up to the host how and when this is called. I think the smart play here is to be as cautious as possible -- assume no blocking operations are safe inside this callback.
So, if you're in the same situation as me where one thread feeds PCM into a ring buffer, and the PaStreamCallback reads from that ring, then you'll probably want to
Subscribe to PaStreamFinishedCallback
Producer thread closes ring buffer and lets PaStreamCallback drain it
Return paComplete from PaStreamCallback when the ring is drained
Signal the producer thread that work is done from PaStreamFinishedCallback, in my case using pthread_cond_signal
Producer thread wakes up and cleans up by deallocating
Even signaling (and locking mutexes) from the audio thread is probably best to avoid, but it's hard to imagine there's an alternative. For regular reading from PCM ring buffer, the PaStreamCallback should probably spin some limited number of times before giving up. For the completion signal, the producer thread should lock and then immediately wait, so that it holds the lock as little as possible.
Related
When a program is doing I/O, my understanding is that the thread will briefly sleep and then resume (e.g. when writing to a file). My question is that when we do printing using printf(), does a C program thread sleep in any way ?
Since you've specifically asked for printf(), I'm going to assume that you mean in the most generic way where it will fill a reasonably sized buffer and invoke the system call write(2) to stdout and that the stdout happens to point to your terminal.
In most operating systems, when you invoke certain system calls the calling thread/process is removed from CPU runnable list and placed in a separate waiting list. This is true for all I/O calls like read/write/etc. Being temporarily removed from processing due to I/O is not the same as being put to sleep via a timer.
For example, in Linux there's uninterruptible sleep state of a thread/process specifically meant for I/O waiting, while interruptible sleep state for those thread/process that are waiting on timers and events. Though, from a dumb user's perspective they both seem to be same, their implementation behind the scenes are significantly different.
To answer your question, a call to printf() isn't exactly sleeping but waiting for the buffer to be flushed to device rather than actually being in sleep. Even then there are a few more quirks which you can read about it in signal(7) and even more about various process/thread states from Marek's blog.
Hope this helps.
Much of the point of stdio.h is that it buffers I/O: a call to printf will often simply put text into a memory buffer (owned by the library by default) and perform zero system calls, thus offering no opportunity to yield the CPU. Even when something like write(2) is called, the thread may continue running: the kernel can copy the data into kernel memory (from which it will be transferred to the disk later, e.g. by DMA) and return immediately.
Of course, even on a single-core system, most operating systems frequently interrupt the running thread in order to share it. So another thread can still run at any time, even if no blocking calls are made.
From epoll's man page:
epoll is a variant of poll(2) that can be used either as an edge-triggered
or a level-triggered interface
When would one use the edge triggered option? The man page gives an example that uses it, but I don't see why it is necessary in the example.
When an FD becomes read or write ready, you might not necessarily want to read (or write) all the data immediately.
Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN (so it's more complicated to code around, but can be more efficient depending on what you need to do).
Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing. If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.
If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can. Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.
The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".
In my experiments, ET doesn't guarantee that only one thread wakes up, although it often wakes up only one. The EPOLLONESHOT flag is for this purpose.
Level triggered
Use level trigger mode when you can't consume all the data in the FD and want epoll to keep triggering while data is available.
For example, if you want to receive large files from FD, and you cannot consume all the file data from the FD at one time, and want to keep the triggering continue for the next consumption. The level trigger mode could be suitable for this case.
Disadvantage
thundering herd
The EPOLLEXCLUSIVE directive is meant to prevent the thundering heard phenomenon
less efficiency
When a read/write event occurs on the monitored file descriptor, epoll_wait() notifies the handler to read or write. If you don’t read or write all the data at once (e.g., the read/write buffer is too small), then the next time epoll_wait() is called, it will notify you to continue reading or writing on the file descriptor you didn’t finish reading or writing on, but of course, if you never read or write, it will keep notifying you.
If the system has a large number of ready file descriptors that you don’t need to read or write, and they return every time, this can greatly reduce the efficiency of the handler retrieving the ready file descriptors it cares about.
use cases
redis epoll Since the IO thread of Redis is single-threaded, level trigger mode is used.
Edge triggered
Use edge triggered mode and make sure all data available is buffered and will be handled eventually.
As Chris Dodd mentioned in the comments
ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same FD. When data comes in on an FD, exactly one thread will be woken to handle it
use cases
nginx epoll model
golang netpoll
There is a way to serialize the C write() so that I can write bytes on a socket, shared between k-threads, with no data-loss? I imagine that a solution to this problem includes user-space locking, and what about scalability? Thank you in advance.
I think the right answer depends on whether your threads need to synchronously wait for a response or not. If they just need to write some message to a socket and not wait for the peer to respond, I think the best answer is to have a single thread that is dedicated to writing messages from a queue that the other threads place messages on. That way, the worker threads can simply place their messages on the queue and get on with doing something else.
Of course, the queue has to be protected by a mutex but any one thread only has to hold the lock for as long as it is manipulating the queue (guaranteed to be quite a short time). The more obvious alternative of letting every thread write directly to the socket requires each thread to hold the lock for as long as it takes the write operation to complete. This will always be much longer than just adding an item to a queue since write is a system call and potentially, it could block for a long period.
Even if your threads need a response to their messages, it may still pay to do something similar. Your socket servicing thread becomes more complex because you'll have to do something like select() on the socket for reads and writes to stop it from blocking and you'll also need a way to match up messages to responses and a way to inform the threads when their responses have arrived.
Since POSIX does not seem to specify atomicity guarantees on send(2), you will likely have to use a mutex. Scalability of course goes down the drain with this sort of serialization.
One possible approach would be to use the locking mechanism. Every thread should wait for a lock before writing any thing on the socket and should release the lock, once it is done.
If all of your threads are sending exactly the same kind of messages, the receiver end would not have any problem in reading the data, but if different threads can send different kind of data with possible different info, you should have an unique message id associated with each kind of data and its better to send the thread id as well (although not necessary, but might help you in debugging small issues).
You can have a structure like:
typedef struct my_socket_data_st
{
int msg_id;
#ifdef __debug_build__
int thread_id;
#endif
size_t data_size_in_bytes;
.... Followed by your data ....
} my_socket_data_t
Scalability depends on a lot things including the hardware resources on which your application would be running. Since it is a network application, you will have to think about the network bandwidth as well. Although there is no (there are a few, but I think you can ignore them for now for your application) limitation from OS on sending/receiving data over a socket, but you will have to consider about making the send synchronous or asynchronous based on your requirement. Also since, you are taking a lock, you will have to think about lock congestion as well. If the lock is not available easily for other threads, that will degrade the performance by a huge factor.
I'm trying to understand how asynchronous file operations being emulated using threads. I've found next-to-nothing materials to read about the subject.
Is it possible that:
a process uses a thread to open a regular file (HDD).
the parent gets the file descriptor from the thread, now it may close the thread.
the parent uses the file descriptor with a new thread, reading X bytes from the file.
the parent gets the file descriptor with the seek-position of the current file state.
the parent may repeat these operations, without the need to open, or seek, every time it wishes to "continue" reading a new chunk of the file?
This is just a wild guess of mine, would appreciate if anybody mind to shed more light to clarify how it's being emulated efficiently.
UPDATE:
By efficient I actually mean that I don't want the thread to "wait" since the moment the file been opened. Think of a HTTP non-blocking daemon which serves a client with a huge file, you want to use the thread to read chunks of the file without blocking the daemon - but you don't want to keep the thread busy while "waiting" for the actual transfer to take place, you want to use the thread for other blocking operations of other clients.
To understand asynchronous I/O better, it may be helpful to think in terms of overlapping operation. That is, the number of pending operations (operations that have been started but not yet completed) can simutaneously go above one.
A diagram that explains asynchronous I/O might look like this: http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx
If you are using the asynchronous I/O capabilities provided by the underlying Operating System, then it is possible to asynchronously read from multiple files without spawning a equal number of threads.
If your underlying Operating System does not provide asynchronous I/O, or if you decide not to use it, in other words, you wish to emulate asynchronous operation by only using blocking I/O (the regular Read/Write provided by the Operating System) then it is necessary to spawn as many threads as the number of simutaneous I/O operations. This is because when a thread is making a function call to blocking I/O, the thread cannot continue its execution until the operation finishes. In order to start another blocking I/O operation, that operation has to be issued from another thread that is not already occupied.
When you open/create a file fire up a thread. Now store that thread id/ptr as your file handle.
Basically the thread will do nothing except sit in a loop waiting for an "event". A semaphore would be good here. When you want to do a read then you add the read command to a queue (remember to critical section the stack add), return a unique id, and then you increment the semaphore. If the thread is asleep it will now wake up and grab the first message off the queue and process it. When it has completed you remove the command from the queue.
To poll if a file read has completed you can, simply, check to see if its in the command queue. If its not there then the command has completed.
Furthermore if you want to allow synchronous reads as well then you can wait after sending the message through for an "event" to get triggered by the completion. You then check to see if the unique id is the queue and if it isn't you return control. If it still is then you go back to a wait state until the relevant unique id has been processed.
I have a worker thread that is listening to a TCP socket for incoming traffic, and buffering the received data for the main thread to access (let's call this socket A). However, the worker thread also has to do some regular operations (say, once per second), even if there is no data coming in. Therefore, I use select() with a timeout, so that I don't need to keep polling. (Note that calling receive() on a non-blocking socket and then sleeping for a second is not good: the incoming data should be immediately available for the main thread, even though the main thread might not always be able to process it right away, hence the need for buffering.)
Now, I also need to be able to signal the worker thread to do some other stuff immediately; from the main thread, I need to make the worker thread's select() return right away. For now, I have solved this as follows (approach basically adopted from here and here):
At program startup, the worker thread creates for this purpose an additional socket of the datagram (UDP) type, and binds it to some random port (let's call this socket B). Likewise, the main thread creates a datagram socket for sending. In its call to select(), the worker thread now lists both A and B in the fd_set. When the main thread needs to signal, it sendto()'s a couple of bytes to the corresponding port on localhost. Back in the worker thread, if B remains in the fd_set after select() returns, then recvfrom() is called and the bytes received are simply ignored.
This seems to work very well, but I can't say I like the solution, mainly as it requires binding an extra port for B, and also because it adds several additional socket API calls which may fail I guess – and I don't really feel like figuring out the appropriate action for each of the cases.
I think ideally, I would like to call some function which takes A as input, and does nothing except makes select() return right away. However, I don't know such a function. (I guess I could for example shutdown() the socket, but the side effects are not really acceptable :)
If this is not possible, the second best option would be creating a B which is much dummier than a real UDP socket, and doesn't really require allocating any limited resources (beyond a reasonable amount of memory). I guess Unix domain sockets would do exactly this, but: the solution should not be much less cross-platform than what I currently have, though some moderate amount of #ifdef stuff is fine. (I am targeting mainly for Windows and Linux – and writing C++ by the way.)
Please don't suggest refactoring to get rid of the two separate threads. This design is necessary because the main thread may be blocked for extended periods (e.g., doing some intensive computation – and I can't start periodically calling receive() from the innermost loop of calculation), and in the meanwhile, someone needs to buffer the incoming data (and due to reasons beyond what I can control, it cannot be the sender).
Now that I was writing this, I realized that someone is definitely going to reply simply "Boost.Asio", so I just had my first look at it... Couldn't find an obvious solution, though. Do note that I also cannot (easily) affect how socket A is created, but I should be able to let other objects wrap it, if necessary.
You are almost there. Use a "self-pipe" trick. Open a pipe, add it to your select() read and write fd_set, write to it from main thread to unblock a worker thread. It is portable across POSIX systems.
I have seen a variant of similar technique for Windows in one system (in fact used together with the method above, separated by #ifdef WIN32). Unblocking can be achieved by adding a dummy (unbound) datagram socket to fd_set and then closing it. The downside is that, of course, you have to re-open it every time.
However, in the aforementioned system, both of these methods are used rather sparingly, and for unexpected events (e.g., signals, termination requests). Preferred method is still a variable timeout to select(), depending on how soon something is scheduled for a worker thread.
Using a pipe rather than socket is a bit cleaner, as there is no possibility for another process to get hold of it and mess things up.
Using a UDP socket definitely creates the potential for stray packets to come in and interfere.
An anonymous pipe will never be available to any other process (unless you give it to it).
You could also use signals, but in a multithreaded program you'll want to make sure that all threads except for the one you want have that signal masked.
On unix it will be straightforward with using a pipe. If you are on windows and want to keep using the select statement to keep your code compatible with unix, the trick to create an unbound UDP socket and close it, works well and easy. But you have to make it multi-threadsafe.
The only way I found to make this multi-threadsafe is to close and recreate the socket in the same thread as the select statement is running. Of course this is difficult if the thread is blocking on the select. And then comes in the windows call QueueUserAPC. When windows is blocking in the select statement, the thread can handle Asynchronous Procedure Calls. You can schedule this from a different thread using QueueUserAPC. Windows interrupts the select, executes your function in the same thread, and continues with the select statement. You can now in your APC method close the socket and recreate it. Guaranteed thread safe and you will never loose a signal.
To be simple:
a global var saves the socket handle, then close the global socket, the select() will return immediately: closesocket(g_socket);