I am working on an audio player written in C that uses libasound (ALSA) as the audio back-end. I have implemented a callback mechanism that allows the audio player to send audio to ALSA in a timely manner. I configured my pcm_handle to internally hold a buffer with 500ms (= buffer_time) worth of audio data. By using poll() on ALSA's poll descriptors the program is notified to add more data to the buffer. I also poll() on my own custom poll descriptor in order to notify the loop when to stop/pause.
I want to be able to pause the audio output (just like pausing a song) and thus I must pause the pcm handle, i.e. tell ALSA to stop sending audio frames from the internal buffer to the soundcard. One could use the snd_pcm_pause() function, however, as the documentation shows, this function does not work on all hardware. My audio player is targeted towards embedded devices so I want to support all hardware, as a result I prefer not to use that function.
Alternatively I could use the snd_pcm_drain() function, which will pause the pcm handle after all the pending audio samples in the 500ms buffer have been played. This will however, result in a latency of up to 500ms. It would be possible to minimize this latency by decreasing the buffer size, but this will never eliminate the problem and would increase the chances for an underrun.
Another option is to use the snd_pcm_drop() function, this function will pause the pcm handle and discard any pending audio samples in the buffer. This solves the latency problem but the result is that when that pcm_handle is restarted, some audio samples are lost, which makes this option also not ideal.
I was personally thinking about using the snd_pcm_drop() function. To solve the lost samples problem I am looking for a way to retrieve all the pending samples in the ALSA buffer so that I can play them as soon as the pcm handle is started again. I tried using snd_pcm_readi() just before calling snd_pcm_drop() in order to retrieve the pending samples but this gave me segmentation faults. I believe ALSA does not allow the use of this function on SND_PCM_STREAM_PLAYBACK handles.
So is there another option to pause the pcm handle without latency and without losing pending audio frames? If not, as suggested in my solution, is there a way to retrieve those pending frames from ALSA's internal buffer?
My implementation strongly resembles the "write and wait transfer method" shown here. The following pseudo code gives a simplified version of my implementation (without a solution for the current problem):
write_and_poll_loop(...) {
while(1) {
poll(...) ; /** Here we wait for room in the buffer or for a pause command to come in */
if(ALSA_buffer_has_room) {
customAudioCallback(); /** In this callback audio samples are written to the ALSA buffer */
}
else if (pause_command) {
snd_pcm_drop(); /** Discard all samples in the ALSA buffer */
snd_pcm_perpare(); /** Prepare pcm_handle for future playback */
block_until_play_command() /** Block until we want to play audio again */
}
}
}
EDIT: I realized that by using the snd_pcm_hw_params_can_pause() I can check whether the hardware supports pausing, if it cannot, I fall back to the snd_pcm_drop() method and just pay the price of losing samples. Nevertheless, I would love to see a solution that is independent of the hardware.
Related
I have an imx8 module running Linux on my PCB and i would like some tips or pointers on how to modify the UART driver to allow me to be able to detect the end of frame very quickly (less than 2ms) from my user space C application. The UART frame does not have any specific ending character or frame length. The standard VTIME of 100ms is much too long
I am reading from a Sim card, i have no control over the data, no control over the size or content of the data. I just need to detect the end of frame very quickly. The frame could be 3 bytes or 500. The SIM card reacts to data that it receives, typically I send it a couple of bytes and then it will respond a couple of ms later with an uninterrupted string of bytes of unknown length. I am using an iMX8MP
I thought about using the IDLE interrupt to detect the frame end. Turn it on when any byte is received and off once the idle interrupt fires. How can I propagate this signal back to user space? Or is there an existing method to do this?
Waiting for an "idle" is a poor way to do this.
Use termios to set raw mode with VTIME of 0 and VMIN of 1. This will allow the userspace app to get control as soon as a single byte arrives. See:
How to read serial with interrupt serial?
How do I use termios.h to configure a serial port to pass raw bytes?
How to open a tty device in noncanonical mode on Linux using .NET Core
But, you need a "protocol" of sorts, so you can know how much to read to get a complete packet. You prefix all data with a struct that has (e.g.) A type and a payload length. Then, you send "payload length" bytes. The receiver gets/reads that fixed length struct and then reads the payload which is "payload length" bytes long. This struct is always sent (in both directions).
See my answer: thread function doesn't terminate until Enter is pressed for a working example.
What you have/need is similar to doing socket programming using a stream socket except that the lower level is the UART rather than an actual socket.
My example code uses sockets, but if you change the low level to open your uart in raw mode (as above), it will be very similar.
UPDATE:
How quickly after the frame finished would i have the data at the application level? When I try to read my random length frames currently reading in 512 byte chunks, it will sometimes read all the frame in one go, other times it reads the frame broken up into chunks. –
Engo
In my link, in the last code block, there is an xrecv function. It shows how to read partial data that comes in chunks.
That is what you'll need to do.
Things missing from your post:
You didn't post which imx8 board/configuration you have. And, which SIM card you have (the protocols are card specific).
And, you didn't post your other code [or any code] that drives the device and illustrates the problem.
How much time must pass without receiving a byte before the [uart] device is "idle"? That is, (e.g.) the device sends 100 bytes and is then finished. How many byte times does one wait before considering the device to be "idle"?
What speed is the UART running at?
A thorough description of the device, its capabilities, and how you intend to use it.
A uart device doesn't have an "idle" interrupt. From some imx8 docs, the DMA device may have an "idle" interrupt and the uart can be driven by the DMA controller.
But, I looked at some of the linux kernel imx8 device drivers, and, AFAICT, the idle interrupt isn't supported.
I need to read everything in one go and get this data within a few hundred microseconds.
Based on the scheduling granularity, it may not be possible to guarantee that a process runs in a given amount of time.
It is possible to help this a bit. You can change the process to use the R/T scheduler (e.g. SCHED_FIFO). Also, you can use sched_setaffinity to lock the process to a given CPU core. There is a corresponding call to lock IRQ interrupts to a given CPU core.
I assume that the SIM card acts like a [passive] device (like a disk). That is, you send it a command, and it sends back a response or does a transfer.
Based on what command you give it, you should know how many bytes it will send back. Or, it should tell you how many optional bytes it will send (similar to the struct in my link).
The method you've described (e.g.) wait for idle, then "race" to get/process the data [for which you don't know the length] is fraught with problems.
Even if you could get it to work, it will be unreliable. At some point, system activity will be just high enough to delay wakeup of your process and you'll miss the window.
If you're reading data, why must you process the data within a fixed period of time (e.g. 100 us)? What happens if you don't? Does the device catch fire?
Without more specific information, there are probably other ways to do this.
I've programmed such systems before that relied on data races. They were unreliable. Either missing data. Or, for some motor control applications, device lockup. The remedy was to redesign things so that there was some positive/definitive way to communicate that was tolerant of delays.
Otherwise, I think you've "fallen in love" with "idle interrupt" idea, making this an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
I've tried multiple example programs that appear to have code to handle xruns under playback:
https://albertlockett.wordpress.com/2013/11/06/creating-digital-audio-with-alsa/
https://www.linuxjournal.com/article/6735 (listing 3)
When using snd_pcm_writei(), it appears that when the return value is -EPIPE (which is xrun/underrun), they do:
if (rc == -EPIPE) {
/* EPIPE means underrun */
fprintf(stderr, "underrun occurred\n");
snd_pcm_prepare(handle);
}
I.e. call snd_pcm_prepare() on the handle.
However, I still get stuttering when I attempt to run programs like these. Typically, I will get at least a few, to maybe half a dozen xrun, and then it will play smoothly and continuously without further xruns. However, if I have something else using the sound card, such as Firefox, I will get many more xruns, and sometimes only xruns. But again, even if I kill any other program that uses the sound card, I still experience the issue with some initial xruns and actual stuttering on the speakers.
This is not acceptable for me, how can I modify this type of xrun handling to prevent stuttering?
My own attempt at figuring this out:
From the ALSA API, I see that snd_pcm_prepare() does:
Prepare PCM for use.
This is not very helpful to an ALSA beginner like myself. It is not explained how this can be used to recover xrun issues.
I also note, from: https://www.alsa-project.org/alsa-doc/alsa-lib/pcm.html
SND_PCM_STATE_XRUN
The PCM device reached overrun (capture) or underrun (playback). You can use the -EPIPE return code from I/O functions
(snd_pcm_writei(), snd_pcm_writen(), snd_pcm_readi(), snd_pcm_readn())
to determine this state without checking the actual state via
snd_pcm_state() call. It is recommended to use the helper function
snd_pcm_recover() to recover from this state, but you can also use
snd_pcm_prepare(), snd_pcm_drop() or snd_pcm_drain() calls.
Again, it is not clear to me. I can use snd_pcm_prepare() OR I can use these other calls? What is the difference? What should I use?
The best way to handle underruns is to avoid handling them by preventing them. This can be done by writing samples early enough before the buffers is empty. To do this,
reorganize your program so that the new samples are already available to be written when you need to call snd_pcm_write*(), and/or
increase the priority of your process/thread (if possible; this is probably not helpful if other programs interfere with your disk OI/O), and/or
increase the buffer size (this also increases the latency).
When an underrun happens, you have to decide what should happen with the samples that should have been played but were not written to the buffer at the correct time.
To play these samples later (i.e., to move all following samples to a later time), configure the device so that an underrun stops it. (This is the default setting.) Your program has to restart the device when it has new samples.
To continue with the following samples at the same time as if the missing samples had actually be played, configure the device so that an underrun does not stop it. This can be done by setting the stop threshold¹ to the boundary value². (Other errors, like unplugging a USB device, will still stop the device.)
When an underrun does happen, the device will play those samples that happen to be in the ring buffer. By default, these are the old samples from some time ago, which will not sound correct. To play silence instead (which will not sound correct either, but in a different way), tell the device to clear each part of the buffer immediately after it has been played by setting the silence threshold³ to zero and the silence size⁴ to the boundary value.
To (try to) reinitialize a device after an error (an xrun or some other error), you could call either snd_pcm_prepare() or snd_pcm_recover(). The latter calls the former, and also handles a suspended device (by waiting for it to be resumed).
¹stop threshold: snd_pcm_sw_params_set_stop_threshold()
²boundary value: snd_pcm_sw_params_get_boundary()
³silence threshold: snd_pcm_sw_params_set_silence_threshold()
⁴silence size: snd_pcm_sw_params_set_silence_size()
I'm using libao (ao_play) to play some buffers. I listen the keyboard keys and for each key I have a wav sound to play. It's simple.
With ao_play I see that the application blocks while is playing the sound. Because I want to play multiple audios at same time, I needed to use threads (with pthread lib).
It works, but I fell like a workaround and if I play to much files (maybe 10 or something like this) so everything stuck for some seconds and so come back.
Well, my question is: how to play multiple sounds at same time non-blocking using libao (and not using threads)?
This not a real design, more like a guess.
First of all, you'll need threads because it's a good old tradition to separate computations from visualisations, or audializations in this case. You'll need an audio thread that renders the stream and sends it to the output.
So, each time your main thread discovers a keypress, it sends a note to the audio thread. That latter captures an event and adds a wave to the currently played stream. The stream is rendered in frames (64, or 1024, or 10240 samples, or whatever you fancy your latency, if the wave itself is a simple mix of few possible samples, it can be notably realtime.) You should keep track of notes currently played, position per each sample. If latency is low, thus granularity high, you can even align sample edges by buffer edges, which would notably simplify rendering.
And after current buffer is rendered you simply send it to DAC and proceed with the next frame.
A quick glance at libao's help page does not reveal any mixing capabilities, so you'll need to create a simple mixer on your own, or you may actually need an existing solution, some simple opensource audio rendering library.
I'm trying to use Audio Queue Services to play mp3 audio that is being delivered from an external process. I am using NSTask and NSOutputHandle to get the output from the command - that part works fine. I'm using Audio File Stream Services to parse the data from the command - that seems to work as well. In my Audio File Stream Services Listener function, I'm not sure what to do with the packets that come in. It would be great if I could just throw them at the audio queue but apparently it doesn't work that way. You're supposed to define a series of buffers and enqueue them on the audio queue. Can the buffers correspond to the packets or do I have to somehow convert them? I'm not very good at C or pointer math so the idea of converting arbitrary-sized packets to non-matching-sized buffers it kind of scary to me. I've read the Apple docs many times but it only covers reading from a file, which seems to skip this whole packet/buffer conversion step.
You should be able to configure the AudioQueue such that the buffer sizes match your packet sizes. Additionally, the AudioQueue will do the job of decoding the mp3 - you shouldn't need to do any of your own conversions.
Use the inBufferByteSize parameter to configure the buffer size:
OSStatus AudioQueueAllocateBuffer (
AudioQueueRef inAQ,
UInt32 inBufferByteSize,
AudioQueueBufferRef *outBuffer
);
If your packets are all different sizes, you can use AudioQueueAllocateBuffer to allocate each buffer with that custom size before filling it, and free it instead of re-queueing it after use by the audio queue callback.
For less memory management (which impacts performance), if you know the max packet size, you can allocate a buffer that big, and then only partially fill that buffer (after checking the packet size to make sure it fits). There's a parameter, mAudioDataByteSize, for the amount with which each buffer is actually filled.
I'm making small library for controlling various embedded devices using C language. I'm using UDP sockets to communicate with each of the devices. Devices send me various interesting data, alarms and notifications and at the same time they send some data that is used internally by the library but may not be interesting to users. So, I've implemented a callback approach, where user could register a callback function with some interesting events on each of the devices. Right now, overall design of this library is something like this:-
I've two threads running.
In one of the thread, there is a infinite while event-loop that uses select and non-blocking sockets to maintain the communication with each of the devices.
Basically, every time I receive a packet from any of devices, I strip off the header which is 20 bytes of some useless information and add my own header containing DEVICE_ID, REQUES_TIME (time request was sent to retrieve that packet and RETRIEVAL_TIME (time now when packet actually arrived) and REQUEST_ID and REQUEST_TYPE (alarm, data, notification etc..).
Now, this thread (one with infinite loop) put packet with new header into ring buffer and then notifies other thread (thread #2) to parse this information.
In thread #2, when notification is received, it locks the buffer and read pop the packet and start parsing it.
Every message contains some information that user may not be interested, so I'm providing user call back approach to act upon data which is useful to user.
Basically, I'm doing something like this in thread 2:-
THREAD #2
wait(data_put_in_buffer_cond)
lock(buffer_mutex)
packet_t* packet = pop_packet_from_buffer(buf);
unlock(buffer_mutex)
/* parsing the package... */
parsed_packet_t* parsed_packet = parse_and_change_endianess(packet->data);
/* header for put by thread #1 with host byte order only not parsing necessary */
header_t* header = get_header(packet);
/* thread 1 sets free callback for kind of packet it puts in buffer
* This not a critical section section of buffer, so fine without locks
*/
buffer.free_callback(packet);
foreach attribute in parsed_packet->attribute_list {
register_info_t* rinfo = USER_REGISTRED_EVENT_TABLE[header->device_id][attribute.attr_id];
/*user is register with this attribute ID on this device ID */
if(rinfo != NULL) {
rinof->callback(packet);
}
// Do some other stuff with this attribute..
}
free(parsed_packet);
Now, my concerned is that what will happen if callback function that user implements takes some time to complete and meanwhile I may drop some packet because ring buffer is in overwriting mode? I've tested my API for 3 to 4 devices, I don't see much drop event if callback function wait decent amount of time..I'm speculating that this approach may not be best.
Would it be a better design, if I use some sort of thread-pool to run user callback functions? In that case I would need to make explicit copy of packet before I send it to user callback? Each packet is about 500 to 700 bytes, I get around 2 packets per second from each device. Any suggestions or comments on improving the current design or solving this issues would be appreciated.
Getting 500-700 bytes per device is not a problem at all, especially if you only have 3-4 devices. Even if you had, let's say, 100 devices, it should not be a problem. The copy overhead would be most probably negligible. So, my suggest would be: do not try to optimize beforehand until you are certain that buffer copying is your bottleneck.
About losing packets, as you say in your question, you are already using a buffer ring (I assume that is something like a circular queue, right?). If the queue becomes full, then you just need to make thread #1 to wait until there is some available space in the queue. Clearly, more events from different devices may arrive, but that should not be a problem. Once, you have space again, select will let you know that you have available data from different devices, so you will just need to process all that data. Of course, in order to have a balanced system, you can set the size of the queue to a value that reduces as much as possible the number of times that the queue is full, and thus, thread #1 needs to wait.