I'm trying to generate short tones of variable lengths through ALSA in a small C program. A few examples I've tried work fine when playing one second worth of sound but anything shorter than that just doesn't produce any sound at all.
I'm filling a buffer with a sine wave like so:
#define BUFFER_LEN 44100
int freq = 700; //audio frequency
int fs = 44100; //sampling frequency
float buffer [BUFFER_LEN];
for (k=0; k<BUFFER_LEN; k++) {
buffer[k] = (sin(2*M_PI*freq/fs*k));
}
Setting the pcm device parameters:
if ((err = snd_pcm_set_params(handle,
SND_PCM_FORMAT_FLOAT,
SND_PCM_ACCESS_RW_INTERLEAVED,
1,
44100,
1,
500000)) < 0) {
printf("Playback open error: %s\n", snd_strerror(err));
exit(EXIT_FAILURE);
}
Playback:
frames = snd_pcm_writei(handle, buffer, BUFFER_LEN);
But if I want to change it to play the tone, say a quarter of a second (i.e. changing BUFFER_LEN to 11025) nothing comes out of the speaker anymore.
I've tried changing types from floats to shorts, setting the PCM_FORMAT to other values and trying different ways of filling the buffer with sine waves.
If anything, I hear a little 'bip' sound in the speakers but just not what I expect. The program doesn't segfaults or crash but I'm just puzzled how to make ALSA play shorter samples.
I don't know if I need to work with some exact frame or buffer size multiple but I'm very open to suggestions.
Your application writing samples to the buffer, and the hardware reading samples from the buffer and playing them are two different processes that run asynchronously.
If there is not enough free space in the buffer for the amount of samples you're trying to write, then snd_pcm_writei() will wait until enough space is available. But when snd_pcm_writei() returns, up to a full buffer of samples might not yet have been played.
To wait until the buffer is empty, use snd_pcm_drain().
Related
I have been trying to output video (from my webcam) simultaneously to both a file ('out.mkv') and pipe:
The file gets filtered frames, and the pipe: gets unfiltered rawvideo.
My frame rate is 30 fps. However, I am getting a far lower framerate in my file output.
Attached is the while loop which reads packets and writes them to output:
while (1) {
av_read_frame(ifmt_ctx, packet);
stream_index = packet->stream_index;
StreamContext *stream = &file_stream_ctx[stream_index];
av_packet_rescale_ts(packet,
ifmt_ctx->streams[stream_index]->time_base,
stream->dec_ctx->time_base);
avcodec_send_packet(stream->dec_ctx, packet);
while (ret >= 0) {
avcodec_receive_frame(stream->dec_ctx, stream->dec_frame);
stream->dec_frame->pts = stream->dec_frame->best_effort_timestamp;
ret = filter_encode_write_frame(stream->dec_frame, stream_index,file_stream_ctx,file_filter_ctx, file_ofmt_ctx);
ret = av_interleaved_write_frame(pipe_ofmt_ctx, packet);
}
}
'ifmt_ctx' is the AVFormatContext for the webcam.
'file_ofmt_ctx', is the AVFormatContext for the pipe for the output file, and pipe_ofmt_ctx is the AVFormatContext.
'file_stream_ctx' and 'file_filter_ctx' are the stream and filter contexts used for filtering and encoding the file output.
My guess is that writing to the pipe is taking too long and not allowing the next packet to be read on time - causing a lower frame rate. Does that make sense? If so - any suggestions on how to fix it? (I tried using AVFMT_FLAG_NONBLOCK but it doesn't seem to help).
Thanks
Hillel
It is hard to tell without any profiling, but you're probably correct given what you're seeing. I'd be curious to know if you eliminate one of the write outs if you see better performance.
I always write the packets or frames (depending on the case) into a queue and have a separate thread do writing. You might even want two threads - one for the video (packet queue) and one for the frames.
As an aside, I think the line avcodec_send_packet(stream->dec_ctx, packet); consumes the packet, so when you use it later in ret = av_interleaved_write_frame(pipe_ofmt_ctx, packet); that seems like it wouldn't work. You could try copying the packet and see if that helps at all:
av_packet_ref(new_packet, packet);
Assuming you've allocated new_packet of course. Doesn't sound like what your seeing is this, so it might be OK what you're doing. But something you could also try.
I'm using libavformat to read packets from rtsp and remux it to mp4 (fragmented).
Video frames are intact, meaning I don't want to transcode/modify/change anything.
Video frames shall be remuxed into mp4 in their original form. (i.e.: NALUs shall remain the same).
I have updated libavformat to latest (currently 4.4).
Here is my snippet:
//open input, probesize is set to 32, we don't need to decode anything
avformat_open_input
//open output with custom io
avformat_alloc_output_context2(&ofctx,...);
ofctx->pb = avio_alloc_context(buffer, bufsize, 1/*write flag*/, 0, 0, &writeOutput, 0);
ofctx->flags |= AVFMT_FLAG_NOBUFFER | AVFMT_FLAG_FLUSH_PACKETS | AVFMT_FLAG_CUSTOM_IO;
avformat_write_header(...);
//loop
av_read_frame()
LOGPACKET_DETAILS //<- this works, packets are coming
av_write_frame() //<- this doesn't work, my write callback is not called. Also tried with av_write_interleaved_frame, not seem to work.
int writeOutput(void *opaque, uint8_t *buffer, int buffer_size) {
printf("writeOutput: writing %d bytes: ", buffer_size);
}
avformat_write_header works, it prints the header correctly.
I'm looking for the reason on why my custom IO is not called after a frame has been read.
There must be some more flags should be set to ask avformat to don't care about decoding, just write out whatever comes in.
More information:
Input stream is a VBR encoded H264. It seems av_write_frame calls my write function only in case an SPS, PPS or IDR frame. Non-IDR frames are not passed at all.
Update
I found out if I request IDR frame at every second (I can ask it from the encoder), writeOutput is called at every second.
I created a test: after a client joins, I requested the encoder to create IDRs #1Hz for 10 times. Libav calls writeOutput at 1Hz for 10 seconds, but then encoder sets itself back to create IDR only at every 10 seconds. And then libav calls writeOutput only at every 10s, which makes my decoder fail. In case 1Hz IDRs, decoder is fine.
I inherited some ALSA code that runs on a Linux embedded platform.
The existing implementation does blocking reads and writes using snd_pcm_readi() and snd_pcm_writei().
I am tasked to make this run on an ARM processor, but I find that the blocked interleaved reads push the CPU to 99%, so I am exploring non-blocking reads and writes.
I open the device as can be expected:
snd_pcm_handle *handle;
const char* hwname = "plughw:0"; // example name
snd_pcm_open(&handle, hwname, SND_PCM_STREAM_CAPTURE, SND_PCM_NONBLOCK);
Other ALSA stuff then happens which I can supply on request.
Noteworthy to mention at this point that:
we set a sampling rate of 48,000 [Hz]
the sample type is signed 32 bit integer
the device always overrides our requested period size to 1024 frames
Reading the stream like so:
int32* buffer; // buffer set up to hold #period_size samples
int actual = snd_pcm_readi(handle, buffer, period_size);
This call takes approx 15 [ms] to complete in blocking mode. Obviously, variable actual will read 1024 on return.
The problem is; in non-blocking mode, this function also takes 15 msec to complete and actual also always reads 1024 on return.
I would expect that the function would return immediately, with actual being <=1024 and quite possibly reading "EAGAIN" (-11).
In between read attempts I plan to put the thread to sleep for a specific amount of time, yielding CPU time to other processes.
Am I misunderstanding the ALSA API? Or could it be that my code is missing a vital step?
If the function returns a value of 1024, then at least 1024 frames were available at the time of the call.
(It's possible that the 15 ms is time needed by the driver to actually start the device.)
Anyway, blocking or non-blocking mode does not make any difference regarding CPU usage. To reduce CPU usage, replace the default device with plughw or hw, but then you lose features like device sharing or sample rate/format conversion.
I solved my problem by wrapping snd_pcm_readi() as follows:
/*
** Read interleaved stream in non-blocking mode
*/
template <typename SampleType>
snd_pcm_sframes_t snd_pcm_readi_nb(snd_pcm_t* pcm, SampleType* buffer, snd_pcm_uframes_t size, unsigned samplerate)
{
const snd_pcm_sframes_t avail = ::snd_pcm_avail(pcm);
if (avail < 0) {
return avail;
}
if (avail < size) {
snd_pcm_uframes_t remain = size - avail;
unsigned long msec = (remain * 1000) / samplerate;
static const unsigned long SLEEP_THRESHOLD_MS = 1;
if (msec > SLEEP_THRESHOLD_MS) {
msec -= SLEEP_THRESHOLD_MS;
// exercise for the reader: sleep for msec
}
}
return ::snd_pcm_readi(pcm, buffer, size);
}
This works quite well for me. My audio process now 'only' takes 19% CPU time.
And it matters not if the PCM interface was opened using SND_PCM_NONBLOCK or 0.
Going to perform callgrind analysis to see if more CPU cycles can be saved elsewhere in the code.
I am using TFT LCD screen (ILI9163c - 160*128). It is connected with athros AR9331 module with spi. Athros AR9331 is running with OpenWRT linux distribution. So, I am driving my LCD with spidev0.1. While filling screen or writing any string on LCD, it is taking too much time to print. So, what can i do to get sufficient printing speed.
Thanks.
This is the function i'm using to write data on spi pin using spidev...
void spi_transactor(unsigned char *write_data, int mode,int size)
{
int ret;
struct spi_ioc_transfer xfer[4];
unsigned char *init_reg;
init_reg = (unsigned char*) malloc(size);
memcpy(init_reg,write_data,size);
if (mode)
{
gpio_set_value(_rs, 1); // DATA
}
else
{
gpio_set_value(_rs, 0); // COMMAND
}
memset(xfer, 0, sizeof xfer);
xfer[0].bits_per_word = 8;
xfer[0].tx_buf = (unsigned long)init_reg;
xfer[0].rx_buf = 0; //( unsigned long ) &buf_rx[0];
xfer[0].len = size; //wlength + rlength;
xfer[0].delay_usecs = 0;
xfer[0].speed_hz = speedx; // 8MHZ
//xfer[0].speed_hz = 160000000; // 40MHZ
ret = ioctl(spi_fd, SPI_IOC_MESSAGE(1), &xfer);
gpio_set_value(_rs, 1);
}
The main performance issue here is that you make a hard copy of the data to send on the heap, every time the function is called. You also set up the communication parameters from scratch each time, even though they are always the same. To make things worse, the function has a massive bug: it leaks memory as if there's no tomorrow.
The hard copies aren't really necessary unless the SPI communication takes too much time for the program to sit and busy-wait on it to finish (rather likely). What you can do in that case is this:
Outsource the whole SPI business to a separate thread.
Create a work queue for the thread, using your favourite ADT for such. It should be a thread-safe FIFO.
Data is copied into the ADT as hard copies, by the caller.
The thread picks one chunk of work from the ADT and transmits it from there, without making yet another hard copy.
The thread waits for the SPI communcation to finish, then makes sure that the ADT deletes the data, before grabbing the next one. For hard real-time requirements, you can have the thread prepare the next message in advance while waiting for the previous one.
The communication parameters "xfer" are set up once by the thread, it just changes the data destination address from case to case.
I am writing a seek routine for analog FM radio using rtl_sdr with a generic DVB-T stick (tuner is a FC0013). Code is mostly taken from rtl_power.c and rtl_fm.c.
My approach is:
Tune to the new frequency
Gather a few samples
Measure RSSI and store it
Do the same for the next frequency
Upon detecting a local peak which is above a certain threshold, tune to the frequency at which it was detected.
The issue is that I can’t reliably map samples to the frequency at which they were gathered. Here’s the relevant (pseudo) code snippet:
/* freq is the new target frequency */
rtlsdr_cancel_async(dongle.dev);
optimal_settings(freq, demod.rate_in);
fprintf(stderr, "\nSeek: currently at %d Hz (optimized to %d).\n", freq, dongle.freq);
rtlsdr_set_center_freq(dongle.dev, dongle.freq);
/* get two bursts of samples to measure RSSI */
if (rtlsdr_read_sync(dongle.dev, samples, samplesSize, &samplesRead) < 0)
fprintf(stderr, "\nSeek: rtlsdr_read_sync failed\n");
/* rssi = getRssiFromSamples(samples, samplesRead) */
fprintf(stderr, "\nSeek: rssi=%.2f", rssi);
if (rtlsdr_read_sync(dongle.dev, samples, samplesSize, &samplesRead) < 0)
fprintf(stderr, "\nSeek: rtlsdr_read_sync failed\n");
/* rssi = getRssiFromSamples(samples, samplesRead) */
fprintf(stderr, "\nSeek: rssi=%.2f\n", rssi);
When I scan the FM band with that snippet of code, I see that the two RSSI measurements typically differ significantly. In particular, the first measurement is usually in the neighborhood of the second measurement taken from the previous frequency, indicating that some of the samples were taken while still tuned into the old frequency.
I’ve also tried inserting a call to rtlsdr_reset_buffer() before gathering the samples, in an effort to flush any samples still stuck in the pipe, with no noticeable effect. Even a combination of
usleep(500000);
rtlsdr_cancel_async(dongle.dev);
rtlsdr_reset_buffer(dongle.dev)
does not change the picture, other than the usleep() slowing down the seek operation considerably. (Buffer size is 16384 samples, at a sample rate of 2 million, thus the usleep() delay is well above the time it takes to get one burst of samples.)
How can I ensure the samples I take were obtained after tuning into the new frequency?
Are there any buffers for samples which I would need to flush after tuning into a different frequency?
Can I rely on tuning being completed by the time rtlsdr_set_center_freq() returns, or does the tuner need some time to stabilize after that? In the latter case, how can I reliably tell when the frequency change is complete?
Anything else I might have missed?
Going through the code of rtl_power.c again, I found this function:
void retune(rtlsdr_dev_t *d, int freq)
{
uint8_t dump[BUFFER_DUMP];
int n_read;
rtlsdr_set_center_freq(d, (uint32_t)freq);
/* wait for settling and flush buffer */
usleep(5000);
rtlsdr_read_sync(d, &dump, BUFFER_DUMP, &n_read);
if (n_read != BUFFER_DUMP) {
fprintf(stderr, "Error: bad retune.\n");}
}
Essentially, the tuner needs to settle, with no apparent indicator of when this process is complete.
rtl_power.c solves this by waiting for 5 milliseconds, then discarding a few samples (BUFFER_DUMP is defined as 4096, at sample rates between 1–2.8M).
I found 4096 samples to be insufficient, so I went for the maximum of 16384. Results look a lot more stable this way, though even this does not always seem sufficient for the tuner to stabilize.
For a band scan, an alternative approach would be to have a loop acquiring samples and determining their RSSI until RSSI values begin to stabilize, i.e. changes are no longer monotonic or below a certain threshold.