I'm going to write an application in Silverlight that consists of 2 threads, one that plays sound and another that records sound. And whatever is recorded will be what was played plus some ambient noise.
The problem is that Silverlight adds a delay to the sound to be played, and because I don't know how much is this delay, I would not know precisely what was played when something is recorded.
Do you know where I can find more information about this delay (how much is it, is it constant, will it change if I restart my application or computer, will it be the same in different computers, ...), or how could I measure it with an accuracy of 1 ms?
To measure the delay you can play some form of generated sound (like sin wave with increasing amplitudes), capture it and match input and output signals.
The delay itself is a complicated matter especially when dealing with low latencies. There are a lot of things involved in building delay including SL itself, audio stack, OS and audio hardware. Some additional info is here.
Related
I am designing the controller and data acquisition unit for a rocket engine test stand. This system needs to control a number of actuators on the test stand and also be able to transmit collected data back to the host computer where the team will be watching live data/camera feeds from safety.
The overall design requirements are as follows:
Acquire data from ~15 analog sensors at 1KHz
Control the actuators on the test stand including valves and ignition switches
Transmit data back to the host computer in our shelter in real time
Accept control from the host computer for things like manual valve actuation, test sequence modification, sequence abortion, etc.
I am not exactly sure where to begin when laying out the software for this system. I am considering using an STM32 ARM Cortex-M4 processor running at 180 MHz. I am having trouble figuring how I should approach the problem. I have considered using an RTOS system but based on what I have seen those generate large overheads as you run them faster as the scheduler has to run each tick. The other idea I'm bouncing around is a state machine combined with some timer-based interrupts for reading and then sending data back out to the PC. Any advice as to how to approach this problem to minimize code complexity would be greatly appreciated. Thanks.
EDIT:
I have been told to clarify a number of things concerning the technical specs of the system.
My actuators consist of:
6 solenoids (controlled digitally through relays/MOSFET, and switched around once a second)
2 DC motors (driven with PWM outputs in a PID loop, need to be able to ramp position controllably)
One igniter, again controlled through a relay/MOSFET
My sensors consist of:
8 pressure transducers (analog voltages)
4 thermocouples (analog voltages)
2 motor encoders (quadrature encoders)
1 light sensor (analog voltage)
1 Load cell (analog voltage)
Ideally all of the collected data (all of the above sensors) plus some additional data (timestamps, motor set positions, solenoid positions) is streamed back to the host computer at in real time.
Given the motor control with PWM & PID, you need to specify a desired resolution, either in PWM timer ticks or ADC reads. This is the most critical part. It doesn't hurt if the ADC has greater resolution than your specified resolution either. The PCB has to be designed accordingly, with sufficient resolution on resistors etc.
After you've done this, find MCU with sufficiently accurate ADC. I would imagine that 12 bit resolution is enough for most applications, but I don't know your specific case.
Next, you need to decide how fast you want the PID to be. Should an output on the PWM result in a read on the ADC in the next cycle, or could you settle for slower response? The realtime bottleneck here will be the ADC conversion clock, not the CPU.
The rest of the system doesn't seem time critical at all - you just have to ensure that everything is read/set synchronously. The data transmission to/from the host should preferably be done over CAN since it comes with hard real-time characteristics. Doesn't seem that you need a whole lot of bandwidth.
I have designed systems very similar to this using bare metal 16 bit MCUs running on 16MHz. Processing speed is really not a big concern, but meeting real-time deadlines is. That means you can forget about using Linux toys like Rasp PI, it's completely out of the question. And a RTOS is likely overkill since it mostly adds additional complexity.
A bare metal Cortex M with sufficient ADC resolution and CAN seems like a good choice. If you can stay away from floating point, that's nice too - depends on how advanced math you need. If you need nothing more advanced than PID, it can be implemented with fixed point just fine. (Or PI rather, since that usually works best for fast motor control systems.)
I'm using libao (ao_play) to play some buffers. I listen the keyboard keys and for each key I have a wav sound to play. It's simple.
With ao_play I see that the application blocks while is playing the sound. Because I want to play multiple audios at same time, I needed to use threads (with pthread lib).
It works, but I fell like a workaround and if I play to much files (maybe 10 or something like this) so everything stuck for some seconds and so come back.
Well, my question is: how to play multiple sounds at same time non-blocking using libao (and not using threads)?
This not a real design, more like a guess.
First of all, you'll need threads because it's a good old tradition to separate computations from visualisations, or audializations in this case. You'll need an audio thread that renders the stream and sends it to the output.
So, each time your main thread discovers a keypress, it sends a note to the audio thread. That latter captures an event and adds a wave to the currently played stream. The stream is rendered in frames (64, or 1024, or 10240 samples, or whatever you fancy your latency, if the wave itself is a simple mix of few possible samples, it can be notably realtime.) You should keep track of notes currently played, position per each sample. If latency is low, thus granularity high, you can even align sample edges by buffer edges, which would notably simplify rendering.
And after current buffer is rendered you simply send it to DAC and proceed with the next frame.
A quick glance at libao's help page does not reveal any mixing capabilities, so you'll need to create a simple mixer on your own, or you may actually need an existing solution, some simple opensource audio rendering library.
I am making a game for the GameBoy in GBDK, and I'm trying to add sounds to the game. GBDK has a function that plays sounds from an array of values, the only problem is that while its playing the sound the rest of the script freezes. Is there a way I can get them to run at the same time?
There is no way to have code running while using sampled audio playback. This is due to the fact that it actually uses full CPU to preform this playback. If you want to use regular sound effects, you'll either need to pause the game while they play, or use a different method. I'll try to summarize using the other playback method below, but it is kind of complicated and I'm no expert.
Using "normal" sound effects
This is kind of WIP - I'm not too experienced with it but it should let you get started.
To use sound effects, you need to write to GameBoy audio registers. This is found in GBDK's hardware.h, which is automatically included with references to gb\gb.h. But (of course) the registers don't have any documentation. This information is found on the GB Cribsheet. There's also this sound documentation file (unfortunately it behaves weirdly on windows encodings - Open with something other than notepad), along with some other information found on the Devrs.com sound documentation.
Working off of GBSOUND.TXT:
The addresses through which the sound channels can be accessed are:
$Addresses: (Description), (Register shorthand)
$FF10 -- $FF14: Channel 1, Referred to as NR10-NR14
$FF15 is unused, was probably going to be a sweep reg for channel 2
originally
$FF16 -- $FF19: Channel 2, Referred to as NR21-NR24
$FF1A -- $FF1E: Channel 3, Referred to as NR30-NR34
$FF1F is unused, was probably going to be a sweep reg for channel 4
originally
$FF20 -- $FF23: Channel 4, Referred to as NR41-NR44
$FF24 controls the Vin status and volume, Referred to as NR50
$FF25 selects which output each channel goes to, Referred to as NR51
$FF26 is the status register, and also controls the sound circuit's power.
Referred to as NR52
$FF27 -- $FF2F are unused.
$FF30 -- $FF3F is the load register space for the 4-bit samples for channel
3
In GBDK, the registers are named NR10_REG, NR11_REG, NR12_REG, ect.
Also, try looking at the example program sound.c, which doesn't compile for me unfortuantely. I might edit this to include more info.
To answer #franklin's question:
Which begs the question, how does a gameboy play both the game and sound at the same time?
They usually don't do that with sample playback. For instance, if you look at Pokémon Yellow, Pikachu's cry is done with sample playback. But while that is playing, nothing else is done. On the other hand, things like normal background music are done using the other audio hardware (sorry, not very detailed wiki link). Similarly, while move sound effects are done with the noise channel (used for the sample playback as well), they aren't actually sampled audio. As such, the game can continue running.
I am writing a small module in C to handle jitter and drift for a full-duplex audio system. It acts as a very primitive voice chat module, which connects to an external modem that uses a separate clock, independent from my master system clock (ie: it is not slaved off of the system master clock).
The source is based off of an existing example available online here: http://svn.xiph.org/trunk/speex/libspeex/jitter.c
I have 4 audio streams:
Network uplink (my voice, after processing, going to the far side speaker)
Network downlink (far side's voice, before processing, coming to me)
Speaker output (the far side's voice, after processing, to the local speakers)
Mic input (my voice, before processing, coming from the local microphone)
I have two separate threads of execution. One handles the local devices and buffer (ie: playing processed audio to the speakers, and capturing data from the microphone and passing it off to the DSP processing library to remove background noise, echo, etc). The other thread handles pulling the network downlink signal and passing it off to the processing library, and taking the processed data from the library and pushing it via the uplink connection.
The two threads use mutexes and a set of shared circular/ring buffers. I am looking for a way to implement a sure-fire (safe and reliable) jitter and drift correction mechanism. By jitter, I am referring to a clock having variable duty cycle, but the same frequency as an ideal clock.
The other potential issue I would need to correct is drift, which would assume both clocks use an ideal 50% duty cycle, but their base frequency is off by ±5%, for example.
Finally, these two issues can occur simultaneously. What would be the ideal approach to this? My current approach is to use a type of jitter buffer. They are just data buffers which implement a moving average to count their average "fill" level. If a thread tries to read from the buffer, and not-enough data is available and there is a buffer underflow, I just generate data for it on-the-fly by either providing a spare zeroed-out packet, or by duplicating a packet (ie: packet loss concealment). If data is coming in too quickly, I discard an entire packet of data, and keep going. This handles the jitter portion.
The second half of the problem is drift correction. This is where the average fill level metric comes in useful. For all buffers, I can calculate the relative growth/reduction levels in various buffers, and add or subtract a small number of samples every so often so that all buffer levels hover around a common average "fill" level.
Does this approach make sense, and are there any better or "industry standard" approaches to handling this problem?
Thank you.
References
Word Clock – What’s the difference between jitter and frequency drift?, Accessed 2014-09-13, <http://www.apogeedigital.com/knowledgebase/fundamentals-of-digital-audio/word-clock-whats-the-difference-between-jitter-and-frequency-stability/>
Jitter.c, Accessed 2014-09-13, <http://svn.xiph.org/trunk/speex/libspeex/jitter.c>
I faced a similar, although admittedly simpler, problem. I won't be able to fully answer your question but i hope sharing my solutions to some practical problems i ran into will benefit you anyway.
Last year i was working on a system which should simultaneously record from and render to multiple audio devices, each potentially ticking off a different clock. The most obvious example being a duplex stream on 2 devices, but it also handled multiple inputs/outputs only. All in all being a bit simpler than your situation (single threaded and no network i/o). In the end i don't believe dealing with more than 2 devices is harder than 2, any system with multiple clocks is going to have to deal with the same problems.
Some stuff i've learned:
Pick one stream and designate it's clock as "the truth" (i.e., sync all other streams to a common master clock). If you don't do this you won't have a well-defined notion of "current sample position", and without it there's nothing to sync to. This also has the benefit that at least one stream in the system will always be clean (no dropping/padding samples).
Your approach of using an additional buffer to handle jitter is correct. Without it you'd be constantly dropping/padding even on streams with the same nominal sample rate.
Consider whether or not you'd want to introduce such a jitter buffer for the "master" stream also. Doing so means introducing artificial latency in the master stream, not doing so means the rest of your streams will lag behind.
I'm not sure whether it's a good idea to drop entire packets. Why not try to use up as much of the samples as possible? Especially with large packet sizes this is far less noticeable.
To elaborate on the above, I got badly bitten by the following case: assume s1 (master) producing 48000 frames every second and s2 producing 96000 every 2 seconds. Round 1: read 48000 from s1, 0 from s2. Round 2: read 48000 from s1, 96000 from s2 -> overflow. Discard entire packet. Round 3: read 48000 from s1, 0 from s2. Etc. Obviously this is a contrived example but i ran into cases where on average I dropped 50% of secondary stream's data using this scheme. Introduction of the jitter buffer does help but didn't completely fix this problem. Note that this is not strictly related to clock jitter/skew, it's just that some drivers like to update their padding values periodically and they will not accurately report to you what is really in the hardware buffer.
Another variation on this problem happens when you really do got clock jitter but the API of your choice doesn't let you control packet size (e.g., allows you to request less frames than are actually available). Assume s1 (master) recording #1000 Hz and s2 alternating each second #1000 and 1001hz. Round 1, read 1000 frames from both. Round 2, read 1000 frames from s1, and 1001 from s2 -> overflow. Etc, on average you'll dump around 50% of frames on s2. Note that this is not so much a problem if your API lets you say "give me 1000 samples even though i know you've got more". By doing so though, you'll eventually overflow the hardware input buffer.
To have the most control over when to drop/pad, I found it easiest to allways keep input buffers empty and output buffers full. This way all dropping/padding takes place in the jitter buffer and you'll at least know and control what's happening.
If possible try to separate your program logic: the hard part is finding out where to pad/drop samples. Once you've got that in place it's easy to try different variations of pad/drop, sample-and-hold, interpolation etc.
All in all I'd say your solution looks very reasonable, although I'm not sure about the "drop entire packet thing" and I'd definitely pick one stream as the master to sync against. For completeness here's the solution I eventually came up with:
1 Assume a jitter buffer of size J on each stream.
2: Wait for a packet of size M to become available on the master stream (M is typically derived from the stream latency). We're going to deliver M frames of input/output to the app. I didn't implement an additional buffer on the master stream.
3: For all input streams: let H be the number of recorded frames in the hardware buffer, B be the number of recorded frames currently in the jitter buffer, and A being the number of frames available to the application: A equals H + B.
3a: If A < M, we have input underflow. Offer A recorded frames + (M - A) padding frames to the app. Since the device is likely slow, fill 1/2 of the jitter buffer with silence.
3b: If A == M, offer A frames to the app. The jitter buffer is now empty.
3c: If A > M but (A - M) <= J, offer M recorded frames to the app. A - M frames stay in the jitter buffer.
3d: If A > M and (A - M) > J, we have input overflow. Offer M recorded frames to the app, of the remaining frames put J/2 back in the jitter buffer, we use up M + J/2 frames and we drop A - (M + J/2) frames as overflow. Don't try to keep the jitter buffer full because the device is likely fast and we don't want to overflow again on the next round.
4: Sort of the inverse of 3: for outputs, fast devices will underflow, slow devices will overflow.
A, H and B are the same thing but this time they don't represent available frames but available padding (e.g., how much frames can i offer to the app to write to).
Try to keep hardware buffers full at all costs.
This scheme worked out quite well for me, although there's a few things to consider:
It involves a lot of bookkeeping. Make sure that for input buffers, data always flows from hardware->jitter buffer->application and for outputs always from app->jitter buffer->hardware. It's very easy to make the mistake of thinking you can "skip" frames in the jitter buffer if there's enough samples available from the hardware directly to the app. This will essentially mess up the chronological order of frames in an audio stream.
This scheme introduces variable latency on secondary streams because i try to postpone the moment of padding/dropping as long as possible. This may or may not be a problem. I found that in practice postponing these operations gives audibly better results, probably because many "minor" glitches of only a few samples are more annoying than the occasional larger hiccup.
Also, PortAudio (an open source audio project) has implemented a similar scheme, see http://www.portaudio.com/docs/proposals/001-UnderflowOverflowHandling.html. It may be worthwile to browse through the mailinglist and see what problems/solutions came up there.
Note that everything i've said so far is only about interaction with the audio hardware, i've no idea whether this will work equally well with the network streams but I don't see any obvious reason why not. Just pick 1 audio stream as the master and sync the other one to it and do the same for the network streams. This way you'll end up with two more-or-less independent systems connected only by the ringbuffer, each with an internally consistent clock, each running on it's own thread. If you're aiming for low audio latency, you'll also want to drop the mutexes and opt for a lock-free fifo of some sorts.
I am curious to see if this is possible. I'll throw in my two bits though.
I am a novice programmer, but studied audio engineering/interactive audio.
My first assumption is that this is not possible. At least not on a sample-to-sample basis. Especially not for complex audio data and waveforms such as human speech. The program could have no expectation of what the waveform "should" look like.
This is why there are high-end audio interfaces with temperature controlled internal clocks.
On the other hand, maybe there is a library that can detect the symptoms of jitter, somehow...
In which case I would be very curious to hear about it.
As far as drift correction, maybe I don't understand something on the programming front, but shouldn't you be pulling audio at a specific sample rate? I believe sample rate/drift is handled at the hardware level.
I really hope this helps. You might have to steer me closer to home.
Say I have a target of x requests/sec that I want to generate continuously. My goal is to start these requests at roughly the same interval, rather than just generating x requests and then waiting until 1 second has elapsed and repeating the whole thing over and over again. I'm not making any assumptions about these requests, some might take much longer than others, which is why my scheduler thread will not perform the requests (or wait for them to finish), but hand them over to a sufficiently sized Thread Pool.
Now if x is in the range of hundreds or less, I might get by with .net's Timers or Thread.Sleep and checking actually elapsed time using Stopwatch.
But if I want to go into the thousands or tens of thousands, I could try going high-resolution timer to maintain my roughly the same interval approach. But this would (in most programming environments on a general OS) imply some amount of hand-coding with spin waiting and so forth, and I'm not sure it's worthwhile to take this route.
Extending the initial approach, I could instead use a Timer to sleep and do y requests on each Timer event, monitor the actual requests per second achieved doing this and fine-tune y at runtime. The effect is somewhere in between "put all x requests and wait until 1 second elapsed since start", which I'm trying not to do, and "wait more or less exactly 1/x seconds before starting the next request".
The latter seems like a good compromise, but is there anything that's easier while still spreading the requests somewhat evenly over time? This must have been implemented hundreds of times by different people, but I can't find good references on the issue.
So what's the easiest way to implement this?
One way to do it:
First find (good luck on Windows) or implement a usleep or nanosleep function. As a first step, this could be (on .net) a simple Thread.SpinWait() / Stopwatch.Elapsed > x combo. If you want to get fancier, do Thread.Sleep() if the time span is large enough and only do the fine-tuning using Thread.SpinWait().
That done, just take the inverse of the rate and you have the time interval you need to sleep between each event. Your basic loop, which you do on one dedicated thread, then goes
Fire event
Sleep(sleepTime)
Then every, say, 250ms (or more for faster rates), check the actually achieved rate and adjust the sleepTime interval, perhaps with some smoothing to dampen wild temporary swings, like this
newRate = max(1, sleepTime / targetRate * actualRate)
sleepTime = 0.3 * sleepTime + 0.7 * newRate
This adjusts to what is actually going on in your program and on your system, and makes up for the time spent to invoke the event callback, and whatever the callback is doing on that same thread etc. Without this, you will probably not be able to get high accuracy.
Needless to say, if your rate is so high that you cannot use Sleep but always have to spin, one core will be spinning continuously. The good news: We get ever more cores on our machines, so one core matters less and less :) More serious though, as you mentioned in the comment, if your program does actual work, your event generator will have less time (and need) to waste cycles.
Check out https://github.com/EugenDueck/EventCannon for a proof of concept implementation in .net. It's implemented roughly as described above and done as a library, so you can embed that in your program if you use .net.