How can I get current microphone input level with C WinAPI? - c

Using Windows API, I want to implement something like following:
i.e. Getting current microphone input level.
I am not allowed to use external audio libraries, but I can use Windows libraries. So I tried using waveIn functions, but I do not know how to process audio input data in real time.
This is the method I am currently using:
Record for 100 milliseconds
Select highest value from the recorded data buffer
Repeat forever
But I think this is way too hacky, and not a recommended way. How can I do this properly?

Having built a tuning wizard for a very dated, but well known, A/V conferencing applicaiton, what you describe is nearly identical to what I did.
A few considerations:
Enqueue 5 to 10 of those 100ms buffers into the audio device via waveInAddBuffer. IIRC, when the waveIn queue goes empty, weird things happen. Then as the waveInProc callbacks occurs, search for the sample with the highest absolute value in the completed buffer as you describe. Then plot that onto your visualization. Requeue the completed buffers.
It might seem obvious to map the sample value as follows onto your visualization linearly.
For example, to plot a 16-bit sample
// convert sample magnitude from 0..32768 to 0..N
length = (sample * N) / 32768;
DrawLine(length);
But then when you speak into the microphone, that visualization won't seem as "active" or "vibrant".
But a better approach would be to give more strength to those lower energy samples. Easy way to do this is to replot along the μ-law curve (or use a table lookup).
length = (sample * N) / 32768;
length = log(1+length)/log(N);
length = max(length,N)
DrawLine(length);
You can tweak the above approach to whatever looks good.

Instead of computing the values yourself, you can rely on values from Windows. This is actually the values displayed in your screenshot from the Windows Settings.
See the following sample for the IAudioMeterInformation interface:
https://learn.microsoft.com/en-us/windows/win32/coreaudio/peak-meters.
It is made for the playback but you can use it for capture also.
Some remarks, if you open the IAudioMeterInformation for a microphone but no application opened a stream from this microphone, then the level will be 0.
It means that while you want to display your microphone peak meter, you will need to open a microphone stream, like you already did.
Also read the documentation about IAudioMeterInformation it may not be what you need as it is the peak value. It depends on what you want to do with it.

Related

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

LMS implementation in LabVIEW fpga (High throughput personality)

Have to say surprised with the limitations of labVIEW FPGA in array implementation. I am not experienced enough in labview to make this comment but I found it very difficult to work with arrays !!.
I am working on Active Noise Cancellation project. I need to collect audio data from two microphones #40k Sample rate and 100 samples per frame (at least) and output the audio through loud speaker #40k sample rate. For this purpose, I am using myRIO 1900 in FPGA High throughput personality.
Right now, I am trying to implement LMS algorithm in LabVIEW FPGA platform. I have attached the MATLAB code below, what I want to implement !
Till array intializations are fine but when it comes to temporary vector manipulations. like :
x_temp(1:nr_c-1)=x_temp(2:nr_c);
x_temp(nr_c)=x(i);
I am going mad ! How to do this ? LabVIEW doesnt allow us to index the 1D arrays properly ( we can only extract one particular element of array not part of array, I can extract x_temp(1:nr_c) with sub array function but how to extract x_temp(2:nr_c) ??)
Please help me how to do the manipulation of 'x_temp' vector with the above basic statements.
PS: 1.LabVIEW FPGA supports only one dimensional array operations.
Even though Adaptive Filter Toolkit pallettes are available in myRIO FPGA High throughput personality, I cannot install Adaptive Filter Toolkit on myRIO Target.!!!
3.In MyLMS program : x : input vector(Array) , y: Desired Signal, nr_c : number of filter coefficients ,step: step size. I hope you understood the rest program :)
MyLMS.m :
function [y_hat,e,w] = MyLms(step, nr_c, x, y)
%intializing all vectors
coeffs=zeros(1,nr_c);
x_temp=zeros(1,nr_c);
y_hat=zeros(length(x),1);
e=zeros(length(x),1);
for i=1:length(x)
%temporary vector formation
x_temp(1:nr_c-1)=x_temp(2:nr_c);
x_temp(nr_c)=x(i);
%LMS algorithm iterations
y_hat(i) = x_temp * coeffs';
e(i)= y(i) - y_hat(i);
coeffs = coeffs+step*e(i)*x_temp;
end
I am by no means a LabVIEW FGPA expert, but I think you would encounter very similar limitations in VHDL.
At any rate, the typical technique for accessing subarrays is using rotate and replace subset (see Joel_G's post). For more information about the FPGA constraints and capabilities, see the help document: Array Palette Details (FPGA Module).

How can I get X11 screen buffer (or how can I get X11 to write to /dev/fb0)

I'm trying to get pixel data from my X11 instance, I've seen this thread (How do take a screenshot correctly with xlib?) and the double for loop is taking just too long for me (over a million loops, as the system I'm building requires the highest amount of efficiency possible, sitting around for 600 miliseconds is just not an option). Is there no way to just get a raw array of pixels to avoid the for loop? I know the XImage class has a "data" member which is supposed to contain all of the pixels, but the organizational system that it uses is foreign to me. Any help would be greatly appreciated! The best end game here would be for me to just be able to have X11 write directly to /dev/fb0

How do I detect whether the sample supplied by VideoSink.OnSample() is right-side up?

We're currently using the Silverlight VideoSink to capture video from users' local webcams, kinda like so:
protected override void OnSample(long sampleTime, long frameDuration, byte[] sampleData)
{
if (FrameShouldBeSubmitted())
{
byte[] resampledData = ResizeFrame(sampleData);
mediaController.SetVideoFrame(resampledData);
}
}
Now, on most of the machines that we've tested, the video sample provided in the byte[] sampleData parameter is upside-down, i.e., if you try to take the RGBA data and turn it into, say, a WriteableBitmap, the bitmap will be upside-down. That's odd, but fairly easy to correct, of course -- you just have to reverse the array as you encode it.
The problem is that at least on some machines (e.g., the single Macintosh in our test environment), the video sample provided is no longer upside-down, but right-side up, and hence, flipping the image actually results in an image that's received upside-down on the far side.
I reported this to MS as a bug, but their (terse) response was that it was "As Designed". Further attempts at clarification have so far been ignored.
Now, I'll grant that it's kinda entertaining to imagine the discussions behind this design decision: "OK, just to make it interesting, let's play the video rightside up on a Mac, but let's turn it upside down for Windows!" "Great idea!" "Yeah, that'll keep those developers guessing!" But beyond that, I can't find this, umm, "feature" documented anywhere, nor can I find any documentation on how one is supposed to be able to tell that a given video sample is upside down or rightside up. Any thoughts on how to tell this?
EDIT 3/29/10 4:50 pm - I got a response from MS which said that the appropriate way to tell was through the Stride property on the VideoFormat object, i.e., if the stride value is negative, the image will be upside-down. However, my own testing indicates that unless I'm doing something wrong, this isn't the case. At least on my own machine, whether the stride value is zero or negative (the only options I see), the sampled image is still upside-down.
I was going to suggest looking at VideoFormat.Stride provided at VideoSink.OnFormatChange but then I noticed your edit. I went ahead and tested it at my dev machine, image is upside down and stride is negative as expected. Have you checked again recently?
Even though stride made perfect sense for native applications (using stride at pointer operations), I agree that current behavior is not what you expect from a modern API. However performance wise, it is better not to make changes on data received from native API.
Yet at this point, while we are talking about performance, why not provide samples in formats other than PixelFormatType.Format32bppArgb so that we can avoid color space conversion? BTW, there is a VideoCaptureDevice.DesiredFormat property which only works for resolution as there is no alternative to PixelFormatType.Format32bppArgb.

Given an audio stream, find when a door slams (sound pressure level calculation?)

Not unlike a clap detector ("Clap on! clap clap Clap off! clap clap Clap on, clap off, the Clapper! clap clap ") I need to detect when a door closes. This is in a vehicle, which is easier than a room or household door:
Listen: http://ubasics.com/so/van_driver_door_closing.wav
Look:
It's sampling at 16bits 4khz, and I'd like to avoid lots of processing or storage of samples.
When you look at it in audacity or another waveform tool it's quite distinctive, and almost always clips due to the increase in sound pressure in the vehicle - even when the windows and other doors are open:
Listen: http://ubasics.com/so/van_driverdoorclosing_slidingdoorsopen_windowsopen_engineon.wav
Look:
I expect there's a relatively simple algorithm that would take readings at 4kHz, 8 bits, and keep track of the 'steady state'. When the algorithm detects a significant increase in the sound level it would mark the spot.
What are your thoughts?
How would you detect this event?
Are there code examples of sound pressure level calculations that might help?
Can I get away with less frequent sampling (1kHz or even slower?)
Update: Playing with Octave (open source numerical analysis - similar to Matlab) and seeing if the root mean square will give me what I need (which results in something very similar to the SPL)
Update2: Computing the RMS finds the door close easily in the simple case:
Now I just need to look at the difficult cases (radio on, heat/air on high, etc). The CFAR looks really interesting - I know I'm going to have to use an adaptive algorithm, and CFAR certainly fits the bill.
-Adam
Looking at the screenshots of the source audio files, one simple way to detect a change in sound level would be to do a numerical integration of the samples to find out the "energy" of the wave at a specific time.
A rough algorithm would be:
Divide the samples up into sections
Calculate the energy of each section
Take the ratio of the energies between the previous window and the current window
If the ratio exceeds some threshold, determine that there was a sudden loud noise.
Pseudocode
samples = load_audio_samples() // Array containing audio samples
WINDOW_SIZE = 1000 // Sample window of 1000 samples (example)
for (i = 0; i < samples.length; i += WINDOW_SIZE):
// Perform a numerical integration of the current window using simple
// addition of current sample to a sum.
for (j = 0; j < WINDOW_SIZE; j++):
energy += samples[i+j]
// Take ratio of energies of last window and current window, and see
// if there is a big difference in the energies. If so, there is a
// sudden loud noise.
if (energy / last_energy > THRESHOLD):
sudden_sound_detected()
last_energy = energy
energy = 0;
I should add a disclaimer that I haven't tried this.
This way should be possible to be performed without having the samples all recorded first. As long as there is buffer of some length (WINDOW_SIZE in the example), a numerical integration can be performed to calculate the energy of the section of sound. This does mean however, that there will be a delay in the processing, dependent on the length of the WINDOW_SIZE. Determining a good length for a section of sound is another concern.
How to Split into Sections
In the first audio file, it appears that the duration of the sound of the door closing is 0.25 seconds, so the window used for numerical integration should probably be at most half of that, or even more like a tenth, so the difference between the silence and sudden sound can be noticed, even if the window is overlapping between the silent section and the noise section.
For example, if the integration window was 0.5 seconds, and the first window was covering the 0.25 seconds of silence and 0.25 seconds of door closing, and the second window was covering 0.25 seconds of door closing and 0.25 seconds of silence, it may appear that the two sections of sound has the same level of noise, therefore, not triggering the sound detection. I imagine having a short window would alleviate this problem somewhat.
However, having a window that is too short will mean that the rise in the sound may not fully fit into one window, and it may apppear that there is little difference in energy between the adjacent sections, which can cause the sound to be missed.
I believe the WINDOW_SIZE and THRESHOLD are both going to have to be determined empirically for the sound which is going to be detected.
For the sake of determining how many samples that this algorithm will need to keep in memory, let's say, the WINDOW_SIZE is 1/10 of the sound of the door closing, which is about 0.025 second. At a sampling rate of 4 kHz, that is 100 samples. That seems to be not too much of a memory requirement. Using 16-bit samples that's 200 bytes.
Advantages / Disadvantages
The advantage of this method is that processing can be performed with simple integer arithmetic if the source audio is fed in as integers. The catch is, as mentioned already, that real-time processing will have a delay, depending on the size of the section that is integrated.
There are a couple of problems that I can think of to this approach:
If the background noise is too loud, the difference in energy between the background noise and the door closing will not be easily distinguished, and it may not be able to detect the door closing.
Any abrupt noise, such as a clap, could be regarded as the door is closing.
Perhaps, combining the suggestions in the other answers, such as trying to analyze the frequency signature of the door closing using Fourier analysis, which would require more processing but would make it less prone to error.
It's probably going to take some experimentation before finding a way to solve this problem.
You should tap in to the door close switches in the car.
Trying to do this with sound analysis is overengineering.
There are a lot of suggestions about different signal processing
approaches to take, but really, by the time you learn about detection
theory, build an embedded signal processing board, learn the processing
architecture for the chip you chose, attempt an algorithm, debug it, and then
tune it for the car you want to use it on (and then re-tune and re-debug
it for every other car), you will be wishing you just stickey taped a reed
switch inside the car and hotglued a magnet to the door.
Not that it's not an interesting problem to solve for the dsp experts,
but from the way you're asking this question, it's clear that sound
processing isn't the route you want to take. It will just be such a nightmare
to make it work right.
Also, the clapper is just an high pass filter fed into a threshold detector. (plus a timer to make sure 2 claps quickly enough together)
There is a lot of relevant literature on this problem in the radar world (it's called detection theory).
You might have a look at "cell averaging CFAR" (constant false alarm rate) detection. Wikipedia has a little bit here. Your idea is very similar to this, and it should work! :)
Good luck!
I would start by looking at the spectral. I did this on the two audio files you gave, and there does seem to be some similarity you could use. For example the main difference between the two seems to be around 40-50Hz. My .02.
UPDATE
I had another idea after posting this. If you can, add an accelerometer onto the device. Then correlate the vibrational and acoustic signals. This should help with cross vehicle door detection. I'm thinking it should be well correlated since the sound is vibrationally driven, wheres the stereo for example, is not. I've had a device that was able to detect my engine rpm with a windshield mount (suction cup), so the sensitivity might be there. (I make no promises this works!)
(source: charlesrcook.com)
%% Test Script (Matlab)
clear
hold all %keep plots open
dt=.001
%% Van driver door
data = wavread('van_driver_door_closing.wav');
%Frequency analysis
NFFT = 2^nextpow2(length(data));
Y = fft(data(:,2), NFFT)/length(data);
freq = (1/dt)/2*linspace(0,1,NFFT/2);
spectral = [freq' 2*abs(Y(1:NFFT/2))];
plot(spectral(:,1),spectral(:,2))
%% Repeat for van sliding door
data = wavread('van_driverdoorclosing.wav');
%Frequency analysis
NFFT = 2^nextpow2(length(data));
Y = fft(data(:,2), NFFT)/length(data);
freq = (1/dt)/2*linspace(0,1,NFFT/2);
spectral = [freq' 2*abs(Y(1:NFFT/2))];
plot(spectral(:,1),spectral(:,2))
The process for finding distinct spike in audio signals is called transient detection. Applications like Sony's Acid and Ableton Live use transient detection to find the beats in music for doing beat matching.
The distinct spike you see in the waveform above is called a transient, and there are several good algorithms for detecting it. The paper Transient detection and classification in energy matters describes 3 methods for doing this.
I would imagine that the frequency and amplitude would also vary significantly from vehicle to vehicle. Best way to determine that would be taking a sample in a Civic versus a big SUV. Perhaps you could have the user close the door in a "learning" mode to get the amplitude and frequency signature. Then you could use that to compare when in usage mode.
You could also consider using Fourier analysis to eliminate background noises that aren't associated with the door close.
Maybe you should try to detect significant instant rise in air pressure that should mark a door close. You can pair it with this waveform and sound level analysis and these all might give you a better result.
On the issue of less frequent sampling, the highest sound frequency which can be captured is half of the sampling rate. Thus, if the car door sound was strongest at 1000Hz (for example) then a sampling rate below 2000Hz would lose that sound entirely
A very simple noise gate would probably do just fine in your situation. Simply wait for the first sample whose amplitude is above a specified threshold value (to avoid triggering with background noise). You would only need to get more complicated than this if you need to distinguish between different types of noise (e.g. a door closing versus a hand clap).

Resources