I am trying to work on a system where the quality of a recorded sentence is rated by a computer. There are three modes under which this system operates:
When the person records a sentence using a mic and mixer arrangement.
When the user records over a landline.
When the user records over a mobile phone.
I notice that the scores I get from recordings using the above 3 sources are in the following order: Mic_score > Landline_score > mobile_score
It is likely that the above order is because of the effects of the codecs and channel characteristics. My question is:
What can be done to compensate for channel/codec introduced artifacts to get consistent scores across channels? If some sort of inverse filtering, then please provide some links where I could get started.
How do I detect what channel the input speech has been recorded on? Use HMMs?
Edit 1: I am not at liberty to go into the details of the criteria. The current scores that I get from the mic, landline and mobile (for the same sentence said (and similarly spoken over the three mediums) is something like 80, 66, 41. This difference may be because of the channel effects. If the content and manner of speaking the sentence is the same, then I am looking for an algorithm that normalizes the scores (they need not be the same, but they should be close).
It may very well be that the sound quality is different.
Have you tried listening to some examples?
You can also use any spectrum analyzer to look at that data in detail. I suggest http://www.baudline.com/. Things your should look out for: Distance between the noise floor and the speech.
Also look at the high frequency noise bursts when the letters t, f and s are spoken. In low quality lines the difference between these letters disappears.
Why do you want to skew the quality measures? Giving an objective response of the quality seems to make more sense.
The landline codec will remove all frequencies around and above 4 kHz. The cell phone codec will throw away more information as part of a lossy compression process. Unless you have another side channel of information regarding the original audio content, there is no reliable way to recover the audio that was thrown away.
You best bet to normalize is to low pass filter the audio to match the 8 kHz telco codec, and the run the result through some cellular standard compression algorithm (there may be one published for your particular mobile cellular protocol). This should reduce the quality of all 3 signals to about the same.
Related
I am working on the ESP32 microcontroller and I would like to implement iBeacon advertising feature. I have been reading about the iBeacon. I have learnt about the specific format that iBeacon packet uses:
https://os.mbed.com/blog/entry/BLE-Beacons-URIBeacon-AltBeacons-iBeacon/
From what I understand, iBeacon preset is set and not meant to be modified. I must set a custom UUID, major and minor numbers such as:
uint8_t Beacon_UUID[16] = {0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0xAA,0xBB,0xCC,0xDD,0xEE,0xFF};
uint8_t Beacon_MAJOR[2] = {0x12,0x34};
uint8_t Beacon_MINOR[2] = {0x56,0x78};
The only thing that I am confused about is the TX Power byte. What should I set it to?
According to the website that I have referred above:
Blockquote
A scanning application reads the UUID, major number and minor number and references them against a database to get information about the beacon; the beacon itself carries no descriptive information - it requires this external database to be useful. The TX power field is used with the measured signal strength to determine how far away the beacon is from the smart phone. Please note that TxPower must be calibrated on a beacon-by-beacon basis by the user to be accurate.
Blockquote
It mentions what is TxPower and how it should be determined but I still cannot make any sense out of it. Why would I need to measure how far away the beacon is from the smart phone if? That should be done by the iBeacon scanner not the advertiser(me).
When you are making a hardware device transmit iBeacon, it is your responsibility to measure the output power of the transmitter and put the corresponding value into the TxPower byte of the iBeacon transmission.
Why? Because receiving applications that detect your beacon need to know how strong your transmitter is to estimate distance. Otherwise there would be no way for the receiving application to tell if a medium signal level like -75 dB is from a nearby weak transmitter or a far away strong transmitter.
The basic procedure is to put a receiver exactly one meter away from your transmitter and measure the RSSI at that distance. The one meter RSSI is what you put into TxPower byte of the iBeacon advertisement.
The specifics of how to measure this properly can be a bit tricky, because every receiver has a different "specificity" meaning they will read a higher or lower RSSI depending on their antenna gain. When Apple came out with iBeacon several years ago, they declared the reference receiver an iPhone 4S -- this was the newest phone available at that time. You would run beacon detection app like AirLocate (not available in the App Store) or my Beacon Locate (available in the App Store). The basic procedure is to aim the back of the phone at the beacon when it is exactly one meter away and use the app to measure the RSSI. Many detector apps have a "calibrate" feature which averages RSSI measurements over 30 seconds or so. For best results when calibrating, do this with both transmitter and receiver at least 3 feet above the ground and minimize metal or dense walls nearby. Ideally, you would do this outdoors using two plastic tripods (or do the same inside an antenna chamber.)
It is hard to find a reference iPhone 4S these days, and other iPhone models can measure surprisingly different RSSI values. My tests show that an iPhone SE 2nd edition measures signals very similarly to an iPhone 4S. But even these models are not made anymore. If you cannot get one of these, use the oldest iPhone you can get without a case and take the best measurement you can as described above. Obviously a ideal measurement requires more effort -- you have to decide how much effort you are willing to put into this. An ideal measurement is only important if you expect receiving applications to want to get the best distance measurements possible.
I would like to know how noise can be removed from data (say, radio data that is an array of rows and columns with each data point representing intensity of the radiation in the given frequency and time).The array can contain radio bursts. But many fixed frequency radio noise also exists(RFI=radio frequency intereference).How to remove such noise and bring out only the burst.
I don't mean to be rude, but this question isn't clear at all. Please sharpen it up.
The normal way to remove noise is first to define it exactly and then filter it out. Usually this is done in the frequency domain. For example, if you know the normalized power spectrum P(f) of the noise, build a filter with response
e/(e + P(f))
where e<1 is an attenuation factor.
You can implement the filter digitally using FFT or a convolution kernel.
When you don't know the spectrum of the noise or when it's white, then just use the inverse of the signal band.
What is a simple way to see if my low-pass filter is working? I'm in the process of designing a low-pass filter and would like to run tests on it in a relatively straight forward manner.
Presently I open up a WAV file and stick all the samples in a array of ints. I then run the array through the low-pass filter to create a new array. What would an easy way to check if the low-pass filter worked?
All of this is done in C.
You can use a broadband signal such as white noise to measure the frequency response:
generate white noise input signal
pass white noise signal through filter
take FFT of output from filter
compute log magnitude of FFT
plot log magnitude
Rather than coding this all up you can just dump the output from the filter to a text file and then do the analysis in e.g. MATLAB or Octave (hint: use periodogram).
Depends on what you want to test. I'm not a DSP expert, but I know there are different things one could measure about your filter (if that's what you mean by testing).
If the filter is linear then all information of the filter can be found in the impulse response. Read about it here: http://en.wikipedia.org/wiki/Linear_filter
E.g. if you take the Fourier transform of the impulse response, you'll get the frequency response. The frequency response easily tells you if the low-pass filter is worth it's name.
Maybe I underestimate your knowledge about DSP, but I recommend you to read the book on this website: http://www.dspguide.com. It's a very accessible book without difficult math. It's available as a real book, but you can also read it online for free.
EDIT: After reading it I'm convinced that every programmer that ever touches an ADC should definitely have read this book first. I discovered that I did a lot of things the difficult way in past projects that I could have done a thousand times better when I had a little bit more knowledge about DSP. Most of the times an unexperienced programmer is doing DSP without knowing it.
Create two monotone signals, one of a low frequency and one of a high frequency. Then run your filter on the two. If it works, then the low frequency signal should be unmodified whereas the high frequency signal will be filtered out.
Like Bart above mentioned.
If it's LTI system, I would insert impulse and record the samples and perform FFT using matlab and plot magnitude.
You ask why?
In time domain, you have to convolute the input x(t) with the impulse response d(t) to get the transfer function which is tedious.
y(t) = x(t) * d(t)
In frequency domain, convolution becomes simple multiplication.
y(s) = x(s) x d(s)
So, transfer function is y(s)/x(s) = d(s).
That's the reason you take FFT of impulse response to see the behavior of the filter.
You should be able to programmatically generate tones (sine waves) of various frequencies, stuff them into the input array, and then compare the signal energy by summing the squared values of the arrays (and dividing by the length, though that's mathematically not necessary here because the signals should be the same length). The ratio of the output energy to the input energy gives you the filter gain. If your LPF is working correctly, the gain should be close to 1 for low frequencies, close to 0.5 at the bandwidth frequency, and close to zero for high frequencies.
A note: There are various (but essentially the same in spirit) definitions of "bandwidth" and "gain". The method I've suggested should be relatively insensitive to the transient response of the filter because it's essentially averaging the intensity of the signal, though you could improve it by ignoring the first T samples of the input, where T is related to the filter bandwidth. Either way, make sure that the signals are long compared to the inverse of the filter bandwidth.
When I check a digital filter, I compute the magnitude response graph for the filter and plot it. I then generate a linear sweeping sine wave in code or using Audacity, and pass the sweeping sine wave through the filter (taking into account that things might get louder, so the sine wave is quiet enough not to clip) . A visual check is usually enough to assert that the filter is doing what I think it should. If you don't know how to compute the magnitude response I suspect there are tools out there that will compute it for you.
Depending on how certain you want to be, you don't even have to do that. You can just process the linear sweep and see that it attenuated the higher frequencies.
All the examples I have seen of neural networks are for a fixed set of inputs which works well for images and fixed length data. How do you deal with variable length data such sentences, queries or source code? Is there a way to encode variable length data into fixed length inputs and still get the generalization properties of neural networks?
I have been there, and I faced this problem.
The ANN was made for fixed feature vector length, and so are many other classifiers such as KNN, SVM, Bayesian, etc.
i.e. the input layer should be well defined and not varied, this is a design problem.
However, some researchers opt for adding zeros to fill the missing gap, I personally think that this is not a good solution because those zeros (unreal values) will affect the weights that the net will converge to. in addition there might be a real signal ending with zeros.
ANN is not the only classifier, there are more and even better such as the random forest. this classifier is considered the best among researchers, it uses a small number of random features, creating hundreds of decision trees using bootstrapping an bagging, this might work well, the number of the chosen features normally the sqrt of the feature vector size. those features are random. each decision tree converges to a solution, using majority rules the most likely class will chosen then.
Another solution is to use the dynamic time warping DTW, or even better to use Hidden Markov models HMM.
Another solution is the interpolation, interpolate (compensate for missing values along the small signal) all the small signals to be with the same size as the max signal, interpolation methods include and not limited to averaging, B-spline, cubic.....
Another solution is to use feature extraction method to use the best features (the most distinctive), this time make them fixed size, those method include PCA, LDA, etc.
another solution is to use feature selection (normally after feature extraction) an easy way to select the best features that give the best accuracy.
that's all for now, if non of those worked for you, please contact me.
You would usually extract features from the data and feed those to the network. It is not advisable to take just some data and feed it to net. In practice, pre-processing and choosing the right features will decide over your success and the performance of the neural net. Unfortunately, IMHO it takes experience to develop a sense for that and it's nothing one can learn from a book.
Summing up: "Garbage in, garbage out"
Some problems could be solved by a recurrent neural network.
For example, it is good for calculating parity over a sequence of inputs.
The recurrent neural network for calculating parity would have just one input feature.
The bits could be fed into it over time. Its output is also fed back to the hidden layer.
That allows to learn the parity with just two hidden units.
A normal feed-forward two-layer neural network would require 2**sequence_length hidden units to represent the parity. This limitation holds for any architecture with just 2 layers (e.g., SVM).
I guess one way to do it is to add a temporal component to the input (recurrent neural net) and stream the input to the net a chunk at a time (basically creating the neural network equivalent of a lexer and parser) this would allow the input to be quite large but would have the disadvantage that there would not necessarily be a stop symbol to seperate different sequences of input from each other (the equivalent of a period in sentances)
To use a neural net on images of different sizes, the images themselves are often cropped and up or down scaled to better fit the input of the network. I know that doesn't really answer your question but perhaps something similar would be possible with other types of input, using some sort of transformation function on the input?
i'm not entirely sure, but I'd say, use the maximum number of inputs (e.g. for words, lets say no word will be longer than 45 characters (longest word found in a dictionary according to wikipedia), and if a shorter word is encountered, set the other inputs to a whitespace character.
Or with binary data, set it to 0. the only problem with this approach is if an input filled with whitespace characters/zeros/whatever collides with a valid full length input (not so much a problem with words as it is with numbers).
I love to work on AI optimization software (Genetic Algorithms, Particle Swarm, Ant Colony, ...). Unfortunately I have run out of interesting problems to solve. What problem would you like to have solved?
This list of NP complete problems should keep you busy for a while...
How about the Hutter Prize?
From the entry on Wikipedia:
The Hutter Prize is a cash prize
funded by Marcus Hutter which rewards
data compression improvements on a
specific 100 MB English text file.
[...]
The goal of the Hutter Prize is to
encourage research in artificial
intelligence (AI). The organizers
believe that text compression and AI
are equivalent problems.
Basically the idea is that in order to make a compressor which is able to compress data most efficiently, the compressor must be, in Marcus Hutter's words, "smarter". For more information on the relation between artificial intelligence and compression, see the Motivation and FAQ sections of the Hutter Prize website.
Does the Netflix Prize count?
I would like my bank balance optimised so that there is as much money as possible left at the end of the month, instead of the other way round.
What about the Go Game ?
Here's an interesting practical problem I came up while tinkering with color quantization and image compression.
The basic idea is that I would like a program to which I give a picture and it reduces the amount of colors is it as much as possible without me noticing it. Since every person has a different sensitivity of the eye (and eyes have different sensitivity of red/green/blue intensities), it should be possible to specify this sensitivity threshold in some way.
In other words, in a truecolor picture, replace every pixel's color with another color so that:
The total count of different colors in a picture would be the smallest possible; and
Every new pixel would have it's color no further from the original color than some user-specified value D.
The D can be defined in different ways, pick your favorite. For example:
Separate red, green and blue components for specifying the maximum possible deviation for each of them (for every pixel you get a rectangular cuboid of valid replacement values);
A real number which would represent the maximum allowable distance in the RGB cube (for every pixel you get a sphere of valid replacement values);
Something inbetween or completely different.
Most efficient solution to a given set of Sudoku puzzles. (excluding brute-force methods)