What functions exist for classifying speech sounds by deriving values from Python or JavaScript?
We know that speech has characteristics: speech sounds, tone, noise, pitch, strength, sound duration, fundamental tone, overtone, timbre and formant. Therefore, it is necessary to determine the values for each characteristic with the use of some algorithm. TensorFlow has no such possibility, as it can only classify sounds according to their general character: crying, screaming, vocal, ect.
Related
I need to take a .wav file with 44.1k and downsample it to 11.25k (Dividing samples by 4).
I also need to apply a bandpass filter to 300 Hz -> 3,200 Hz to the .wav file.
I am new to audio programming, I have been trying to research how to specifically approach this task but every search has pointed to a library or tool that would simply make the conversion for me. It's a training exercise for a new position I have acquired (to familiarize myself with both C programming and audio programming) so I have to accomplish it manually.
Thanks
Separate the code that deals with the specifics of .wav files from the specifics of working with audio samples. There are lots of how-tos on the Web for reading and writing .WAV files, especially PCM samples.
Once you have your samples in memory, downsampling is trivial. You literally take every _n_th sample and throw away the rest. There are other approaches that might better preserve the fidelity. For example, you might use a low-pass filter first and then resample the filtered waveform. A low-pass filter can be implemented with weighted averaging of the recent samples.
Bandpass filtering can be accomplished a few ways. The most direct is to transform the samples from the time domain to the frequency domain, manipulate the signal in frequency space, and transform it back. The transform used for this is called a Fourier transform. The most common way to do this in software is with an algorithm called a Fast Fourier Transform (FFT). It's "fast" because it eliminates a lot of redundant calculations.
Reading up on how to implement a digital filter is probably the first step. I would suggest looking up FIR and IIR filters.
Or if you are lazy, there probably exists several third party libraries that you can use.
I am trying to make a kaiser window for a audio signal using both Matlab and c.I have been looking at Matlab and gnu scientific library documentation to understand how to use a modified bessel function of first kind and 0th order, but I still have some questions:
It seems that GSL does not accept a 0 order bessel function, I don't understand the documentation on this point.
I don't know if I should use a regular or irregular function. What are their differences? Matlab do not have that.
which is the fastest method to filter the signal: time domain or frequency domain?
how to filter the signal on the frequency domain?
I will only answer to the last three points. (Warning : I am french and my english isn't great...)
1) When you consider the Fourier transform of a signal multiplied by a specific window, in the spectral domain, you convolute the original spectrum of the signal by the spectrum of your window. In an ideal mathematical world, you would love to have a Dirac since it's convolution would only shift the signal. But to get a Dirac in the frequency you would need a periodic signal in the time domain which isn't defined on a compact (i.e. finite like your sound record) support. And this is too bad because there is a theorem (Paley-Wiener's corollary) that states that if your time-domain support is compact your frequency-domain support is not bounded and the decreasing behaviour of the Fourier transform increase with the regularity of the signal (i.e. window in our case). Great then ! All we have to chose is a nice regular (smooth ?) window. Unfortunately, to get a really smooth window, we have to narrow it (wide smooth windows exist but have other drawbacks dues to their derived function...its like too large constants ahead appealing algorithmic complexity) and it's spectrum will be wider (for the same reason invoked in the theorem). But you (and Obama) believe in compromise to face the (Pontryagin) duality, don't you ? The gaussian is a great compromise since its Fourier transform is a gaussian too (sum of random variables ? convolution ? +,x-morphism in the complex plane...every thing is linked but its a too long non-linear story to be told here). Therefore a lot of window tend to look like a gaussian.
Here is a bunch of windows and spectrums stolen to my speech processing teacher :
2) It's a pure mathematical duality, so it depends of what you mean by fiter. Does applying a Sobel filter into the frequency-domain make any sense ? (in fact it may...)
3) Again, it depends of what you mean by filter.
I think I can answer (1) and (2):
(1) You can program the zeroth order Bessel functions yourself (spherical or not) by consulting a handbook on mathematical functions such as Abramowitz and Stegun, Gradshteyn and Ryzhik, or the Digital Library of Mathematical Functions (http://dlmf.nist.gov/).
(2) By regular and irregular, I presume you mean regular or modified Bessel functions. Bessel functions are solutions to the three-dimensional heat equation posed in cylindrical coordinates. Your boundary conditions determine your use of regular or modified Bessel functions. For nice discussions about regular and modified Bessel functions, I suggest reading The Conduction of Heat in Solids by Carslaw and Jaeger and Boundary Value Problems of Heat Conduction by M. Necati Ozisik. You can also try for difficult problems Classical Electrodynamics by John David Jackson.
How are the regular and modified Bessel functions different? The regular J Bessel function is somewhat oscillatory in nature (see a nice old book like Jahnke, Emde, and Losch for hand-drawn Bessel function graphs, a lost art form if you ask me) whereas I and K are single-valued.
I can't really help you much on (3) and (4), as I'm not much an electrical engineer (although I would like to learn more!).
It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:
1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?
If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.
You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).
To answer your questions:
C is suitable.
It depends on the algorithm, or the way you are thinking about `compression'.
My opinion will be, first decide whether you want to do a lossless compression or a lossy compression, then pick an algorithm to implement. Here are a few pointers:
For the lossless one, some are very intuitive, such as the run-length encoding,
e.g., if there is 11 as and 5 bs, you just encode them as 11a5b.
Some algorithms use a dictionary, please refer to LZW encoding.
Finally, I do recommend Huffman encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).
For lossy ones, Discrete Fourier Transform (DFT), or wavelet, is used in JPEG compression. This is useful to understand multimedia compression.
Wikipedia page is a good starting point.
Yes, C is well suited for this kind of work.
Whether you work with bytes or bits will depend on the algorithm that you decide to implement. For example, Huffman coding is inherently bit-oriented whereas many other compression algorithms are not.
C is a great choice for writing a compression program. You can use plenty of other languages too, though.
Your computer probably can't directly address units of memory smaller than a byte (pretty much by definition), so working with bytes is probably a good choice. Some of how you work with the data will be affected by the compression algorithm you choose.
Good luck!
1) Would c be a suitable programming language to accomplish this task?
Yes.
2) Should I be working in byte's with the input file? Or at a binary level somehow?
They're the same, so the question makes no sense.
not use a pre-existing compression library
Can you use a pre-existing compression algorithm? There are dozens and "compression algorithm" -- when used with Google -- will reveal a great deal of helpful information.
Where Can I find algorithm details for holistic word recognition? I need to build a simple OCR system in hardware (FPGAs actually), and the scientific journals seems so abstract?
Are there any open source (open core) codes for holistic word recognition?
Thanks
For an algorithm that is quite suitable for FPGA implementation (embarrassingly parallel) you might look at:
http://en.wikipedia.org/wiki/Cross-correlation
It is fast, and easily implemented.
The only thing is: it recognizes a shape (in your case some text) DEPENDENT of the rotation and size / stretch / skew etc. But if that isn't a problem, it can be very fast and is quite robust. You should only watch out for interpretation problems with characters that are similar (like o and c).
I used it to find default texts on scanned forms to obtain bearings where Region of Interests are and searching in those images (6M pixels) only took around 15 ms with our implementation on a Core2 CPU in a single thread.
Just wondering if it's possible to go through a flac, mp3, wav, etc file and edit portions, or the entire file by removing sections based on a specific frequency range?
So for example, I have a recording of a friend reciting a poem with a few percussion instruments in the background. Could I write a C program that goes through the entire file and cuts out everything except the vocals (human voice frequency ranges from 85-255 Hz, from what I've been reading)?
Thanks in advance for any ideas!
To address the OP's specific example: I think your understanding of human voice frequency is wrong. Perhaps the fundamental frequency of male spoken voice stays in that range (for tenor singing, or female speech or singing, or shouting, even the fundamental will go much higher, maybe 500-1000 Hz). But that doesn't even matter, because even if the fundamental is low, the overtones which create the different vowel sounds will go up to 2000-4000 Hz or higher. And the frequencies which define "noise" consonants like "t" and "s" go all the way to the top of the audio range, say 5000-10000 Hz. Percussion fills this same audio range, so I doubt that you can separate voice and percussion by filtering certain frequencies in or out.
It is certainly possible, otherwise digital studio mixing software wouldn't exist.
What your'e effectively asking for is to attenuate frequency ranges across an entire file. In analog land, you would apply a low-pass and a high-pass filter (or some other combination of filters) to attenuate the frequencies.
In software, you'd solve this problem by writing a digital filter of sorts that would reduce the output of various frequencies. Frequencies would be identified via an FFT computation.
The fastest thing to do would be to use an audio editing app and apply the changes there.
There is an audio library called PortAudio that may provide some support for editing an audio stream at the numerical level. It is written in C, and has a C API.
If you want to test out audio processing algorithms I strongly suggest Supercollider. It's free and has many kinds of audio filter built in. But eliminating voice could require considerable tweaking. Supercollider will allow you to write code driven by various parameters and then hook those parameters up to a GUI which you'll be able to tweak while supplying it with live (or recorded) data.
Even if you want to write C code, you'll learn a lot from using Supercollider first. Many of the filters are surprisingly easy to implement in C but you'll need to write a certain amount of framework code before you can get started.
Additionally, I learnt quite a bit about writing digital audio filters from this book. Among other things, it discusses some of the characteristics of human speech, as well as how to build filters to selectively enhance or knock out particular frequencies. It also provides working C code.
SciPy can do all sorts of signal processing.
You can also use MAX/MSP (but that's paid) or PureData (that's free) for working with music algorithms , they are the basis from which supercollider was created. And are excellent softwares if you want to do that on real-time envirollments.