Just wondering if it's possible to go through a flac, mp3, wav, etc file and edit portions, or the entire file by removing sections based on a specific frequency range?
So for example, I have a recording of a friend reciting a poem with a few percussion instruments in the background. Could I write a C program that goes through the entire file and cuts out everything except the vocals (human voice frequency ranges from 85-255 Hz, from what I've been reading)?
Thanks in advance for any ideas!
To address the OP's specific example: I think your understanding of human voice frequency is wrong. Perhaps the fundamental frequency of male spoken voice stays in that range (for tenor singing, or female speech or singing, or shouting, even the fundamental will go much higher, maybe 500-1000 Hz). But that doesn't even matter, because even if the fundamental is low, the overtones which create the different vowel sounds will go up to 2000-4000 Hz or higher. And the frequencies which define "noise" consonants like "t" and "s" go all the way to the top of the audio range, say 5000-10000 Hz. Percussion fills this same audio range, so I doubt that you can separate voice and percussion by filtering certain frequencies in or out.
It is certainly possible, otherwise digital studio mixing software wouldn't exist.
What your'e effectively asking for is to attenuate frequency ranges across an entire file. In analog land, you would apply a low-pass and a high-pass filter (or some other combination of filters) to attenuate the frequencies.
In software, you'd solve this problem by writing a digital filter of sorts that would reduce the output of various frequencies. Frequencies would be identified via an FFT computation.
The fastest thing to do would be to use an audio editing app and apply the changes there.
There is an audio library called PortAudio that may provide some support for editing an audio stream at the numerical level. It is written in C, and has a C API.
If you want to test out audio processing algorithms I strongly suggest Supercollider. It's free and has many kinds of audio filter built in. But eliminating voice could require considerable tweaking. Supercollider will allow you to write code driven by various parameters and then hook those parameters up to a GUI which you'll be able to tweak while supplying it with live (or recorded) data.
Even if you want to write C code, you'll learn a lot from using Supercollider first. Many of the filters are surprisingly easy to implement in C but you'll need to write a certain amount of framework code before you can get started.
Additionally, I learnt quite a bit about writing digital audio filters from this book. Among other things, it discusses some of the characteristics of human speech, as well as how to build filters to selectively enhance or knock out particular frequencies. It also provides working C code.
SciPy can do all sorts of signal processing.
You can also use MAX/MSP (but that's paid) or PureData (that's free) for working with music algorithms , they are the basis from which supercollider was created. And are excellent softwares if you want to do that on real-time envirollments.
Related
I need to take a .wav file with 44.1k and downsample it to 11.25k (Dividing samples by 4).
I also need to apply a bandpass filter to 300 Hz -> 3,200 Hz to the .wav file.
I am new to audio programming, I have been trying to research how to specifically approach this task but every search has pointed to a library or tool that would simply make the conversion for me. It's a training exercise for a new position I have acquired (to familiarize myself with both C programming and audio programming) so I have to accomplish it manually.
Thanks
Separate the code that deals with the specifics of .wav files from the specifics of working with audio samples. There are lots of how-tos on the Web for reading and writing .WAV files, especially PCM samples.
Once you have your samples in memory, downsampling is trivial. You literally take every _n_th sample and throw away the rest. There are other approaches that might better preserve the fidelity. For example, you might use a low-pass filter first and then resample the filtered waveform. A low-pass filter can be implemented with weighted averaging of the recent samples.
Bandpass filtering can be accomplished a few ways. The most direct is to transform the samples from the time domain to the frequency domain, manipulate the signal in frequency space, and transform it back. The transform used for this is called a Fourier transform. The most common way to do this in software is with an algorithm called a Fast Fourier Transform (FFT). It's "fast" because it eliminates a lot of redundant calculations.
Reading up on how to implement a digital filter is probably the first step. I would suggest looking up FIR and IIR filters.
Or if you are lazy, there probably exists several third party libraries that you can use.
The premise of the project will be:
There will be a prerecorded track of guitar, for example. The student will play the same track on his guitar. I need to compare these two sounds and find out whether the student played it good or not. I will be using STM32 microcontroller and Keil uVision software for simulation at first (programming at C).
I know that I will be using an ADC using DMA and I assume I would Fast Fourier Transform the wave signals and then somehow compare the two frequency responses. Also, would there be a problem with tempo? I mean it is not logical that every note will hit on the exact ms and then compare it
I've seen some methods like Hidden Markov Model or Goertzel algorithm but I am not quite sure what they do and if they are optimal and easy for the project. So my question would be: is there a specific algorithm that suits best and how would I implement it on my code (since I haven't really started working on code, mostly theoretical reading so far).
edit: I've made a similar post yesterday but my premise was too complicated to solve so I am posting on a new premise, much easier to accomplish. I thought not to ask on the first thread since it would mix up two different issues.
Assuming that you can use FFT to find out which notes are playing at what time (this may prove to be difficult for distorted guitar chords), you can do this e.g. 10 times per second for both streams, and then check how often the notes in both streams match. This will give you a percentage, if you need a binary value you'd have to use a threshold value.
If both streams are not equal length (different tempo) then you will have to stretch. You don't have to stretch the actual audio, just the times between the note measurements (e.g. every 100 ms for the first stream and every 125 seconds for the second stream).
So the biggest problem may be to find out what notes are playing at any given moment in time.
I'd start with constructing a mapping of frequencies to notes. Also it may be a good idea to low-pass filter the signal at around 1100 Hz to already get rid of some of the unwanted harmonics (you can't play higher than that on the guitar anyway) and similaryly high-pass filter the signal at 80 Hz. Then after the FFT or DFT (not sure if it matters which you choose), find the frequencies that are close to the real note frequencies. Then pick the loudest one and those that are above a certain threshold relative to the loudest one (e.g. drop anything that is less than half as loud as the loudest one, but some experimentation will be needed to find a good threshold value).
I'm planning on doing a small project involving ECG signals. I am currently getting ECG signals via a COM port and recording these in a txt file using C programming.
My next step is to be able to plot all those data points in real-time. Can this be done using C programming? If not, I do not mind collecting a sample that is 2 minutes long and then plotting those data points.
After that, I want to be able to take the FFT of the time-domain data and be able to plot the frequency plot.
My end goal is to design a GUI using C, that shows a person the real-time EEG waveform as well as the frequency plot.
I did make another post and was advised to try:
RRDTool (http://oss.oetiker.ch/rrdtool/) : However, this doesn't seem to be a straightforward implementation for C.
OpenGL Utility Toolkit (http://www.opengl.org/resources/libraries/glut/) : This seems to be really powerful for generating 3D plots. However, I couldn't find helpful guides simplifying 2D plot implementations
KST : http://kst-plot.kde.org/ , This was the most interesting software. I've played around with it a bit and like it's simplicity. It also allows me to get FFT data. However, I'm not sure how to connect it with my end goal of having the GUI since it is a seperate program.
If someone could recommend C based implementation and some tutorials/sample code to go along with it that would be great. Additionally, advice on other alternatives to reach my end-goal would also be much appreciated !
I can suggest using SDL. It's more 2d based and easier to learn than OpenGL, written in C and quite powerful.
If you want you can first try out SDL in Python using pygame.
Qt http://qt.nokia.com/products/ has a plot widget that you could use ... and has language bindings for many languages
You might want to look at PLplot. Works with many languages, including C, on all the common desktop operating systems (Mac, Windows, Linux, Unix). There are quite a few examples, and most (if not all) examples are available in a number of different languages.
Where Can I find algorithm details for holistic word recognition? I need to build a simple OCR system in hardware (FPGAs actually), and the scientific journals seems so abstract?
Are there any open source (open core) codes for holistic word recognition?
Thanks
For an algorithm that is quite suitable for FPGA implementation (embarrassingly parallel) you might look at:
http://en.wikipedia.org/wiki/Cross-correlation
It is fast, and easily implemented.
The only thing is: it recognizes a shape (in your case some text) DEPENDENT of the rotation and size / stretch / skew etc. But if that isn't a problem, it can be very fast and is quite robust. You should only watch out for interpretation problems with characters that are similar (like o and c).
I used it to find default texts on scanned forms to obtain bearings where Region of Interests are and searching in those images (6M pixels) only took around 15 ms with our implementation on a Core2 CPU in a single thread.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Power Efficient Software Coding
Adobe announced at Google I/O that it's next version of Flash 10.1 is going to more efficient for devices where power consumption matters.
This got me to thinking: how do you write code that uses less power? Are there any helpful resources regarding this topic?
My guess would be that it is a combination of:
reducing the complexity of your application
writing efficient code that is executed quickly (presumably because processing time = power consumed)
There's actually one much bigger way to reduce power consumption that hasn't been touched on.
Let's take a computer and divide all functions into two basic groups. Those implemented in hardware and those implemented in software.
If a function is implemented in hardware (that is- there is circuitry for which you can put the inputs on one set of wires and the outputs come out another set of wires) then the power consumption is equal to the power consumed in the total number of gates. The clock ticks one time (draining a little power) and the bus goes hot for the output (draining a little power).
If a function is implemented in software (that is- there is no single circuit which is used to implement the function) then it requires the use of multiple circuits, multiple clock cycles, and often-times lots of memory calls. Keep in mind that SRAM (used for processor registers) is made of D flip-flops which are constantly draining power so long as they are in use.
As a simple example, let's look at the H.264 encoder. H.264 is a video encoding used by QuickTime videos. It's also used in MPEG videos, many AVIs, and it's used by Skype. Because it's so common someone sat down and found a way to make a chip in hardware to which you feed the encoded file on one end and the red, green, and blue video channels come out the other end.
Before this chip existed (and before Flash 10.1) you had to decode this using software. Decoding it involves lots of sines and cosines. Sine and cosine are transcendental functions (that is- there is no way to write them in the four basic math operations without an infinite series). This means that the best you could do what run a loop 32-64 times getting gradually more accurate, with each iteration of the loop adding, multiplying, and dividing. Each iteration of the loop also moves values in and out of registers (which- as you recall, uses power).
Flash used to decode video by mathematically decoding it in software. Now it just says "pass the video to the H.264 chip". Of course it also has to check for the existence of this chip and use software if it doesn't exist. This means Flash, as a whole, is now larger. But one any system (like HTC phones) with an H.264 chip, it now uses less power.
Apply this same logic for:
Multiplying (adding multiple times in software)
Modulus (an infinite series in software)
Comparing (subtracting and checking if negative in software)
Drawing (sines/cosines/nastiness in software. Easy to pass to a videocard)
Seeing as though this is probably aimed towards embedded devices, I would venture to say that the best way to save power is to not be on, and to minimize how long the device is on. This means putting the processor to sleep and waking up only when work needs to be done. The best way I can think of to do this would to make an application entirely interrupt-driven.
In addition to Kevin's suggestion, I would think that minimizing Internet communications would help. This would include fetching data in bulk so more time can be spent asleep.
Also keep in mind that accessing devices like drives and wifi increases power consumption. Try to minimize access to such devices.