I am new to the speech and speaker recognition probleme I understanded the way mfcc works but as far as I understanded (and found ) the coeffecients vary between the different words . my question : is there any other feature extraction methods that are text independent ? if so please refere to them .
Any hint will be very helpful .
Thanks in advance .
It's not clear what you mean by text-independent. MFCC (and any other) feature is a vector of real coefficients. Similar audio frames (in terms of human perception) may give you similar coefficients, but they also may not. If we speak of speech recognition then background noise and individual voice characteristics can change the coefficients drastically. That's why classifiers such as GMM or DNN are used to determined speech units given a particular MFCC. If you're interested in other feature extraction algorithms you can read about LPC and PLP features.
Related
Anyone have good book / article recommendation for procedural generation of background music? (No vocals, just instruments).
I'm not interested in:
How do I generate the sound of a particular note on a particular instrument
I'm interested in:
How do I generate the melody / score for the music.
Thanks!
EDIT:
Thanks for the reference to Brian Eno. I'm definitely looking into the ambient/user can ignore type of music. I.e. think the background music of a game. It's there to provide some basic mood, but the focus is the game.
Sometime ago I ran into ChucK, which is a programming-language to generate music/sound/audio:
ChucK presents a new time-based, concurrent programming model that's highly precise and expressive (we call this strongly-timed), as well as dynamic control rates, and the ability to add and modify code on-the-fly. In addition, ChucK supports MIDI, OSC, HID device, and multi-channel audio. It's fun and easy to learn, and offers composers, researchers, and performers a powerful programming tool for building and experimenting with complex audio synthesis/analysis programs, and real-time interactive control.
I believe the end result can be converted into MIDI, which can then be converted into a score or sheet notation.
I don't know if this is what you're looking for. Hope this helps!
EDIT
After thinking about this a little longer, I think what you can possibly do (and this sounds a bit crazy) is write code that generates ChucK code. So define a set of rules for your music/score generation and then use that to create valid ChucK code. After you run the ChucK code, you can get a MIDI file which you can then convert into score/sheet-music.
The book "Computer Models of Musical Creativity" by David Cope should help you along with the theoretical side of computer-assisted composition, though you might want some music theory under your belt before you dive in.
If you are interested in procedural music check out the Condition30 site -- condition30.com
This music is all procedural.
If you're interested in an implementation of procedural music based on cellular automata in C#, you could grab the source code from http://proceduralmidi.codeplex.com/. A binary is also available.
For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the gold data.
Would it have been possible to train the HMM without the tagged training set?
In theory you can do that. In that case you would use the Baum-Welch-Algorithm. It is described very well in Rabiner's HMM Tutorial.
However, having applied HMMs to part of speech, the error you get with the standard form will not be so satisfying. It is a form of expectation maximization which only converges to local maxima. Rule based approaches beat HMMs hands down, iirc.
I believe the natural language toolkit NLTK for python has an HMM implementation for that exact purpose.
NLP was a couple years ago, but I believe without tagging the HMM could help determine the symbol emission/state transition probabilities of n-grams (i.e. what are the odds of "world" occurring after "hello"), but not parts-of-speech. It needs the tagged corpus to learn how the POS interrelate.
If I'm way off on this let me know in the comments!
I am working for a project at school regarding face detection, based on a technique described by Viola and Jones 2001/2004.
I've read that the OpenCV has an implementation of this algorithm, and it works very good.
I was wondering if you have any advices regarding what techniques (pre-processing) to apply to the images before testing the existence of a face (eg. histogram equalization) ?
I basically used the code from this sample program from the OpenCV page and it worked very well for my masters thesis project. If you get bad results or your lighting is strange you can try a histogram equalization.
with a friend I did something similar too for an university project, and especially on low resolution video sequences it really helped to upsample the frame, doubling its size. It was my friends' idea, who had previously taken an image processing class. Although equivalent, things like decreasing initial scan window size, horizontal and vertical steps didn't produce the same result. In other words it may be better to work on larger images with larger scan windows than on smaller with smaller scan windows. Don't know exactly why.
Bye ;-)
I know its too late. But do go through this site as well.
It speaks of the common pre-proccessing required for the images. Equalising the image, Editing irrelevant content etc
I am thinking about creating a database system for images where they are stored with compact signatures and then matched against a "query image" that could be a resized, cropped, brightened, rotated or a flipped version of the stored one. Note that I am not talking about image similarity algorithms but rather strictly about duplicate detection. This would make things a lot simpler. The system wouldn't care if two images have an elephant on them, it would only be important to detect if the two images are in fact the same image.
Histogram comparisons simply won't work for cropped query images. The only viable way to go I see is shape/edge detection. Images would first be somehow discretized, every pixel being converted to an 8-level grayscale for example. The discretized image will contain vast regions in the same colour which would help indicate shapes. These shapes then could be described with coefficients and their relative position could be remembered. Compact signatures would be produced out of that. This process will be carried out over each image being stored and over each query image when a comparison has to be performed. Does that sound like an efficient and realisable algorithm? To illustrate this idea:
removed dead ImageShack link
I know this is an immature research area, I have read Wikipedia on the subject and I would ask you to propose your ideas about such an algorithm.
SURF should do its job.
http://en.wikipedia.org/wiki/SURF
It is fast an robust, it is invariant on rotations and scaling and also on blure and contrast/lightning (but not so strongly).
There is example of automatic panorama stitching.
Check article on SIFT first
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
If you want to do a feature detection driven model, you could perhaps take the singular value decomposition of the images (you'd probably have to do a SVD for each color) and use the first few columns of the U and V matrices along with the corresponding singular values to judge how similar the images are.
Very similar to the SVD method is one called principle component analysis which I think will be easier to use to compare between images. The PCA method is pretty close to just taking the SVD and getting rid of the singular values by factoring them into the U and V matrices. If you follow the PCA path, you might also want to look into correspondence analysis. By the way, the PCA method was a common method used in the Netflix Prize for extracting features.
How about converting this python codes to C back?
Check out tineye.com They have a good system that's always improving. I'm sure you can find research papers from them on the subject.
The article you might be referring to on Wikipedia on feature detection.
If you are running on Intel/AMD processor, you could use the Intel Integrated Performance Primitives to get access to a library of image processing functions. Or beyond that, there is the OpenCV project, again another library of image processing functions for you. The advantage of a using library is that you can try various algorithms, already implemented, to see what will work for your situation.
I'm trying to implement a fuzzy logic membership function in C for a hobby robotics project but I'm not quite sure how to start.
I have inputs about objects near a point, such as distance or which directions are clear/obstructed, and I want to map how strongly these inputs belong to sets like very near, near, far, very far. Does anyone have a tip on how to start? Thanks.
Disclaimer: I've never implemented a fuzzy controller (I've only ever used PI or PID in real-life) and control class was 10 years ago.
Here's an presentation demonstrating moving towards a target using distance and angle for inputs and power as the output. FuzzyTech's Example positioning a crane
This just presents the topic and theory i.e. no code.
Best source is probably one of the robotics groups
e.g Seattle Robotic Society fuzzy logic tutorial it is technical ... and long.
if you can access technical journals then search Google scholar for "fuzzy logic" "path planning" robotics
if you're looking for some ideas on how to implement fuzzy logic then perhaps a Application Note from one of the microchip manufactures will get you started e.g Microchip's paper on Airflow control or servo control. I know it's not Arduino but Microchips papers are usually very clearly presented.
And finally an example in c++ its probably more complex than you're looking for. Free fuzzy logic library
Good luck.
I'm not expert with fuzzy logic, but according to my basic understanding, you could start by deciding what distances would constitute near (say 10 cm) far (say 1m), then you use probabilities to fill in the range in between (so 55cm might be 50% near, 50% far). Then you do something similar for your other properties, and combine the probabilities associated with each property with more probabilities.
Do you have a good reference for designing fuzzy controls?
I suppose you could start here. I think they at least describe simple fuzzification and defuzzification routines.
The guys at MakeProto have created an automatic code generator for Fuzzy Systems that outputs C code from Matlab fuzzy systems, or by a hand-defined fuzzy system.
Might be worth taking a look at.
http://makeproto.com/blog/?p=35
Fuzzy inference system can be implemented in both C and C++. Learn How to frame fuzzy logic in c