orange Logistic regression formula in OLTP - logistic-regression

Once the log regression fn is known, how does one actually implement this in an online system for a new respondent, saying asking for a loan? Are we to call Python and Orange and pass a dataset with one record to the apis, or are we to take the function and embed it in Java, say? That is not clear to me.
Thanks.

You can either call Python and Orange or take the learned LR model parameters (coefficients) and implement the LR classifier in Java.
So the answer is: yes.

Related

Procedural music generation?

Anyone have good book / article recommendation for procedural generation of background music? (No vocals, just instruments).
I'm not interested in:
How do I generate the sound of a particular note on a particular instrument
I'm interested in:
How do I generate the melody / score for the music.
Thanks!
EDIT:
Thanks for the reference to Brian Eno. I'm definitely looking into the ambient/user can ignore type of music. I.e. think the background music of a game. It's there to provide some basic mood, but the focus is the game.
Sometime ago I ran into ChucK, which is a programming-language to generate music/sound/audio:
ChucK presents a new time-based, concurrent programming model that's highly precise and expressive (we call this strongly-timed), as well as dynamic control rates, and the ability to add and modify code on-the-fly. In addition, ChucK supports MIDI, OSC, HID device, and multi-channel audio. It's fun and easy to learn, and offers composers, researchers, and performers a powerful programming tool for building and experimenting with complex audio synthesis/analysis programs, and real-time interactive control.
I believe the end result can be converted into MIDI, which can then be converted into a score or sheet notation.
I don't know if this is what you're looking for. Hope this helps!
EDIT
After thinking about this a little longer, I think what you can possibly do (and this sounds a bit crazy) is write code that generates ChucK code. So define a set of rules for your music/score generation and then use that to create valid ChucK code. After you run the ChucK code, you can get a MIDI file which you can then convert into score/sheet-music.
The book "Computer Models of Musical Creativity" by David Cope should help you along with the theoretical side of computer-assisted composition, though you might want some music theory under your belt before you dive in.
If you are interested in procedural music check out the Condition30 site -- condition30.com
This music is all procedural.
If you're interested in an implementation of procedural music based on cellular automata in C#, you could grab the source code from http://proceduralmidi.codeplex.com/. A binary is also available.

Training Hidden Markov Models without Tagged Corpus Data

For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the gold data.
Would it have been possible to train the HMM without the tagged training set?
In theory you can do that. In that case you would use the Baum-Welch-Algorithm. It is described very well in Rabiner's HMM Tutorial.
However, having applied HMMs to part of speech, the error you get with the standard form will not be so satisfying. It is a form of expectation maximization which only converges to local maxima. Rule based approaches beat HMMs hands down, iirc.
I believe the natural language toolkit NLTK for python has an HMM implementation for that exact purpose.
NLP was a couple years ago, but I believe without tagging the HMM could help determine the symbol emission/state transition probabilities of n-grams (i.e. what are the odds of "world" occurring after "hello"), but not parts-of-speech. It needs the tagged corpus to learn how the POS interrelate.
If I'm way off on this let me know in the comments!

Duplicate image detection algorithms?

I am thinking about creating a database system for images where they are stored with compact signatures and then matched against a "query image" that could be a resized, cropped, brightened, rotated or a flipped version of the stored one. Note that I am not talking about image similarity algorithms but rather strictly about duplicate detection. This would make things a lot simpler. The system wouldn't care if two images have an elephant on them, it would only be important to detect if the two images are in fact the same image.
Histogram comparisons simply won't work for cropped query images. The only viable way to go I see is shape/edge detection. Images would first be somehow discretized, every pixel being converted to an 8-level grayscale for example. The discretized image will contain vast regions in the same colour which would help indicate shapes. These shapes then could be described with coefficients and their relative position could be remembered. Compact signatures would be produced out of that. This process will be carried out over each image being stored and over each query image when a comparison has to be performed. Does that sound like an efficient and realisable algorithm? To illustrate this idea:
removed dead ImageShack link
I know this is an immature research area, I have read Wikipedia on the subject and I would ask you to propose your ideas about such an algorithm.
SURF should do its job.
http://en.wikipedia.org/wiki/SURF
It is fast an robust, it is invariant on rotations and scaling and also on blure and contrast/lightning (but not so strongly).
There is example of automatic panorama stitching.
Check article on SIFT first
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
If you want to do a feature detection driven model, you could perhaps take the singular value decomposition of the images (you'd probably have to do a SVD for each color) and use the first few columns of the U and V matrices along with the corresponding singular values to judge how similar the images are.
Very similar to the SVD method is one called principle component analysis which I think will be easier to use to compare between images. The PCA method is pretty close to just taking the SVD and getting rid of the singular values by factoring them into the U and V matrices. If you follow the PCA path, you might also want to look into correspondence analysis. By the way, the PCA method was a common method used in the Netflix Prize for extracting features.
How about converting this python codes to C back?
Check out tineye.com They have a good system that's always improving. I'm sure you can find research papers from them on the subject.
The article you might be referring to on Wikipedia on feature detection.
If you are running on Intel/AMD processor, you could use the Intel Integrated Performance Primitives to get access to a library of image processing functions. Or beyond that, there is the OpenCV project, again another library of image processing functions for you. The advantage of a using library is that you can try various algorithms, already implemented, to see what will work for your situation.

How to automatically excerpt user generated content?

I run a website that allows users to write blog-post, I would really like to summarize the written content and use it to fill the <meta name="description".../>-tag for example.
What methods can I employ to automatically summarize/describe the contents of user generated content?
Are there any (preferably free) methods out there that have solved this problem?
(I've seen other websites just copy the first 100 or so words but this strikes me as a sub-optimal solution.)
Think of the task of summarization as a challenge to 'select the most important sentences' from the document.
The method described in The Automatic Creation of Literature Abstracts by H.P. Luhn (1958) describes a naive method that actually performs quite well. Try giving it a shot.
If your website is in Python coding this algorithm using the NLTK (Natural Language Toolkit) is a fun task.
Make it predictable.
From a users perspective simply using the first paragraph is not bad at all.
Using any automation is bound to fall flat in some cases. So I suggest to display
the first paragraph (maybe truncating at some point) as a summary and offer the ability to override that by an optional field.
I might try using mechanical Turk or any number of other crowdsourcing options.
Another item to check out, a SourceForge project, AutoSummary Semantic Analysis Engine
Not a trivial task... You should look for articles or books on "extractive summarization"
A few starters could be:
Books:
Natural Language Processing with Python
Foundations of Statistical Natural Language Processing
Articles:
Language independent extractive summarization
Extractive summarization: how to identify the gist of a text
Extractive Summarization using Inter- and Intra- Event Relevance
Yahoo has a free API for this:
http://developer.yahoo.com/search/content/V1/termExtraction.html
Apple's patent 6424362 - Auto-summary of document content contains sample code which might be useful...
This borders on artificial intelligence so there's not going to be an "easy" solution out there, but there are products that target this problem.
Check out Copernic Summarizer, for one.
Noun phrases typically tend to be important elements of a sentence. Picking sentence(s) with a high density of noun phrases could yield a good summary. You could get noun phrases using a POS tagger.
For a good summary, it is desirable that it is a meaningful sentence. Reading a broken sentence is slightly jarring.
Alternatively, when the author posts the article, the author can highlight what are the keywords that can be used in the description which can then be automatically put in the meta description tag.

Fuzzy logic membership function in C

I'm trying to implement a fuzzy logic membership function in C for a hobby robotics project but I'm not quite sure how to start.
I have inputs about objects near a point, such as distance or which directions are clear/obstructed, and I want to map how strongly these inputs belong to sets like very near, near, far, very far. Does anyone have a tip on how to start? Thanks.
Disclaimer: I've never implemented a fuzzy controller (I've only ever used PI or PID in real-life) and control class was 10 years ago.
Here's an presentation demonstrating moving towards a target using distance and angle for inputs and power as the output. FuzzyTech's Example positioning a crane
This just presents the topic and theory i.e. no code.
Best source is probably one of the robotics groups
e.g Seattle Robotic Society fuzzy logic tutorial it is technical ... and long.
if you can access technical journals then search Google scholar for "fuzzy logic" "path planning" robotics
if you're looking for some ideas on how to implement fuzzy logic then perhaps a Application Note from one of the microchip manufactures will get you started e.g Microchip's paper on Airflow control or servo control. I know it's not Arduino but Microchips papers are usually very clearly presented.
And finally an example in c++ its probably more complex than you're looking for. Free fuzzy logic library
Good luck.
I'm not expert with fuzzy logic, but according to my basic understanding, you could start by deciding what distances would constitute near (say 10 cm) far (say 1m), then you use probabilities to fill in the range in between (so 55cm might be 50% near, 50% far). Then you do something similar for your other properties, and combine the probabilities associated with each property with more probabilities.
Do you have a good reference for designing fuzzy controls?
I suppose you could start here. I think they at least describe simple fuzzification and defuzzification routines.
The guys at MakeProto have created an automatic code generator for Fuzzy Systems that outputs C code from Matlab fuzzy systems, or by a hand-defined fuzzy system.
Might be worth taking a look at.
http://makeproto.com/blog/?p=35
Fuzzy inference system can be implemented in both C and C++. Learn How to frame fuzzy logic in c

Resources