How to create my own recommendation engine? [closed]

How to create my own recommendation engine? [closed] - database

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.
Where do (should be implemented on database side or backend side) I start creating
recommendation engine?
What level of database knowledge will be needed?
Is there any open source ones that can be used for help or any resource?
What should be the first steps that I have to do?

Presenting recommendations can be split up in to two main sections:
Feature extraction
Recommendation
Feature extraction is very specific to the object being recommended. For music, for example, some features of the object might be the frequency response of the song, the power, the genre, etc. The features for the users might be age, location, etc. You then create a vector for each user and song with the various elements of the vector corresponding to different features of interest.
Performing the actual recommendation only requires well thought out feature vectors. Note that if you don't choose the right features your recommendation engine will fail. This would be like asking you to tell me my sex based on my age. Of course my age may provide a bit of information, but I think you could imagine better questions to ask. Anyways, once you have your feature vectors for each user and song, you will need to train the recommendation engine. I think the best way to do this would be to get a whole bunch of users to take your demographic test and then tell you specific songs that they like. At this point you have all the information you need. Your job is to draw a decision boundary with the information you have. Consider a simple example. You want to predict whether or not a user likes AC/DC's "Back in Black" based on age and sex. Imagine a graph showing 100 data points. The x axis is age, the y axis is sex (1 is male, 2 is female). A black mark indicates that the user likes the song while a red mark means they don't like the song. My guess is that this graph might have a lot of black marks corresponding to users that are male and between the ages of 12 and 37 while the rest of the marks will be red. So, if we were to manually select a decision boundary, it'd be a rectangle around this area holding the majority of the black marks. This is called the decision boundary because, if a completely new person comes to you and tells you their age and sex, you only have to plot them on the graph and ask whether or not they fall within that box.
So, the hard part here is finding the decision boundary. The good news is that you don't need to know how to do that. You just need to know how to use some of the common tools. You can look into using neural networks, support vector machines, linear classifiers, etc. Again, don't let the big names fool you. Most people can't tell you what these things are really doing. They just know how to plug things in and get results.
I know it's a bit late, but I hope this helps anyone that stumbles on this thread.

I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:
Who uploaded a video?
Who commented on a video?
Which tags where created?
Who visited the video? (also tracking anonymous visitors)
Who favorited a video?
Who rated a video?
Which channels was the video assigned to?
Text streams of title, description, tags, channels and comments are collected by a fulltext indexer which puts weight on each of the data sources.
Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:
Similar videos by fulltext search
Videos uploaded by the same user
Other videos the users from these comments also commented on
Other videos the users from these favorites also favorited
Other videos the raters from these ratings also rated on (weighted)
Other videos in the same channels
Other videos with the same tags (weighted by "expressiveness" of tags)
Other videos played by people who played this video (XY latest plays)
Similar videos by comments fulltext
Similar videos by title fulltext
Similar videos by description fulltext
Similar videos by tags fulltext
All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.
I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.
To give a small hint, in ruby code it looks like this:
def related_by_tags
tag_names.find(:all, :include => :videos).inject([]) { |result,t|
result + t.video_ids.map { |v|
[v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]
}
}
end
I would be interested on how other people solve such algorithms.

This is really a very big question you are asking, so even if I could give you a detailed answer I doubt I'd have the time.... but I do have a suggestion, take a look at Greg Linden's blog and his papers on item-based collaborative filtering. Greg implemented the idea of recommendations engines at Amazon using the item based approach, he really knows his stuff and his blog and papers are very readable.
Blog: http://glinden.blogspot.com/
Paper: http://www.computer.org/portal/web/csdl/doi/10.1109/MIC.2003.1167344 (I'm afraid you need to log in to read it in full, as you are a CS student this should be possible).
Edit
You could also take a look at Infer.Net, they include an example of building a recommender system for movie data.

I have a 2 part blog on collaborative filtering based recommendation engine for implementation in Hadoop.
http://pkghosh.wordpress.com/2010/10/19/recommendation-engine-powered-by-hadoop-part-1/
http://pkghosh.wordpress.com/2010/10/31/recommendation-engine-powered-by-hadoop-part-2/
Here is the github repository for the open source project
https://github.com/pranab/sifarish
Feel free to use if you like it.

An example recommendation engine that is open source (AGPLv3-licensed) has been published by Filmaster.com recently. It's written in C++ and uses best practices from the white papers produced as part of the Netflix challange. An article about it can be found at: http://polishlinux.org/gnu/open-source-film-recommendation-engine/
and the code is here: http://bitbucket.org/filmaster/filmaster-test/src/tip/count_recommendations.cpp

Related

How can I make my Google Form manage unique questions for an interview committee?

I want the members of an interview committee to fill out a Google Form that will help manage which questions they will each ask the candidates they are interviewing. I don't want a candidate to be asked the same question by multiple interviewers.
Step 1, they select from among 96 "competencies," e.g., "Accountability" might be a competency they believe the position requires. They can choose up to, let's say, 5 competencies.
Step 2, based on which competencies they selected, they can now choose interview questions. We have a bank of 970 potential interview questions, each one of them directly related to one of the competencies. As interviewers select questions, those questions should be eliminated as options for the other interviewers. [I found some code to support the elimination part of this step in AppScript. What I CANNOT figure out is how to make the list of questions also based on which competencies they selected in the prior step. They should be choosing from a short list, not from a list of all 970. Each competency only has a handful of questions tied to it.]
Step 3, once everyone has completed the Form, we can produce a doc or pdf so everyone on the committee can see what everyone else is asking.
Is this possible??

If I understood your question correctly, Google Forms are not suitable for your task. You can find Add-Ons and GAS example codes that can change, for example, options for some dropdown question.
The problem is those changes are reflected to all the form viewers/users, in your case interview committee. Think about it like you have a single Google Doc file, "Terms and Conditions" that you can edit and others can view. Google Form are designed with a "same for everybody" idea in mind, so you can see answer percentages right away, like you have poll on twitter.
What you can do, since you already played with GAS is to create GAS WebApp that presents different options for each user. But if you are beginner I wouldn't recommend this, you will have to be GAS + JuniorWeb developer at least. Plus, since users can work concurrently, you will have to double check if what user selected became unavailable meanwhile (after competencies are selected, while you are picking the questions someone maybe submitted one of your competencies, when you hit submit your selections are invalid, you'll have to do it again).
The simplest way is if you have a company website, to ask the website developer to make some kind of company portal (employees only, not visible to visitors). Then to create this tool you need and put it on the company portal.
I saw some solutions for "race for resources" in Google Sheets where you can have competencies listed, then one true/false checkbox column "Reserved" and one "Who Reserved" column. And the filter is set to hide Reserved=true (filters rows for everybody, "live"). Then users can "collaborate" and user is aware if something they wanted became unavailable second ago. "Who Reserved" would require GAS code to handle onEdit event. But you have a second level - questions, I have no idea how you could make this.

Are there any composite-avatar scripts that I could steal or take a look at?

In a project I'll be working on soon there will be a need to generate avatars. The generation process will be one of those where the user can select different heads, hairstyles, clothing, etc. Some items will also be unavailable at first and will have to be earned or purchased.
I already have a fair idea on how to do this, but since it is a nontrivial amount of code, it would be nice to see working examples of code. Ideally I could just take a script and integrate into my page, but just gathering ideas from other people would be good too.
I'll be working with PHP, but examples in other languages are welcome too.
Added: To clarify, I don't mean a random avatar generator (or one that generates an avatar based on some hash value). A random avatar generator is subtly different from what I have intented. In a random avatar generator the programmer-artist has a much greater say of what goes where. He can carefully pick out pieces that will not conflict with each other, and he can discard those that give him trouble.
In my case the avatar generator is more like this. The user chooses which head to use, which hairstyle to apply, which piece of clothing to use, etc. There are way more pieces there, with artists adding new ones every once in a while. It's much harder to test how pieces will or won't fit together. Sometimes more advanced blending is required (like a hat would have a part of it in front of hair, and part of it behind the hair). Etc.

[EDIT: Revised answer to updated question]
I think that it's primarily a matter of designing the parts accordingly. You have some basic forms (male, female, tall / small, etc.) for which you have stylesets (e.g. hair) designed to match and align perfectly. Solving this algorithmically instead is probably a bad choice in terms of workload and probably not necessary for non-animated figures.
However, maybe you'll need some additional alpha-channel/transparency masks or something for combining head, hair and hat.
Other than that, these parts would have to be combined layer for layer like Monster ID.

infernowebmedia.com
They sell a cheap and fully functional website code to make your own avatar site.

Need suggestions for an Applied AI project

I have a course in my current semester in which I'm required to do a project on application of AI. I have decided to do this on game AI. I have 2 basic ideas: implementing an FPS bot(s) or implementing soccer AI.
I'm quiet a noob at AI right now, I've implemented basic pathfinding algos (A*, etc), and have studied about Finite state machines, some First Order logic, basic Neural Network stuff(Backpropagation ALgo), and am currently doing a course on Genetic Algorithms.
Our main focus is on the bot right now. Our plans include:
Each 'bot' would be implemented using a Finite State Machine (FSM), which would contain the possible states the bot could have; & the rules for the action/state changes that are going to take place when it receives an input.
In bot group movement, each bot would decide whether to strike, ways to strike; based on range, number of bots, existing fights using Neural Networks.
By using genetic algorithms the opponents next move could be anticipated based on repetitive moves.
Although I've programmed a few 2d games till now in my free time (like pacman, tetris, etc), I've never really gone into the 3d area. We will most probably be using a 3d engine.
We want to concentrate most of our energy on the AI part. We would like not to be bothered with unnecessary details about the animation/3d models, etc. For example, if we could find a framework which has functions like Moveright() which just moves the bot to the right, it would be really awesome.
My basic question is : is it too ambitious to go about it in the way we have planned, considering the duration of the project is abour 3 months? Should we go 3d and use a 3d game engine? is it easy to use such engines, if you have no experience with them before? If yes, what kind of engine would be suitable to our project?
I came accross another idea, given in the book AI Game programming by example, where the player would have a top down view of the bots. Would that way be more appropriate?
Thanks .. sorry about the length of the question .. it's just that my problem is a bit too specific.

My basic question is : is it too
ambitious to go about it in the way we
have planned, considering the duration
of the project is abour 3 months?
Yes -- but that's not necessarily a bad thing :)
Should we go 3d and use a 3d game
engine?
No. Mainly because you said:
We want to concentrate most of our
energy on the AI part.
Here's what I'd do, based on my experience (and knowing that, as a student, I often bit off way more than I could chew, too):
Make your simulation function irrespective of a graphical component. Have it publish "updates" to another layer, that consist of player and ball vectors. By doing so you'll be keeping your AI tasks separate from everything else, which means you have fewer bugs to worry about, and you can also unit test your underlying simulation much easier.
Take those "updates" and create your first "visualization" layer -- make it the simplest 2D representation possible. It could just be a stream of text lines: "Player 1 has the ball / Player 1 kicked ball at (30,40) with speed 20kph". That will be hard enough for your first pass since you'll be figuring out how to take data published by the simulation and doing something with it.
Your next visualization might add a 2D grid of ANSI graphics (think rogue-like) to actually show players and the ball moving. Your next one after that might be sprites. And so on. Note how you incrementally increase the complexity of your visualization... don't make your first step go to using a technology (3d graphics engine) you've never used before. (You'll never finish your project in that case.)
As for your questions about which route to take -- FSMs, NNs, GAs, top-down design -- you should rank your interest in them from most to least (along with the rest of your group) and then tackle them, in that order. You might consider doing one style for one team and a different design for the other team. You might want to make your FSM team play against a FSM team that's had an additional tweak done to it, in order to compare and contrast if you think your changes are actually being beneficial (you might be surprised and find out they make the team worse). Actually, that's where unit testing and splitting the simulation from the visualization come in very, very handy -- you should be able to "sim" as many games as you need to to get experimental results without worrying about graphics. You might even do it in batches overnight with scripts.
In general, my advice to you is this: break down your project into the tiniest pieces you can, and tackle them one at a time, so no matter where you're at when time runs out, you'll have something interesting to show off.

You could have a look at guntactyx, that's what I had to use when I did my AI unit at uni.
It takes care of all the display, physics, sound etc... for you, all you have to do is program your team of bots.
The API includes functions to make the bot move left or right, shoot, hear sounds (like gun shots) etc... and it comes with a few sample bots so you don't start from scratch.
Also, it's quite fun to watch your bots battling your friends' bots :)

Do you use Styrofoam balls to model your systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
[Objective-C]
Do you still use Styrofoam balls to model your systems, where each ball
represents a class?
Tom Love: We do, actually. We've also done a 3D animation version of
it, which we found to be nowhere near
as useful as the Styrofoam balls.
There's something about a physical,
conspicuous structure hanging from the
ceiling right in the middle of a
development project that's regularly
updated to provide not only the
structure of the system that you're
building, but also the current status
of each one of the classes.
We've done it on 19 projects the last time I've counted. One of them was 1,856 classes, which is big - actually, probably bigger than it should be. It was a big commercial project, so it needed to be somewhat big.
Masterminds of Programming
It is the first time I've read or heard about using styrofoam balls to model classes.
Is that a commonly used technique? And, how does that sort of modeling help us to design better the system?
If you have any photos to share which can show us how the classes are represented it'd be great!
Update: So, it seems that the material most people use is the paper. Styrofoam balls are actually oddballs, not a commonly used technique.
Noticeable techniques:
"paper plates and string" modeling, NealB
Post-it Notes on a whiteboard, Jason
Class-Responsibility-Collaboration cards, duffymo
Sheets of ruled paper taped to the wall, AMissico
Thank you all for the very good answers.

I found a couple of styrofoam models for:
Windows 95
and
Lotus Notes
(if that helps)
Actually, here's a Tom Love case study that shows a couple of his models.
This model may represent the least
expensive CASE tool on the market --
materials cost $20.35. It was more
useful than any CASE tools I have ever
used.
We used it in three important ways.
It fixed the number of classes that we would deliver in the finished
application and we did not allow new
ones to be added, unless existing ones
could be removed.
It was a very useful way to publicly document which classes had
been code reviewed (blue ribbons) and
tested (green ribbons).
It helped everyone understand what was being built and how much time and
effort it takes to do testing,
documentation and code reviews.
Edit: photo of object model
alt text http://img686.imageshack.us/img686/82/stryrofoamobjectmodel.jpg

The styrofoam ball model appears to date back to the mid 1990's - a time when CASE (Computer Aided Systems Analysis)
systems were all the rage.
At that time CASE systems promised significant benefits but were dismally slow,
buggy, unstable, overextended and downright awkward to use. Basically, long on potential but short on delivery.
I remember having a conversation with an analyst working on a different project from mine. Her team had
become so frustrated with their CASE system that they trashed it and resorted to "paper plates and string"
modeling. They reserved a meeting room, removed all the furniture, and laid out their process model using labeled
paper plates with strings (representing data flows) connecting them. She claimed it was much more
useful than the CASE system it replaced.
I suspect that the styrofoam ball model had similar roots.
Using styrofoam balls or paper plates fostered design "buy-in". If a team
finds something to rally around it naturally creates a common design focus. It is simple, concrete and
minimal - using it requires a lot
of face to face interaction and discussion. And that is where the value comes from. I suspect
if you brought a new person into the project and told them to bring themselves up-to-speed by
reviewing the "model" they would be "dead in the water". However, walk them through the
"model" and a real conversation would occur where all the required information need to
perform on the project would be imparted very quickly and efficiently.
Do I think styrofoam balls could become a sustainable modeling tool? No, I don't. They would be a real
pain to keep up to date in a changing environment. They convey little information. There are better tools available
today. And most importantly, if the team you are working with don't "buy" it, and they
probably won't, it will look really stupid - kind of like a sports team mascot, a rallying point
only if the team "buys it".

No, we don't do this. And in my 30-odd year history in the IT industry, I've never heard of anyone doing this.
The only way this could help you design better systems is by:
keeping the class count down since it's hard to build the styrofoam model; and
minimising changes, since updating it would be a serious pain in the rear end.
Other than those two dubious features, I can't see this as being very useful. I'd almost conclude it was some sort of prank. Far better to do some real work, I think.
Seriously, if we tried to model our application with styro coffee cups and straws, our bosses would be calling the men in white coats.

Post-it Notes on a whiteboard seem to be popular in the circles I travel in. Objects go on the Post-Its, and you rearrange them until you get your relationships the way you want em.
And then there are the Color Modeling people who use a 4-pack of colored Post-Its and assign an archetype to each color. It doesn't sound like this is much of an improvement, but standing across a room looking at it, you can tell where there are missing features or unidentified objects in the system.

There is one application to this that I think we tend to forget-- using tools to articulate an architecture comes naturally to us after years in the industry, but there are valuable, albeit less technically-minded, stakeholders who may not grasp vital concepts as readily. It would sometimes be a lifesaver to point to a cluster of balls and say, "This is the Language Processing Model, and if I implement the feature you want, it will have consequences here, here, and here. You can see that there are a lot of balls connected there".
Architects, be they designing buildings or systems, might rely on those tangible models to indoctrinate the check writers into the process.

And I thought that UML was useless. The styrofoam ball model makes UML look positively elegant by comparison.
Ward Cunningham's CRC card idea is more useful, even cheaper, and still retains that tactile quality that Dr. Love was after.
I had never heard of the idea until I read this question. It deserves an up vote for originality. And the "Windows" and "Lotus Notes" pictures are priceless.

Sheets of ruled paper taped to the wall, where each sheet is a component, class, entity, or whatever is needed. Everyone has a pencil.
Everyone can write on them "flushing" out the model during the design meetings. Such as, meeting notes, implemetation notes, new classes, removed classes, reasons why you do not have a particular class, and so on. After the design meeting, the principal designer takes them down and rewrite them, again "flushing" them out with pen in "rough-draft" versions. The designer can then make decisions based on the notes of each sheet, create new sheets for any additional components. Generate topics for next meeting, note any descrepancies, note any design / implementation details needed for coding, or whatever else they need to do.
Repeat the meetings until everyone is satisfied. Pencil is new stuff, pen is previous items. Once everyone is happy, the designer creates the working-draft, and posts where everyone can see and initial, in pen, their acceptance of the "working-draft".
Nothing is final. Pen versions are "latest" versions. Pencil versions are "work-in-progress" or "draft" versions.
Simple, fast, flexible, no wasting time on the computer, with high visiblity. Working man's Wiki.

No. My team does not do this.
And I am badly tempted to mock with image macros. But I'm contemplating that the idea is silly enough that it is self-mocking.

How to create a smart chat-bot? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I know that it's still an open problem so I don't expect to see complete answers here. I just want to find some approaches to solve the next problem:
I have a model (assume that is's bot's memory), and different words are associated with different objects in the model. Speaking with the bot is like executing sql-queries with a DB. Language is a very hard formalizable protocol. And we can't just write a million lines of code to implement some real language. But I believe that it's absolutely possible to implement some self-learning mechanism. How can it be implemented? Is it possible to implement learning "from scratch" or "from few basic words"? Just want to hear your ideas.
Actually, English is a very strict language and it's one of the easiest languages for experimenting with AI. Many other languages allow you to change the order of words (for example). And in some cases changed order can change the whole meaning or just add some intonation. I really don't have any ideas how to teach a bot for these things.

The first step, in taking this game to the next level, is ...
...to have a very clear view of prior art!
(and pardon me to say, the question doesn't suggest that you have such an extensive insight into the matter [and you're not alone, count me in ;-)])
Even, and maybe in particular, if your intention is to apply completely novel techniques and models, it seems important to review the literature on current and past practices. Aside from possibly identifying elements that may be adapted or reused in a new implementation, a survey of the domain will provide an keen understanding of the nature of the problem[s].
I've personally tried -on various and multiple occasions!- either the naive approach or the sophomoric approach to tackling broadly-defined problems. With the naive approach, one has but a very slight idea of the true nature and scope of the problem. The sophomoric sees us better equipped with domain knowledge and also with related tools, but this can also be misleading because without a deeper understanding, we tend to mis-read/mis-understand new material offered to us and also misuse some of the tools (a bit like the the fellow who's "good with a hammer" for whom many things look like a nail...)
It is particularly easy to make these mistakes in the field of NLP. That's because
Common sense seems to be all is required: after all a child, who's native tongue is English understands subtleties like
"He's not really an expert"
"He's really not an expert" (small wink at the OP's reference to the ordering of word in the English language)
We live in such exciting times, technology and knowledge wise: Processing power, programming language and tools, mathematical techniques, availability of affordable corpora... to name a few of these things that make this moment in time so special.
Far from me the idea of discouraging you in your chat-bot endeavor, I just hope that this long and generic exposé will encourage to look-before-you-leap, as this will truly save you time in the long run, I think in two ways:
provide you some frames of references (again, even if your intention is to "think outside these boxes")
maybe entice you to redefine the problem, for example by limiting it to particular domains of conversation (sports, or health, or life at a particular university campus...) or by focusing on a particular aspect of the problem (semantic awareness, smooth, natural sounding grammar, use of colloquial forms...)
Good luck ;-)

Check out MegaHAL's implementation for some ideas. We've used a variant of this bot for ages in an IRC channel of ours, and he does on occasion appear to be the intelligent mixture of many of our dominant personalities.

You "train" the bot -
each time the bot answer, you rank (or the tester) the answer - if the answer is good/logical - give high rank, if the answer is bad... low/negative rank.
use the ranking in the future to choose the answer, and this is how the bot learns...

There's a great description of Eliza in Paradigms of AI Programming. You should be able to implement a simple Eliza bot in a few days of work.
This isn't a learning algorithm, but it's surprising how realistic answers can be from something so simple.

You can create your own chat bot on BOT libre, http://www.botlibre.com.
The bots learns, can be trained, can be scripted, and your can program them, or let them program themselves.
Thew site supports embedding your bot on your own site, has REST API access, Android, IRC, Twitter. Free hosting, even for commercial bots.

AIML from the AliceBot project may help you out. It's a whole XML schema (if that doesn't put you off) for the branch of AI its concerned with.
An example from Wikipedia:
<category>
<pattern>WHAT IS YOUR NAME</pattern>
<template>My name is <bot name="name"/>.</template>
</category>
RebbeccaAIML is one quite well documented implementation.