Binarize the ratings - MovieLens dataset - dataset

I am working on a personalised news recommendation engine based on click-behaviour of users. My features will be predefined news categories (such as politics, sport and etc).
Whenever user clicks on an article, I build/update user profile based on this article, then recommend another article from articles pool.
Regarding evaluation of this system, I need to have a dataset which contains binary user-item interactions (user clicked on recommended article or not) - which I couldn't find an appropriate dataset for this specific context. What I'm trying to do is, binarize Movielens dataset, then calculate precision and recall.
What I actually do in MovieLens dataset is as follows: if the rating for an item, by a user, is larger than the average rating by this user I assign it a binary rating of 1, 0 otherwise.
Is this approach right way to evaluate such kind of systems?

binarizing makes no difference. Precision and recall are relative so the fact that someone rated is all you need. The algo for a "good" rating is meaningless for testing purposes.
epinions has two dataset, one for ratings, the other binary for trust.
use MAP#k mean average precision for some number of recommendations. This will take account of the ranking in a group of recommendations, which is no, doubt how they will be used.
BTW there is already a recommender in open source that does this, and allows mixing multiple events/actions/indicators and can also use content similarity here. It is based on PredictionIO's framework, which is Spark based.

Related

Infinite scrolling for personalized recommendations

I've been recently researching solutions that would allow me to display a personalized ranking of products in an online e-commerce store.
A natural solution for this problem would be to use a managed ML service such as
AWS Personalize.
Based on my understanding it can be implemented in terms of 2 service calls:
Recommendations - return up to ~500 products based on user's profile
Ranking - based on product ids (up to ~500) reorder the list according user's to profile
I was wondering if there exists and implementation / indexing strategy that would allow to display the entire product catalog (let's assume 10k products)?
I can imagine an implementation that would:
Return 50 products per page
Call recommendations to grab first 500 products
Alternatively, pick top 500 products platform-wise and rerank them according to the user.
For the remaining results, i.e. pages 11 to N a database query would be executed, excluding those 500 products by id. The recommended ordering wouldn't be as relevant anymore as te top recommendations have been listed and the user is less likely to encounter relevant results at the 11th page. As a downside such a query would need a relatively large array to be included as a part of the query.
Is there a better solution for this? I've seen many eccomerce platforms offering a "Recommended" order option for their product listing that allows infinite scrolling. Would that mean that such a store is typically using predefined ranks, that is an arbitrary rank assigned to each product by the content manager that is exactly the same for any user on the platform?
I don't think I've ever seen an ecommerce site that shows me 10K products without any differentiation. Most ecommerce sites use a process called "merchandising" to decide which product to show, to which customer, in which position/treatment, at which time, and in which context.
Personalized recommendations may be part of that process, but they are generally only a part. For instance, if you're browsing "Books about architecture", the fact the recommendation engine thinks you're really interested in CDs by Duran Duran is not super useful. And even within "books about architecture", there may be other attributes that are more likely to drive your buying behaviour.
Most ecommerce sites use a range of attributes to decide product ranking in a product listing page, for example "relevance", "price", "is the product on special offer?", "is the product in stock?", "do we make a big margin on this product?", "is this a best seller?", "is the supplier reliable?". Personalized recommendations are sprinkled into these factors, but the weightings are very much specific to the vendor.
Most recommendation engines will provide a "relevance score" or similar indicator. In my experience, this has a long tail distribution - a handful of products will score highly, and the rest will score very low relevancy scores. The ecommerce business I have worked with have a cut off point - a score of less than x means we'll ignore the recommendation, in favour of other criteria.
Again, in my experience, personalized recommendations are useful for squeezing the last few percentage points of conversion, but nowhere near as valuable as robust search, intuitive categorization, and rich meta data.
I'd suggest that if your customer has scrolled through 500 recommended products, it's unlikely the next 9500 need to be in "recommended" order - your recommendation mechanism is probably statistically not significant.

Common ways for pair matching of users in a database?

hope you are safe and well!
I have a question about regular or common ways of pair-matching if there is a database of users: say there are a few properties of each user, and when matching, each user could change the filtering options to only match those who fit their own requirement(so there is mutual selection between users), and we want to efficiently match 1000 users as precisely as possible.
For example, let's say there are 3 properties of every user: gender(female/male/other), study level(elementary/mediate/advanced), and grade(freshman/sophomore/junior/senior), and when matching, each user could choose to only match with people with their selected gender, study level and grade.
When focusing on 1 user, I could guess, on the perspective of database, we could use the filtering options in commands and get a list of those who satisfy both "my requirement" and "I fit their requirement"? However, I think this would be slow and asynchronous problems when there are 1000+ users in the matching phase at the same time?
I saw another post here discussed the blossom algorithm or greedy algorithm, which seem cool since if looking in a graph. Are they doable in this case? I guess if two users mutually fit both requirements, they would have an edge between the two nodes, and the value of edge could be comprehensive matching scores of 3 properties all together?
Anyway, I'm wondering is there a common way to do the pair matching precisely with at least 1000+ users at the same time?
Thank you so much!
If the requirement is that each match has to have the exact same properties, then the solution is fairly simple; just do a multiple criteria sort (ex. first sort by gender, then within each gender category sort by study level, etc.) and pair the identical users.
However, in a random dataset you're very unlikely to have perfect matches for all users. In that case you would want score pairs by how closely each category matches and use a more complex algorithm to maximize your overall matches. What you would do depends heavily on your use case and userbase size. Honestly, 1000 users is a very small number for modern computers; pretty much any polynomial time method (including blossom as you mentioned) would work fine.

Data set for Doc2Vec general sentiment analysis

I am trying to build doc2vec model, using gensim + sklearn to perform sentiment analysis on short sentences, like comments, tweets, reviews etc.
I downloaded amazon product review data set, twitter sentiment analysis data set and imbd movie review data set.
Then combined these in 3 categories, positive, negative and neutral.
Next I trinaed gensim doc2vec model on the above data so I can obtain the input vectors for the classifying neural net.
And used sklearn LinearReggression model to predict on my test data, which is about 10% from each of the above three data sets.
Unfortunately the results were not good as I expected. Most of the tutorials out there seem to focus only on one specific task, 'classify amazon reviews only' or 'twitter sentiments only', I couldn't manage to find anything that is more general purpose.
Can some one share his/her thought on this?
How good did you expect, and how good did you achieve?
Combining the three datasets may not improve overall sentiment-detection ability, if the signifiers of sentiment vary in those different domains. (Maybe, 'positive' tweets are very different in wording than product-reviews or movie-reviews. Tweets of just a few to a few dozen words are often quite different than reviews of hundreds of words.) Have you tried each separately to ensure the combination is helping?
Is your performance in line with other online reports of using roughly the same pipeline (Doc2Vec + LinearRegression) on roughly the same dataset(s), or wildly different? That will be a clue as to whether you're doing something wrong, or just have too-high expectations.
For example, the doc2vec-IMDB.ipynb notebook bundled with gensim tries to replicate an experiment from the original 'Paragraph Vector' paper, doing sentiment-detection on an IMDB dataset. (I'm not sure if that's the same dataset as you're using.) Are your results in the same general range as that notebook achieves?
Without seeing your code, and details of your corpus-handling & parameter choices, there could be all sorts of things wrong. Many online examples have nonsense choices. But maybe your expectations are just off.

Automatic product classification and query weighting

I'm facing ranking problems using solr and I'm stucked.
Given a e-commerce site, for the query "ipad" i obtain:
ipad case for ipad 2
ipad case
ipad connection kit
ipad 32gb wifi
This is a problem, since we want to rank first the main products (or products by itself) and tf/idf ranks first the accessories due to descriptions like "ipad case compatible with ipad, ipad2, ipad3, ipad retina, ipad mini, etc".
Furthermore, using the categories we have no way of determining whether is an accessory or a product.
I wonder if using automatic classification would help. Another solution that improves this ranking (like Named Entity Recognition) would be appreciated.
Could you provide tagged data?
If you have >50k items a Naive Bayes with a bigram language model trained on the product name will almost catch all accessories with 99% accuracy. I guess you can train such a naive bayes with Mahout, however product names have a pretty limited bigram amount so this can be trained even on a smartphone easily and fast nowadays.
This is a typical mechanical turk task, shouldn't be that expensive to tag a few items. However if you insist on some semi-supervised algorithm, I found Iterative similarity aggregation pretty useful.
The main idea is that you give a few tokens like "case"/"power adapter" and it iteratively finds new tokens that are indicators of spam because they appear in the same context.
Here is the paper, but I have written a blogpost about this as well which sums up the intention in plain language. This paper also mentions the same "let the user find the right item" paradigm that Sean has proposed, so both can be used in conjunction.
Oh and if you need some advice of machine learning with Lucene&SOLR I can recommend you the talk of my friend Tommaso Teofili at ApacheCon Europe this year. You can find the slides on slideshare. There is also a youtube video of the talk out there, just search for it ;)
TF/IDF is just going to rank based on the words in the query vs words in the title as you have found. That sounds like it is not the right definition of "good result" and that you want to favor products over accessories.
Of course you can simply attach heuristics to patch the problem. For example, consider the title as a set of words, not multiset, so the appearance of "iPad" several times makes no difference. Or just boost the score of items that you know are products. This isn't learning per se, but are simple, directly reflect your business knowledge, and probably have some positive effect.
If you want to learn here, you probably need to use the one best source of knowledge about what the best results are: your users. You know what they click in response to each query. You can learn a term-item model that associates search terms to items clicked. You can view that as many types of problem -- actually a latent-factor recommender model could work well there.
Have a look at Ted's slides on how to use a recommender as a "search engine": http://www.slideshare.net/tdunning/search-as-recommendation

What are the most efficient algorithms to recommend items to groups of users?

Using collaborative filtering usually applies to giving ratings to an individual user, but how would these algorithms change when needing to recommend an item(s) to multiple people (for example: friends wanting to watch a movie or wanting to choose a holiday together)?
Since this question is at a very general level, I will answer it at that level.
The key change is that a loss function that is typically minimized (or an objective function is maximized) for an individual would be minimized for a set. Unless you have training data for sets, this tends to be very difficult. What's more, the set could change depending on the recommendation.
Nonetheless, a naive approach would be to suggest a least common denominator item: one that, on average, maximizes the objective function.

Resources