web2py like equivalents with google app engine - google-app-engine

Is there any way to generate queries similar to the like, contains, startswith operators with app engine BigTable database?
So that I could do something similar to:
db(db.some_table.like('someting')).select()
with app engine in web2py.

App engine does not support full text search so short answer is no.
What you can do with web2py is create a computed filed with a list of keywords to search for.
def tokenize(r): return [x.lower() for x in re.compile('\w+').findall(r.title)]
db.define_table('data',
Field('title'),
Field('keywords','list:string',compute=tokenize,writable=False,readable=False))
On GAE the keywords field is a StringListProperty().
Then instead of searching in title, you search in keywords:
rows = db(db.data.keywords.contains(my_keyword.lower())).select()
This works on GAE and it is very efficient. The problem now is that you will not be used to combine it in complex queries because of the GAE "exploding" indexes problem. For example is you have N keywords and want to search for two keywords:
rows = db(db.data.keywords.contains(my_keyword1.lower())&
db.data.keywords.contains(my_keyword2.lower())).select()
Your index size becomes N^2. So you have to perform more complex queries locally:
query2=lambda r: my_keyword1.lower() in r.keywords
rows = db(db.data.keywords.contains(my_keyword1.lower())).select().find(query2)
All of this will also work on GAE and not-on-GAE. It is portable.

Related

Google App Engine Search API (Python) - search on multiple search indexes

Is it possible to perform a search on multiple indexes using App Engine's search api in order to leverage the search.Query options for both indexes?
For example, search.Index(a) has Book documents and search.Index(b) has Movie documents. Then, the following query:
search.Query(
query_string="Dog",
options=search.QueryOptions(
limit=10,
offset=0,
sort_options=search.SortOptions(
match_scorer=search.MatchScorer())))`
would return the first 10 results from the Book and Movie indexes combined that best match the query string "Dog".
From the docs, it seems to suggest the search api is limited only to one search index but I'm wondering if there was a workaround.

Simple search in App Engine

I want people to be able to search from a title field and a short description field (max 150 characters), so no real full-text search. Mainly they search for keywords, like "salsa" or "club", but I also want them to be able to search for "salsa" and match words like "salsaclub", so at least some form of partial matching.
Would the new Search API be useful for this kind of search, or would I be better off putting all keywords, including possible partial matches, in a list and filter on this list?
Trying to put all the keywords and partial matches (some sort of support for stemming etc) might work if you limit yourself to small numbers of query terms (ie 1 or 2) anything more complex will become costly. If you want anything more than a one or two terms I would look at the alternatives.
You haven't said if your using python or java, go php. If python have a look at Whoosh for appengine https://github.com/tallstreet/Whoosh-AppEngine or go with the Search API.

Implementing keyword search on Google App Engine?

I'm trying to implement a keyword/tags search for a certain entity type in GAE's datastore:
class Docs(db.Model):
title = db.StringProperty()
user = db.StringProperty()
tags = db.StringListProperty()
I also wrote a very basic search function (using a fake list of tags, not the datastore values), that takes a query string and matches it to each set of tags. It ranks Docs based on how many query words match the tags. This is pretty much all I need it to do, except using the actual datastore.
I have no idea how to get the actual datastore values though. For my search function to work I need a list of all the entities in the datastore, which is impossible (?).
I also tried looking into GAE's experimental full-text search, and Relation Index Entities as a way to search the datastore without using the function I wrote. Neither was successful.
Any other ideas on how to search for entities based on tags?
It's a very simple query, if you need to find all Docs with a tag "findme", it's simply:
num_results = 10
query = Docs.all().filter("tags in", "findme")
results = query.fetch(num_results) # get list of results
It's well documented:
https://developers.google.com/appengine/docs/python/datastore/queries

Solr Lucene - Not sure how to index data so documents scored properly

Here's my goal. A user has a list of skill+proficiency tuples.
We want to find users based on some skill/experience criteria:
java, novice
php, expert
mysql, advanced
Where the * skills are highly desired and all others are good to have. Users which meet or exceed (based on experience) would be ranked highest. But it should also degrade nicely. If no users have both java and php experience, but they have one of the highly desired skills they should be ranked at the top. Users with only one of the optional skills may appear at the bottom.
An idea I had is to index a user's skills in fields like this:
skill_novice: java
skill_novice: php
skill_advanced: php
skill_expert: php
skill_novice: mysql
skill_advanced: mysql
...so that at minimal I can do a logical query to find people who meeting the highly desired skills:
(skill_novice:java AND skill_expert:php)
but this doesn't degrade nicely (if no matches found) nor does it find the optional skills. Perhaps instead I can do something like this:
skill_novice:java AND
(skill_novice:php^0.1 OR skill_advanced:php^0.2 OR skill_expert:php^0.3)
Is there a better way to accomplish this?
I think you could boost the field with the different values at index time:
// mysql expert
Field mysqlf = new Field("skill", "mysql",
Field.Store.YES,
Field.Index.ANALYZED);
mysqlf.setBoost(10.0F);
// mysql begginer
mysqlf = new Field("skill", "mysql",
Field.Store.YES,
Field.Index.ANALYZED);
mysqlf.setBoost(1.0F);
You need to enable norms for this to work.

Wildcard search on Appengine in python

I'm just starting with Python on Google App Engine building a contact database. What is the best way to implement wildcard search?
For example can I do query('name=', %ewman%)?
Unfortunately, Google app engine can't do partial text matches
From the docs:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2", "abc", u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
App Engine can't do 'like' queries, because it can't do them efficiently. Nor can your SQL database, though: A 'foo LIKE "%bar%"' query can only be executed by doing a sequential scan over the entire table.
What you need is an inverted index. Basic fulltext search is available in App Engine with SearchableModel. Bill Katz has written an enhanced version here, and there's a commercial solution for App Engine (with a free version) available here.

Resources