Google App Engine Search API (Python) - search on multiple search indexes - google-app-engine

Is it possible to perform a search on multiple indexes using App Engine's search api in order to leverage the search.Query options for both indexes?
For example, search.Index(a) has Book documents and search.Index(b) has Movie documents. Then, the following query:
search.Query(
query_string="Dog",
options=search.QueryOptions(
limit=10,
offset=0,
sort_options=search.SortOptions(
match_scorer=search.MatchScorer())))`
would return the first 10 results from the Book and Movie indexes combined that best match the query string "Dog".
From the docs, it seems to suggest the search api is limited only to one search index but I'm wondering if there was a workaround.

Related

Simple search in App Engine

I want people to be able to search from a title field and a short description field (max 150 characters), so no real full-text search. Mainly they search for keywords, like "salsa" or "club", but I also want them to be able to search for "salsa" and match words like "salsaclub", so at least some form of partial matching.
Would the new Search API be useful for this kind of search, or would I be better off putting all keywords, including possible partial matches, in a list and filter on this list?
Trying to put all the keywords and partial matches (some sort of support for stemming etc) might work if you limit yourself to small numbers of query terms (ie 1 or 2) anything more complex will become costly. If you want anything more than a one or two terms I would look at the alternatives.
You haven't said if your using python or java, go php. If python have a look at Whoosh for appengine https://github.com/tallstreet/Whoosh-AppEngine or go with the Search API.

Implementing keyword search on Google App Engine?

I'm trying to implement a keyword/tags search for a certain entity type in GAE's datastore:
class Docs(db.Model):
title = db.StringProperty()
user = db.StringProperty()
tags = db.StringListProperty()
I also wrote a very basic search function (using a fake list of tags, not the datastore values), that takes a query string and matches it to each set of tags. It ranks Docs based on how many query words match the tags. This is pretty much all I need it to do, except using the actual datastore.
I have no idea how to get the actual datastore values though. For my search function to work I need a list of all the entities in the datastore, which is impossible (?).
I also tried looking into GAE's experimental full-text search, and Relation Index Entities as a way to search the datastore without using the function I wrote. Neither was successful.
Any other ideas on how to search for entities based on tags?
It's a very simple query, if you need to find all Docs with a tag "findme", it's simply:
num_results = 10
query = Docs.all().filter("tags in", "findme")
results = query.fetch(num_results) # get list of results
It's well documented:
https://developers.google.com/appengine/docs/python/datastore/queries

GAE Full Text Search API phrase matching

I can only find exact phrase matching for queries in the experimental Search API for Google App Engine. For example the query 'best prices hotel' will only match that exact phrase. It will not match texts such as 'best hotel prices' or 'best price hotels'. It's of course a much more difficult task to match text in a general way but I thought the Search API would at least be able to handle some of that.
Another example is the query 'new cars' which will not match the text 'new and used cars'.
You should be able to use the '~' operator to rewrite queries to include plurals.
E.g., ~hotel or ~"best prices hotel".
Documentation about this operator should be added in the next app engine SDK release.

how to Index URL in SOLR so I can boost results after website

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.
I want to boost results from a specific website using boost query.
For example I have 4 documents each containing in SourceURL the following data
https://meta.stackoverflow.com/page1
http://www.stackoverflow.com/page2
https://stackoverflow.com/page3
https://stackexchange.com/page1
I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).
Do you know how can I index the url field and then use boost query to identify all the documents from a specific website like in the case above ?
One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).
Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.

web2py like equivalents with google app engine

Is there any way to generate queries similar to the like, contains, startswith operators with app engine BigTable database?
So that I could do something similar to:
db(db.some_table.like('someting')).select()
with app engine in web2py.
App engine does not support full text search so short answer is no.
What you can do with web2py is create a computed filed with a list of keywords to search for.
def tokenize(r): return [x.lower() for x in re.compile('\w+').findall(r.title)]
db.define_table('data',
Field('title'),
Field('keywords','list:string',compute=tokenize,writable=False,readable=False))
On GAE the keywords field is a StringListProperty().
Then instead of searching in title, you search in keywords:
rows = db(db.data.keywords.contains(my_keyword.lower())).select()
This works on GAE and it is very efficient. The problem now is that you will not be used to combine it in complex queries because of the GAE "exploding" indexes problem. For example is you have N keywords and want to search for two keywords:
rows = db(db.data.keywords.contains(my_keyword1.lower())&
db.data.keywords.contains(my_keyword2.lower())).select()
Your index size becomes N^2. So you have to perform more complex queries locally:
query2=lambda r: my_keyword1.lower() in r.keywords
rows = db(db.data.keywords.contains(my_keyword1.lower())).select().find(query2)
All of this will also work on GAE and not-on-GAE. It is portable.

Resources