I'm trying to implement a keyword/tags search for a certain entity type in GAE's datastore:
class Docs(db.Model):
title = db.StringProperty()
user = db.StringProperty()
tags = db.StringListProperty()
I also wrote a very basic search function (using a fake list of tags, not the datastore values), that takes a query string and matches it to each set of tags. It ranks Docs based on how many query words match the tags. This is pretty much all I need it to do, except using the actual datastore.
I have no idea how to get the actual datastore values though. For my search function to work I need a list of all the entities in the datastore, which is impossible (?).
I also tried looking into GAE's experimental full-text search, and Relation Index Entities as a way to search the datastore without using the function I wrote. Neither was successful.
Any other ideas on how to search for entities based on tags?
It's a very simple query, if you need to find all Docs with a tag "findme", it's simply:
num_results = 10
query = Docs.all().filter("tags in", "findme")
results = query.fetch(num_results) # get list of results
It's well documented:
https://developers.google.com/appengine/docs/python/datastore/queries
Related
Is it possible to perform a search on multiple indexes using App Engine's search api in order to leverage the search.Query options for both indexes?
For example, search.Index(a) has Book documents and search.Index(b) has Movie documents. Then, the following query:
search.Query(
query_string="Dog",
options=search.QueryOptions(
limit=10,
offset=0,
sort_options=search.SortOptions(
match_scorer=search.MatchScorer())))`
would return the first 10 results from the Book and Movie indexes combined that best match the query string "Dog".
From the docs, it seems to suggest the search api is limited only to one search index but I'm wondering if there was a workaround.
I need to search by DocId because I have files in Drive that I am also searching, and need to merge the results. I also need to limit the results by other fields. I tried this query:
INFO: Searching with query: DocId:(4842249208725504 5405199162146816 5510752278413312 5581121022590976 5827411627212800)
However it found 0 results even though they exist. I also tried doc_id and id.
log.info("Searching with query: " + q);
try {
Results<ScoredDocument> results = getIndex().search(q);
I will also need to filter by other fields, ex:
DocId:(123456789) year:(2012)
The other fields work during searching, but not DocId. In the Admin interface, it shows DocId as being one of the fields! http://localhost:8888/_ah/admin/search?subsection=searchIndex...
Inside each document have an atom field named docId and in that field pass in the doc id. Then you can do a search per normal (as you suggested).
Here is a quote from the documentation
While it is convenient to create readable, meaningful unique document
identifiers, you cannot include the doc_id in a search. Consider this
scenario: You have an index with documents that represent parts, using
the part's serial number as the doc_id. It will be very efficient to
retrieve the document for any single part, but it will be impossible
to search for a range of serial numbers along with other field values,
such as purchase date. Storing the serial number in an atom field
solves the problem.
If you know the doc ID in advance, rather then searching for it why not just get it directly?
doc = index.get("AZ125")
https://developers.google.com/appengine/docs/python/search/#Python_Retrieving_documents_by_doc_ids
I want people to be able to search from a title field and a short description field (max 150 characters), so no real full-text search. Mainly they search for keywords, like "salsa" or "club", but I also want them to be able to search for "salsa" and match words like "salsaclub", so at least some form of partial matching.
Would the new Search API be useful for this kind of search, or would I be better off putting all keywords, including possible partial matches, in a list and filter on this list?
Trying to put all the keywords and partial matches (some sort of support for stemming etc) might work if you limit yourself to small numbers of query terms (ie 1 or 2) anything more complex will become costly. If you want anything more than a one or two terms I would look at the alternatives.
You haven't said if your using python or java, go php. If python have a look at Whoosh for appengine https://github.com/tallstreet/Whoosh-AppEngine or go with the Search API.
In my application, I'd like the search API to value a match in the name field, higher than a match in the other fields.
A user can also fill in an 'about' message, which has way more text, so it could be more likely that a match happens there. Is there any way to do this?
SortExpression (https://developers.google.com/appengine/docs/python/search/sortexpressionclass) provides a way to set the sort based on a particular expression, but it only offers a document-wise score (i.e. not per field).
Another (probably bad idea) is to search only by name field, using a query string like "name: my_search_term_here"
So from my knowledge Search API of Google App Engine offers no way to bias one field during search (i.e. similar to the ^ operator in ApacheSolr Lucene).
I am not aware of this functionality in Google AppEngine. Having said that, you could split this problem in two steps. First search for a term in name field, which would give you a list of documents, call it list1. Then search for the same term in less important fields. This would give you another list, call it list2. You can then combine these two lists in any way you want - i.e. make a new list3 which is a concatenation of list1 and list2 and all items from list1 are before items from list2. Hope this helps.
I am wondering if there is anyway to transform an end user query to a more complicated solr query based on some rules.
For example, if the user types in 32" television, then I want to use the dismax query parser to let solr take care of this user query string like below:
http://localhost:8983/solr/select/?q=32" television&defType=dismax
However, if the user types in "televisions on sale", then I want to do a regular search for token televisions and onsale flag is true like below:
http://localhost:8983/solr/select/?q=name:televisions AND isOnSale:true
Is this possible? Or must this logic require an advance search form where the user can clearly state in a checkbox that they only want on sale items.
Thanks.
Transforming the user query is quite possible. You can do it in following two ways
implement a Servlet Filter that listens to user query transforms it before dispatching it to solr request handler.
Look at query parser plugin in SOLR and implement one based on the existing one like standard query parser and modify it to apply transformation rules.
Let the search happen through the whole index and let the user choose. If a review shows up, render it with the appropriate view. If a product shows up, offer to search for more products.
Samsung 32 in reviews --read more
LG 32 in offers --find more like this
Your offers page can offer more options, such as filtering products on sale.
You may use a global boost field on documents. For example, a product on sale has a score of 1.0 while out of stock products have 0.33. A review of a new products has 1.0, old products have less.
Maybe you can set up the search so when someone searches for whatever have isOnSale as a secondary sort parameter. So by default sort by score then sort by isonsale or just sort by isonsale. That way you will still get all "television" ads in the results just the ones on sale are on top.