Google Datastore - Search Optimization Technique - google-app-engine

I am dealing with a real-estate app. A Home will hvae typical properties like Price, Bed Rooms, Bath Rooms, SqFt, Lot size etc. User will search for Homes and such a query will require multiple inequality filters like: Price between x and y, rooms greater than z, bathrooms more than p... etc.
I know that multiple inequality filters are not allowed. I also do not want to perform any filtering in my code and/because I want to be able to use Cursors.
so I have come up with two solutions. I am not sure if these are right - so wonder if gurus can shed some light
Solution 1: I will discretize the values of each attribute and save them in a list-field, then use IN. For example: If there are 3 bed rooms, instead of storing beds=3, I will store beds = [1,2,3]. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "beds IN [2]" - and my home above [1,2,3] will qualify - so so will any home with 2 beds [1,2] or 4 beds [1,2,3,4] and so on
Solution 2: It is similar to the first one but instead of creating a list-property, I will actually add attributed (columns) to the home. So a home with 3 bed rooms will have the following attributed/columns/properties: col-bed-1:true, col-bed-2:true, col-bed-3:true. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "col-bed-2 = true" - and my home will qualify - so will any home with 2 beds, 3 beds, 4 beds and so on
I know both solutions will work, but I want to know:
1. Which one is better both from a performance and google pricing perspective
2. Is there a better solution to do this?

I do almost exactly your use case with a python gae app that lists posts with housing advertisements (similar to craigslist). I wrote it in python and searching with a filter is working and straightforward.
You should choose a language: Python, Java or Go, and then use the Google Search API (that has built-in filtering for equalities or inequalities) and build datastore indexes that you can query using the search API.
For instance, you can use a python class like the following to populate the datastore and then use the Search API.
class Home(db.Model):
address = db.StringProperty(verbose_name='address')
number_of_rooms = db.IntegerProperty()
size = db.FloatProperty()
added = db.DateTimeProperty(verbose_name='added', auto_now_add=True) # readonly
last_modified = db.DateTimeProperty(required=True, auto_now=True)
timestamp = db.DateTimeProperty(auto_now=True) #
image_url = db.URLProperty();
I definitely think that you should avoid storing permutations for several reasons: Permutations can explode in size and makes the code difficult to read. Instead you should do like I did and find examples where someone else has already solved an equal or similar problem.
This appengine demo might help you.

Related

Is it possible to re-order query results in memory?

and thanks in advance for any and all help!!
I'm running a query on the datastore that looks like this:
forks = Thing.query(ancestor=user.subscriber_key).filter(
Thing.status==True,
Thing.fork_of==thing_key,
Thing.start_date <= user.day_threshold(),
Thing.level.IN([1,2,3,4,5])).order(
Thing.level)
This query works and returns the results I expect. However, I would like to sort it on one additional field (Thing.last_touched). If I add this to the sort, it won't work because Thing.last_touched is not the property to which the inequality filter is applied. I can't add an additional inequality filter, since we're only allowed one, plus it's not needed (actually, that's why Thing.leve.IN is there.. not needed as a filter, but required for the sort).
So, what I'm wondering is, could I run the query with the filters that I want, and then run code to sort the query results myself? I know I could pull all the parameters I want to sort and store them in dictionaries and sort them that way, but it seems to me there ought to be a way to handle this with the query.
I've searched for days for this but have had no luck.
Just in case you need it, here's the class definition of Thing:
class Thing(ndb.Model):
title = ndb.StringProperty()
level = ndb.IntegerProperty()
fork = ndb.BooleanProperty()
recursion_level = ndb.IntegerProperty()
fork_of = ndb.KeyProperty()
creation_date = ndb.DateTimeProperty(auto_now_add=True)
last_touched = ndb.DateTimeProperty(auto_now=True)
status = ndb.BooleanProperty()
description = ndb.StringProperty()
owner_id = ndb.StringProperty()
frequency = ndb.IntegerProperty()
start_date = ndb.DateTimeProperty(auto_now_add=True)
due_date = ndb.DateTimeProperty()
One of the main reasons that Google AppEngine is so fast even when dealing with insane amounts of data is because of the very limited query options. All standard queries are "scans" over an index, i.e. there is some table (index) that keeps references to your actual data entires in order sorted by ONE of the data's properties. So, let's say you add the following entries:
Thing A: start-date = Wednesday (I'm just going to use weekdays for simplicity)
Thing B: start-date = Friday
Thing C: start-date = Monday
Thing D: start-date = Thursday
Then, AppEngine will create an index that looks like this:
1 - Monday -> Thing C
2 - Wednesday -> Thing A
3 - Thursday -> Thing D
4 - Friday -> Thing B
Now, any query will correspond to a continuous block in this (or another) index. If you, for example, say "All Things with start-date >= Tuesday", it will return entries in row 2 through 4 (i.e. Thing A, Thing D, and Thing B in that exact order!). If you query for "< Thursday", you get 1-2. If you say "> Tuesday and <= Thursday" you get 2-3.
And if you are doing inequality filters on a different property, AppEngine will use a different index.
This is why you can only do one inequality filter and why the sort-order is always also specified by the property that you do an inequality filter of. Because AppEngine is not designed to be able to return items 1, 2, 4 (with a gap*) out of an index, or items 4, 2, 3 (no gap, but out of order).
So, if you need to sort your entries on a different property other than the one you use for inequality filtering, you basically have 3 choices:
Perform your query with the inequality filter, read all results into memory, and sort them in your code afterwards (I think this is what you mean by storing them in a dictionary)
Perform your query WITHOUT the inequality filter, but sorted on the right property. Then, as you loop over the returned entries, simply check the inequality yourself and drop the ones that don't match
Perform your query with the inequality filter and just return the items in the wrong order, and let the client-application worry about sorting them! ;)
Generally I would assume that you have much more unused resources available client-side to do the sorting, so I would probably go for option 3 in most cases. But if you need to sort the entries server-side (e.g. for a mobile-app targeted at older smart-phones), it will depend on the size of your database and the fraction of entries that usually match your inequality filter, whether option 1 or option 2 are better. If your inequality filter only removes a small fraction of the entries, option 2 might be much faster (as it doesn't require any O(>n) sorting), but if you have a huge database of entries and only a very small number of them will match the inequality, definitely go for option 1.
BTW: The talk "App Engine Datastore Under the Covers" from Google I/O 2008 might be a very helpful resource. It's a bit technical, but it gives a great overview of this topic and I consider it must-know information if you want to do anything in AppEngine. Note, though, that this talk is a bit out-dated. There are a bunch more things that you can do with queries now-a-days. But ALL of these extra things (if I understand correctly) are API functions that in the end just generate a set of several simple queries (exactly like the ones described in this talk) and then just combine the results of these in memory in your application (just like you would if you did your own sorting).
*There are some exceptions where AppEngine can generate the intersection of two (or more?) index-scans to drop items from the results, but I don't think that you could use that to change the order of the returned entries.

Firebase + AngularFire -> States?

I'd like to know how I would deal with object states in a FireBase environment.
What do I mean by states? Well, let's say you have an app with which you organize order lists. Each list consists of a bunch of orders, so it can be considered a hierarchical data structure. Furthermore each list has a state which might be one of the following:
deferred
open
closed
sent
acknowledged
ware completely received
ware partially received
something else
On the visual (HTML) side the lists shall be distinguished by their state. Each state shall be presented to the client in its own, say, div-element, listing all the related orders beneath.
So the question is, how do I deal with this state in FireBase (or any other document based database)?
structure
Do I...
... (option 1) use a state-field for each orderlist and filter on the clientside by using if or something similar:
orderlist1.state = open
order1
order2
orderlist2.state = open
order1
orderlist3.state = closed
orderlist4.state = deferred
... (option 2) use the hierarchy of FireBase to classify the orderlists like so:
open
orderlist1
order1
order2
orderlist2
order1
closed
orderlist3
deferred
orderlist4
... (option 3) take a totally different approach?
So, what's the royal road here?
retrieval, processing & visual output of option 2
Since for option 1 the answer to this question is apparantly pretty straight forward (if state == ...) I continue with option 2: how do I retrieve the data in option 2? Do I use a Firebase-object for each state, like so:
var closedRef = new Firebase("https://xxx.firebaseio.com/closed");
var openRef = new Firebase("https://xxx.firebaseio.com/open");
var deferredRef = new Firebase("https://xxx.firebaseio.com/deferred");
var somethingRef = new Firebase("https://xxx.firebaseio.com/something");
Or what's considered the best approach to deal with that sort of data/structure?
There is no universal answer to this question. The "best approach" is going to depend on the particulars of your use case, which you haven't provided here. Specifically, how you will be reading and manipulating the data.
Data architecture in NoSQL is all about working hard on writes to make reads easy. It's all about how you plan to use the data. (It's also enough material for a chapter in a book.)
The advantage to "option 1" is that you can easily iterate all the entire list. Great if your list is measured in hundreds. This is a great approach if you want to fetch the list and manipulate it on the fly on the client side.
The advantage to "option 2" is that you can easily grab a subset of the list. Great if your list is measured in thousands and you will typically be fetching open issues only rather than closed ones. This is great for archiving/new/old lists like yours.
There are other options as well.
Sorted Data using Priorities
Perhaps the most universal approach is to use ordered data. This allows you to query a subset of your records using something like:
new Firebase(URL).startAt('open').endAt('open').limit(10);
This is sufficient in most cases where you have only one criteria, or when you can create a unique identifier from multiple criteria (e.g. 'open:marketing') without difficulty. Examples are scoreboards, state lists like yours, data ordered by timestamps.
Using an index
You can also create custom subsets of your data by creating an index of keys and using that to fetch the others.
This is most useful when there is no identifiable characteristic of your subsets. For example, if I pick them from a list and store my favorites.
I think my this plnkr can help you for this.
Here, click on edit/add and just check the country(order in your case) - State(state in your case) dependent dropdown may be the same as you want.just one single thing you may need to add is filter it.
They both are different tables in db.
You can also get it from git.

GQL query with "like" operator [duplicate]

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Co-occurrence of words in documents with Google big table

Given document-D1: containing words (w1,w2,w3)
and document D2 and words (w2,w3..)
and document Dn and words ( w1,w2, wn)
Can I structure my data in big table to answer the questions like:
which words occur most frequently with w1,
or which words occur most frequently with w1 and w2.
What I am trying to achieve is to find the third word Wx (suggestion) which ocures most frequently in documents togehter with given words W1 and W2
I know the solution in SQL, but is it possible with google-big table?
I know I would have to build my indices by myself, the question is how should I structure them to avoid index explosion
thanks
almir
The only way to do this that I'm aware of is to index all 3-tuples of words, with their counts. Your kind would look something like this:
class Tuple(db.Model):
words = db.StringListProperty()
count = db.IntegerProperty()
Then, you need to insert or update the appropriate tuple entity for each set of 3 unique words in your text. Eg, the string "the king is dead" would result in the tuples (the, king, is), (the, king, dead), (the, is, dead), (king, is, dead)... This obviously results in an exponential explosion in entries, but I'm not aware of any way around that for what you want to do.
To find the suggestions, you'd do something like this:
q = Tuple.all().filter('word =', w1).filter('word =', w2).order('-count')
In the broader sense of recommendation algorithms, however, there is a lot of research into more efficient ways to do this. It's an open question, as evidenced by the existence of the Netflix challenge.
Using list-properties and merge-join is the best way to answer set membership questions in Google App Engine: Building Scalable, Complex Apps on App Engine.
You could setup your model as follows:
class Document(db.Model):
word = db.StringListProperty()
name = db.StringProperty()
...
doc.word = ["google", "app", "engine"]
Then it would be easy to query for co-occurrence. For example, which documents have the words google and engine?
results = db.GqlQuery(
"SELECT * FROM Documents "
"WHERE word = 'google'"
" and word = 'engine'")
docs = [d.name for d in results]
There are some limitations, though. From the presentation:
Index writes are done in parallel on
Bigtable Fast-- e.g., update a list
property of 1000 items with 1000 row
writes simultaneously! Scales linearly
with number of items Limited to 5000
indexed properties per entity
But queries must unpackage all result
entities When list size > ~100, reads
are too expensive! Slow in wall-clock
time Costs too much CPU
You could also create a model of words and save in the StringListProperty only their keys, but depending on the size of your documents even that would not be feasible.
There is nothing inherent to the AppEngine datastore that will help you with this problem. You will need to index the words in the documents programatically.

Google App Engine: Is it possible to do a Gql LIKE query?

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources