Understanding ListProperty backend behavior in GAE - google-app-engine

I'm trying to understand how you're supposed to access items in a GAE db.ListProperty(db.Key).
Example:
A Magazine db.Model entity has a db.ListProperty(db.Key) that contains 10 Article entities. I want to get the Magazine object and display the Article names and dates. Do I make 10 queries for the actual article objects? Do I do a batch query? What if there's 50 articles? (Don't batch queries rely on the IN operator, which is limited to 30 or fewer elements?)

So you are describing something like this:
class Magazine(db.Model):
ArticleList = db.ListProperty(db.Key)
class Article(db.Model):
ArticleName = db.StringProperty()
ArticleDate = db.DateProperty()
In this case the simplest way to grab the listed articles is to use the Model.get() method, which looks for a key list.
m = Magazine.get() #grab the first record
articles = Article.get(m.ArticleList) #get Articles using key list
for a in articles:
name = a.ArticleName
date = a.ArticleDate
#do something with this data
Depending on how you plan on working with the data you may be better off adding a Magazine reference property to your Article entities instead.

You need to read Modeling Entity Relationships, especially the part about one to many.

Related

yii2 find with() - What is it for?

Maybe this question is very simple but I couldn't understand what is with() in yii2 despite I've read couple of articles about it. What does this mean:
$players = PlayersModel::find()->with("countries")->all();
What is this for? In my database (tables are related) on what purpose can it be used:
Please show me useful aspect of this feature: with()
with() is explained in the Yii 2 Guide.
This method allows to eagerly load the relational data in your query.
In your example there is PlayersModel. I assume there is also ClubsModel that represents data from database table clubs.
Let's say Player belongs to one of the Clubs. There should be defined relation between PlayersModel and ClubsModel. If it's defined in PlayersModel it could be something like:
public function getClub()
{
return $this->hasOne(ClubsModel::className(), ['id' => 'id_club']);
}
So now there is relation named club. Each time you call $model->club (where $model is object of PlayersModel) you get related ClubsModel object.
Now - when you look for specific Player:
$player = PlayersModel::find()->where(['id' => $id])->one();
or (a bit simpler to write):
$player = PlayersModel::findOne($id);
This is one performed SQL query. In next step you want to get the Club of this Player - there is relation already defined so you can call:
$club = $player->club;
But this performes another SQL query - it's called lazy loading.
Let's say you know you need Player data together with his Club data at once - you can use with() to get this:
$player = PlayersModel::find()->where(['id' => $id])->with('club')->one();
It's one SQL query. Now when you call:
$club = $player->club;
There is no need for second query this time because this relational data is already fetched - it's called eager loading.

Google NDB: Best way to read child entities from an entity, repeated property vs regular query?

Let's say i have this really simple parent/child relatiosnship (any Answer class instances always has a Question parent):
class Answer(ndb.Model):
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
class Question(ndb.Model):
content = ndb.StringProperty()
answers = ndb.KeyProperty(repeated = True, kind = 'Answer')
def to_message(self):
"""Returns a protoRPC message object of the question"""
The two to message methods are simply used to return a protoRPC object.
The question is: in my to_message method, in the Question class, if i want to fetch all child Answer instances, retrieve them, and use their own to_message method to make them into a nice rpc Message, is it better to:
Iterate over the anwers repeated KeyProperty list
Do a query using a filter on the "parent" property, and iterate over the list it outputs
In terms of NDB access, the first method seems to be the best, but since we're going to go over the free limit anyway, i'm more wondering if the datastore is not more efficient at fetching stuff than i am, iterating over that list.
Edit: The original question has actually a very simple and obvious answer: the first way.
The real question would be, in case I have to filter out some Answer entities based on their attributes (for instance timestamp): is it better to query using a filter, or iterate over the list and use a condition to gather only the "interesting" entities?
With that schema you don't have to query anything because you already have the keys of each answer as a list of keys in question_entity.answers
So you only have to fetch the answers using that keys. Is better if you get all the answers in only one operation.
list_of_answers = ndb.get_multi(question_entity.answers)
(More info at NDB Entities and Keys)
On the other hand, if you model that relationship with a KeyProperty in Answer:
class Answer(ndb.Model):
question = ndb.KeyProperty(Question)
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
or with ancestors:
answer = Answer(parent=question_entity.key)
In these cases you should use a normal query for retrieve the answers:
answers = Answer.query(Answer.question == question_entity.key)
or an ancestor query:
answers = Answer.query(ancestor = question_entity.key)
respectively.
This means two jobs: Query the index plus fetching the datastore. In conclusion, in this case the first approach is cheaper for retrieving datastore data.
Using ndb.get_multi on the list of keys to fetch the Answers, and then iterating to call their to_message methods will be the most efficient.

Tracking item order for storage to and retrieval from a DB

I'm trying to figure out how I'm going to 'CRUD' the order of items I have in a group that I'm storing in a database. (Pseudo statement of: select * items from app where group_id = 1;)
My guess is that I just use an numeric field and just increase/decrease the number as more items are added to/removed from the group. I can then just update the items number in this field as they are moved around. However, I've seen this go really badly wrong in an old legacy app where items would get out of sync and you'd have a group where the order ended up something like this:
0,1,1,3,4,5
0,1,1,1,4,5
This wasn't handled very gracefully by the application either, and broke the application necessitating manual intervention to reorder the items in the DB.
Is there a way to avoid this pitfall?
EDIT: I would also maybe want the items available in multiple groups with multiple orders.
I think in that case I would need a many to many relationship for both the group to item relationship and the item to order relationship. /EDIT
I'll be doing this in the Django framework.
I'm not really sure what you are asking; because ordering is one thing, and grouping of related objects is something else entirely.
Databases don't store the order of things, but rather the relationships (grouping) of things. The order of things is a user interface detail and not something that a database should be used for.
In django, you can create a ManyToMany relationship. This essentially creates a "box" where you can add and remove items that are related to a particular model. Here is the example from the documentation:
from django.db import models
class Publication(models.Model):
title = models.CharField(max_length=30)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.headline
class Meta:
ordering = ('headline',)
Here an Article can belong to many Publications, and Publications have one or more Articles associated with them:
a = Article.create(headline='Hello')
b = Article.create(headline='World')
p = Publication.create(title='My Publication')
p.article_set.add(a)
p.article_set.add(b)
p.save()
# You can also add an article to a publication from the article object:
c = Article.create(headline='The Answer is 42')
c.publications.add(p)
To know how many articles belong to a publication:
Publication.objects.get(title='My Publication').article_set.count()

Should the size of entities be as small as possible when I count them by "count()" method?

I'm wondering if I should have a kind only for counting entities.
For example
There is a model like the following.
class Message(db.Model):
title = db.StringProperty()
message = db.StringProperty()
created_on = db.DateTimeProperty()
created_by = db.ReferenceProperty(User)
category = db.StringProperty()
And there are 100000000 entities made of this model.
I want to count entities which category equals 'book'.
In this case, should I create the following mode for counting them?
class Category(db.Model):
category = db.StringProperty()
look_message = db.ReferenceProperty(Message)
Does this small model make it faster to count?
And does it erase smaller memory?
I'm thinking to count them like the following by the way
q = db.Query(Message).filter('category =', 'book')
count = q.count(10000)
Counting 100000000 entities is a very expensive operation on a NoSQL database as the App Engine datastore. You'll probably want to count as you update, or run a map-reduce operation to count after the fact.
App Engine also offers a simple way to query how many entities of each type you have:
https://developers.google.com/appengine/docs/python/datastore/stats
For example, to count all Messages:
from google.appengine.ext.db import stats
kind_stats = stats.KindStat().all().filter("kind_name =", "Message").get()
count = kind_stats.count
Note that stats are updated asynchronously, so they'll lag the actual count.
I think that you have to create another entity like this.
This entity will just count the number of messages by category.
Just change your category to this:
class Category(db.model):
category = db.StringProperty()
totalOfMessages = db.IntegerProperty(default=0)
In the message class you change to reference the category class, just change the category property to:
category = db.ReferenceProperty(Category)
When you create a new Message object, you have to update the counter, increment when you create a new message or decrement if you delete.
The best way to work with counters on GAE is using Sharding Counters
Count is implemented as an index scan that discards all data except the number of records seen . It never looks up the entity, so the size of the entity does not matter.
That being said, counting like this does not scale and is quite costly in a system without a fixed schema. It would likely be better to use another method like a Sharded Counter, MapReduce or Materialized View/Fork Join. If you really want it to scale, this talk is pretty informative: http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html

Google App Engine: Is it possible to do a Gql LIKE query?

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources