GAE: Efficiently querying entities and their referenced entities

GAE: Efficiently querying entities and their referenced entities - google-app-engine

I am trying to efficiently return a list of entities, and their respective referenced entities in a template view. For example
We are working with two Kinds:
class Event(ndb.Model):
"""An fun activity or event"""
name = ndb.StringProperty()
venue = ndb.KeyProperty()
class Venue(ndb.Model):
"""The venue of an event"""
name = ndb.StringProperty()
address = StringProperty()
The Event kind references Venue via an ndb.KeyProperty(). To display a list of events and their respective venues into a template, I can first do this query:
# we can fetch this from memcache
events = Event.query().fetch()
Then, in my view:
{% for event in events %}
Event Name: {{event.name}}
Event Venue: {{event.venue.get().name}} # is this line costly?
{% endfor %}
With this method, I think that for each event, there will be get() call for its respective venue. If this is true, it sounds expensive. Assuming that there are 100 events. Each page load would incur 100 event.venue.get().name requests. This means that a modest 10000 page views per day would incur 10000 * 100 .get() requests. Does that sound correct?
Is this the best approach to this problem? If not, what options can I consider?

First, depending on the total number of venues in your dataset, they may all easily fit into Memcache. So unless a venue is modified, you can go for days without touching the datastore - regardless of the number of page views. Make sure you use Memcache for your venues too.
Second, a more efficient way to retrieve entities is in a batch request. Loop through your events, create a list of all venues that you need (by the way, which may be smaller than the number of events if several events happen in the same venue - I don't see that check in your code), then issue a batch request for all venues.

Here is the Python code to fetch all of the venue names:
venue_keys = set(event.venue for event in events)
venues = ndb.get_multi(venue_keys)
venue_name = {venue.key, venue.name for venue in venues}
Then, in your template, you can use:
Event Venue: {{ venue_name.get(event.venue, 'No venue') }}

Related

Single piece of content, multiple URLs?

I have a use case that I could use some advice on.
We publish multiple products, each of which has it's own subtree on the site. Generally, a piece of content gets published to just a single product, e.g. a news article gets published to product A and can be accessed at one URL.
However, sometimes we have content that we want to publish to multiple products, e.g. a single news article gets published to products A, B, and C and will be available at 3 different URLs.
With our current CMS we end up doing this by copying and pasting the content, which is a hassle for editors, especially if the content needs to be updated.
An ideal scenario would be where and editor edits the content in one place, specifies the products to publish to, and the content is served by more than one URL and with a template that is product-specific.
It seems that RoutablePageMixin could be useful here, but I'm not sure how to handle letting the editor specify the destination products and making the routing aware of that choice.
Has anyone solved a similar problem using Wagtail?

I have solved a similar problem in Wagtail, the RoutablePageMixin is the key to solving this problem.
If you have /blog/A/slug-product/, /blog/B/slug-product/, /blog/C/slug-product/ , then you can get the slug value slug-product here, then use this value to search the distinct content in your db.
class BlogPage(RoutablePageMixin, Page):
def get_posts(self):
return PostPage.objects.descendant_of(self).live()
#route(r'^(\d{4})/(\d{2})/(\d{2})/(.+)/$')
def post_by_date_slug(self, request, year, month, day, slug, *args, **kwargs):
post_page = self.get_posts().filter(slug=slug).first()
return Page.serve(post_page, request, *args, **kwargs)
As you can see, I did not use the date info of the url but the slug value to get the blog post object, you can follow the pattern here to use regex to match the url you want.
If the slug values in urls are also different, this solution might not work very well, but in most cases, this solution can work fine.
I have written a blog post talking about how to use RoutablePageMixin to make the page routable, you can check this link if you wang to get more about RoutablePageMixin.
Routable Page

Rather than thinking of your news articles as being child objects of one or more products, it might help to think of them as one big pool of news articles which are categorised by product. Your product page will then effectively be a filtered index page of news articles.
Here's how I'd model it:
If you want your news articles to exist at a canonical URL that's independent of any particular category, or you want to make use of page moderation and/or previewing, then define NewsArticle as a page model; otherwise, define it as a snippet or a ModelAdmin-managed model.
On the NewsArticle model, have an InlinePanel where editors can associate as many related products as required:
class NewsArticle(Page):
body = RichTextField()
date = models.DateField()
content_panels = Page.content_panels + [
FieldPanel('body'),
FieldPanel('date'),
InlinePanel('related_products', label="Related products"),
]
class NewsArticleRelatedProduct(Orderable):
news_article = ParentalKey(NewsArticle, related_name='related_products')
product = models.ForeignKey(ProductPage, on_delete=models.CASCADE, related_name='news_articles')
panels = [
PageChooserPanel('product'),
]
On your ProductPage model, add a method that returns a queryset of news items, filtered and sorted appropriately:
class ProductPage(Page):
# ...
def get_news_articles(self):
return self.news_articles.live().order_by('-date')
You can then loop over the news articles in your product page template, using a tag like {% for news_article in page.get_news_articles %}.

How to find distinct set of tags from multiple entities in app engine

I am writing a system on app engine that collects "samples" and provides services for querying and analyzing the samples. The data model for a sample looks similar to this:
class Sample(ndb.Model):
category = ndb.StringProperty()
name = ndb.StringProperty()
data = ndb.JsonProperty()
timestamp = ndb.DateTimeProperty()
tags = ndb.StringProperty(repeated = True)
As you can see, for each sample there is a set of string tags. For example something like:
['CustomerA', '2.0.5', 'featureX', 'logTypeB', ...]
I have a handler that allows querying over all samples in the system based upon filters on the base properties and including a set of tags to require. Note: the results set can be very large, so the query supports paging/limits so I return data a bit at a time. That all works.
Now when I am putting a user interface on top of this I would like a way to present the user with an autocomplete field for entering additional tags to further filter the results. So for example if they have restricted it down to samples with the following tags:
Sample(..., tags=['CustomerA', '2.0.5', 'featureX'])
Sample(..., tags=['CustomerA', '2.0.5', 'featureY'])
Sample(..., tags=['CustomerB', '2.0.5', 'featureX'])
Sample(..., tags=['CustomerB', '2.0.5', 'featureX'])
Sample(..., tags=['CustomerB', '2.0.5', 'featureY'])
then I want to show them an autocomple that includes:
['CustomerA', 'CustomerB', '2.0.5', 'featureX', 'featureY']
In other words I need a handler that can return a unique list of tags that exist in the current set of results. The problem is that I can't see anyway to do this in App Engine without iterating over all result samples (potentially very large) and building up a set of unique tags to return.
I could keep a separate set of entities for all tags in the system, but this doesn't solve the problem either. It would allow me to quickly find all the tags that exist over all the Samples in the system, but not restrict it to the set of Samples that pass the current filter.
Any ideas on what I could do to implement this in a reasonable way?

The best way to do this, is by saving the tags in a separate entity which is used solely for the autocomplete. Since the tag names are unique, you can use the tag as the entity key. This can be made simple using the ndb model hooks. For example:
class SampleTag(ndb.Model):
tag = ndb.StringProperty()
class Sample(ndb.Model):
category = ndb.StringProperty()
name = ndb.StringProperty()
data = ndb.JsonProperty()
timestamp = ndb.DateTimeProperty()
tags = ndb.StringProperty(repeated = True)
def _pre_put_hook(self):
for tag in self.tags:
SampleTag.get_or_insert(name=tag)
Then you can use the values in SampleTag to display in your autocomplete.
This is just an example - it's not very efficient, especially if you have long lists of tags. To improve it, you should determine which tags (if any) have been added since last save, and only loop through those. Also, you may wish to use async calls, or maybe delegate the _pre_put routine entirely to a taskqueue, which will speed up the time it takes to put() your models.
Also, this doesn't handle deletion. This is a bit more tricky, as you cannot know in advance if the tag exists elsewhere. To do this, I'd use a cron job to periodically check if your tags exist.

Filtering a collection vs. several collections in Backbone?

When is it appropriate to filter a collection vs. having several collections in Backbone?
For example, consider a music library app. It would have a view for displaying genres and another view for displaying the selected genre's music.
Would you rather make one huge collection with all the music and then filter it or several smaller ones?
Having just one collection would allow you add features for filtering by other attributes as well, but suppose you have tons of music: how do you prevent loading it all in when the application starts if the user if only going to need 1 genre?

I think the simplest approach is having a common unique Collection that, intelligently, fetch an already filtered by genre data from the server:
// code simplified and no tested
var SongsCollection = Backbone.Collection.extend({
model: Song,
url: function() {
return '/songs/' + this.genre;
},
initialize: function( opts ){
this.genre = opts.genre;
}
});
var mySongsCollection = new SongsCollection({ genre: "rock" });
mySongsCollection.fetch();
You have to make this Collection to re-fetch data from the server any time the User changes the selected genre:
mySongsCollection.genre = "punk";
mySongsCollection.fetch();

It's mostly a design choice, but my vote would be to choose a scheme that loosely reflects the database storing the collections.
If you're likely to be storing data in an SQL database, you will more likely than not have separate tables for songs and genres. You would probably connect them either via a genre_id column in the song table, or (if songs can have more than one genre) in terms of a separate song_genres join table. Consequently, you would probably want separate collections representing genres and the songs within them. In this case, backbone-relational might be very useful tool for helping keep them straight.
If you're storing information in any kind of relational/key-value/document store, it might make sense to simply store the genre with the song directly and filter accordingly. In this case, you might end up storing your document keys/queries in such a way that you could access songs either directly (e.g., via songs) or through the genre (e.g., genre:genre_id/songs). If this is the route you go, it may be more convenient to simply create a single huge collection of songs and plan to set up corresponding filters in both the application and database environment.

Get a list of all unique tags used within a set of entities

I've got two models: Group and Item. An Item has a list of tags and belongs to a Group.
class Group(db.Model):
name = db.StringProperty()
class Item(db.Model):
title = db.StringProperty()
tags = db.StringListProperty()
group = db.ReferenceProperty(Group)
So far, typical actions are adding a tag to an Item, removing a tag from an Item, and showing all Items matching a given Group and tag.
What's a good way for getting a list of all unique tags used within a Group?
Ideally I'd like to have a property in Group that reflects the tags used:
class Group(db.Model):
name = db.StringProperty()
aggregated_tags = db.StringListProperty()
It would be even better if this included the number of Items that have this tag.
Permanent consistency is not a requirement, i.e. it is fine if the aggregated list of tags does not match the actual list of tags in use, as long as they become consistent eventually.
Item and Group are not in the same entity group, so I can't have a transaction that updates the Item and the Group at the same time.

The best way to do this is to maintain it yourself. Update the list of tags whenever you add or remove one from an item. You can do this in a task queue task if you want to do it asynchronously.
Alternatively, you could write a mapreduce that you run periodically that recalculates the tag set for every group - in fact, this is pretty much a classic use-case for mapreduce.

How to delete a column (attribute) from a GAE datastore?

I have a persistent class stored in a GAE datastore. I have removed one of the attributes from the class. The new records in this table show a value <none> for the removed attribute. But is there a way I can completely drop this column off the table?
Thanks.
Added the following 'migration' code according to moraes' suggestion, but it isn't achieving the desired result:
PersistenceManager pm = PMF.get().getPersistenceManager();
try {
Query q = pm.newQuery(UserLogin.class);
Collection<UserLogin> list = (Collection<UserLogin>) q.execute();
Iterator<UserLogin> iter = list.iterator();
while (iter.hasNext()) {
UserLogin obj = (UserLogin) iter.next();
obj.setLoginDate(obj.getLoginDate());
}
pm.makePersistentAll(list);
} finally {
pm.close();
}

I found the answer to this problem in this Article:
http://code.google.com/appengine/articles/update_schema.html
"Removing Deleted Properties from the Datastore
If you remove a property from your model, you will find that existing entities still have the property. It will still be shown in the admin console and will still be present in the datastore. To really clean out the old data, you need to cycle through your entities and remove the data from each one.
Make sure you have removed the properties from the model definition.
If your model class inherits from db.Model, temporarily switch it to inherit from db.Expando. (db.Model instances can't be modified dynamically, which is what we need to do in the next step.)
Cycle through existing entities (like described above). For each entity, use delattr to delete the obsolete property and then save the entity.
If your model originally inherited from db.Model, don't forget to change it back after updating all the data."
And here is an example with code:
http://sandrylogan.wordpress.com/2010/12/08/delattr/

If you are using ndb (and you probably should), you can easily delete properties by deleting them from entity._properties:
for entity in MyModel.query():
if 'old_property' in entity._values:
del entity._properties['old_property']
del entity._values['old_property']
entity.put()
Or you could make it faster by using an asynchronous query map:
#ndb.tasklet
def cleanup(entity):
if 'old_property' in entity._values:
del entity._properties['old_property']
del entity._values['old_property']
yield entity.put_async()
MyModel.query().map(cleanup)

There is no concept of "table" in datastore. Each entity can have arbitrary properties that don't follow a common schema. The only "schema" is in your model code, and existing records don't change automatically when you change your models.
So, to delete the property from existing records, you need to iterate over all records and re-save them without the property.

The datastore viewer gets its list of columns from the datastore stats, which are updated on a regular basis. If you've removed that column from every entity that had it, wait a day or two and the datastore viewer will stop showing it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

GAE: Efficiently querying entities and their referenced entities - google-app-engine

Here is the Python code to fetch all of the venue names: venue_keys = set(event.venue for event in events) venues = ndb.get_multi(venue_keys) venue_name = {venue.key, venue.name for venue in venues} Then, in your template, you can use: Event Venue: {{ venue_name.get(event.venue, 'No venue') }}

Related

Single piece of content, multiple URLs?

How to find distinct set of tags from multiple entities in app engine

Filtering a collection vs. several collections in Backbone?

Get a list of all unique tags used within a set of entities

How to delete a column (attribute) from a GAE datastore?

Categories

Resources