Structuring Google App Engine for strong consistency - google-app-engine

I want to run over this plan I have for achieving strong consistency with my GAE structure. Currently, here's what I have (it's really simple, I promise):
You have a Class (Class meaning Classroom not a programming "class") model and an Assignment model and also a User model. Now, a Class has an integer list property called memberIds, which is an indexed list of User ids. A class also has a string list of Assignment ids.
Anytime a new Assignment is created, its respective Class entity is also updated and adds the new Assignment id to its list.
What I want to do is get new Assignments for a user. What I do is query for all Classes where memberId = currentUserId. Each Class I get back has a list of assignment ids. I use those ids to get their respective Assignments by key. After months with this data model, I just realized that I might not get strong consistency with this (for the Class query part).
If user A posts an assignment (which consequently updates ClassA), user B who checks in for new assignments a fraction of a second later might not yet see the updated changes to ClassA (right?).
This is undesired. One solution would be to user ancestor queries, but that is not possible in my case, and entity groups are limited to 1 write per second, and I'm not sure if that's enough for my case.
So here's what I figured: anytime a new assignment is posted, we do this:
Get the respective Class entity
add the assignment id to the Class
get the ids of all this Class's members
fetch all users who are members, by keys
a User entity has a list of Classes that the user is a member of. (A LocalStructuredProperty, sort of like a dictionary:{"classId" : "242", "hasNewAssignment" : "Yes"} ). We flag that class as hasNewAssignment =
YES
Now when a user wants to get new assignments, instead of querying for groups that have a new assignment and which I am a member of, I
check the User objects list of Classes and check which classes have
new assignments.
I retrieve those classes by key (strongly consistent so far, right?)
I check the assignments list of the classes and I retrieve all assignments by key.
So throughout this process, I've never queried. All results should be strongly consistent, right? Is this a good solution? Am I overcomplicating things? Are my read/write costs skyrocketing with this? What do you think?

Queries are not strongly consistent. Gets are strongly consistent.
I think you did the right thing:
Your access is strongly consistent.
Your reads will be cheaper: one get is half cheaper as then query that returns one entity.
You writes will be more expensive: you also need to update all User entities.
So the cost depends on your usage pattern: how many assignment reads do you have vs new assignment creation.

I think using ancestor queries is a better solution.
Set the ancestor of the Assignment entities as the Class to which the assignment is allotted
Set the ancestor of a Student entity as the Class to which the student belongs.
This way all the assignments and students in a particular class belong to the same entity group. So Strong consistency is guaranteed w.r.t a query that has to deal with only a single class.
N.B. I am assuming that not too many people won't be posting assignments into a class at the same time. (but any number of people can post assignments into different classes, as they belong to different entity groups)

Related

should I put the user in the ancestor path or separately?

My app should contain several users, each of them having a list of objects ( only one user own the object ).
My question is : Would it be better to put an entity User that references the Ids of its objects, or should I put the user as the ancestor of the objects ? Please be kind, I am just beginning with nosql and datastore !
What approach you take will depend heavily on your access patterns, what make sense for easy retrieval, frequency of writes etc. You start your design process by building a basic entity relationship model, then start elaborating on what information you need to get to, and how frequently it is required what security restrictions are required. Then look at how you need to adjust the real model to reflect these access use cases taking into account performance, ease of use, security requirements.
Which approach you should choose depends mainly on the consistency model (strong vs eventual) you require for your entities. In Google Cloud Datastore, an entity group (an entity and its descendants) is a unit with strong consistency, transactionality, and locality.
You can read more on the topic here and here.
And there is one more important thing that is needed to take into account. If you model a parent-child relationship between a user and an object, the parent will be part of the object's key hence if you will change the object's owner later, you will end up with different object in terms of its key.

NDB Modeling One-to-one with KeyProperty

I'm quite new to ndb but I've already understood that I need to rewire a certain area in my brain to create models. I'm trying to create a simple model - just for the sake of understanding how to design an ndb database - with a one-to-one relationship: for instance, a user and his info. After searching around a lot - found documentation but it was hard to find different examples - and experimenting a bit (modeling and querying in a couple of different ways), this is the solution I found:
from google.appengine.ext import ndb
class Monster(ndb.Model):
name = ndb.StringProperty()
#classmethod
def get_by_name(cls, name):
return cls.query(cls.name == name).get()
def get_info(self):
return Info.query(Info.monster == self.key).get()
class Info(ndb.Model):
monster = ndb.KeyProperty(kind='Monster')
address = ndb.StringProperty()
a = Monster(name = "Dracula")
a.put()
b = Info(monster = a.key, address = "Transilvania")
b.put()
print Monster.get_by_name("Dracula").get_info().address
NDB doesn't accept joins, so the "join" we want has to be emulated using class methods and properties. With the above system I can easily reach a property in the second database (Info) through a unique property in the first (in this case "name" - suppose there are no two monsters with the same name).
However, if I want to print a list with 100 monster names and respective addresses, the second database (Info) will be hit 100 times.
Question: is there a better way to model this to increase performance?
If its truly a one to one relationship, why are creating 2 models. Given your example the Address entity cannot be shared with any Monster so why not put the Address details in the monster.
There are some reasons why you wouldn't.
Address could become large and therefore less efficient to retrieve 100's of properties when you only need a couple - though project queries may help there.
You change your mind and you want to see all monsters that live in Transylvania - in which case you would create the address entity and the Monster would have the key property that points to the Address. This obviously fails when you work out that some monsters can live in multiple places (Werewolfs - London, Transylvania, New York ;-) , in which case you either have a repeating KeyProperty in the monstor or an intermediate entity that points to the monster and the address. In your case I don't think that monsters on the whole have that many documented Addresses ;-)
Also if you are uniquely identifying monsters by name you should consider storing the name as part of the key. Doing a Monster.get_by_id("dracula") is quicker than a query by name.
As I wrote (poorly) in the comment. If 1. above holds and it is a true one to one relationship. I would then create Address as a child entity (Monster is the parent/ancestor in the key) when creating address. This allows you to,
allow other entities to point to the Address,
If you create a bunch of child entities, fetch them with a single
ancestor query). 3 If you have get monster and it's owned entities
again it's an ancestor query.
If you have a bunch of entities that
should only exist if Monster instance exists and they are not
children, then you have to do querys on all the entity types with
KeyProperty's matching the key, and if theses entities are not
PolyModels, then you have to perform a query for each entity
type (and know you need to perform the query on a given entity,
which involves a registry of some type, or hard coding things)
I suspect what you may be trying could be achieved by using elements described in the link below
Have a look at "Operations on Multiple Keys or Entities" "Expando Models" "Model Hooks"
https://developers.google.com/appengine/docs/python/ndb/entities
(This is probably more a comment than an answer)

Using ancestors or reference properties in Google App Engine?

Currently, a lot of my code makes extensive use of ancestors to put and fetch objects. However, I'm looking to change some stuff around.
I initially thought that ancestors helped make querying faster if you knew who the ancestor of the entity you're looking for was. But I think it turns out that ancestors are mostly useful for transaction support. I don't make use of transactions, so I'm wondering if ancestors are more of a burden on the system here than a help.
What I have is a User entity, and a lot of other entities such as say Comments, Tags, Friends. A User can create many Comments, Tags, and Friends, and so whenever a user does so, I set the ancestor for all these newly created objects as the User.
So when I create a Comment, I set the ancestor as the user:
comment = Comment(aUser, key_name = commentId)
Now the only reason I'm doing this is strictly for querying purposes. I thought it would be faster when I wanted to get all comments by a certain user to just get all comments with a common ancestor rather than querying for all comments where authorEmail = userEmail.
So when I want to get all comments by a certain user, I do:
commentQuery = db.GqlQuery('SELECT * FROM Comment WHERE ANCESTOR IS :1', userKey)
So my question is, is this a good use of ancestors? Should each Comment instead have a ReferenceProperty that references the User object that created the comment, and filter by that?
(Also, my thinking was that using ancestors instead of an indexed ReferenceProperty would save on write costs. Am I mistaken here?)
You are right about the writing cost, an ancestor is part of the key which comes "free". using a reference property will increase your writing cost if the reference property is indexed.
Since you query on that reference property if will need to be indexed.
Ancestor is not only important for transactions, in the HRD (the default datastore implementation) if you don't create each comment with the same ancestor, the quires will not be strongly consistent.
-- Adding Nick's comment ---
Every entity with the same parent will be in the same entity group, and writes to entity groups are serialized, so using ancestors here will slow things down iff you're writing multiple entities concurrently. Since all the entities in a group are 'owned' by the user that forms the root of the group in your instance, though, this shouldn't be a problem - and in fact, what you're doing is actually a recommended design pattern.

Make a MustOverride (abstract) member with EF 4?

I have an abstract class Contact.
It leads to two subclasses:
Company (Title)
Person (FirstName, LastName)
I want to add a computed 'Title' col in the Person table, that return FirstName + ' ' + LastName, which will give me better search options.
So I want to create the Contact table have an abstract property Title, which each of these two implements, and so, I will be able to use:
Dim contacts = From c In context.Contacts
Where c.Title.Contains("Nash")
I am pretty sure this is impossible, the question is what is the efficient alternative way?
In my scenario I have a ListBox showing all the Contacts of both Company and Person types, I have a search TextBox and I want the Service query (GetContacts(searchQuery As String)) to query the filtered set against the DB.
Update
After Will's answer, I decided to create in the Person table a computed col as above.
The question is what what be the most efficient way to imlpement the WCF-RIA query method:
Public Function GetContacts(searchQuery As String) As IQueryable(Of Contact)
'Do here whatever it takes to retieve from Contacts + People
'and mix the results of both tables ordered by Title
End Function
Unfortunately, while there is a way to do this with partial classes, I am 99% sure you cannot mix linq queries that touch entity properties and "POCO" properties defined in a partial class.
The Linq to Entity context will actually convert these queries to sql, and it cannot handle situations where a particular method isn't directly supported by the context. A common example for L2E is the inability to use enums in your query. Like knowing how to handle enums, the context certainly doesn't know how to handle your POCO properties when converting to raw sql.
An option you might want to investigate is to create a the computed column within your database, or to run your queries, do the traditional ToArray() in order to trigger enumeration, and then examine the computed column in memory. This might not be a good solution, depending on the size of your table, however.
So, essentially, you wish to search two disparate types (backed by two different tables) and then combine the results for display to the user.
I would have to say that polymorphism is NOT the best solution. The desire to show them in the UI shouldn't force a design decision all the way down into your type definitions.
I have done something similar a few times before in WPF. I've done it two ways; by using polymorphism in the form of facade types which wrap the models and which can be treated by a common base type, and by treating all the different types in the collection as System.Object.
The first way is okay when you need type safety and the ability to treat different types the same way. The wrappers extend a common base class and are coded to "know" how to handle each of their wrapped types correctly.
The second way is okay when you don't need type safety, such as when exposing a collection to a WPF View where you are displaying them in an ItemsControl, which can figure out the correct DataTemplate to use by the type of each instance in the collection.
I'm not sure which way is best for you, but whichever it is, you should query both your Company and Person tables separately, Union the two result sets, then sort them appropriately.
Pseudocode:
//Wrapper version
var results = Company
.Where(x=>x.Title.Contains(searchTerm))
.Select(x=> new CompanyWrapper(x))
Cast<BaseWrapper>().Union(
Person
.Where(x=>x.ComputedTitle.Contains(searchTerm))
.Select(x=> new PersonWrapper(x))
.Cast<BaseWrapper>());
//System.Object version
var results = Company
.Where(x=>x.Title.Contains(searchTerm))
Cast<object>().Union(
Person
.Where(x=>x.ComputedTitle.Contains(searchTerm))
.Cast<object>());
In both cases, you may not have to downcast specifically. Again, the first gives you type safety if you need it in the UI, the second is simpler and requires less code on the backend but is only useful if you don't require type safety in the UI.
As for sorting, once you've searched and combined your result, you can OrderBy to sort the result, however you will have to provide a function which can perform the ordering. This function will differ depending on which version you choose.

Google Appengine: Is This a Good set of Entity Groups?

I am trying to wrap my head around Entity Groups in Google AppEngine. I understand them in general, but since it sounds like you can not change the relationships once the object is created AND I have a big data migration to do, I want to try to get it right the first time.
I am making an Art site where members can sign up as regular a regular Member or as one of a handful of non-polymorphic Entity "types" (Artist, Venue, Organization, ArtistRepresentative, etc). Artists, for example can have Artwork, which can in turn have other Relationships (Gallery, Media, etc). All these things are connected via References and I understand that you don't need Entity Groups to merely do References. However, some of the References NEED to exist, which is why I am looking at Entity Groups.
From the docs:
"A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller."
That said, I have a couple hopefully yes/no questions.
Question 0: I gather you don't need Entity Groups just to do transactions. However, since Entity Groups are stored in the same region of Big Table, this helps cut down on consistency issues and race conditions. Is this a fair look at Entity Groups and Transactions together?
Question 1: When a child Entity is saved, do any parent objects get implicitly accessed/saved? i.e. If I set up an Entity Group with path Member/Artist/Artwork, if I save an Artwork object, do the Member and Artist objects get updated/accessed? I would think not, but I am just making sure.
Question 2: If the answer to Question 1 is yes, does the accessing/updating only travel up the path and not affect other children. i.e. If I update Artwork, no other Artwork child of Member is updated.
Question 3: Assuming it is very important that the Member and its associated account type entity exist when a user signs up and that only the user will be updating its Member and associated account type Entity, does it make sense to put these in Entity Groups together?
i.e. Member/Artist, Member/Organization, Member/Venue.
Similarly, assuming only the user will be able to update the Artwork entities, does it make sense to include those as well? Note: Media/Gallery/etc which are references to Artwork may be related to lots of Artwork, not just those owned by the user (i.e. many to many relations).
It makes sense to have all the user's bits in an entity group if it works the way I suspect (i.e. Q1/Q2 are "no"), since they will all be in the same region of BigTable. However, adding the Artwork to the entity group seems like it might violate the "keep it small" principal and honestly, may not need to be in Transactions aside from saving bandwidth/retrys when users are uploading artwork images.
Any thoughts? Am I approaching Entity Groups wrong?
0: You do need entity groups for transactions among multiple entities
1: Modifying/accessing children does not modify/access a parent
2: N/A
3: Sounds reasonable. My feeling is, entity groups should not be used unless you need transactions among them.
It is not necessary to have the the Artwork as a child for permission purposes. But if you need transactional modification to them (including e.g. creation and deletion) it might be better. For example: if you delete an account, you delete the user entity but before you delete the child, you get DeadlineExceeded or the server crashes. Now you have an orphaned Artwork. If you have more than 1,000 Artworks for an Artist, you must delete in batches.
Good luck!

Resources