I am getting strange behaviour in the results of app engine ndb queries.
Models (simplified):
class Trainer(polymodel.PolyModel):
user = ndb.KeyProperty(kind='User')
A full set of objects (approx. 100-200) is collected with a query:
trainers = Trainer.query()
A this stage, individual trainer objects have a valid user object
that can be obtained by:
user = trainer.user.get()
The trainers collection is split into several intermediate python lists.
After that, most of the trainer object have a None value for .user.
This code used to work for several years. Has anyone else experienced problems like this with ndb.KeyProperty? Is it possible that key properties get purged when the dataset is getting too big?
It turned out that there were some KeyProperty fields with keys that were pointing nowhere (kind-of dangling pointers).
The biggest issue however, was the fact that Trainers objects were used as arguments in deferred calls, so thos objects have to be serialized in some way (presumably using pickling). The associated key properties used to be working, however, maybe because of different timing, the approach is not working anymore. Passing the urlsafe keys of the trainer objects as arguments to the deffered functions solved the problem.
Related
My application runs on Google App Engine and uses Google Cloud Datastore.
I was alerted by one of my users that some entries they are associated with and had previously been seeing, are not appearing to them.
Indeed, when I query the Datastore for these entities (with a single property filter), they are not returned. I was able to find them via querying on a different property and after writing them to the datastore, the index is updated and they are returned in the query.
Perhaps technically my query is not guaranteed to return the entities as it is weakly consistent, but none of the entities were changed recently and usually any inconsistent results are resolved quite quickly. (it has been several days now)
So it seems like the index entries for this property on these entities were lost or damaged somehow. What to do? Wait and hope the index will be regenerated? I can write entities for this user to the datastore to regenerate the index...but doing it for all my users is not really an option.
The only similar case I can see on SA is this question: Seems the indexes are missing for new entities created since some time late June 1st, 2015 which resulted from this incident: https://status.cloud.google.com/incident/appengine/15015 but no similar incident occurred recently, according to the status dashboard.
Have you changed the status of which properties are indexed?
If you do, previously existing entities will only be updated in the indices the next time they are put.
That could cause older entities to not show up in a query.
An App Engine engineer kindly pointed me towards the root of the issue - my query filter property was on a UserProperty type.
From the docs at https://cloud.google.com/appengine/docs/legacy/standard/python/users/userobjects :
A User value with an email address that does not represent a Google
account at the time it is created will never match a User value that
represents a real user.
My user's email was not a Google Account but he likely recently created one, causing the user_id() to go from None to an integer id.
This means that from now, when I do a query like this:
u = User('name#non_google_domain.com')
Entity.all().filter('user_property', u)
Internally, u is now actually looked up by the combination of name#non_google_domain.com and the integer id, instead of the combination of name#non_google_domain.com and None, causing my entities not to be returned.
Indeed, examining e._entity['user_property'] of a returned entity, the new ones are of the form User('name#non_google_domain.com', _user_id='12345678912345678900') and the old ones are User('name#non_google_domain.com').
UserProperty is no longer recommended for use, because of issues like this.
We have mongo data models that are written by multiple systems; currently, a bug in a different system can corrupt a single document in a collection such that it can no longer be mapped to the correct Java object (for example, a missing _class attribute in a subdocument will cause an instantiation exception). When we then query for all documents in the collection using Java, the entire query fails due to the single bad document.
We would like to use an approach which is tolerant of instantiation exceptions; the intent is for any bad documents to be discarded, while still returning objects for all the documents that can be mapped.
Could you please advise the best approach to achieve this outcome?
I think you should be able to mark this field as #Transient in entity to make SpringData to ignore this field in MongoDB communication.
I have a class, lets say Blarkar. Blarkar has an embed class kar. Sometimes when I query for an instance of Blarkar I want the complete object, but other times I don't need all its embed objects and their embed objects. How do I load an object without its embed objects?
You can't. GAE loads an entity whole or not at all. Generally this is not a problem and you shouldn't try to optimize unless you know you have a real issue. But if so, you can split your entity into multiple parts, eg User and UserExtraStuff.
There is a special type of query called a projection query, but this is not likely going to be useful - it lets you select some data out of an index without doing a full entity lookup. It's only useful in limited types of inequality queries. The data has to be in the index.
Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?
Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.
I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.
I was reading this StackOverflow question about eager loading which led me to this blog post about efficient dereferencing on GAE.
Is it correct, that if I dereference two ReferenceProperties that point to the same object in the datastore, the framework doesn't maintain any kind of identity map and performs two separate get requests? The objects returned are also different instances and changes on one are obviously not reflected on the other.
Isn't this less than ideal? I'm coming from a SQLAlchemy background, where I find the session pattern really intuitive.
That's correct. Guido's new NDB project does perform this mapping, but the current db framework doesn't. The reason for this is what you'd expect: if two different parts of the code fetch and modify the same entity, it could create unwanted side-effects. The intuitive expectation is that if you fetched the object, it's yours and nothing else is going to change it underneath you unless you want it to.
If you're trying to dereference a batch of entities at the same time, you can convert the list of keys into a set first to eliminate duplicate fetches.