How can I fetch the lastest entry of a model new put into NDB? - google-app-engine

How can I get the latest entry of a model new putted into NDB?
1: If I use a same parent key ? How to ?
I see the document write
Entities whose keys have the same root form an entity group or group.
If entities are in different groups, then changes to those entities
might sometimes seem to occur "out of order". If the entities are
unrelated in your application's semantics, that's fine. But if some
entities' changes should be consistent, your application should make
them part of the same group when creating them.
Is this means , with the same parent key the order is insert order?
But , how to get the last one ?
2: If I not use a same parent key (the model is same)? How to ?

If you're OK with eventual consistency (i.e. you might not see the very latest one immediately) you can just add a DateTimeProperty with auto_now_add=True and then run a query sorting by that property to get the latest one. (This is also approximate since you might have several entities saved close together which are ordered differently than you expect.)
If you need it to be exactly correct, the only way I can see is to create an entity whose job it is to hold a reference to the latest entry, and update that entity in the same transaction as the entry you're creating. Something like:
class LatestHolder(ndb.Model):
latest = ndb.KeyProperty('Entry')
# code to update:
#ndb.transactional(xg=True)
def put_new_entry(entry):
holder = LatestHolder.get_or_insert(name='fixed-key')
holder.latest = entry
holder.put()
entry.put()
Note that I've used a globally fixed key name here with no parent for the holder class. This is a bottleneck; you might prefer to make several LatestHolder entities with different parents if your "latest entry" only needs to be from a particular parent, in which case you just pass a parent key to get_or_insert.

Related

How to ensure isolation with non-ancestor query

I want to create user using ndb such as below:
def create_user(self, google_id, ....):
user_keys = UserInformation.query(UserInformation.google_id == google_id ).fetch(keys_only=True)
if user_keys: # check whether user exist.
# already created
...(SNIP)...
else:
# create new user entity.
UserInformation(
# primary key is incompletekey
google_id = google_id,
facebook_id = None,
twitter_id = None,
name =
...(SNIP)...
).put()
If this function is called twice in the sametime, two user is created.("Isolation" is not ensure between get() and put())
So, I added #ndb.transactional to above function.
But following error is occured.
BadRequestError: Only ancestor queries are allowed inside transactions.
How to ensure isolation with non-ancestor query?
The ndb library doesn't allow non-ancestor queries inside transactions. So if you make create_user() transactional you get the above error because you call UserInformation.query() inside it (without an ancestor).
If you really want to do that you'd have to place all your UserInformation entities inside the same entity group by specifying a common ancestor and make your query an ancestor one. But that has performance implications, see Ancestor relation in datastore.
Otherwise, even if you split the function in 2, one non-transactional making the query followed by a transactional one just creating the user - which would avoid the error - you'll still be facing the datastore eventual consistency, which is actually the root cause of your problem: the result of the query may not immediately return a recently added entity because it takes some time for the index corresponding to the query to be updated. Which leads to room for creating duplicate entities for the same user. See Balancing Strong and Eventual Consistency with Google Cloud Datastore.
One possible approach would be to check later/periodically if there are duplicates and remove them (eventually merging the info inside into a single entity). And/or mark the user creation as "in progress", record the newly created entity's key and keep querying until the key appears in the query result, when you finally mark the entity creation as "done" (you might not have time to do that inside the same request).
Another approach would be (if possible) to determine an algorithm to obtain a (unique) key based on the user information and just check if an entity with such key exists instead of making a query. Key lookups are strongly consistent and can be done inside transactions, so that would solve your duplicates problem. For example you could use the google_id as the key ID. Just an example, as that's not ideal either: you may have users without a google_id, users may want to change their google_id without loosing other info, etc. Maybe also track the user creation in progress in the session info to prevent repeated attempts to create the same user in the same session (but that won't help with attempts from different sessions).
For your use case, perhaps you could use ndb models' get_or_insert method, which according to the API docs:
Transactionally retrieves an existing entity or creates a new one.
So you can do:
user = UserInformation.get_or_insert(*args, **kwargs)
without risking the creation of a new user.
The complete docs:
classmethod get_or_insert(*args, **kwds)source Transactionally
retrieves an existing entity or creates a new one.
Positional Args: name: Key name to retrieve or create.
Keyword Arguments
namespace – Optional namespace. app – Optional app ID.
parent – Parent entity key, if any.
context_options – ContextOptions object (not keyword args!) or None.
**kwds – Keyword arguments to pass to the constructor of the model class if an instance for the specified key name does not already
exist. If an instance with the supplied key_name and parent already
exists, these arguments will be discarded. Returns Existing instance
of Model class with the specified key name and parent or a new one
that has just been created.

Multiple objects with same FK in context.entity.local [Entity Framework]

I have some problems regarding entity framework and saving to the db.
As my current program works, it deserializes a json-object which results in a list with objects that matches the database. Each of these objectst looks like this:
Every object is a parent with a relation to one child-object.
Each child object have a relation to one or many parent-objects.
After the deserialization is complete each child-objects is created as new object for every parent (meaning I get several instances of the same object).
When i try to save the objects to the db, this offcourse doesn't work since I'm trying to insert many child-objects whith the same pk. I can clearly see that context.childentity.local contains many objects with the same pk.
Is there any easy way to solve this issue? Can I in some way tell EF to refer all duplicates to the same object?
Best regards Anton

Is there any side effect of not having a physical entity for it to act as parent key

If I go through google app engine tutorial, I can see their example seem to encourage us to have parent for entities.
Hence, I have the following workable code, for user creation (with email as unique)
def parent_key():
return ndb.Key('parent', 'parent')
class User(ndb.Model):
email = ndb.StringProperty(required = True)
timestamp = ndb.DateTimeProperty(required = True)
class RegisterHandler2(webapp2.RequestHandler):
def get(self):
email = self.request.get('email')
user_timestamp = int(time.time())
user = User.get_or_insert(email, parent=parent_key(), email=email, timestamp=datetime.datetime.fromtimestamp(user_timestamp))
Note, parent entity physically doesn't exist.
Although the above code runs totally fine, I was wondering any possible problem can occur, if parent entity physically doesn't exist?
One of my concern of not having parent, is eventually consistency. After write operation, I want my read operation able to fetch the latest written value. I'm using User.get_or_insert to write (and read), and User.get_by_id to read only.
I want after I execute User.get_or_insert, and next request User.get_by_id will return latest value. I was wondering, to achieve strong consistency, is parent key an important thingy?
There are no problems as long as you don't actually need this parent entity.
You should not make a decision to use parent entities lightly. In fact, using entity groups (parent-child entities) limit the number of entities you can update per second and makes it necessary to know the parent key to retrieve a child entity.
You may run into serious problems. For example, if entity "User" is a child of some parent entity, and then all other entities are children of entities "User", that turns all of your data into one big entity group. Assuming your app is fairly active, you will see datastore operations failures because of this performance limitation.
Note also that a key of an entity gets longer if you have to include a key of a parent entity into it. If you create a chain of entities (e.g. parent -> user -> album -> photo), a key for each "photo" entity will include a key for album, a key for user and a key for parent entity. It becomes a nightmare to manage and requires much more storage space.
Using a parent key that doesn't correspond to an entity that actually has properties (which is what I think you're referring to as a 'physical entity') is a standard technique.
You can even decide later to add properties to that key.
I've been using this technique for years.

Achieving Strong Consistency Using get_or_insert

I have a model like this:
class UserModel(ndb.Model):
''' model class which stores all the user information '''
fname = ndb.StringProperty(required=True)
lname = ndb.StringProperty(required=True)
sex = ndb.StringProperty(required=True, choices=['male', 'female'])
age = ndb.IntegerProperty(required=True)
dob = ndb.DateTimeProperty(required=True)
email = ndb.StringProperty(default=None)
mobile = ndb.StringProperty(required=True)
city = ndb.StringProperty(required=True)
state = ndb.StringProperty(required=True)
Since none of above fields are unique, not even email becuase many people may no have email ids. So I am using the following logic to create a string id
1. Take first two letters of 'state' and change it to upper case.
2. Take first to letters of 'city' and change it to upper case.
3. Get the count of all records in the database and increment by one.
4. Append all of them together.
I am using get_or_insert for inserting the entity.
Though adding a user, will not happen too often but any kind of clash would be catastrophic, means probability of contention is less but its impact is very high.
My questions are:
1. Will using get_or_insert guarantee that I will never have duplicate IDs?
2. get_or_insert documentation says "Transactionally retrieves an existing
entity or creates a new one.". How can something perform an operation
"transactionally" without using a ancestor query.
PS: For several reasons I can't keep all the user entities in the same entity groups.
In order to provide transactionality, get_or_insert uses a Datastore transaction. In order to use a query in a transaction it must be an ancestor query, however transactions can also get and put, which don't require a parent to be set on the entity.
However, as #Greg mentioned, you absolutely do not want to use this scheme for generating user ids. In particular, doing a count on your db is incredibly slow and will not scale, and is eventually consistent. Because the query is eventually consistent, it may return a count smaller than the actual count as long as results are eventually consistent (which for a large app will be all the time). This means you could wait several hours before an insert would actually succeed.
If you want to provide a customer ID with a State and City, I would recommend doing the following:
Do a put using automatic ids.
Expose to the user a "Customer ID" which is the State + City + ID.
When you want to lookup a customer given their "Customer ID", just do a get for the ID portion.
if you keep that ID scheme (for which you honestly don't really need steps 1 and 2, just 3), there is no reason for it to create duplicate IDs. With get_or_insert, it'll look for the exact ID you provide and fetch it if it exists, or simply create it if it doesn't, as explained here. So you CANNOT have duplicate IDs (well if you have this ID as your forced key in your model). if you follow the link provided it clearly states that :
The get and subsequent (possible) put operations are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.
And the fact it does it transactionnaly means it'll lock up the entity group to be sure you don't have contention. Since you don't seem to have ancestors I think it'll just lock the entity you're updating

NDB Modeling One-to-one with KeyProperty

I'm quite new to ndb but I've already understood that I need to rewire a certain area in my brain to create models. I'm trying to create a simple model - just for the sake of understanding how to design an ndb database - with a one-to-one relationship: for instance, a user and his info. After searching around a lot - found documentation but it was hard to find different examples - and experimenting a bit (modeling and querying in a couple of different ways), this is the solution I found:
from google.appengine.ext import ndb
class Monster(ndb.Model):
name = ndb.StringProperty()
#classmethod
def get_by_name(cls, name):
return cls.query(cls.name == name).get()
def get_info(self):
return Info.query(Info.monster == self.key).get()
class Info(ndb.Model):
monster = ndb.KeyProperty(kind='Monster')
address = ndb.StringProperty()
a = Monster(name = "Dracula")
a.put()
b = Info(monster = a.key, address = "Transilvania")
b.put()
print Monster.get_by_name("Dracula").get_info().address
NDB doesn't accept joins, so the "join" we want has to be emulated using class methods and properties. With the above system I can easily reach a property in the second database (Info) through a unique property in the first (in this case "name" - suppose there are no two monsters with the same name).
However, if I want to print a list with 100 monster names and respective addresses, the second database (Info) will be hit 100 times.
Question: is there a better way to model this to increase performance?
If its truly a one to one relationship, why are creating 2 models. Given your example the Address entity cannot be shared with any Monster so why not put the Address details in the monster.
There are some reasons why you wouldn't.
Address could become large and therefore less efficient to retrieve 100's of properties when you only need a couple - though project queries may help there.
You change your mind and you want to see all monsters that live in Transylvania - in which case you would create the address entity and the Monster would have the key property that points to the Address. This obviously fails when you work out that some monsters can live in multiple places (Werewolfs - London, Transylvania, New York ;-) , in which case you either have a repeating KeyProperty in the monstor or an intermediate entity that points to the monster and the address. In your case I don't think that monsters on the whole have that many documented Addresses ;-)
Also if you are uniquely identifying monsters by name you should consider storing the name as part of the key. Doing a Monster.get_by_id("dracula") is quicker than a query by name.
As I wrote (poorly) in the comment. If 1. above holds and it is a true one to one relationship. I would then create Address as a child entity (Monster is the parent/ancestor in the key) when creating address. This allows you to,
allow other entities to point to the Address,
If you create a bunch of child entities, fetch them with a single
ancestor query). 3 If you have get monster and it's owned entities
again it's an ancestor query.
If you have a bunch of entities that
should only exist if Monster instance exists and they are not
children, then you have to do querys on all the entity types with
KeyProperty's matching the key, and if theses entities are not
PolyModels, then you have to perform a query for each entity
type (and know you need to perform the query on a given entity,
which involves a registry of some type, or hard coding things)
I suspect what you may be trying could be achieved by using elements described in the link below
Have a look at "Operations on Multiple Keys or Entities" "Expando Models" "Model Hooks"
https://developers.google.com/appengine/docs/python/ndb/entities
(This is probably more a comment than an answer)

Resources