I want to create user using ndb such as below:
def create_user(self, google_id, ....):
user_keys = UserInformation.query(UserInformation.google_id == google_id ).fetch(keys_only=True)
if user_keys: # check whether user exist.
# already created
...(SNIP)...
else:
# create new user entity.
UserInformation(
# primary key is incompletekey
google_id = google_id,
facebook_id = None,
twitter_id = None,
name =
...(SNIP)...
).put()
If this function is called twice in the sametime, two user is created.("Isolation" is not ensure between get() and put())
So, I added #ndb.transactional to above function.
But following error is occured.
BadRequestError: Only ancestor queries are allowed inside transactions.
How to ensure isolation with non-ancestor query?
The ndb library doesn't allow non-ancestor queries inside transactions. So if you make create_user() transactional you get the above error because you call UserInformation.query() inside it (without an ancestor).
If you really want to do that you'd have to place all your UserInformation entities inside the same entity group by specifying a common ancestor and make your query an ancestor one. But that has performance implications, see Ancestor relation in datastore.
Otherwise, even if you split the function in 2, one non-transactional making the query followed by a transactional one just creating the user - which would avoid the error - you'll still be facing the datastore eventual consistency, which is actually the root cause of your problem: the result of the query may not immediately return a recently added entity because it takes some time for the index corresponding to the query to be updated. Which leads to room for creating duplicate entities for the same user. See Balancing Strong and Eventual Consistency with Google Cloud Datastore.
One possible approach would be to check later/periodically if there are duplicates and remove them (eventually merging the info inside into a single entity). And/or mark the user creation as "in progress", record the newly created entity's key and keep querying until the key appears in the query result, when you finally mark the entity creation as "done" (you might not have time to do that inside the same request).
Another approach would be (if possible) to determine an algorithm to obtain a (unique) key based on the user information and just check if an entity with such key exists instead of making a query. Key lookups are strongly consistent and can be done inside transactions, so that would solve your duplicates problem. For example you could use the google_id as the key ID. Just an example, as that's not ideal either: you may have users without a google_id, users may want to change their google_id without loosing other info, etc. Maybe also track the user creation in progress in the session info to prevent repeated attempts to create the same user in the same session (but that won't help with attempts from different sessions).
For your use case, perhaps you could use ndb models' get_or_insert method, which according to the API docs:
Transactionally retrieves an existing entity or creates a new one.
So you can do:
user = UserInformation.get_or_insert(*args, **kwargs)
without risking the creation of a new user.
The complete docs:
classmethod get_or_insert(*args, **kwds)source Transactionally
retrieves an existing entity or creates a new one.
Positional Args: name: Key name to retrieve or create.
Keyword Arguments
namespace – Optional namespace. app – Optional app ID.
parent – Parent entity key, if any.
context_options – ContextOptions object (not keyword args!) or None.
**kwds – Keyword arguments to pass to the constructor of the model class if an instance for the specified key name does not already
exist. If an instance with the supplied key_name and parent already
exists, these arguments will be discarded. Returns Existing instance
of Model class with the specified key name and parent or a new one
that has just been created.
Related
I am learning google.cloud.datastore, and like to know how to delete a property along with its value from an entity. Also, is it possible to delete a specific or a list of properties from all entities of a certain kind?
My understanding is datastore stores/manipulates data in a row-wise way (entities)?
cheers
Your understanding is correct, all datastore write operations happen, indeed, at the entity level. So in order to modify one or a subset of properties you'd retrieve the entity, modify the property (or delete it, if you want to delete the property) set and save the entity.
The exact details depend on the language and library used. From Updating an entity:
To update an existing entity, modify the properties of the entity
and store it using the key:
PYTHON
with client.transaction():
key = client.key('Task', 'sample_task')
task = client.get(key)
task['done'] = True
client.put(task)
The object data overwrites the existing entity. The entire object is
sent to Cloud Datastore. If the entity does not exist, the update will
fail. If you want to update-or-create an entity, use upsert as
described previously.
Note: To delete a property, remove the property from the entity, then save the entity.
In the above snippet, for example, deleting the done property of the task entity, if existing, would be done like this:
with client.transaction():
key = client.key('Task', 'sample_task')
task = client.get(key)
if 'done' in task:
del task['done']
client.put(task)
If I go through google app engine tutorial, I can see their example seem to encourage us to have parent for entities.
Hence, I have the following workable code, for user creation (with email as unique)
def parent_key():
return ndb.Key('parent', 'parent')
class User(ndb.Model):
email = ndb.StringProperty(required = True)
timestamp = ndb.DateTimeProperty(required = True)
class RegisterHandler2(webapp2.RequestHandler):
def get(self):
email = self.request.get('email')
user_timestamp = int(time.time())
user = User.get_or_insert(email, parent=parent_key(), email=email, timestamp=datetime.datetime.fromtimestamp(user_timestamp))
Note, parent entity physically doesn't exist.
Although the above code runs totally fine, I was wondering any possible problem can occur, if parent entity physically doesn't exist?
One of my concern of not having parent, is eventually consistency. After write operation, I want my read operation able to fetch the latest written value. I'm using User.get_or_insert to write (and read), and User.get_by_id to read only.
I want after I execute User.get_or_insert, and next request User.get_by_id will return latest value. I was wondering, to achieve strong consistency, is parent key an important thingy?
There are no problems as long as you don't actually need this parent entity.
You should not make a decision to use parent entities lightly. In fact, using entity groups (parent-child entities) limit the number of entities you can update per second and makes it necessary to know the parent key to retrieve a child entity.
You may run into serious problems. For example, if entity "User" is a child of some parent entity, and then all other entities are children of entities "User", that turns all of your data into one big entity group. Assuming your app is fairly active, you will see datastore operations failures because of this performance limitation.
Note also that a key of an entity gets longer if you have to include a key of a parent entity into it. If you create a chain of entities (e.g. parent -> user -> album -> photo), a key for each "photo" entity will include a key for album, a key for user and a key for parent entity. It becomes a nightmare to manage and requires much more storage space.
Using a parent key that doesn't correspond to an entity that actually has properties (which is what I think you're referring to as a 'physical entity') is a standard technique.
You can even decide later to add properties to that key.
I've been using this technique for years.
I'm looking at the GAE example for datastoring here, and among other things this confused me a bit.
def guestbook_key(guestbook_name=DEFAULT_GUESTBOOK_NAME):
"""Constructs a Datastore key for a Guestbook entity with guestbook_name."""
return ndb.Key('Guestbook', guestbook_name)
I understand why we need the key, but why is 'Guestbook' necessary? Is it so you can query for all 'Guestbook' objects in the datastore? But if you need to search a datastore for a type of object why isn't there a query(type(Greeting)? Concidering that that is the ndb.model that you are putting in?
Additionally, if you are feeling generous, why in creating the object you are storing, do you have to set parent?
greeting = Greeting(parent=guestbook_key(guestbook_name))
First: GAE Datastore is one big distributed database used by all GAE apps concurrently. To distinguish entities GAE uses system-wide keys. A key is composed of:
Your application name (implicitly set, not visible via API)
Namespace, set via Namespace API (if not set in code, then an empty namespace is used).
Kind of entity. This is just a string and has nothing to do with types at database level. Datastore is schema-less so there are no types. However, language based APIs (Java JDO/JPA/objectify, Python NDB) map this to classes/objects.
Parent keys (afaik, serialised inside key). This is used to establish entity groups (defining scope of transactions).
A particular entity identifier: name (string) or ID (long). They are unique within namespace and kind (and parent key if defined) - see this for more info on ID uniqueness.
See Key methods (java) to see what data is actually stored within the key.
Second: It seems that GAE Python API does not allow you to query Datastore without defining classes that map to entity kind (I don't use GAE Python, so I might be wrong). Java does have a low-level API that you can use without mapping to classes.
Third: You are not required to define a parent to an entity. Defining a parent is a way to define entity groups, which are important when using transactions. See ancestor paths and
transactions.
That's what a key is: a path consisting of pairs of kind and ID. The key is what identifies what kind it is.
I don't understand your second question. You don't have to set a parent, but if you want to set one, you can only do it when creating the entity.
How can I get the latest entry of a model new putted into NDB?
1: If I use a same parent key ? How to ?
I see the document write
Entities whose keys have the same root form an entity group or group.
If entities are in different groups, then changes to those entities
might sometimes seem to occur "out of order". If the entities are
unrelated in your application's semantics, that's fine. But if some
entities' changes should be consistent, your application should make
them part of the same group when creating them.
Is this means , with the same parent key the order is insert order?
But , how to get the last one ?
2: If I not use a same parent key (the model is same)? How to ?
If you're OK with eventual consistency (i.e. you might not see the very latest one immediately) you can just add a DateTimeProperty with auto_now_add=True and then run a query sorting by that property to get the latest one. (This is also approximate since you might have several entities saved close together which are ordered differently than you expect.)
If you need it to be exactly correct, the only way I can see is to create an entity whose job it is to hold a reference to the latest entry, and update that entity in the same transaction as the entry you're creating. Something like:
class LatestHolder(ndb.Model):
latest = ndb.KeyProperty('Entry')
# code to update:
#ndb.transactional(xg=True)
def put_new_entry(entry):
holder = LatestHolder.get_or_insert(name='fixed-key')
holder.latest = entry
holder.put()
entry.put()
Note that I've used a globally fixed key name here with no parent for the holder class. This is a bottleneck; you might prefer to make several LatestHolder entities with different parents if your "latest entry" only needs to be from a particular parent, in which case you just pass a parent key to get_or_insert.
I have a Kind of 'Customer'. I want to run a transaction that locks the entire Kind when a new 'Customer' is about to be inserted. The transaction would first query to check that the new 'Customer' Name does not already exist, then the 2nd part of the transaction runs the insert if no matches are found. This way I'm enforcing a Unique Constraint (and also restricting the operation to approx 1 insert per second).
My unsatisfactory solution to getting all my 'Customer' entitys in the same entity group is to create a Kind called 'EntityGroups', with a single record called 'CustomersGroup'. This one record is used every time as the Parent of newly created 'Customer' entities, thereby grouping the entire Kind into one entity group.
My question is: I am concerned about using a phantom record such as 'CustomerGroup' because if anything happened and it were lost or deleted, I could not assign any new 'Customer' entities to the same group! I imagine it would be better to assign the Parent of each 'Customer' entity a static arbitrary parent, such as '1111111'? I think the terminology is "virtual root entity", how do I do this?
Please help with any advice on how I can best handle this!
Why don't you use: NDB's get_or_insert: Transactionally retrieves an existing entity or creates a new one.
https://developers.google.com/appengine/docs/python/ndb/modelclass#Model_get_or_insert
Your CustomerGroup record does not need to exist for it to act as a parent. Just create it's key by hand and assign it as the parent to the record in question.
You don't need to worry about it being deleted if it does not exist!
When you create a model and set another as it's parent the system does not check (nor does it need to ) that that model actually exists at all.
So for example:
rev_key = ndb.Key('CustomerGroup', '11111', 'Customer', 'New_Customer_Name')
Yet a model with a key of: ('CustomerGroup', '11111') does not actually exist but it can still be in the ancestor chain.
GrantsV, you can achieve this by creating a proxy entity for each unique constraint and using cross-group transactions to commit the constraints with the normal writes.
class UniqueConstraint(db.Model):
# Consider adding a reference to the owner of the constraint.
#db.transactional(propagation=db.MANDATORY, xg=True)
#classmethod
def reserve(cls, kind, property, value):
key = cls.__get_key(kind, property, value)
if db.get(key):
raise Exception # Already exists
cls(key=key).put()
#db.transactional(propagation=db.MANDATORY, xg=True)
#classmethod
def release(cls, kind, property, value):
db.delete(cls.__get_key(kind, property, value))
#classmethod
def __get_key(cls, kind, property, value):
# Consider using a larger entity group.
return db.Key.from_path(cls.kind(), '%s:%s:%s' % (kind, property, value))
# To restrict to 1 insert per second per kind, use:
# return db.Key.from_path(cls.kind(), kind, cls.kind(), '%s:%s' % (property, value))
You can create a parent entity, like this:
class CustomerParent(ndb.Model):
pass
Then you instantiate and store your parent entity:
customers_parent = CustomerParent()
customers_parent.put()
Finally, when you create all your customer entities, you specify the parent:
a_customer = Customer(parent=customers_parent.key, ...)
a_customer.put()
Hope this helps!