I have a datastore transaction where I create an entity (a user) letting datastore generate the ID for me.
I then would like to use that ID so that I can create another entity (of another kind).
While this is possible with regular datastore 'save' api:
datastore.save(user).then(() => {
userId = user.key.id // returns entity's ID created by datastore
})
However, this does not seem possible when using a transaction:
transaction.save(user).then(() => {
console.log( user.key.id )
})
The above outputs "cannot read 'then' of undefined'",
even though there's no callback listed in the docs, i tried anyway.
When creating/saving an entity with a transaction, how can I retrieve that entity's autogenerated ID?
Allocate IDs
You can use Method: projects.allocateIds before entering the transaction:
Allocates IDs for the given keys, which is useful for referencing an entity before it is inserted.
Using the Python client library.
This will make Datastore 'reserve' IDs (even though there still won't be any entities created with those IDs) and avoid ID collision before inserting.
Inside your transaction, you get one ID from those allocated and assign to the first entity. Then you can reference the same ID in your second entity, even before the transaction committed the first entity insert.
Best practice for transactions
Transactions have a maximum duration of 60 seconds with a 10 second idle expiration time after 30 seconds.
Because of that, the best practice is performing everything possible before entering your transaction. Allocate IDs is, of course, one of the things you can run before your transaction.
Related
I want to create user using ndb such as below:
def create_user(self, google_id, ....):
user_keys = UserInformation.query(UserInformation.google_id == google_id ).fetch(keys_only=True)
if user_keys: # check whether user exist.
# already created
...(SNIP)...
else:
# create new user entity.
UserInformation(
# primary key is incompletekey
google_id = google_id,
facebook_id = None,
twitter_id = None,
name =
...(SNIP)...
).put()
If this function is called twice in the sametime, two user is created.("Isolation" is not ensure between get() and put())
So, I added #ndb.transactional to above function.
But following error is occured.
BadRequestError: Only ancestor queries are allowed inside transactions.
How to ensure isolation with non-ancestor query?
The ndb library doesn't allow non-ancestor queries inside transactions. So if you make create_user() transactional you get the above error because you call UserInformation.query() inside it (without an ancestor).
If you really want to do that you'd have to place all your UserInformation entities inside the same entity group by specifying a common ancestor and make your query an ancestor one. But that has performance implications, see Ancestor relation in datastore.
Otherwise, even if you split the function in 2, one non-transactional making the query followed by a transactional one just creating the user - which would avoid the error - you'll still be facing the datastore eventual consistency, which is actually the root cause of your problem: the result of the query may not immediately return a recently added entity because it takes some time for the index corresponding to the query to be updated. Which leads to room for creating duplicate entities for the same user. See Balancing Strong and Eventual Consistency with Google Cloud Datastore.
One possible approach would be to check later/periodically if there are duplicates and remove them (eventually merging the info inside into a single entity). And/or mark the user creation as "in progress", record the newly created entity's key and keep querying until the key appears in the query result, when you finally mark the entity creation as "done" (you might not have time to do that inside the same request).
Another approach would be (if possible) to determine an algorithm to obtain a (unique) key based on the user information and just check if an entity with such key exists instead of making a query. Key lookups are strongly consistent and can be done inside transactions, so that would solve your duplicates problem. For example you could use the google_id as the key ID. Just an example, as that's not ideal either: you may have users without a google_id, users may want to change their google_id without loosing other info, etc. Maybe also track the user creation in progress in the session info to prevent repeated attempts to create the same user in the same session (but that won't help with attempts from different sessions).
For your use case, perhaps you could use ndb models' get_or_insert method, which according to the API docs:
Transactionally retrieves an existing entity or creates a new one.
So you can do:
user = UserInformation.get_or_insert(*args, **kwargs)
without risking the creation of a new user.
The complete docs:
classmethod get_or_insert(*args, **kwds)source Transactionally
retrieves an existing entity or creates a new one.
Positional Args: name: Key name to retrieve or create.
Keyword Arguments
namespace – Optional namespace. app – Optional app ID.
parent – Parent entity key, if any.
context_options – ContextOptions object (not keyword args!) or None.
**kwds – Keyword arguments to pass to the constructor of the model class if an instance for the specified key name does not already
exist. If an instance with the supplied key_name and parent already
exists, these arguments will be discarded. Returns Existing instance
of Model class with the specified key name and parent or a new one
that has just been created.
Given below entity in google app engine datastore, is it better to define index on reportingIds or define a separate entity which has only personId and reportingIds fields? Based on the documentation I understood, defining index results in increase of count of operations against datastore quota.
Below are entities in GAE Go. My code needs to scan through Person entities frequently. It needs to limit its scan to Person entity that has at least 1 reporting person. 2 approaches I see. Define index on reportingIds and Query by specifying filters. Create/Update PersonWithReporters entity when ever a Person gets a new reporting person. In the second case, my code needs to iterate through all the entities in PersonWithReporters and need not construct any index/query. I can iterate using Key which is always guaranteed to have the latest data. Not sure which approach is beneficial considering datastore operation counts against quota limit.
type Person struct {
Id string //unique person id
//many other personal details, his personal settings etc
reportingIds []string //ids of the Person this guy manages
}
type PersonWithReporters struct {
Id string //Person managing reportees
reportingIds []string //ids of the Person this guy manages
}
A approach with a separate entity gives you two advantages.
As you have already mentioned, you don't need to index/query all Person entities.
Every time a Person gets a new reporting person, you will create a new entity, which may be significantly cheaper than updating a Person entity which has many other properties, some of which, presumably, are indexed.
Your approach with a separate entity is also not ideal. When you index a property with multiple values, under the hood the Datastore creates an index entry for each value. So, when you add reporting person number 3 to this entity, you have to update 3 index entries instead of 1.
You can optimize your data model even further by creating a Reporter entity with no properties! Every time a new reporting person is added, you create this Reporter entity with ID set to the ID of a reporting person, and make it a child entity of a Person entity representing a person to whom this reporter reports.
Now, when you need to iterate through all persons with someone reporting to them, you run a simple query on this Reporter entity - no filters. This query can be set to keys-only (there is nothing than a key in this entity anyway, but keys-only queries are treated differently - they are basically free).
For every entity returned by this query you retrieve its key, and this key contains an ID (which is an ID of a reporting person), and a parent key, which includes an ID of a person who this reporter reports to.
Unless AppEngine's datastore in Go is very different to how it works in Java or Python you cannot index an array natively - So option 1 is out of the question, and so is option 2.
I suggest option three, which is to define a
type PersonWithReporters {
Id string // concatenate(managing_Person_id, separator, reporter_Person_id) to avoid id collisions
reportingId string; // indexed
managingId string; // probably indexed as well
}
You would create multiple of these entities instead of a single entity with an array. Also you add an index on reportingId. Now you can create a filter query on this entity and should be able to retrieve the desired information.
I would worry more about performance and not too much about the quota limits, they are pretty high. Just implement it, see how it works and whether quota is your main concern here.
I'm trying to get my head around a login model that uses several authentication methods.
1.) For example, when a new user tries to log in with OpenID my backend is going to insert two entities into the datastore:
Insert a new user, where the automatically inserted id will be his $userId
(kind: User, id: autoId)
Insert a new login that is linked to the $userId
(kind: AuthOpenid, name: $openId), Property(userId: $userId)
This will allow me to make lookup by key requests when a user tries to log in, which enforces strongly consistent data, right?
The idea is that one user can have many logins (like stackexchange) and I don't have to worry about write/read limits because no entities have ancestors while still enforcing consistency.
2.) On a related note: Assuming my users are allowed to pick a username once they have provided an authentication method, how do I efficiently check if a username is taken?
My idea was to insert a new entity for every picked username.
Insert a new username
(kind: Username, name: $username)
Now I can simply make a lookup by key request to see if a username is taken. As far as I know, common lookups will be stored in memcache anyways, so this should be efficient, right?
I could also reverse the procedure and just attempt to insert a username and see if it fails.
1) Your approach looks good. As you've noted, Lookup operations (lookup by key) are guaranteed to return consistent results.
You're also correct that by putting each AuthOpenid entity in its own entity group (no common ancestor), you will avoid the write throughput limit of 1 write/second on any particular entity group (there's no corresponding limit on rate of entity group creation).
2) This will also work, but you will need to execute the read and write operations as part of a transaction. This ensures that if two users try to reserve the same username, only one of them will succeed.
In Cloud Datastore, an insert mutation will fail if an entity with the same key already exists, so this will also work.
(Note that this is different from the put() operation in the App Engine Datastore which uses upsert semantics.)
I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.
Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.
Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.
Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14
The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.
Here are the questions I have:
1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.
2) If no ORDER BY is specified, what order are entities returned in a query?
3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?
Little less important, but food for thought
4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?
5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?
6) If all entities belong to the same entity group, are all entities indexed at the same time?
Thanks for the responses.
All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.
ORDER BY defaults to ASC, i.e. ascending order.
Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.
Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).
and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.
I have some entities of a kind, and I need to keep them within limited amount, by discarding old ones. Just like log entries maintainance. Any good approach on GAE to do this?
Options in my mind:
Option 1. Add a Date property for each of these entities. Create cron job to check datastore statistics daily. If it exceeds the limit, query some entities of that kind and sort by date with oldest first. Delete them until the size is less than, for example, 0.9 * max_limit.
Option 2. Option 1 requires an additional property with index. I observed that the entity key ids may be likely increasing. So I'd like to query only keys and sort by ascending order. Delete the ones with smaller ids. It does not require additional property (date) and index. But I'm seriously worrying about whether the key id is assured to go increasingly?
I think this is a common data maintainance task. Is there any mature way to do it?
By the way, a tiny ad for my app, free and purely for coder's fun! http://robotypo.appspot.com
You cannot assume that the IDs are always increasing. The docs about ID gen only guarantee that:
IDs allocated in this manner will not be used by the Datastore's
automatic ID sequence generator and can be used in entity keys without
conflict.
The default sort order is also not guaranteed to be sorted by ID number:
If no sort orders are specified, the results are returned in the order
they are retrieved from the Datastore.
which is vague and doesn't say that the default order is by ID.
One solution may be to use a rotating counter that keeps track of the first element. When you want to add new entities: fetch the counter, increment it, mod it by the limit, and add a new element with an ID as the value of the counter. This must all be done in a transaction to guarantee that the counter isn't being incremented by another request. The new element with the same key will overwrite one that was there, if any.
When you want to fetch them all, you can manually generate the keys (since they are all known), do a bulk fetch, sort by ID, then split them into two parts at the value of the counter and swap those.
If you want the IDs to be unique, you could maintain a separate counter (using transactions to modify and read it so you're safe) and create entities with IDs as its value, then delete IDs when the limit is reached.
You can create a second entity (let's call it A) that keeps a list of the keys of the entities you want to limit, like this (pseudo-code):
class A:
List<Key> limitedEntities;
When you add a new entity, you add its key in the list of A. If the length of the list exceeds the limit, you take the first element of the list and the remove the corresponding entity.
Notice that when you add or delete an entity, you should modify the list of entity A in a transaction. Since, these entities belong to different entity groups, you should consider using Cross-Group Transactions.
Hope this helps!