Google App Engine (datastore) - will a deleted key regenerate? - google-app-engine

I've got a simple question about datastore keys. If I delete an entity, is there any possibility that the key will be created again? or each key is unique and can be generated only one-time?
Thanks.

It is definitely possible to re-use keys.
Easy to test, for example using the datastore admin page:
create an entity for one of your entity models using a custom/specified key name and some property values
delete the entity
create another one using the same key name and different property values...
As for the keys with auto-generated IDs it is theoretically possible, but I guess rather unlikely due to the high number of possibilities. From Assigning identifiers:
Cloud Datastore can be configured to generate auto IDs using two
different auto id policies:
The default policy generates a random sequence of unused IDs that are approximately uniformly distributed. Each ID can be up to 16
decimal digits long.
The legacy policy creates a sequence of non-consecutive smaller integer IDs.

Related

AppEngine, DataStore: Preallocating normally-distributed IDs (*not* monotonically incrementing)

There are three schemes to set IDs on datastore entities:
Provide your own string or int64 ID.
Don't provide them and let AE allocate int64 IDs for you.
Pre-allocate a block of int64 IDs.
The documentation has this to say about ID generation:
This (1):
Cloud Datastore can be configured to generate auto IDs using two
different auto id policies:
The default policy generates a random sequence of unused IDs that are approximately uniformly distributed. Each ID can be up to 16
decimal digits long.
The legacy policy creates a sequence of non-consecutive smaller integer IDs.
If you want to display the entity IDs to the user, and/or depend upon
their order, the best thing to do is use manual allocation.
and this (2):
Note: Instead of using key name strings or generating numeric IDs
automatically, advanced applications may sometimes wish to assign
their own numeric IDs manually to the entities they create. Be aware,
however, that there is nothing to prevent Cloud Datastore from
assigning one of your manual numeric IDs to another entity. The only
way to avoid such conflicts is to have your application obtain a block
of IDs with the datastore.AllocateIDs function. Cloud Datastore's
automatic ID generator will keep track of IDs that have been allocated
with this function and will avoid reusing them for another entity, so
you can safely use such IDs without conflict.
and this (3):
Cloud Datastore generates a random sequence of unused IDs that are
approximately uniformly distributed. Each ID can be up to 16 decimal
digits long.
System-allocated ID values are guaranteed unique to the entity group.
If you copy an entity from one entity group or namespace to another
and wish to preserve the ID part of the key, be sure to allocate the
ID first to prevent Cloud Datastore from selecting that ID for a
future assignment.
I have a particular entity-type that is stored with an ancestor. However, I'd like to have globally-unique IDs and AE's IDs (allocated via datastore.AllocateIDs with Go) will not be globally unique when stored under an ancestor (in an entity-group). So, pre-allocation would solve this (they're ancestor-agnostic). However, you are obviously given an interval in response... a continuous range of IDs that have been reserved.
Isn't there some way to preallocate those nice, opaque, uniformally-distributed IDs?
While we're on the subject, I had assumed that the opaque IDs from AE were the result of some pseudorandom number generator with a persisted-state for each entity-type, but the word "track" in (2) seems to imply that there is a cost to optimistically generating and buffering IDs that might not be used. It's be great if someone can clarify this.
The simple solution is to do the following:
When trying to allocate a new ID for an entity:
Repeat the following:
Generate a random K bit integer. Use it for the entity ID field. [Use a uniform random distribution].
Create a Cloud Datastore transaction.
Insert the new entity. [If the transaction aborts because the entity already exists try again with a new random number].
If you make K big enough (for example 128) and have a properly seeded random number generator, then it is statistically impossible to generate an ID collision and you can remove the retry loop.
If you make K big enough stop using the integer id field in the entity key and instead use the string one. Base64 URL encode random number as a string.

Generating a unique id GAE datastore

In MySQL I used auto-increment to generate an id for every user. I would like to create a similar user table in Google Datastore where the id for a user will be unique. According to these docs:https://cloud.google.com/appengine/docs/java/datastore/entities
System-allocated ID values are guaranteed unique to the entity group.
But according to this post: Ever see duplicate IDs when using Google App Engine and ndb? the id's are not unique. I need this id to be unique. It is confusing because in the docs it says the id is unique, but from this post it says the id is not unique it is the key that is unique. My objective is for no two users to have the same id. How can I guarantee this? I would prefer for the database to take care of this form me opposed to me having to create large ids manually using things such as uuids.
As Igor correctly observed, IDs are always unique as long as the entity has no parent.
I can't think of any reason to make user entities children of some other entities, so you are safe.
Note that IDs will not be sequential, as it helps to spread the load equally across the entire dataset - it's a by-product of how the Datastore is designed.

Key vs ID/Name?

I do not want to create an autogenerated key for my entities so I specify my own:
Entity employee = Entity.newBuilder().setKey(makeKey("Employee", "bobby"))
.addProperty(makeProperty("fname", makeValue("fname").setIndexed(false)))
.addProperty(makeProperty("lname", makeValue("lname").setIndexed(false)))
.build();
CommitRequest request = CommitRequest.newBuilder()
.setMode(CommitRequest.Mode.NON_TRANSACTIONAL)
.setMutation(Mutation.newBuilder().addInsert(employee))
.build();
datastore.commit(request);
When I check to see what the entity looks like it looks like this:
Why is this auto-generated key generated if I specified my own key (bobby)? It seems bobby was also created, but now I have bobby and this autogenerated key. What is the difference between the key and id/name?
You can't specify your own key, keys actually contain information necessary for the datastore operation. This note in the documentation gives you an idea:
Note: The URL-safe string looks cryptic, but it is not encrypted! It
can easily be decoded to recover the original entity's kind and
identifier:
key = Key(urlsafe=url_string)
kind_string = key.kind()
ident = key.id()
If you use such URL-safe keys, don't use sensitive data such as email
addresses as entity identifiers. (A possible solution would be to use
the MD5 hash of the sensitive data as the identifier. This stops third
parties, who can see the encrypted keys, from using them to harvest
email addresses, though it doesn't stop them from independently
generating their own hash of a known email address and using it to
check whether that address is present in the Datastore.)
What you can specify is the ID portion of the key, either as a number or as a string:
A key is a series of kind-ID pairs. You want to make sure each entity
has a key that is unique within its application and namespace. An
application can create an entity without specifying an ID; the
Datastore automatically generates a numeric ID. If an application
picks some IDs "by hand" and they're numeric and the application lets
the Datastore generate some IDs automatically, the Datastore might
choose some IDs that the application already used. To avoid, this, the
application should "reserve" the range of numbers it will use to
choose IDs (or use string IDs to avoid this issue entirely).
This is the url-safe version of your key, suitable for use in links. Use KeyFactory.stringToKey to convert it to an actual key, and you'll see that it contains your string name.
What you create with makeKey("Employee", "bobby") is a key for an Entity with the entity name Employee and the name bobby. What you see as Key in the datastore viewer is a representation for exactly that.
Generally speaking a key always consists of
optional parent key (with entity type and name/id)
entity type
entity name/id
Maybe someone here can tell you how to decode the key into its components but rest asured that you're doing everything right and the behavior is as expected.

Datastore why use key and id?

I had a question regarding why Google App Engine's Datastore uses a key and and ID. Coming from a relational database background I am comparing entities with rows, so why when storing an entity does it require a key (which is a long automatically generated string) and an ID (which can be manually or automatically entered)? This seems like a big waste of space to identify a record. Again I am new to this type of database, so I may be missing something.
Key design is a critical part of efficient Datastore operations. The keys are what are stored in the built-in and custom indexes and when you are querying, you can ask to have only keys returned (in Python: keys_only=True). A keys-only query costs a fraction of a regular query, both in $$ and to a lesser extent in time, and has very low deserialization overhead.
So, if you have useful/interesting things stored in your key id's, you can perform keys-only queries and get back lots of useful data in a hurry and very cheaply.
Note that this extends into parent keys and namespaces, which are all part of the key and therefore additional places you can "store" useful data and retrieve all of it with keys-only queries.
It's an important optimization to understand and a big part of our overall design.
Basically, the key is built from two pieces of information :
The entity type (in Objectify, it is the class of the object)
The id/name of the entity
So, for a given entity type, key and id are quite the same.
If you do not specify the ID yourself, then a random ID is generated and the key is created based on that random id.

Getting values out of DynamoDB

I've just started looking into Amazon's DynamoDB. Obviously the scalability appeals, but I'm trying to get my head out of SQL mode and into no-sql mode. Can this be done (with all the scalability advantages of dynamodb):
Have a load of entries (say 5 - 10 million) indexed by some number. One of the fields in each entry will be a creation date. Is there an effective way for dynamo db to give my web app all the entries created between two dates?
A more simple question - can dynamo db give me all entries in which a field matches a certain number. That is, there'll be another field that is a number, for argument's sake lets say between 0 and 10. Can I ask dynamodb to give me all the entries which have value e.g. 6?
Do both of these queries need a scan of the entire dataset (which I assume is a problem given the dataset size?)
many thanks
Is there an effective way for dynamo db to give my web app all the
entries created between two dates?
Yup, please have a look at the of the Primary Key concept within Amazon DynamoDB Data Model, specifically the Hash and Range Type Primary Key:
In this case, the primary key is made of two attributes. The first
attributes is the hash attribute and the second one is the range
attribute. Amazon DynamoDB builds an unordered hash index on the hash
primary key attribute and a sorted range index on the range primary
key attribute. [...]
The listed samples feature your use case exactly, namely the Reply ( Id, ReplyDateTime, ... ) table facilitates a primary key of type Hash and Range with a hash attribute Id and a range attribute ReplyDateTime.
You'll use this via the Query API, see RangeKeyCondition for details and Querying Tables in Amazon DynamoDB for respective examples.
can dynamo db give me all entries in which a field matches a certain
number. [...] Can I ask dynamodb to give
me all the entries which have value e.g. 6?
This is possible as well, albeit by means of the Scan API only (i.e. requires to read every item in the table indeed), see ScanFilter for details and Scanning Tables in Amazon DynamoDB for respective examples.
Do both of these queries need a scan of the entire dataset (which I
assume is a problem given the dataset size?)
As mentioned the first approach works with a Query while the second requires a Scan, and Generally, a query operation is more efficient than a scan operation - this is a good advise to get started, though the details are more complex and depend on your use case, see section Scan and Query Performance within the Query and Scan in Amazon DynamoDB overview:
For quicker response times, design your tables in a way that can use
the Query, Get, or BatchGetItem APIs, instead. Or, design your
application to use scan operations in a way that minimizes the impact
on your table's request rate. For more information, see Provisioned Throughput Guidelines in Amazon DynamoDB.
So, as usual when applying NoSQL solutions, you might need to adjust your architecture to accommodate these constraints.

Resources