Google Cloud Datastore unique autogenerated ids - google-app-engine

I'm using Google Cloud Datastore and using namespaces to partition data. Some kinds are using autogenerated IDs from Cloud Datastore by creating keys like this:
var key = Datastore.key([
'example'
]);
This code will generate a key with kind 'example' and Cloud Datastore automatically assign the entity an integer numeric ID. (Source: https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers)
But this "unique" ID is only unique for its namespace. I have seen the same ID in different namespaces.
So my question is, is it possible to tell Cloud Datastore that autogenerated IDs must be unique for all the namespaces?
Maybe this question has not sense, but I would prefer to have unique IDs in all the datastore (if possible).
I have seen "allocateIds" function in Cloud Datastore documentation, but I would like to know if this function take care about namespaces or not, because I've seen I can include them in the request and I'm afraid the IDs are the same than the ones autogenerated by Cloud Datastore.
Thank you in advance!

No: You can not tell Datastore to allocate unique IDs across all entity groups and namespaces.
However there is an easy fix: if you believe in statistics and correctly seeded random number generators you be generally better off if you generate your own GUIDs for keys.
If you don't believe in statistics and random numbers you can still generate a GUID and transactionally verify that it doesn't exist in your Datastore before writing the entity in question.
If you are truly desperate to have Datastore do id allocation for you it is possible to make a call AllocateIds manually and ask it to allocate an id for a constant key. (For example, ask it to allocate for an arbitrary (but unchanging) key in the default namespace, and it will return you an integer which will be unique to use somewhere else).

Related

ASP.NET Core: how to hide database ids?

Maybe this has been asked a lot, but I can't find a comprehensive post about it.
Q: What are the options when you don't want to pass the ids from database to the frontend? You don't want the user to be able to see how many records are in your database.
What I found/heard so far:
Encrypt and decrypt the Id on backend
Use a GUID instead of a numeric auto-incremented Id as PK
Use a GUID together with an auto-incremented Id as PK
Q: Do you know any other or do you have experience with any of these? What are the performance and technical issues? Please provide documentation and blog posts on this topic if you know any.
Two things:
The sheer existence of an id doesn't tell you anything about how many records are in a database. Even if the id is something like 10, that doesn't mean there's only 10 records; it's just likely the tenth that was created.
Exposing ids has nothing to do with security, one way or another. Ids only have a meaning in the context of the database table they reside in. Therefore, in order to discern anything based on an id, the user would have to have access directly to your database. If that's the case, you've got far more issues than whether or not you exposed an id.
If users shouldn't be able to access certain ids, such as perhaps an edit page, where an id is passed as part of the URL, then you control that via row-level access policies, not by obfuscating or attempting to hide the id. Security by obscurity is not security.
That said, if you're just totally against the idea of sequential ids, then use GUIDs. There is no performance impact to using GUIDs. It's still a clustered index, just as any other primary key. They take up more space than something like an int, obviously, but we're talking a difference of 12 bytes per id - hardly anything to worry about with today's storage.

how to keep memcache and datastore in sync

suppose I have million users registered with my app. now there's a new user, and I want to show him who all in his contacts have this app installed. A user can have many contacts, let's say 500. now if I go to get an entity for each contact from datastore then it's very time and money consuming. memcache is a good option, but I've to keep it in sync for that Kind. I can get dedicated memcache for such a large data, but how do I sync it? my logic would be, if it's not there in memcache, assume that that contact is not registered with this app. A backend module with manual scaling can be used to keep both in sync. But I don't know how good this design is. Any help will be appreciated.
This is not how memcache is designed to be used. You should never rely on memcache. Keys can drop at any time. Therefore, in your case, you can never be sure if a contact exists or not.
I don't know what your problem with datastore is? Datastore is designed to read data very fast - take advantage of it.
When new users install your app, create a lookup entity with the phone number as the key. You don't necessarily need any other properties. Something like this:
Entity contactLookup = new Entity("ContactLookup", "somePhoneNumber");
datastore.put(contactLookup);
That will keep a log of who's got the app installed.
Then, to check which of your users contacts are already using your app, you can create an array of keys out of the phone numbers from the users address book (with their permission of course!), and perform a batch get. Something like this:
Set<Key> keys = new HashSet<Key>();
for (String phoneNumber : phoneNumbers)
keys.add(KeyFactory.createKey("ContactLookup", phoneNumber));
Map<Key, Entity> entities = datastore.get(keys);
Now, entities will be those contacts that have your app installed.
You may need to batch the keys to reduce load. The python api does this for you, but not sure about the java apis. But even if your users has 500 contacts, it's only 5 queries (assuming batches of 100).
Side note: you may want to consider hashing phone numbers for storage.
Memcache is a good option to reduce costs and improve performance, but you should not assume that it is always available. Even a dedicated Memcache may fail or an individual record can be evicted. Besides, all this synchronization logic will be very complicated and error-prone.
You can use Memcache to indicate if a contact is registered with the app, in which case you do not have to check the datastore for that contact. But I would recommend checking all contacts not found in Memcache in the Datastore.
Verifying if a record is present in a datastore is fast and inexpensive. You can use .get(java.lang.Iterable<Key> keys) method to retrieve the entire list with a single datastore call.
You can further improve performance by creating an entity with no properties for registered users. This way there will be no overhead in retrieving these entities.
Since you don't use python and therefore don't have access to NDB, the suggestion would be to, when you add a user, add him to memcache and create an async query (or a task queue job) to push the same data to your datastore. Like that memcache gets pushed first, and then eventually the datastore follows. They'll always be in sync.
Then all you need to do is to first query your memcache when you do "gets" (because memcache is always in sync since you push there first), and if memcache returns empty (being volatile and whatnot), then query the actual datastore to "re fill" memcache

Search API, create documents and indexes

I need help in the search API
I am Brazilian and I'm using the google translator to communicate.
My question is:
For each item in the datastore persisted I create a document and an index?
And for those objects that are already persisted in the datastore, I go all the bank to create a document and an index for each, if I want to search for Search API?
I am using java.
It's reasonable to use the Search API to search for objects that are also stored in the Datastore. You can create a Search document for each Datastore entity (so that there's a one-to-one correspondence between them). But you don't need to use a separate Search Index for each one: all the Search documents can be added to one index. Or, if you have a huge number of documents, and if there is some natural partitioning between them, you could distribute them over some modest number of indexes. Assuming you can know via some external means which (single) index to choose for searching, preventing them from getting too big can help performance.
I've tried to answer the question that I think you're asking. It's difficult for me to understand the English that the Google translator has produced. In particular, what does "I go all the bank ..." mean?

Visual understanding of the GAE datastore

I am trying to understand how the Google App Engine (GAE) datastore is designed and how to use it. I am having a bit of a hard time to visualise the structure from the description at the getting started page.
Can somebody explain the datastore using figures for us visually oriented people? Or point to a good tutorial again with visual learning in mind?
I am specifically looking for answers with diagrams/figures that explains how GAE is used.
The 2008 IO session "Under the Covers of the Google App Engine Datastore" has a good visual overview of the datastore.
https://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore
http://snarfed.org/datastore_talk.html
For more IO talks go to:
https://developers.google.com/appengine/docs/videoresources
Very simplified I've understood that GAE can be viewed as a hashmap of hashmaps.
That said you could view it like this:
I guess there's no correct answer here, just different mind models. Depending on your programming background you may find mine enlightning, disturbing or both. I picture the datastore as a single huge distributed key-value collection of buckets that comprises all entity data of any kind in any namespace and all GAE apps of all users. A single bucket is called an entity group. It has a root key which (under the hood) consists of your appID, a namespace, a kind, an entity ID or name. In an entity group resides one ore more entities which have keys extending the root key. The entity belonging to the root key itself may or may not exist. Operations within a single entity group are atomic (transactional). An entity is a simple map-like datastructure. The 2 built-in indexes (ascending and descending) again are 2 giant sorted collections of index entries. Each index entry is a datastructure of appID,namespace,kind,property name,property type,property value,entity key - in that order.
Each (auto-)indexed value of each property of each entity creates 2 such index entries. There's another index with just entity keys in it. Custom indexes however go to yet another sorted collection with entries containing appID,namespace,index type,combined index value, entity key. That's the only part of the whole datastore that uses meta-data. It stores an index definition which tells the store how the combined index value is formed from the entity. This is the picture that's burnt into my mind and from which I know how to make the datastore happy.

alternative to GAE Keys for mySQL?

wondering if the GAE keys (com.google.appengine.api.datastore.Key) can be used with local mysql apps? I presume it's not possible, so if I define my primary keys in my models as longs, do I lose too much of the key functionality, like the KeyService and querying using keys ?
Thanks
No, you do not lose functionality. You can still query using the long keys and the performance is the same.
The 'Key' datastore type allows you to create keys based on a string. For example, you might want to create the key based on the user's email address. You won't be able to have this functionality with long keys.
But in most cases you do not need this option.

Resources