Assuming the IDs have not been used by calling put() for an Entity. How long would the allocated IDs stick around for? Are they ever put back into use by datastore? Or are they allocated forever?
The documentation says,
These keys are guaranteed not to have been returned previously by the
data store's internal ID generator, nor will they be returned by
future calls to the internal ID generator.
I'll go out on a limb and say that the use of 'guarantee' and 'future' above means forever.
Related
I'm making a class that takes an object and assigns it an ID number that's unique. It's nothing special, it takes the object, assigns a number to it, and then increments the ID counter for the next object that will receive an ID number.
I want to call the member UUID, or GUID, standing for Universally unique identifier, or Globally unique identifier, because this language is very clear for what the member is.
However I looked up the term and Wikipedia says:
A universally unique identifier (UUID) is a 128-bit number used to
identify information in computer systems.
UUID
Which makes me think the terms have a very specific meaning, and that possibly my use of it to just mean a unique number given to each object is not proper usage. I'm thinking of using 32-bit int or the like.
Is this an incorrect use of UUID or GUID? I don't think it matters, but I'm writing in C++.
Yes, I think it would be a misuse of those terms. The word universally in UUID or globally in GUID means that the identifier is not only unique within your specific system, but within any system developed for any purpose, anywhere. A 32-bit integer that you simply increment for each new entity doesn't have that property. It may be unique within your system, but not universally. I would just call it Identifier or something similar.
You will find your answer here. in c++ you have a method for creating guid:
GUID gidReference;
HRESULT hCreateGuid = CoCreateGuid( &gidReference );
what it means is:
The CoCreateGuid function calls the RPC function UuidCreate, which
creates a GUID, a globally unique 128-bit integer. Use CoCreateGuid
when you need an absolutely unique number that you will use as a
persistent identifier in a distributed environment.To a very high
degree of certainty, this function returns a unique value – no other
invocation, on the same or any other system (networked or not), should
return the same value.
and when we dive in we see:
For security reasons, it is often desirable to keep ethernet addresses
on networks from becoming available outside a company or organization.
The UuidCreate function generates a UUID that cannot be traced to the
ethernet address of the computer on which it was generated. It also
cannot be associated with other UUIDs created on the same computer. If
you do not need this level of security, your application can use the
UuidCreateSequential function, which behaves exactly as the UuidCreate
function does on all other versions of the operating system.
I dim it important so you will not only know that you misused it, but also that you will know why, because in future days this information may be of important value to you.
When you insert an Entity into datastore with a #id Long id; property, the datastore automatically creates a random (or what seems like a random) Long value as the id that looks like: 5490350115034675.
I would like to set the Long id myself but have it be randomly generated from datastore.
I found this piece of code that seems to do just that:
Key<MyEntity> entityKey = factory().allocateId(MyEntity.class);
Long commentId = entityKey.getId();
Then I can pass in the commentId into the constructor of MyEntity and subsequently save it to the datastore.
When I do that however, I do not seem to get a randomly generated id, it seems to follow some weird pattern where the first allocated id is 1 and the next one is 10002, then 20001 and so on.
Not sure what all that means and if it is safe to continue using... Is this the only way to do this?
When you use the autogenerated ids (ie Long), GAE uses the 'scattered' id generator which gives you ids from a broad range of the keyspace. This is because high volume writing (thousands per second) of more-or-less contiguous values in an index results in a lot of table splitting, hurting performance.
When you use allocateId(), you get an id from the older allocator that was used before scattered ids. They aren't necessarily contiguous or monotonic but they tend to start small and grow.
You can mix and match; allocations will never conflict.
I presume, however, that you want random-looking ids because you want them to be hard to guess. Despite their appearance at first glance, the scattered id allocator does not produce unguessable ids. If you want sparse ids that will prevent someone from scanning your keyspace, you need to explicitly add a random element. Or just use UUID.randomUUID() in the first place.
App Engine allocates IDs using its own internal algorithm designed to improve datastore performance. I would trust App Engine team to do their magic.
Introducing your own scheme for allocating IDs is not as simple - you have to account for eventual consistency, etc. And it's unlikely that you will gain anything, performance-wise, from all this effort.
After several months of evaluating and reevaluating and planing different data structures and web/application servers I'm now at a point where I need to bang my head around with implementation details. The (at the moment theoretical) question I'm facing is this:
Say I'm using GWANs KV store to store C structs for Users and the like (works fine, tested), how should I go about removing these objects from KV, and later on from memory, without encountering a race condition?
This is what I'm at at the moment:
Thread A:
grab other objects referencing the one to be deleted
set references to NULL
delete object
Thread B:
try to get object -> kv could return object, as it's not yet deleted
try to do something with the object -> could already be deleted here, so I would access already freed memory?
or something else which could happen:
Thread B:
get thing referencing object
follow reference -> object might not be deleted here
do something with reference -> object might be deleted here -> problem
or
Thread B:
got some other object which could reference the to be deleted object
grab object which isn't yet deleted
set reference to object -> object might be deleted here -> problem
Is there a way to avoid those kind of conditions, except for using locks? I've found a latitude of documents describing algorithms dealing with different producer/consumer situations, hashtables, ... with even sometimes wait free implementations (I haven't yet found a good example to show me the difference between lock-free and wait-free, though I get it conceptually), but I haven't been able to figure out how to deal with these kind of things.
Am I overthinking this, or is there maybe an easy way to avoid all these situations? I'm free to change the data- and -storage layout in any way I want, and I can use processor specific instructions freely (e.g. CAS)
Thanks in advance
Several questions there:
deleting a GWAN KV stored struct
When removing a KV from a persistence pointer or freeing the KV, you have to make sure that nobody is dereferencing freed data.
This is application dependent. You can introduce some tolerance by using G-WAN memory pools which will make data survive a KV deletion as long as the memory is not overwrited (or the pool freed).
deleting a GWAN KV key-value pair
G-WAN's KV store does the bookkeeping (using atomic intrinsics) to protect values fetched by threads and unprotects them after the request has been processed.
If you need to keep data for a longer time, make a copy.
Other storage tools, like in-memory SQLite use locks. In this case, lock granularity is very important.
I would like to implement a deleted key detection for app engine if possible without any extra entities/markers being stored upon deletion, so I can show 404 or 410 response accordingly.
AFAIK new entity key numeric id's are assigned without particular order (at least a simple one), but they are of course reserved/allocated and never implicitly reused for new entities.
So is there a way to check if a particular key was previously allocated, but entity stored under this key was since deleted?
I do not care if a key was manually allocated and never used to store any data, I'll treat it as deleted.
No, there's no way to determine if a key has already been allocated.
You mention that you'll treat allocated but unused keys as deleted, but note that this will result in returning the wrong status code in these cases - including in the potential situation where a key is allocated and later used: you'll mistakenly report it as deleted until it's first used.
I'm copying entities from one kind to another, and want to map their long ids in a predictable way. After the mapping is over, I want auto-generation of ids to kick in.
To protect the entities I copy, I want to use allocateIdRange and manually allocate each id as I copy it. My hope is that this will cause the datastore to protect these new ids, and only assign other ids to new entities created after the copy.
One return code has me worried: CONTENTION
Indicates the given KeyRange is empty but the datastore's automatic ID
allocator may assign new entities keys in this range. However it is
safe to manually assign Keys in this range if either of the following
is true:
No other request will insert entities with the same kind and
parent as the given KeyRange until all entities with manually assigned
keys from this range have been written.
Overwriting entities written by other requests with the same kind and parent as the given KeyRange is acceptable.
Number 2 is out for me. It is not acceptable for these entities to be overwritten.
Number 1 I think is acceptable, but the wording is scary enough that I want to make sure. If I allocate 5 ids, from 100 to 104, and I get CONTENTION back, this seems to indicate that the entities I copy MAY be overwritten with new entities with automatic ids in the future. BUT, if I hurry up and write my own entities with ids manually set to 100, 101, 102, 103, and 104, I will be safe and new entities with automatic ids will NOT receive these ids.
I'm worried because I don't understand how this would work. I don't think of the id allocator as paying attention to what gets written.
TL;DR
Imagine the following scenario:
allocateIdRange(100, 104); // returns CONTENTION
putEntityWithManualId(100);
putEntityWithManualId(101);
putEntityWithManualId(102);
putEntityWithManualId(103);
putEntityWithManualId(104);
// all puts succeed
now, when, later, I call
putNewEntityWithAutomaticId();
is there any risk that the automatic id will be 100, 101, 102, 103, or 104?
The documentation follows as bellow:
The datastore's automatic ID allocator will not assign a key to a new entity that will overwrite an existing entity, so once the range is populated there will no longer be any contention.
Thus, you don't need to worry that your newly copied entities will be overwritten.