how can I make custom auto generated IDs for documents in firestore - reactjs

For my project I need ids that can be easily shared, so firestores default auto generated ids won't work.
I am looking for a way to auto generate id like 8329423 that would be incremented or randomly chosen in range 0 to 9999999.

Firestore's auto-ID fields are designed to statistically guarantee that no two clients will ever generate the same value. This is why they're as long as they are: it's to ensure there is enough randomness (entropy) in them.
This allows Firestore to determine these keys completely client-side without needing to look up on the server whether the key it generated was already generated on another client before. And this in turn has these main benefits:
Since the keys are generated client-side, they can also be generated when the client is not connected to any server.
Since the keys are generated client-side, there is no need for a roundtrip to the server to generate a new key. This significantly speeds up the process.
Since the keys are generated client-side, there is no contention between clients generating keys. Each client just generates keys as needed.
If these benefits are important to your use-case, then you should strongly consider whether you're likely to create a better unique ID than Firestore already does. For example, Firestore's IDs have 62^20 unique values, which is why they're statistically guaranteed to never generate the same value over a very long period of time. Your proposed range of 0 - 9999999 has 1 million unique values, which is much more likely to generate duplicate.
If you really want this scheme for IDs, you will need to store the IDs that you've already given out on the server (likely in Firestore), so that you can check against it when generating a new key. A very common way to do this is to keep a counter of the last ID you've already handed out in a document. To generate a new unique ID, you:
Read the latest counter value from the document.
Increment the counter.
Write the updated counter value to the document.
Use the updated counter value in your code.
Since this read-update-write happens from multiple clients, you will need to use a transaction for it. Also note that the clients now are coordinating the key-generation, so you're going to experience throughput limits on the number of keys you can generate.

Related

Working with accumulated bucket values in Entity Framework

I'm attempting to find design patterns/strategies for working with accumulated bucket values in a database where concurrency can be a problem. I don't know the proper search terms to use to find information on the topic.
Here's my use case (I'm using code-first Entity Framework, so EF-specific advice is welcome):
I have a database table that contains a quantity value. This quantity value can be incremented or decremented by multiple clients at the same time (due to this, I call this value a "bucket" value as it is a bucket for a bunch of accumulated activity; this is in opposition of the other strategy where you keep all activity and calculate the value based on the activity). I am looking for strategies on ensuring accuracy of this "bucket" value (within the context of EF) that takes into consideration that multiple clients may attempt to change it simultaneously (concurrency).
The answer "you must track activity and derive your value from that activity" is acceptable, but I want to consider all bucket-centric solutions as well.
I am looking for advice on search terms to use to find good information on this topic as well as specific links.
Edit: You may assume that all activity is relative to the "bucket" value (no clients will be making an absolute change to the value; they will only increment or decrement).
Without directly coding the SQL Queries that update the buckets, you would have to use client-side Optimistic Concurrency. See Entity Framework Optimistic Concurrency Patterns. Clients whose update would overwrite a change will get an exception, after which you can reload with the current value and retry. This pattern requires a ROWVERSION column on the target table.
If you code the updates in TSQL you can code an atomic update, something like
update foo with (updlock)
set bucket_a = bucket_a + 1
output inserted.*
where id = #id
(The 'updlock' isn't strictly necessary in this query, but is good form any time you want to ensure this kind of isolation)

Creating Fixed Width ID Based On Serial Number in Python NDB Datastore

I have a model named UserModel and I know that it will never grow beyond 10000 entities. I don't have anything unique in the UserModel which I can use for creating a key. Hence I decided to have string keys which are of this format USRXXXXX.
Where XXXXX represent the serial count. e.g USR00001, USR12345
Hence I chose to have a following way to generate the IDs
def generate_unique_id():
qry = UserModel.query()
num = qry.count() + 1
id = 'USR' + '%0.5d' % num
return id
def create_entity(model, id, **kwargs):
ent = model.get_or_insert(id, **kwargs)
# check if its the newly created record or the existing one
if ent.key.id() != id:
raise InsertError('failed to add new user, please retry the operation)
return True
Questions:
Is this the best way of achiving serial count of fixed width. Whethe this solution is optimal and idiomatic?
Does using get_or_insert like above guarantees that I will never have duplicate records.
Will it increase my billing, becuase for counting the number of records I an doing UserModel.query() without any filters. In a way I am fetching all the records. Or billing thing will not come in picture till I user fetch api on the qry object?
Since you only need a unique key for the UserModel entities, I don't quite understand why you need to create the key manually. The IDs that are generated automatically by App Engine are quaranteed to be unique.
Regarding your questions, we have the following:
I think not. Maybe you should first allocate IDs (check section Using Numeric Key IDs), order it, and use it.
Even though get_or_insert is strong consistent, the query you perform (qry = UserModel.query()) is not. Thus, you may result in overwriting existing entities. For more information about eventual consistency, take a look here.
No, it will not increase your billing. When you execute Model.query().count(), the datastore under the hood executes a Model.query().fetch(keys_only=True) and counts the number of results. Keys-only queries generate small datastore operations, which based on latest pricing changes by Google are not billable.
Probably not. You might get away with what you are trying to do if your UserModel entities have ancestors for stronger consistency.
No, get_or_insert does not guarantee that you won't have duplicates. Although you are unlikely to have duplicates in this case, you are more likely to loose data. Say you are inserting two entities with no ancestors - Model.query().count() might take some time to reflect the creation of the first entity causing the second entity to have the same ID as the first one and thus overwriting it (i.e. you end up with the 2nd entity only that has the ID of the first one).
Model.query().count() is short for len(Model.query().fetch()) (although with some optimizations) so every time you generate an ID you fetch all entities.

When is it appropriate to use UUIDs for a web project?

I'm busy with the database design of a new project, and I'm not sure whether to use UUIDs or normal table-unique auto-increment ids.
Up to now, the sites I've built have all run on a single server, and very heavy traffic has never been too much of a concern. However, this web application will eventually run concurrently on multiple servers, serve an API, and need to process thousands of requests per second, and I want to make sure that the design I choose now doesn't cripple any of those possibilities later.
I have my suspicions, of course, and they should be clear through the way I phrased my question, but I would like to hear from those with more experience what trouble I can run into later if I do or don't have UUIDs, and what I should really be basing my decision on.
So, in short: What are the considerations I should give into deciding whether or not to use UUIDs for all database models, so that any one object can be identified uniquely by one string, and when is it appropriate to use this as the primary key, instead of table-by-table auto-increment?
Note: I've seen this question (When are you truly forced to use UUID as part of the design?), and read all the answers, but they mostly answer "How rarely do UUIDs collide", instead of "When is it appropriate to use them".
One consideration that I've used when deciding on UUIDs vs. auto-increment ids is whether they're going to be user-visible, and if so, whether I want users to know how many I have of that table. For example, if I didn't want to make public the number of registered users my site has, I wouldn't assign auto-increment user ids.
And to address one other specific point you raised, it's still possible to use auto-incrementing ids with multiple servers (though not with the built-in MySQL). You just need to start all the ids at different offsets, and increment accordingly. That is, if you had 3 servers, you could start server A at 1, server B at 2, and server C at 3, and then increment the ids by 10 each time instead of 1. That way, you could guarantee no collisions.
And finally, the last thing I consider is how important performance is to my application. Integers are much more easily indexed than UUIDs that are string-based, so indexes are smaller, more quickly searched, etc.
UUID's or GUID's can be very useful especially for the web. If you use auto-increment values to store UserId anyone can view the source of your web pages and see the simplicity of it's use. They could then try any integer value to get data they are not supposed to see.
GUID's are not created in any sequential format, therefore if you create them one right after the other, there sequence can not easily be guessed.
I don't think it's necessary to use GUID's for simple lookup type data such as ColorId 1=Blue, 2=Red, 3=Green.
GUID's are also very useful for session and state management.
That's my $0.02

Enforcing Unique Constraint in GAE

I am trying out Google App Engine Java, however the absence of a unique constraint is making things difficult.
I have been through this post and this blog suggests a method to implement something similar. My background is in MySQL.Moving to datastore without a unique constraint makes me jittery because I never had to worry about duplicate values before and checking each value before inserting a new value still has room for error.
"No, you still cannot specify unique
during schema creation."
-- David Underhill talks about GAE and the unique constraint (post link)
What are you guys using to implement something similar to a unique or primary key?
I heard about a abstract datastore layer created using the low level api which worked like a regular RDB, which however was not free(however I do not remember the name of the software)
Schematic view of my problem
sNo = biggest serial_number in the db
sNo++
Insert new entry with sNo as serial_number value //checkpoint
User adds data pertaining to current serial_number
Update entry with data where serial_number is sNo
However at line number 3(checkpoint), I feel two users might add the same sNo. And that is what is preventing me from working with appengine.
This and other similar questions come up often when talking about transitioning from a traditional RDB to a BigTable-like datastore like App Engine's.
It's often useful to discuss why the datastore doesn't support unique keys, since it informs the mindset you should be in when thinking about your data storage schemes. The reason unique constraints are not available is because it greatly limits scalability. Like you've said, enforcing the constraint means checking all other entities for that property. Whether you do it manually in your code or the datastore does it automatically behind the scenes, it still needs to happen, and that means lower performance. Some optimizations can be made, but it still needs to happen in one way or another.
The answer to your question is, really think about why you need that unique constraint.
Secondly, remember that keys do exist in the datastore, and are a great way of enforcing a simple unique constraint.
my_user = MyUser(key_name=users.get_current_user().email())
my_user.put()
This will guarantee that no MyUser will ever be created with that email ever again, and you can also quickly retrieve the MyUser with that email:
my_user = MyUser.get(users.get_current_user().email())
In the python runtime you can also do:
my_user = MyUser.get_or_create(key_name=users.get_current_user().email())
Which will insert or retrieve the user with that email.
Anything more complex than that will not be scalable though. So really think about whether you need that property to be globally unique, or if there are ways you can remove the need for that unique constraint. Often times you'll find with some small workarounds you didn't need that property to be unique after all.
You can generate unique serial numbers for your products without needing to enforce unique IDs or querying the entire set of entities to find out what the largest serial number currently is. You can use transactions and a singleton entity to generate the 'next' serial number. Because the operation occurs inside a transaction, you can be sure that no two products will ever get the same serial number.
This approach will, however, be a potential performance chokepoint and limit your application's scalability. If it is the case that the creation of new serial numbers does not happen so often that you get contention, it may work for you.
EDIT:
To clarify, the singleton that holds the current -- or next -- serial number that is to be assigned is completely independent of any entities that actually have serial numbers assigned to them. They do not need to be all be a part of an entity group. You could have entities from multiple models using the same mechanism to get a new, unique serial number.
I don't remember Java well enough to provide sample code, and my Python example might be meaningless to you, but here's pseudo-code to illustrate the idea:
Receive request to create a new inventory item.
Enter transaction.
Retrieve current value of the single entity of the SerialNumber model.
Increment value and write it to the database
Return value as you exit transaction.
Now, the code that does all the work of actually creating the inventory item and storing it along with its new serial number DOES NOT need to run in a transaction.
Caveat: as I stated above, this could be a major performance bottleneck, as only one serial number can be created at any one time. However, it does provide you with the certainty that the serial number that you just generated is unique and not in-use.
I encountered this same issue in an application where users needed to reserve a timeslot. I needed to "insert" exactly one unique timeslot entity while expecting users to simultaneously request the same timeslot.
I have isolated an example of how to do this on app engine, and I blogged about it. The blog posting has canonical code examples using Datastore, and also Objectify. (BTW, I would advise to avoid JDO.)
I have also deployed a live demonstration where you can advance two users toward reserving the same resource. In this demo you can experience the exact behavior of app engine datastore click by click.
If you are looking for the behavior of a unique constraint, these should prove useful.
-broc
I first thought an alternative to the transaction technique in broc's blog, could be to make a singleton class which contains a synchronized method (say addUserName(String name)) responsible adding a new entry only if it is unique or throwing an exception. Then make a contextlistener which instantiates a single instance of this singleton, adding it as an attribute to the servletContext. Servlets then can call the addUserName() method on the singleton instance which they obtain through getServletContext.
However this is NOT a good idea because GAE is likely to split the app across multiple JVMs so multiple singleton class instances could still occur, one in each JVM. see this thread
A more GAE like alternative would be to write a GAE module responsible for checking uniqueness and adding new enteries; then use manual or basic scaling with...
<max-instances>1</max-instances>
Then you have a single instance running on GAE which acts as a single point of authority, adding users one at a time to the datastore. If you are concerned about this instance being a bottleneck you could improve the module, adding queuing or an internal master/slave architecture.
This module based solution would allow many unique usernames to be added to the datastore in a short space of time, without risking entitygroup contention issues.

mapping encoded keys to shorter identifiers in appengine

I want to send unique references to the client so that they client can refer back to specific objects. The encoded keys appengine provides are sometimes 50 bytes long, and I probably only need two or three bytes (I could hope to need four or five, but that won't be for a while!).
Sending the larger keys is actually prohibitively expensive, since I might be sending 400 references at a time.
So, I want to map these long keys to much shorter keys. An obvious solution is to store a mapping in the datastore, but then when I'm sending 400 objects I'm doing 400 additional queries, right? Maybe I mitigate the expense by keeping copies of the mappings in memcache as well. Is there a better way?
Can I just yank the number out of the unencoded keys that appengine creates and use that? I only need whatever id I use to be unique per entity kind, not across the whole app.
Thanks,
Riley
Datastore keys include extra information you don't need - like the app ID. So you definitely do not need to send the entire keys.
If these references are to a particular Kind in your datastore, then you can do even better and just send the key_name or numeric ID (whichever your keys use). If the latter is the case, then you could transmit each key with just a few bytes (you could opt for either a variable-length or fixed-length integer encoding depending on which would be more compact for your specific case [probably the former until most of the IDs you're sending get quite large]).
When you receive these partial keys back from the user, it should be easy to reconstruct the full key which you need to retrieve the entities from the datastore. If you are using the Python runtime, you could use db.Key.from_path(kind_name, numeric_id_or_key_name).
A scheme like this should be both simpler and (a lot) faster than trying to use the datastore/memcache to store a custom mapping.
You don't need a custom mapping mechanism. Just use entity key names to store your short identifier :
entity = MyKind(key_name=your_short_id)
entity.put()
Then you can fetch these short identitiers in one query :
keys = MyKind.all(keys_only=True).filter(...).fetch(400)
short_ids = [key.name() for key in keys]
Finally, use MyKind.get_by_key_name(short_id) in order to retrieve entities from identifiers sent back by your users.

Resources