Is it possible to generate duplicate UUIDs in the same millisecond? - uuid

Is it possible to create two duplicate UUIDs one after the other? I'm unfamiliar with how UUIDs are generated, but I would guess that if you created two separate UUIDs from the same MAC address in the same millisecond, then they would be exactly the same. Is this true?
I guess I'm asking two questions in one. I'm very interested to know what parameters are used to generate a random UUID. I'm guessing its more than just timestamp and MAC address.

In the Python UUID package, it takes the timestamp and generates a random number random.randrange(1<<14L) for UUIDv1, so you are taking a nanosecond timestamp plus a random number from 1 to 16384, so... My guess is it would be possible but highly unlikely.
If you are worried about this being an issue, you always have UUIDv3, UUIDv4, and my choice, UUIDv5.

Related

Is it safe to use UUIDs generated from 2 different systems without a chance of collision?

We have an old system that generated some UUIDs. We have more records that need UUID but can't use the old system to generate them, so we will need to generate them elsewhere. This immediately struck me as not a good idea and have been searching for an answer but haven't found this exact question. There would be no check to make sure the UUID wasn't generated already in the old system. The UUID would just get populated for records that don't have one. Safe?
If you use UUID V1, all values generated are guaranteed to be unique. However, most folks avoid V1 because it leaks data about the system that generated it (specially, the MAC address) and the exact time, neither of which is ideal in many contexts.
Most folks use UUID V4, which is statistically unique. While it is theoretically possible for the same value to be generated twice, the odds of it ever actually happening before the heat death of the universe is approximately zero, which is good enough for any practical purpose.
UUID V3/V5 are used for when you want predictable values, which doesn’t sound like a good fit for your needs.

Should I use UUID or something else?

Very often I have a task, where I need to collect an object and I need to know it's ID before saving to DB (PostgreSQL).
I can do this with UUID but it has lots of disadvantages:
- less perfomance when selecting or including
- less perfomance with joins
- need more space
So question is: How can I generate an ID for the object beforehand and minimize UUID negative consequences?
We faced this issue with a project. I ran some tests (about 4M rows, if I remember right) which indicated that uuids didn't really hit PG's performance too badly compared with ints. Having used uuids as primary keys for some time now I wouldn't hesitate to do so again. Although, I must add the caveat we have yet to see how this performs in production on a large scale.
Check this out: http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database
The nice thing about using uuids is you never have to worry about clashes. Not nice thing: they are a bit cumbersome if you're manually entering a query for a test.
If you end up selecting based on a large list of uuids use this trick: https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/
Hope this helps,
Adam.
You can use any uuid generator in whatever programming language to do this. I would suggest using the uuid type in PostgreSQL to avoid the need for too much overhead regarding space or joining. PostgreSQL also does not include a way of generating these so you have to generate them first.
A major issue you may run into is that with numeric ids, a number of things are relatively painless that become a bigger issue with uuids. These include:
Typing in identifiers
Selecting a series of records inserted at a similar time (because numeric ids are sequential).
However if you use the UUID type in PostgreSQL, selection and join performance should not be too bad. And how you generate the UUIDs is up to you as a programmer
Of course the UUID should works in low performance than integer, the question is on which volume of data. To be honestly, 4M data is too small to say whether it will be performance issue, and of course, if the requirements image that the data volume is still less than 4M, that is OK.
In the document https://rclayton.silvrback.com/do-you-really-need-a-uuid-guid , it suggest better on how and when to use UUID

Java. UUID, how does it works

Does the java.util.UUID is unique in each file?
For example:
We have file IDFile1.java, generating here a random id, using UUID.randomUUID().toString().
And we have file2 with name IDFile2.java, which generates another id.
Will this two file IDs collide with each other?
Is there any way to "turn back" used ID, generated from java.util.UUID, that will mean, this ID could be used again?
The purpose of functions like randomUUID is to munge together various pieces of information which are likely to have some amount of randomness such that two UUIDs generated at different times or in different places would have an extremely small likelihood of matching unless all sources of potential randomness happened to yield identical results. No effort is made to keep track of which UUIDs have or have not been issued; instead, the goal is to have enough randomness that the probability of an unintentional match will be small relative to e.g. the probability of a computer being smashed to a million pieces by a meteor strike.
Note that the system may use "number of UUIDs issued" as part of its UUID calculation, but that would only be one of many factors that go into it. The purpose of such a counter isn't to allow one to "go back" to a previous UUID, but rather to ensure that if e.g. two UUIDs are requested nearly simultaneously without any source of randomness having become available between them, the two requests will yield different values.
UUIDs are unrelated to what file generates it. It doesn't matter which file generates them they more than likely will not be the same.
For question 2 the way UUIDs are generated doesn't really allow for them to be regenerated in any meaningful way. They are usually generated based on some info from your computer, the current time and other stuff. The Java algorithm uses a cryptographically secure random number generator and are known as type 4 UUIDs.

IDs for Information on More Than One DB/Server

I'm working on a project that I want to have be as flexible and scalable as possible from the beginning. A problem I'm concerned about is one best described by Joshua Schacter in Founders at Work, who noted it as one detail he wish he would've planned for ahead of time.
Scaling past one machine, one database, is very challenging, even with replication. The tools that are there are not quite right.
For example, when you add things to a table and it numbers them, that means you can't have a second machine also adding to them because the numbers will collide. So what do you do? You have to come up with some completely different way to do it.
Do you have a central server that hands out number sets, or do you come up with something that's not numbers? Do you use random numbers and hope they never collide? Whatever it is, auto-assigned IDs just don't fly.
Has anyone here faced this problem? What are ways to move beyond auto-incremented IDs, or is there a way to have them scale with multiple servers?
Use GUID/UUID (globally/universally unique identifier). In theory it's guaranteed to be unique across multiple machines.
GUIDs, your chances of collision are astronomically low.
It's also possible to have (what we called) SmartGUIDs (usually called COMB GUIDS - see this analysis, particularly page 7) where you can encode a timestamp within the GUID, so you get record creation date information "for free" - so you can save a timestamp column for record creation datetime - which gets back some of what you lost on moving from 32-bit integer to 128-bit GUID. These can also be guaranteed to be monotonic, unlike regular GUIDs, which can be useful for clustered indexes and for sorting.
You can also use composite keys with some kind of server/db ID with a regular auto-increment identity or auto-number.

What is a UUID?

Well, what is one?
It's an identification number that will uniquely identify something. The idea being that id number will be universally unique. Thus, no two things should have the same uuid. In fact, if you were to generate 10 trillion uuids, there would be something along the lines of a .00000006 chance of two uuids being the same.
Standardized identifiers
UUIDs are defined in RFC 4122. They're Universally Unique IDentifiers, that can be generated without the use of a centralized authority. There are four major types of UUIDs which are used in slightly different scenarios. All UUIDs are 128 bits in length, but are commonly represented as 32 hexadecimal characters separated by four hyphens.
Version 1 UUIDs, the most common, combine a MAC address and a timestamp to produce sufficient uniqueness. In the event of multiple UUIDs being generated fast enough that the timestamp doesn't increment before the next generation, the timestamp is manually incremented by 1. If no MAC address is available, or if its presence would be undesirable for privacy reasons, 6 random bytes sourced from a cryptographically secure random number generator may be used for the node ID instead.
Version 3 and Version 5 UUIDs, the least common, use the MD5 and SHA1 hash functions respectively, plus a namespace, plus an already unique data value to produce a unique ID. This can be used to generate a UUID from a URL for example.
Version 4 UUIDs, are simply 128 bits of random data, with some bit-twiddling to identify the UUID version and variant.
UUID collisions are extremely unlikely to happen, especially not in a single application space.
UUID stands for Universally Unique IDentifier.
It's a 128-bit value used for a unique identification in software development. UUID is the same as GUID (Microsoft) and is part of the Distributed Computing Environment (DCE), standardized by the Open Software Foundation (OSF).
As mentioned, they are intended to have a high likelihood of uniqueness over space and time and are computationally difficult to guess. It's generation is based on the current timestamp and the unique property of the workstation that generated the UUID.
Image from https://segment.com/blog/a-brief-history-of-the-uuid/
It's a very long string of bits that is supposed to be unique now and forever, i.e. no possible clash with any other UUID produced by you or anybody else in the world .
The way it works is simply using current timestamp, and an internet related unique property of the computer that generated it (like the IP address, which ought to be unique at the moment you're connected to the internet; or the MAC address, which is more low level, a hard-wired ID for your network card) is part of the bit string.
Originally every network card in the world has its own unique MAC address, but in later generations, you can change the MAC address through software, so it's not as much reliable as a unique ID anymore.
It's a Universally Unique Identifier
A UUID is a 128-bit number that is used to uniquely identify some entity. Depending on the specific mechanisms used, a UUID is guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated. The UUID relies upon a combination of components to ensure uniqueness. A UUID contains a reference to the network address of the host that generated the UUID, a timestamp and a randomly generated component. Because the network address identifies a unique computer, and the timestamp is unique for each UUID generated from a particular host, those two components should sufficiently ensure uniqueness.
I just want to add that it is better to use usUUID (Static windows identifiers).
For example if a computer user that relies, on a third party software like a screen reader for blind or low vision users, the other software (in this case the screen reder) will play better with unique identifiers!
After all how happy will you be if someone moves your car after you know the place you parked it at!!!
A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier (GUID) is also used, typically in software created by Microsoft.
When generated according to the standard methods, UUIDs are for practical purposes unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicate.

Resources