What is a UUID? - uuid

Well, what is one?

It's an identification number that will uniquely identify something. The idea being that id number will be universally unique. Thus, no two things should have the same uuid. In fact, if you were to generate 10 trillion uuids, there would be something along the lines of a .00000006 chance of two uuids being the same.

Standardized identifiers
UUIDs are defined in RFC 4122. They're Universally Unique IDentifiers, that can be generated without the use of a centralized authority. There are four major types of UUIDs which are used in slightly different scenarios. All UUIDs are 128 bits in length, but are commonly represented as 32 hexadecimal characters separated by four hyphens.
Version 1 UUIDs, the most common, combine a MAC address and a timestamp to produce sufficient uniqueness. In the event of multiple UUIDs being generated fast enough that the timestamp doesn't increment before the next generation, the timestamp is manually incremented by 1. If no MAC address is available, or if its presence would be undesirable for privacy reasons, 6 random bytes sourced from a cryptographically secure random number generator may be used for the node ID instead.
Version 3 and Version 5 UUIDs, the least common, use the MD5 and SHA1 hash functions respectively, plus a namespace, plus an already unique data value to produce a unique ID. This can be used to generate a UUID from a URL for example.
Version 4 UUIDs, are simply 128 bits of random data, with some bit-twiddling to identify the UUID version and variant.
UUID collisions are extremely unlikely to happen, especially not in a single application space.

UUID stands for Universally Unique IDentifier.
It's a 128-bit value used for a unique identification in software development. UUID is the same as GUID (Microsoft) and is part of the Distributed Computing Environment (DCE), standardized by the Open Software Foundation (OSF).
As mentioned, they are intended to have a high likelihood of uniqueness over space and time and are computationally difficult to guess. It's generation is based on the current timestamp and the unique property of the workstation that generated the UUID.
Image from https://segment.com/blog/a-brief-history-of-the-uuid/

It's a very long string of bits that is supposed to be unique now and forever, i.e. no possible clash with any other UUID produced by you or anybody else in the world .
The way it works is simply using current timestamp, and an internet related unique property of the computer that generated it (like the IP address, which ought to be unique at the moment you're connected to the internet; or the MAC address, which is more low level, a hard-wired ID for your network card) is part of the bit string.
Originally every network card in the world has its own unique MAC address, but in later generations, you can change the MAC address through software, so it's not as much reliable as a unique ID anymore.

It's a Universally Unique Identifier

A UUID is a 128-bit number that is used to uniquely identify some entity. Depending on the specific mechanisms used, a UUID is guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated. The UUID relies upon a combination of components to ensure uniqueness. A UUID contains a reference to the network address of the host that generated the UUID, a timestamp and a randomly generated component. Because the network address identifies a unique computer, and the timestamp is unique for each UUID generated from a particular host, those two components should sufficiently ensure uniqueness.

I just want to add that it is better to use usUUID (Static windows identifiers).
For example if a computer user that relies, on a third party software like a screen reader for blind or low vision users, the other software (in this case the screen reder) will play better with unique identifiers!
After all how happy will you be if someone moves your car after you know the place you parked it at!!!

A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier (GUID) is also used, typically in software created by Microsoft.
When generated according to the standard methods, UUIDs are for practical purposes unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicate.

Related

Non-standard UUID generation (such as using counters)

I find myself often in a situation where I want to generate a very large number of UUIDs rapidly. It would be desirable to generate one UUID "properly," then manipulate it to generate the rest. For example, I might just "add one" to the lowest bits of the UUID.
For most of my applications, I control the consumers of the UUIDs sufficiently that I can guarantee that they accept these UUIDs without complication. I'm curious whether I could say the same if I did not control the consumers.
In particular, I'm interested in the following three algorithms:
Generate a UUID v5 with some unique string. For subsequent UUIDs, add 1 to the "node" section (last bytes)
Generate a UUID v4 from a pseudorandom source. Again, add 1 to the "node" section for subsequent UUIDs (in effect, using a tainted random number source with a strong correlation to the first number)
Generate a UUIDv5 with some unique string. Manually change the version number to v4, and begin adding 1 for subsequent UUIDs.
All of these are clearly not standard by the intent of the standard. However, are there any issues which would cause this to run afowl of the letter of the law, just just the spirit?

Is this an acceptable usage of UUID or GUID?

I'm making a class that takes an object and assigns it an ID number that's unique. It's nothing special, it takes the object, assigns a number to it, and then increments the ID counter for the next object that will receive an ID number.
I want to call the member UUID, or GUID, standing for Universally unique identifier, or Globally unique identifier, because this language is very clear for what the member is.
However I looked up the term and Wikipedia says:
A universally unique identifier (UUID) is a 128-bit number used to
identify information in computer systems.
UUID
Which makes me think the terms have a very specific meaning, and that possibly my use of it to just mean a unique number given to each object is not proper usage. I'm thinking of using 32-bit int or the like.
Is this an incorrect use of UUID or GUID? I don't think it matters, but I'm writing in C++.
Yes, I think it would be a misuse of those terms. The word universally in UUID or globally in GUID means that the identifier is not only unique within your specific system, but within any system developed for any purpose, anywhere. A 32-bit integer that you simply increment for each new entity doesn't have that property. It may be unique within your system, but not universally. I would just call it Identifier or something similar.
You will find your answer here. in c++ you have a method for creating guid:
GUID gidReference;
HRESULT hCreateGuid = CoCreateGuid( &gidReference );
what it means is:
The CoCreateGuid function calls the RPC function UuidCreate, which
creates a GUID, a globally unique 128-bit integer. Use CoCreateGuid
when you need an absolutely unique number that you will use as a
persistent identifier in a distributed environment.To a very high
degree of certainty, this function returns a unique value – no other
invocation, on the same or any other system (networked or not), should
return the same value.
and when we dive in we see:
For security reasons, it is often desirable to keep ethernet addresses
on networks from becoming available outside a company or organization.
The UuidCreate function generates a UUID that cannot be traced to the
ethernet address of the computer on which it was generated. It also
cannot be associated with other UUIDs created on the same computer. If
you do not need this level of security, your application can use the
UuidCreateSequential function, which behaves exactly as the UuidCreate
function does on all other versions of the operating system.
I dim it important so you will not only know that you misused it, but also that you will know why, because in future days this information may be of important value to you.

Java. UUID, how does it works

Does the java.util.UUID is unique in each file?
For example:
We have file IDFile1.java, generating here a random id, using UUID.randomUUID().toString().
And we have file2 with name IDFile2.java, which generates another id.
Will this two file IDs collide with each other?
Is there any way to "turn back" used ID, generated from java.util.UUID, that will mean, this ID could be used again?
The purpose of functions like randomUUID is to munge together various pieces of information which are likely to have some amount of randomness such that two UUIDs generated at different times or in different places would have an extremely small likelihood of matching unless all sources of potential randomness happened to yield identical results. No effort is made to keep track of which UUIDs have or have not been issued; instead, the goal is to have enough randomness that the probability of an unintentional match will be small relative to e.g. the probability of a computer being smashed to a million pieces by a meteor strike.
Note that the system may use "number of UUIDs issued" as part of its UUID calculation, but that would only be one of many factors that go into it. The purpose of such a counter isn't to allow one to "go back" to a previous UUID, but rather to ensure that if e.g. two UUIDs are requested nearly simultaneously without any source of randomness having become available between them, the two requests will yield different values.
UUIDs are unrelated to what file generates it. It doesn't matter which file generates them they more than likely will not be the same.
For question 2 the way UUIDs are generated doesn't really allow for them to be regenerated in any meaningful way. They are usually generated based on some info from your computer, the current time and other stuff. The Java algorithm uses a cryptographically secure random number generator and are known as type 4 UUIDs.

Which UUID version to use?

Which version of the UUID should you use? I saw a lot of threads explaining what each version entails, but I am having trouble figuring out what's best for what applications.
There are two different ways of generating a UUID.
If you just need a unique ID, you want a version 1 or version 4.
Version 1: This generates a unique ID based on a network card MAC address and current time. If any of these things is sensitive in any way, don't use this. The advantage of this version is that, while looking at a list of UUIDs generated by machines you trust, you can easily know whether many UUIDs got generated by the same machine, or infer some time relationship between them.
Version 4: These are generated from random (or pseudo-random) numbers. If you just need to generate a UUID, this is probably what you want. The advantage of this version is that when you're debugging and looking at a long list of information matched with UUIDs, it's quicker to spot matches.
If you need to generate reproducible UUIDs from given names, you want a version 3 or version 5. If you are interacting with other systems, this choice was already made and you should check with version and namespaces they use.
Version 3: This generates a unique ID from an MD5 hash of a namespace and name. If are dealing with very strict resource requirements (e.g. a very busy Arduino board), use this.
Version 5: This generates a unique ID from an SHA-1 hash of a namespace and name. This is the more secure and generally recommended version.
If you want a random number, use a random number library. If you want a unique identifier with effectively 0.00...many more 0s here...001% chance of collision, you should use UUIDv1. See Nick's post for UUIDv3 and v5.
UUIDv1 is NOT secure. It isn't meant to be. It is meant to be UNIQUE, not un-guessable. UUIDv1 uses the current timestamp, plus a machine identifier, plus some random-ish stuff to make a number that will never be generated by that algorithm again. This is appropriate for a transaction ID (even if everyone is doing millions of transactions/s).
To be honest, I don't understand why UUIDv4 exists... from reading RFC4122, it looks like that version does NOT eliminate possibility of collisions. It is just a random number generator. If that is true, than you have a very GOOD chance of two machines in the world eventually creating the same "UUID"v4 (quotes because there isn't a mechanism for guaranteeing U.niversal U.niqueness). In that situation, I don't think that algorithm belongs in a RFC describing methods for generating unique values. It would belong in a RFC about generating randomness. For a set of random numbers:
chance_of_collision = 1 - (set_size! / (set_size - tries)!) / (set_size ^ tries)
That's a very general question. One answer is: "it depends what kind of UUID you wish to generate". But a better one is this: "Well, before I answer, can you tell us why you need to code up your own UUID generation algorithm instead of calling the UUID generation functionality that most modern operating systems provide?"
Doing that is easier and safer, and since you probably don't need to generate your own, why bother coding up an implementation? In that case, the answer becomes use whatever your O/S, programming language or framework provides. For example, in Windows, there is CoCreateGuid or UuidCreate or one of the various wrappers available from the numerous frameworks in use. In Linux there is uuid_generate.
If you, for some reason, absolutely need to generate your own, then at least have the good sense to stay away from generating v1 and v2 UUIDs. It's tricky to get those right. Stick, instead, to v3, v4 or v5 UUIDs.
Update:
In a comment, you mention that you are using Python and link to this. Looking through the interface provided, the easiest option for you would be to generate a v4 UUID (that is, one created from random data) by calling uuid.uuid4().
If you have some data that you need to (or can) hash to generate a UUID from, then you can use either v3 (which relies on MD5) or v5 (which relies on SHA1). Generating a v3 or v5 UUID is simple: first pick the UUID type you want to generate (you should probably choose v5) and then pick the appropriate namespace and call the function with the data you want to use to generate the UUID from. For example, if you are hashing a URL you would use NAMESPACE_URL:
uuid.uuid3(uuid.NAMESPACE_URL, 'https://ripple.com')
Please note that this UUID will be different than the v5 UUID for the same URL, which is generated like this:
uuid.uuid5(uuid.NAMESPACE_URL, 'https://ripple.com')
A nice property of v3 and v5 URLs is that they should be interoperable between implementations. In other words, if two different systems are using an implementation that complies with RFC4122, they will (or at least should) both generate the same UUID if all other things are equal (i.e. generating the same version UUID, with the same namespace and the same data). This property can be very helpful in some situations (especially in content-addressible storage scenarios), but perhaps not in your particular case.
Postgres documentation describes the differences between UUIDs. A couple of them:
V3:
uuid_generate_v3(namespace uuid, name text) - This function generates a version 3 UUID in the given namespace using the specified input name.
V4:
uuid_generate_v4 - This function generates a version 4 UUID, which is derived entirely from random numbers.
Since it's not mentioned yet: you can use uuidv1 if you want to be able to sort your entities by creation time without a separate, explicit timestamp. While that's not 100 % precise and in many cases not the best way to go (due to the lack of explicity), it comes handy in some scenarios, e.g. when you're working with a Cassanda database.
Version 1: UUIDs using a timestamp and monotonic counter.
Version 3: UUIDs based on the MD5 hash of some data.
Version 4: UUIDs with random data.
Version 5: UUIDs based on the SHA1 hash of some data.
Version 6: UUIDs using a timestamp and monotonic counter.
Version 7: UUIDs using a Unix timestamp.
Version 8: UUIDs using user-defined data.
Read more at Rust documentation.

Is it possible to generate duplicate UUIDs in the same millisecond?

Is it possible to create two duplicate UUIDs one after the other? I'm unfamiliar with how UUIDs are generated, but I would guess that if you created two separate UUIDs from the same MAC address in the same millisecond, then they would be exactly the same. Is this true?
I guess I'm asking two questions in one. I'm very interested to know what parameters are used to generate a random UUID. I'm guessing its more than just timestamp and MAC address.
In the Python UUID package, it takes the timestamp and generates a random number random.randrange(1<<14L) for UUIDv1, so you are taking a nanosecond timestamp plus a random number from 1 to 16384, so... My guess is it would be possible but highly unlikely.
If you are worried about this being an issue, you always have UUIDv3, UUIDv4, and my choice, UUIDv5.

Resources