Java. UUID, how does it works - uuid

Does the java.util.UUID is unique in each file?
For example:
We have file IDFile1.java, generating here a random id, using UUID.randomUUID().toString().
And we have file2 with name IDFile2.java, which generates another id.
Will this two file IDs collide with each other?
Is there any way to "turn back" used ID, generated from java.util.UUID, that will mean, this ID could be used again?

The purpose of functions like randomUUID is to munge together various pieces of information which are likely to have some amount of randomness such that two UUIDs generated at different times or in different places would have an extremely small likelihood of matching unless all sources of potential randomness happened to yield identical results. No effort is made to keep track of which UUIDs have or have not been issued; instead, the goal is to have enough randomness that the probability of an unintentional match will be small relative to e.g. the probability of a computer being smashed to a million pieces by a meteor strike.
Note that the system may use "number of UUIDs issued" as part of its UUID calculation, but that would only be one of many factors that go into it. The purpose of such a counter isn't to allow one to "go back" to a previous UUID, but rather to ensure that if e.g. two UUIDs are requested nearly simultaneously without any source of randomness having become available between them, the two requests will yield different values.

UUIDs are unrelated to what file generates it. It doesn't matter which file generates them they more than likely will not be the same.
For question 2 the way UUIDs are generated doesn't really allow for them to be regenerated in any meaningful way. They are usually generated based on some info from your computer, the current time and other stuff. The Java algorithm uses a cryptographically secure random number generator and are known as type 4 UUIDs.

Related

Is it safe to use UUIDs generated from 2 different systems without a chance of collision?

We have an old system that generated some UUIDs. We have more records that need UUID but can't use the old system to generate them, so we will need to generate them elsewhere. This immediately struck me as not a good idea and have been searching for an answer but haven't found this exact question. There would be no check to make sure the UUID wasn't generated already in the old system. The UUID would just get populated for records that don't have one. Safe?
If you use UUID V1, all values generated are guaranteed to be unique. However, most folks avoid V1 because it leaks data about the system that generated it (specially, the MAC address) and the exact time, neither of which is ideal in many contexts.
Most folks use UUID V4, which is statistically unique. While it is theoretically possible for the same value to be generated twice, the odds of it ever actually happening before the heat death of the universe is approximately zero, which is good enough for any practical purpose.
UUID V3/V5 are used for when you want predictable values, which doesn’t sound like a good fit for your needs.

Non-standard UUID generation (such as using counters)

I find myself often in a situation where I want to generate a very large number of UUIDs rapidly. It would be desirable to generate one UUID "properly," then manipulate it to generate the rest. For example, I might just "add one" to the lowest bits of the UUID.
For most of my applications, I control the consumers of the UUIDs sufficiently that I can guarantee that they accept these UUIDs without complication. I'm curious whether I could say the same if I did not control the consumers.
In particular, I'm interested in the following three algorithms:
Generate a UUID v5 with some unique string. For subsequent UUIDs, add 1 to the "node" section (last bytes)
Generate a UUID v4 from a pseudorandom source. Again, add 1 to the "node" section for subsequent UUIDs (in effect, using a tainted random number source with a strong correlation to the first number)
Generate a UUIDv5 with some unique string. Manually change the version number to v4, and begin adding 1 for subsequent UUIDs.
All of these are clearly not standard by the intent of the standard. However, are there any issues which would cause this to run afowl of the letter of the law, just just the spirit?

Should I use UUID or something else?

Very often I have a task, where I need to collect an object and I need to know it's ID before saving to DB (PostgreSQL).
I can do this with UUID but it has lots of disadvantages:
- less perfomance when selecting or including
- less perfomance with joins
- need more space
So question is: How can I generate an ID for the object beforehand and minimize UUID negative consequences?
We faced this issue with a project. I ran some tests (about 4M rows, if I remember right) which indicated that uuids didn't really hit PG's performance too badly compared with ints. Having used uuids as primary keys for some time now I wouldn't hesitate to do so again. Although, I must add the caveat we have yet to see how this performs in production on a large scale.
Check this out: http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database
The nice thing about using uuids is you never have to worry about clashes. Not nice thing: they are a bit cumbersome if you're manually entering a query for a test.
If you end up selecting based on a large list of uuids use this trick: https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/
Hope this helps,
Adam.
You can use any uuid generator in whatever programming language to do this. I would suggest using the uuid type in PostgreSQL to avoid the need for too much overhead regarding space or joining. PostgreSQL also does not include a way of generating these so you have to generate them first.
A major issue you may run into is that with numeric ids, a number of things are relatively painless that become a bigger issue with uuids. These include:
Typing in identifiers
Selecting a series of records inserted at a similar time (because numeric ids are sequential).
However if you use the UUID type in PostgreSQL, selection and join performance should not be too bad. And how you generate the UUIDs is up to you as a programmer
Of course the UUID should works in low performance than integer, the question is on which volume of data. To be honestly, 4M data is too small to say whether it will be performance issue, and of course, if the requirements image that the data volume is still less than 4M, that is OK.
In the document https://rclayton.silvrback.com/do-you-really-need-a-uuid-guid , it suggest better on how and when to use UUID

Is it possible to generate duplicate UUIDs in the same millisecond?

Is it possible to create two duplicate UUIDs one after the other? I'm unfamiliar with how UUIDs are generated, but I would guess that if you created two separate UUIDs from the same MAC address in the same millisecond, then they would be exactly the same. Is this true?
I guess I'm asking two questions in one. I'm very interested to know what parameters are used to generate a random UUID. I'm guessing its more than just timestamp and MAC address.
In the Python UUID package, it takes the timestamp and generates a random number random.randrange(1<<14L) for UUIDv1, so you are taking a nanosecond timestamp plus a random number from 1 to 16384, so... My guess is it would be possible but highly unlikely.
If you are worried about this being an issue, you always have UUIDv3, UUIDv4, and my choice, UUIDv5.

IDs for Information on More Than One DB/Server

I'm working on a project that I want to have be as flexible and scalable as possible from the beginning. A problem I'm concerned about is one best described by Joshua Schacter in Founders at Work, who noted it as one detail he wish he would've planned for ahead of time.
Scaling past one machine, one database, is very challenging, even with replication. The tools that are there are not quite right.
For example, when you add things to a table and it numbers them, that means you can't have a second machine also adding to them because the numbers will collide. So what do you do? You have to come up with some completely different way to do it.
Do you have a central server that hands out number sets, or do you come up with something that's not numbers? Do you use random numbers and hope they never collide? Whatever it is, auto-assigned IDs just don't fly.
Has anyone here faced this problem? What are ways to move beyond auto-incremented IDs, or is there a way to have them scale with multiple servers?
Use GUID/UUID (globally/universally unique identifier). In theory it's guaranteed to be unique across multiple machines.
GUIDs, your chances of collision are astronomically low.
It's also possible to have (what we called) SmartGUIDs (usually called COMB GUIDS - see this analysis, particularly page 7) where you can encode a timestamp within the GUID, so you get record creation date information "for free" - so you can save a timestamp column for record creation datetime - which gets back some of what you lost on moving from 32-bit integer to 128-bit GUID. These can also be guaranteed to be monotonic, unlike regular GUIDs, which can be useful for clustered indexes and for sorting.
You can also use composite keys with some kind of server/db ID with a regular auto-increment identity or auto-number.

Resources