Is this an acceptable usage of UUID or GUID? - uuid

I'm making a class that takes an object and assigns it an ID number that's unique. It's nothing special, it takes the object, assigns a number to it, and then increments the ID counter for the next object that will receive an ID number.
I want to call the member UUID, or GUID, standing for Universally unique identifier, or Globally unique identifier, because this language is very clear for what the member is.
However I looked up the term and Wikipedia says:
A universally unique identifier (UUID) is a 128-bit number used to
identify information in computer systems.
UUID
Which makes me think the terms have a very specific meaning, and that possibly my use of it to just mean a unique number given to each object is not proper usage. I'm thinking of using 32-bit int or the like.
Is this an incorrect use of UUID or GUID? I don't think it matters, but I'm writing in C++.

Yes, I think it would be a misuse of those terms. The word universally in UUID or globally in GUID means that the identifier is not only unique within your specific system, but within any system developed for any purpose, anywhere. A 32-bit integer that you simply increment for each new entity doesn't have that property. It may be unique within your system, but not universally. I would just call it Identifier or something similar.

You will find your answer here. in c++ you have a method for creating guid:
GUID gidReference;
HRESULT hCreateGuid = CoCreateGuid( &gidReference );
what it means is:
The CoCreateGuid function calls the RPC function UuidCreate, which
creates a GUID, a globally unique 128-bit integer. Use CoCreateGuid
when you need an absolutely unique number that you will use as a
persistent identifier in a distributed environment.To a very high
degree of certainty, this function returns a unique value – no other
invocation, on the same or any other system (networked or not), should
return the same value.
and when we dive in we see:
For security reasons, it is often desirable to keep ethernet addresses
on networks from becoming available outside a company or organization.
The UuidCreate function generates a UUID that cannot be traced to the
ethernet address of the computer on which it was generated. It also
cannot be associated with other UUIDs created on the same computer. If
you do not need this level of security, your application can use the
UuidCreateSequential function, which behaves exactly as the UuidCreate
function does on all other versions of the operating system.
I dim it important so you will not only know that you misused it, but also that you will know why, because in future days this information may be of important value to you.

Related

How to not expose base64 encoded UUIDs

I have a doubt regarding the exposure of internal database primary keys.
I have decided to use UUIDs in place of auto-increment longs (see here for details). This way, among other things, people cannot discover the relative size of my data or their growth over time.
Now, the UUID doesn't provide any internal information but it is not very URL friendly, although it is URL safe. Furthermore if long PKs shouldn't be exposed, then UUIDs shouldn't either.
Usually to make UUIDs more user friendly, people base64 encode them.
Example:
- UUID: 7b3149e7-bdab-4895-b659-a5f5b0d0
- base64: ezFJ572rSJW2WQAApfWw0A
My point is: anyone could still take those base64 string from the url and decode them in order to obtain the original UUID. This means that even in this case UUIDs would end up being exposed as well.
Should I use another type of encoding? Is out there something already known or should I create my custom encoding? If yes, should I follow any guidelines?
Thank you
On the first look to be able to provide a small tiny level of Secrecy to those Identifiers you can use one way Hash functions such as SHA2(which is a Cryptographic function and not Encoding). This will literally buy you no specific security advantage.
If you are relying only on Object Reference IDs for access control and try to make them secret then I suggest you think twice at your Access Control and Authorization Model.
It is good to have random/non-guessable/Collision Free Object Reference IDs, however If you are relying on Secrecy of Reference ID for security this is a big flaw (in Old OWASP Top10 this was referred as Direct Object Reference Identifier Issue and in OWASP 2017 this is referred as Broken Access Control Issue). You need to consider a Full AAA chain: Authentication,Authorization,Audit/Accountability for Access by relying on a Random unique Token with a short validity period, which later on can be used to decide on Authorization and Access levels of your system's to be tied with a subject and permit them to interact with the Objects that they are entitled with.
The reason you aren't supposed to expose PKs is that they may (a) leak information and (b) allow people to guess other values. Neither is true of UUIDs (at least v3/4/5), which is one of the main reasons to use them in the first place. The human factor you mention is why so many folks use base64 (or other) encoding; it's not for security.
That said, you should never rely on URL secrecy as security; there are far too many ways that URLs leak, and your users may even do it intentionally--but they'd be very upset if sending a link to their friend meant that friend had full access to their account.

How do I correctly use libsodium so that it is compatible between versions?

I'm planning on storing a bunch of records in a file, where each record is then signed with libsodium. However, I would like future versions of my program to be able to check signatures the current version has made, and ideally vice-versa.
For the current version of Sodium, signatures are made using the Ed25519 algorithm. I imagine that the default primitive can change in new versions of Sodium (otherwise libsodium wouldn't expose a way to choose a particular one, I think).
Should I...
Always use the default primitive (i.e. crypto_sign)
Use a specific primitive (i.e. crypto_sign_ed25519)
Do (1), but store the value of sodium_library_version_major() in the file (either in a dedicated 'sodium version' field or a general 'file format revision' field) and quit if the currently running version is lower
Do (3), but also store crypto_sign_primitive()
Do (4), but also store crypto_sign_bytes() and friends
...or should I do something else entirely?
My program will be written in C.
Let's first identify the set of possible problems and then try to solve it. We have some data (a record) and a signature. The signature can be computed with different algorithms. The program can evolve and change its behaviour, the libsodium can also (independently) evolve and change its behaviour. On the signature generation front we have:
crypto_sign(), which uses some default algorithm to produce signatures (at the moment of writing is just invokes crypto_sign_ed25519())
crypto_sign_ed25519(), which produces signatures based on specific ed25519 algorithm
I assume that for one particular algorithm given the same input data and the same key we'll always get the same result, as it's math and any deviation from this rule would make the library completely unusable.
Let's take a look at the two main options:
Using crypto_sign_ed25519() all the time and never changing this. Not that bad of an option, because it's simple and as long as crypto_sign_ed25519() exists in libsodium and is stable in its output you have nothing to worry about with stable fixed-size signature and zero management overhead for this. Of course, in future someone can discover some horrible problem with this algorithm and if you're not prepared to change the algorithm that could mean horrible problem for you.
Using crypto_sign(). With this we suddenly have a lot of problems, because the algorithm can change, so you must store some metadata along with the signature, which opens up a set of questions:
what to store?
should this metadata be record-level or file-level?
What do we have in mentioned functions for the second approach?
sodium_library_version_major() is a function to tell us the library API version. It's not directly related to changes in supported/default algorithms so it's of little use for our problems.
crypto_sign_primitive() is a function that returns a string identifying the algorithm used in crypto_sign(). That's a perfect match for what we need, because supposedly its output will change at exactly the time when the algorithm would change.
crypto_sign_bytes() is a function that returns the size of signature produced by crypto_sign() in bytes. That's useful for determining the amount of storage needed for the signature, but it can easily stay the same if algorithm changes, so it's not the metadata we need to store explicitly.
Now that we know what to store there is a question of processing that stored data. You need to get the algorithm name and use that to invoke matching verification function. Unfortunately, from what I see, libsodium itself doesn't provide any simple way to get the proper function given the algorithm name (like EVP_get_cipherbyname() or EVP_get_digestbyname() in openssl), so you need to make one yourself (which of course should fail for unknown name). And if you have to make one yourself maybe it would be even easier to store some numeric identifier instead of the name from library (more code though).
Now let's get back to file-level vs record-level. To solve that there are another two questions to ask — can you generate new signatures for old records at any given time (is that technically possible, is that allowed by policy) and do you need to append new records to old files?
If you can't generate new signatures for old records or you need to append new records and don't want the performance penalty of signature regeneration, then you don't have much choice and you need to:
have dynamic-size field for your signature
store the algorithm (dynamic string field or internal (for your application) ID) used to generate the signature along with the signature itself
If you can generate new signatures or especially if you don't need to append new records, then you can get away with simpler file-level approach when you store the algorithm used in a special file-level field and, if the signature algorithm changes, regenerate all signatures when saving the file (or use the old one when appending new records, that's also more of a compatibility policy question).
Other options? Well, what's so special about crypto_sign()? It's that its behaviour is not under your control, libsodium developers choose the algorithm for you (no doubt they choose good one), but if you have any versioning information in your file structure (not signature-specific, I mean) nothing prevents you from making your own particular choice and using one algorithm with one file version and another with another (with conversion code when needed, of course). Again, that's also based on the assumption that you can generate new signature and that's allowed by policy.
Which brings us back to the original two choices with question of whether it's worth the trouble of doing all that compared to just using crypto_sign_ed25519(). That mostly depends on your program life span, I'd probably say (just as an opinion) that if that's less than 5 years then it's easier to just use one particular algorithm. If it can easily be more than 10 years, then no, you really need to be able to survive algorithm (and probably even whole crypto library) changes.
Just use the high-level API.
Functions from the high-level API are not going to use a different algorithm without the major version of the library being bumped.
The only breaking change one can expect in libsodium 1.x.y is the removal of deprecated/undocumented functions (that don't even exist in current releases compiled with the --enable-minimal switch). Everything else will remain backward compatible.
New algorithms might be introduced in 1.x.y versions without high-level wrappers, and will be stabilized and exposed via a new high-level API in libsodium 2.
Therefore, do not bother calling crypto_sign_ed25519(). Just use crypto_sign().

Using dependant types to provide a compile type proofe that some integer is a valid row-id in database?

In my never-ending wonder in dependent type land a strange idea came into my head. I do a lot of data base programming and it would be nice if I could get rid of all those sanity-checking and validity-checking. One specially annoying case is those functions that accept an Integer and expect that to be a valid row-id of some certain table. A very silly example is:
function loadStudent(studentId: Integer) : Student
Supposing my language of choice supports dependent types in their full glory, would it be possible to utilize the type system to make loadStudent accept only valid studentId values :
function loadStudent(studentId : ValidRowId("students_table") ) : Student
If yes, how do I write a data constructor for ValidRowId type? All the examples I have seen thus far were pure (no IO involved).
Maybe I'm misunderstanding the question, but I don't see how it's possible without doing IO. How can you know that an id is valid without searching the database to see if there is a record with that id?
I suppose that you could, at program start up time, read all the current IDs into a table in memory and then do your checks against that. But you would have to somehow know if another user had added or deleted records after you created the table.
Okay, you could have all threads on all computers that access the database communicate with some central server that keeps this master list so that it would always be current. But we already have a central place that keeps track of all the IDs currently in use in the database: it's called "the database". What would be the advantage of going to a whole bunch of work to maintain a duplicate copy of a subset of the data on the database? It's unlikely you'd get much performance gain, and you'd create the possibility that bugs in your code, bad connections, etc, would result in the data getting out of sync.

Enum storage in Database field

Is it best to store the enum value or the enum name in a database table field?
For example should I store 'TJLeft' as a string or a it's equivalent value in the database?
Public Enum TextJustification
TJLeft
TJCenter
TJRight
End Enum
I'm currently leaning towards the name as some could come along later and explicitly assign a different value.
Edit -
Some of the enums are under my control but some are from third parties.
Another reason to store the numeric value is if you're using the [Flags] attribute on your enumeration in cases where you may want to allow for multiple enumeration values. Say, for example you want to let someone pick what days of the week that they're available for something...
[Flags]
public enum WeekDays
{
Monday=1,
Tuesday=2,
Wednesday=4,
Thursday=8,
Friday=16
}
In this case, you can store the numeric value in the db for any combination of the values (for example, 3 == Monday and Tuesday)
I always use lookup tables consisting of the fields
OID int (pk) as the numeric value
ProgID varchar (unique) as the value's identifier in C# (i.e. const name, or enum symbol)
ID nvarchar as the display value (UI)
dbscript lets me generate C# code from my lookup tables, so my code is always in sync with the database.
For your own enums, use the numeric values, for one simple reason: it allows for every part of enum functionality, out of the box, with no hassle. The only caveat is that in the enum definition, every member must be explicitly given a numeric value, which can never change (or, at least, not after you've made the first release). I always add a prominent comment to enums that get persisted to the database, so people don't go changing the constants.
Here are some reasons why numeric values are better than string identifiers:
It is the simplest way to represent the value
Database searching/sorting is faster
Lower database storage cost (which could be a serious issue for some applications)
You can add [Flags] to your enum and not break your code and/or existing data
For [Flags] stored in a string field:
Poorly normalized data
Could generate false-positive anomalies when doing matching (i.e., if you have members "Sales" and "RetailSales", merely doing a substring search for "Sales" will match on either type). This has to be constrained either by using a regex on word boundaries (finicky using databases, and slow), or constraining in the enum itself, which is nonstandard, error-prone, and very difficult to debug.
For string fields (either [Flags] or not), if the database is obfuscated, this field has to be handled, which greatly affects the ability and efficiency when doing searching/sorting code, as mentioned in the previous point
You can rename any of the members without breaking the database code and/or existing client data.
Less over-the-wire data transfer space/time needed
There are only two situations where using the member names in the database may be an advantage:
If you're doing a lot of data editing manually... but who does that? And if you are, there's a good chance you're not going to be using an enum anyway.
Third-party enums where they may not be so diligent as to maintain the numeric value constants. But I have to say, anyone releasing a decently-written API is overwhelmingly likely to be smart enough to keep the enum values constant. (The identifiers have to stay the same since changing them would break existing code.)
On lookup tables, which I strongly discourage because they are a one-way bullet train to a maintenance nightmare:
Adding [Flags] functionality requires the use of a junction table, which means more complicated queries (existing ones need to be rewritten), and added complexity. What about existing client data?
If the identifier is stored in the data table, what's the point of having a lookup table in the first place?
If the numeric value is stored in the data table, you gain nothing since you still have to look up the identifier from the lookup table. To make it easier, you could create a view... for every table that has an enum value in it. And then let's not even think about [Flags] enums.
Introducing any kind of synchronization between database and code is just asking for trouble. What about existing client data?
Store an ID (value) and a varchar name; this lets you query on either way. Searching on the name is reasonable if your IDs (values) may get out of sync later.
It is better to use the integer representation... If you have to change the Enum later (add more values etc) you can explicitly assign the integer value to the Enum value so that your Enum representation in code still matches what you have in the database.
It depends on how important performance is versus readability. Databases can index numeric values a lot easier than strings, which means you can get better performance without using as much memory. It would also reduce the amount of data going across the wire somewhat. On the other hand, when you look at a numeric value in your database which you then have to refer to a code file to translate, that can be annoying.
In most cases, I'd suggest using the value, but you will need to make sure you're explicitly setting those values so that if you add a value in the future it doesn't shift the references around.
As often it depends on many things:
Do you want to sort by the natural order of the enums? Use the numeric values.
Do you work directly in the database using a low level tool? use the name.
Do you have huge amounts of data and performance is an issue? use the number
For me the most important issue is most of the time maintainability:
If your enums change in the future, names will either match correctly of fail hard and loud. With numbers some one can add a enum instance, changing all the numbers of all enums, so you have to update all the tables where the enum is used. And almost no way to know if you missed a table.
if you are trying to get the values of enum stored in the database back, then try this
EnumValue = DirectCast([Enum].Parse(GetType(TextJustification), reader.Item("put_field_name_here").ToString), TextJustification)
tell me if it works for you

What is a UUID?

Well, what is one?
It's an identification number that will uniquely identify something. The idea being that id number will be universally unique. Thus, no two things should have the same uuid. In fact, if you were to generate 10 trillion uuids, there would be something along the lines of a .00000006 chance of two uuids being the same.
Standardized identifiers
UUIDs are defined in RFC 4122. They're Universally Unique IDentifiers, that can be generated without the use of a centralized authority. There are four major types of UUIDs which are used in slightly different scenarios. All UUIDs are 128 bits in length, but are commonly represented as 32 hexadecimal characters separated by four hyphens.
Version 1 UUIDs, the most common, combine a MAC address and a timestamp to produce sufficient uniqueness. In the event of multiple UUIDs being generated fast enough that the timestamp doesn't increment before the next generation, the timestamp is manually incremented by 1. If no MAC address is available, or if its presence would be undesirable for privacy reasons, 6 random bytes sourced from a cryptographically secure random number generator may be used for the node ID instead.
Version 3 and Version 5 UUIDs, the least common, use the MD5 and SHA1 hash functions respectively, plus a namespace, plus an already unique data value to produce a unique ID. This can be used to generate a UUID from a URL for example.
Version 4 UUIDs, are simply 128 bits of random data, with some bit-twiddling to identify the UUID version and variant.
UUID collisions are extremely unlikely to happen, especially not in a single application space.
UUID stands for Universally Unique IDentifier.
It's a 128-bit value used for a unique identification in software development. UUID is the same as GUID (Microsoft) and is part of the Distributed Computing Environment (DCE), standardized by the Open Software Foundation (OSF).
As mentioned, they are intended to have a high likelihood of uniqueness over space and time and are computationally difficult to guess. It's generation is based on the current timestamp and the unique property of the workstation that generated the UUID.
Image from https://segment.com/blog/a-brief-history-of-the-uuid/
It's a very long string of bits that is supposed to be unique now and forever, i.e. no possible clash with any other UUID produced by you or anybody else in the world .
The way it works is simply using current timestamp, and an internet related unique property of the computer that generated it (like the IP address, which ought to be unique at the moment you're connected to the internet; or the MAC address, which is more low level, a hard-wired ID for your network card) is part of the bit string.
Originally every network card in the world has its own unique MAC address, but in later generations, you can change the MAC address through software, so it's not as much reliable as a unique ID anymore.
It's a Universally Unique Identifier
A UUID is a 128-bit number that is used to uniquely identify some entity. Depending on the specific mechanisms used, a UUID is guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated. The UUID relies upon a combination of components to ensure uniqueness. A UUID contains a reference to the network address of the host that generated the UUID, a timestamp and a randomly generated component. Because the network address identifies a unique computer, and the timestamp is unique for each UUID generated from a particular host, those two components should sufficiently ensure uniqueness.
I just want to add that it is better to use usUUID (Static windows identifiers).
For example if a computer user that relies, on a third party software like a screen reader for blind or low vision users, the other software (in this case the screen reder) will play better with unique identifiers!
After all how happy will you be if someone moves your car after you know the place you parked it at!!!
A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier (GUID) is also used, typically in software created by Microsoft.
When generated according to the standard methods, UUIDs are for practical purposes unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicate.

Resources