Need a unique identifier that does not change on software update - uuid

I have used GUID as an unique identifier and hashed it so that it generates a unique number. But a software update has changed the GUID and failed to produce the same unique number for the machine because hashing the new GUID produced another unique number different from the original one.
Now, I need a unique number which i can retrieve programmatically and does not change on software update.

Actually you can create a version 5 Guid. This type of guid uses sha-1 to hash a name with a namespace. As long as you are always using the same name and namespace your guid will always be the same.
This document describes how the current 5 different types of guids are generated.
See section 4.3. Algorithm for Creating a Name-Based UUID.

Related

Ensuring unique generated keys across tables in more than one machine

I want to use this Go package https://github.com/bwmarrin/snowflake to generate primary int64 keys for my tables in Postgresql. If my application server is running on at least two machines how could I prevent duplicate keys from being generated?
So snowflake provides 63 bit integer stored in an int64. According to the documentation you can generate 4096 unique IDs every millisecond, per Node ID. Let's take the default implementation.That is 4096 * 1023 = 40961023 id's per millisecond and if you calculate in one second you can generate billions of unique id across multiple nodes and will be very rare to get conflict.
So i think if you pass a node id in env variable of the server and generate id's based upon that you should be safe.
It also helps to add some prefix to the id based upon the entity or domain so that you get more entropy which will reduce the conflicts even less.

How can I prevent SQL to allocate same uniqueidentifier on the same table? [duplicate]

I'm using Guids as primary keys in my database and was wondering if it is ever possible that a duplicate Guid might be generated. Are Guids guaranteed to be unique?
While each generated GUID is not guaranteed to be unique, the total number of unique keys (2^128 or 3.4×10^38) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×10^22 stars; every star could then have 6.8×10^15 universally unique GUIDs.
From Wikipedia.
While each generated GUID is not guaranteed to be unique, the total
number of unique keys (2128 or 3.4×1038) is so large that the
probability of the same number being generated twice is very small.
You can check more info on that here.
A possible solution to avoid duplicate guids (if you still want one) is to use the UNIQUE constraint.
They are guaranteed to be unique if you use NEWSEQUENTIALID() to generate one. Still, Microsoft claims[1] that it will be unique and that
no other computer in the world will generate a duplicate of that GUID
value.
Wikipedia states[2] that
Microsoft added a function to the Transact-SQL language called
NEWSEQUENTIALID(),[8] which generates GUIDs that are guaranteed to
increase in value, but may start with a lower number (still guaranteed
unique) when the server restarts
In short: in practice, they're going to be unique.
[1] http://msdn.microsoft.com/en-us/library/ms190215(SQL.105).aspx
[2] http://en.wikipedia.org/wiki/Globally_unique_identifier#Sequential_algorithms

Database Design and flattening composite primary keys

I'm building a claims database with the above schema so far. Three three-part key on tblPatient is to uniquely identify individual's claim for a certain problemX. The same patientID can appear in tblPatient as long as the admission/discharge dates are different.
This database is also concerned (not pictured) with claims that are NOT associated with problemX. These claims will be identified with another three-part key of patientID, claimsFromDate,claimsThroughDate. So, tblPatient.admissionDate and tblPatient.DischargeDate will not have to be equal to claimsFromDate and claimsThroughDate and if they are equal it's happenstance.
Since tblPatient.patientID is repeated more than once (for those that have more than one visit), I cannot simply copy it to another table without breaking unique constraints for a primary key. I need to be able to relate patientID with the rest of the claims. Do I need to redesign my tblPatient to only have one field as a primary key, or include the already-existing three-part key and just roll with it?
First of all: In a perfect, puristic, database-world you would split your patients database into two - one containing the patients, and another called "PatientClaims" or such. If you do not care for patient-specifc data apart from patient-ID, then you should at least rename the table.
That same puristic approach would also tell you the primary key is defined as "The only set of data that uniquely identifies the row" - which is likely your three fields. (I could assume that you could leave out DischargeDate, but only if you are sure that the logic for doing so is sound)
Seeing, however, that you have to work with:
1. A three-part key, which is never pleasent
2. Having that key-combo across two tables, and likely having to join them
I would suggest simply defining a new key - like "ClaimID" using whatever autoincrement functions your database of choice has available.
Unrelated note: Your whole state/county double-table set looks kinda weird - but that might just be me not understanding what you are modelling.

Create Guid PK in Database VS. in Code

What is "better" generating Primary Keys on the database or generating them in application code, specifically when using GUID/UniqueIdentifier datatype for the keys.
I have read up on the difference between using Guid's and int data types, and it sounds like Guids are feasable for so called "generating offline".
E.g.
instead of having a NEWID() contstraint in the database, In one project (where we are using Entity Framework) we use in the application code Guid.NewGuid() to generate the PK when inserting data.
Is this a bad approach ?
My concerns are:
Database indexes: Database performance because Id's might not be sequential
The one in 64 billion chance that the key is already used. (considering that the application will not be enormous but may need room to grow)
Perhaps there are other disadvantages ?
well actually, GUID could be sequential from SQL Server 2005. There is a function in named NEWSEQUENTIALID() , link here
Creates a GUID that is greater than any GUID previously generated by
this function on a specified computer since Windows was started. After
restarting Windows, the GUID can start again from a lower range, but
is still globally unique. When a GUID column is used as a row
identifier, using NEWSEQUENTIALID can be faster than using the NEWID
function. This is because the NEWID function causes random activity
and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to
completely fill the data and index pages.

Primary key vs. RRN

What's the difference between a primary key and an RRN?
A Primary key uniquely and unambiguously identifies a given record (in a database table/view) or a given row (in a text file). Although it can be convenient for the primary key to be based on a single Field (a single "column"), it is also possible for a primary to be based on several Fields/Columns.
RRN is an acronym which can either be understood as "Record Row Number" or "Relative Row Number". The Record Row Number is generally understood to be a number, typically (but not necessarily) assigned by simple increment (based on the value of the previous such RRN assigned) which is "added" to the other Fields/Column of a particular record type. Many DBMSes supply features for the support of such "auto-incremented" or more generally automatically assigned RRNs.
Defined as above, an RRN can be used as a Primary Key.
There are many advantages -and drawbacks- to having a [semantically void] RRN as opposed to a Primary key based on [one or several] attribute (Field or Column) values of the record. This is probably discussed in other SO question; here are a few of the most common arguments:
A primary key may be modified, an RRN is "immutable".
For example if the primary key is a Social Security Number (SSN), a record may at some time be updated because the SSN was originally input with a typo error. When/if that happens, any related records which uses this SSN to refer to the updated record need to also be updated. Had these related records used the [non significative] RRN, they would be immune to possible changes of the SSN value.
When there is no "natural" Primary key based on a single column, it may be more convenient to use the RRN
RRNs are typically shorter
Related tables and lists which refer to original records by way of a non-RRN primary key, somehow duplicate the underlying information. This can be both an advantage and a drawback: one can know the underlying field value without having to look it up in the original table: good if you want the related table to contain such info, bad if you don't (ex: Social Security Number can be considered sensitive etc.)
RRNs are guaranteed to be unique (short of a bug with the RRN-generation logic), whereby attribute-value based keys have a propensity to turning out non-unique ("oops! We thought we could use phone number as a house ID; dang!, the phone company started reusing numbers...")
A Primary key identifies a row in a table.
An RRN (I presume you mean Relative Record number) also identifies a row by position in a subset (ie. a query result).
I've found it useful if you need to extrapolate a sequential order for a set of records not related to the primary key.

Resources