I have several tables whose only unique data is a uniqueidentifier (a Guid) column. Because guids are non-sequential (and they're client-side generated so I can't use newsequentialid()), I have made a non-primary, non-clustered index on this ID field rather than giving the tables a clustered primary key.
I'm wondering what the performance implications are for this approach. I've seen some people suggest that tables should have an auto-incrementing ("identity") int as a clustered primary key even if it doesn't have any meaning, as it means that the database engine itself can use that value to quickly look up a row instead of having to use a bookmark.
My database is merge-replicated across a bunch of servers, so I've shied away from identity int columns as they're a bit hairy to get right in replication.
What are your thoughts? Should tables have primary keys? Or is it ok to not have any clustered indexes if there are no sensible columns to index that way?
When dealing with indexes, you have to determine what your table is going to be used for. If you are primarily inserting 1000 rows a second and not doing any querying, then a clustered index is a hit to performance. If you are doing 1000 queries a second, then not having an index will lead to very bad performance. The best thing to do when trying to tune queries/indexes is to use the Query Plan Analyzer and SQL Profiler in SQL Server. This will show you where you are running into costly table scans or other performance blockers.
As for the GUID vs ID argument, you can find people online that swear by both. I have always been taught to use GUIDs unless I have a really good reason not to. Jeff has a good post that talks about the reasons for using GUIDs: https://blog.codinghorror.com/primary-keys-ids-versus-guids/.
As with most anything development related, if you are looking to improve performance there is not one, single right answer. It really depends on what you are trying to accomplish and how you are implementing the solution. The only true answer is to test, test, and test again against performance metrics to ensure that you are meeting your goals.
[Edit]
#Matt, after doing some more research on the GUID/ID debate I came across this post. Like I mentioned before, there is not a true right or wrong answer. It depends on your specific implementation needs. But these are some pretty valid reasons to use GUIDs as the primary key:
For example, there is an issue known as a "hotspot", where certain pages of data in a table are under relatively high currency contention. Basically, what happens is most of the traffic on a table (and hence page-level locks) occurs on a small area of the table, towards the end. New records will always go to this hotspot, because IDENTITY is a sequential number generator. These inserts are troublesome because they require Exlusive page lock on the page they are added to (the hotspot). This effectively serializes all inserts to a table thanks to the page locking mechanism. NewID() on the other hand does not suffer from hotspots. Values generated using the NewID() function are only sequential for short bursts of inserts (where the function is being called very quickly, such as during a multi-row insert), which causes the inserted rows to spread randomly throughout the table's data pages instead of all at the end - thus eliminating a hotspot from inserts.
Also, because the inserts are randomly distributed, the chance of page splits is greatly reduced. While a page split here and there isnt too bad, the effects do add up quickly. With IDENTITY, page Fill Factor is pretty useless as a tuning mechanism and might as well be set to 100% - rows will never be inserted in any page but the last one. With NewID(), you can actually make use of Fill Factor as a performance-enabling tool. You can set Fill Factor to a level that approximates estimated volume growth between index rebuilds, and then schedule the rebuilds during off-peak hours using dbcc reindex. This effectively delays the performance hits of page splits until off-peak times.
If you even think you might need to enable replication for the table in question - then you might as well make the PK a uniqueidentifier and flag the guid field as ROWGUIDCOL. Replication will require a uniquely valued guid field with this attribute, and it will add one if none exists. If a suitable field exists, then it will just use the one thats there.
Yet another huge benefit for using GUIDs for PKs is the fact that the value is indeed guaranteed unique - not just among all values generated by this server, but all values generated by all computers - whether it be your db server, web server, app server, or client machine. Pretty much every modern language has the capability of generating a valid guid now - in .NET you can use System.Guid.NewGuid. This is VERY handy when dealing with cached master-detail datasets in particular. You dont have to employ crazy temporary keying schemes just to relate your records together before they are committed. You just fetch a perfectly valid new Guid from the operating system for each new record's permanent key value at the time the record is created.
http://forums.asp.net/t/264350.aspx
The primary key serves three purposes:
indicates that the column(s) should be unique
indicates that the column(s) should be non-null
document the intent that this is the unique identifier of the row
The first two can be specified in lots of ways, as you have already done.
The third reason is good:
for humans, so they can easily see your intent
for the computer, so a program that might compare or otherwise process your table can query the database for the table's primary key.
A primary key doesn't have to be an auto-incrementing number field, so I would say that it's a good idea to specify your guid column as the primary key.
Just jumping in, because Matt's baited me a bit.
You need to understand that although a clustered index is put on the primary key of a table by default, that the two concepts are separate and should be considered separately. A CIX indicates the way that the data is stored and referred to by NCIXs, whereas the PK provides a uniqueness for each row to satisfy the LOGICAL requirements of a table.
A table without a CIX is just a Heap. A table without a PK is often considered "not a table". It's best to get an understanding of both the PK and CIX concepts separately so that you can make sensible decisions in database design.
Rob
Nobody answered actual question: what are pluses/minuses of a table with NO PK NOR a CLUSTERED index.
In my opinion, if you optimize for faster inserts (especially incremental bulk-insert, e.g. when you bulk load data into a non-empty table), such a table: with NO clustered index, NO constraints, NO Foreign Keys, NO Defaults and NO Primary Key, in a database with Simple Recovery Model, is the best. Now, if you ever want to query this table (as opposed to scanning it in its entirety) you may want to add a non-clustered non-unique indexes as needed but keep them to the minimum.
I too have always heard having an auto-incrementing int is good for performance even if you don't actually use it.
A Primary Key needn't be an autoincrementing field, in many cases this just means you are complicating your table structure.
Instead, a Primary Key should be the minimum collection of attributes (note that most DBMS will allow a composite primary key) that uniquely identifies a tuple.
In technical terms, it should be the field that every other field in the tuple is fully functionally dependent upon. (If it isn't you might need to normalise).
In practice, performance issues may mean that you merge tables, and use an incrementing field, but I seem to recall something about premature optimisation being evil...
Since you are doing replication, your are correct identities are something to stear clear of. I would make your GUID a primary key but nonclustered since you can't use newsequentialid. That stikes me as your best course. If you don't make it a PK but put a unique index on it, sooner or later that may cause people who maintain the system to not understand the FK relationships properly introducing bugs.
Related
We have a very large database and have been using shards which we want to get away from. The shards work by everytime a table gets really big, we start a new table that has the same schema as the previous table and keep a number in another table that helps us find which table the data is in. This is a cumbersome manual process and means we have data spread out over N different tables all with the same schema.
The idea we are trying for is to eliminate this need for sharding by using indexes. Our data lookup queries do not use unique keys and many records are returned that have the same values across columns.
The following illustrates many of our lookup selects for a particular table, the fields with the * indicate that field may or may not be in the select.
where clause: scheduled_test, *script, *label, *error_message
group/order: messenger_id, timeslice, script, label, error_message, step_sequence, *adapter_type
My thought is that I would not want to create an index with all of these 11 fields. I instead picked 3 of the ones that seemed to be used more commonly including the one that is always in the where clause. I had read that it is advisable not to have too wide an index with too many fields. I also had heard that the optimizer will use the indexed fields first and that it is not uncommon to have non unique indexes even though MSDN states to the effect that unique indexes is the big advantage. It's just not how our data is designed. I realize SQL will add something to the index to make it unique, but that doesn't seem to matter for our purposes.
When I look at the execution plan in sql server management studio on a query that is similar to what we might run, it says "clustered index seek cost 100%", but it is using the clustered index that I created so I am hoping this is better than the default clustered index that is just the generated primary key (previously how the table was defined). I am hoping that what I have here is as good or better than our sharding method and will eliminate the need for the shards.
We do insert alot of data into the tables all at once, but these rows all have the same data values across many columns and I think they would even tend to get inserted at the end as well. These inserts don't share values with older data and if the index is just 3 columns hopefully that would not be a very big hit on the inserts.
Does what I am saying seem reasonable or what else should I look into or consider ? Thanks alot, I am not that familiar with these types of indexing issues but have been looking on various websites and experimenting.
Generally, the narrower the clustered index the better as the clustering key of the clustered index will be added to all non-clustered indexes, making them less efficient.
SQL server will add a uniquifier to non-unique clustered indexes, making them (and all non-clustered indexes) even wider still.
If the space used by these indexes is not an issue for you, then you should consider whether the value of the clustered index key is ever increasing (or decreasing) as if it isn't, you will get page splits and fragmentation which will definitely hurt your inserts.
It's probably worth setting this up in a test system if you can to examine the impact different indexing strategies have on your normal queries.
Is there a benefit to having a single column primary key vs a composite primary key?
I have a table that consists of two id columns which together make up the primary key.
Are there any disadvantages to this? Is there a compelling reason for me to throw in a third column that would be unique on it's own?
Database Normalization nuts will tell you one thing.
I'm just going to offer my own opinion of what i've learned over the years..I stick an AutoIncrementing ID field to Every ($&(##$)# one of my tables. It makes life a million times easier in the long run to be able to single out with impunity a single row.
This is from a "down in the trenches" developer.
Single column keys are simple to write, simple to maintain, and simple to understand.
If you're going to have a huge number of rows - billions? - maybe saving a byte here and there will help.
But if you're not looking at extreme cases, optimizing for "simple" is often the best way to go.
If you are a coder and the database is nothing to you but a glorified object-store, then sure, by all means inject surrogate keys willy nilly. In fact go one better and just delegate all DB schema design and DB interaction to your favourite ORM and be done with it. Indeed, when I want a small or medium scale object-store, that's exactly what I do.
If you are approaching an information systems or information management problem, then it is a completely different story. When you start dealing with 10's (or more likely 100's) of millions of dirty records integrated from multiple sources, several or all of which are not under your control; at that point the seductive lure of an easy answer to the problems of 'identity' is a trap.
Yes you sometimes still introduce a surrogate key internally to allow for concise FK relationships and improved cache efficiency on covering indices; but, you gain those benefits at the cost of substantial pain at managing the natural-key/surrogate-key relationship.
In this case it will be important to make sure you don't allow the surrogate key to leak. Your public API's at the business-logic layer should use the natural-key, nothing above an document/record-cache should be aware of the existence of a surrogate key. Be aware that the cost of matching updates against the existing surrogate keys can be prohibitive, and a far larger scalability hit than the incremental cost of moving a few extra bytes per request over the internal network.
So in conclusion:
If the DB is just being used as an object-store: let the ORM worry about object identity, and there should almost certainly be a surrogate key.
If the DB is being used as a database: the introduction of a surrogate key is an engineering design decision with serious tradeoffs in both directions. The decision will need to be made on a case by case basis, with full recognition of the resulting costs to be accepted in exchange for the benefits gained either way.
Update
The 'convenience' of a surrogate key is really just the ability to punt on the question of identity. This is often necessary in a database, and reasonable in the caching layer as I allow, but beyond that it leads to brittle data designs. The problem is that identity is no something that has one correct answer. For non-trivial data-intensive systems you will routinely find yourself needing to work in terms of equivalence classes, rather than the reference identity, object-oriented programming lulls us into thinking is normal.
What it really comes down to is a realization that the whole concept of a 'primary key' is a fiction invented to help the relational model work efficiently; but, adopting a surrogate key, cements that fiction and makes the whole system brittle and inflexible. Business logic needs to be able to provide their own definitions of equality — sometimes four copies of the same file need to be considered four files, sometimes they should be considered indistinguishable from the original file; when you edit one of them, is that then a new file? the same file? The answer to both questions is of course yes, when... Working with natural keys provides this critical ability to work in terms of conceptual equivalence classes. If you let surrogate keys infect your business logic, you quickly lose this.
I have had to use multi-column primary keys in the past, and it became quite a nightmare very quickly.
If you have one table that references your first table, how does it contain that primary key? Now add another table that references only the second table but needs to find data in the first. Now another... on down the rabbit hole.
If you know that you will only have the one table, there's probably not an issue either way- use whichever represents your data better. But if you'll be using it in joins, you can lose performance pretty quickly.
Is there a benefit to having a single column primary key vs a composit[sic] primary key?
Yes. If the primary key also happens to be the clustered index, it is common that the clustered index is duplicated fully for each secondary index in the table. Therefore, having a fatter clustered index, which is what one would get with a composite, implies an increase in storage cost. Also, foreign references to this table would need to specify both fields to refer to a unique entry, which implies a further storage cost. There is also an arguably greater cost in development time because there is a slight increase in the complexity of the join.
On the other hand, depending on the distribution of the values of your two key fields, it may be the case that concurrent access to your table is greatly improved because chronologically-successive inserts could occur on different physical pages; this could be the case, for example, if your fields are time-independent (and non-monotonic like an auto-incrementer) like clientID, or something like that. This could be significant for performance in a high concurrency environment.
I have a table that consists of two id columns which together make up the primary key.
Are there any disadvantages to this? Is there a compelling reason for me
to throw in a third column that would be unique on it's own?
If the most common way in which your table is queried is to specify those three fields as restrictions, then having all three in a composite key would likely be the fastest lookup.
And there is another important point that I almost forgot. Since having a composite key means that foreign references to this table from other tables must specify all fields in the key, it also means that some queries performed on the other table that required a restriction on one or more of the parts of the composite index of this table, can be performed without requiring a join. This could be considered similar to the concept of denormalization for the sake of performance (and arguably sacrificing a little ease of maintainability).
In general I prefer to have a surrogate key becasue there are very few truly good natural keys (key problem is not uniqueness but that they change over time) and the longer the natural key, the more it affects performance when used as a PK. If you have a natural key, you should create a unique index on it and then use the surrogate key as the PK used for joining to other tables. That enforces the uniqueness of the natural key data but fixes the problems of join performance and the extra time to update all child records when the natural key changes.
There is one case where I ignore this and that is a joining table. If it is a table that is used to enforce a many to many relationship and consists only of two surrogate keys from other tables, then you really gain nothing from adding a surrogate key. Typically the individual keys are used for joins not the PK and surrogate keys almost never change. In a joining table, I just add the two colmns I need and nothing else.
In most databases I know (MySQL, PostgreSQL) the composite key will generate an index. So if you specify your key as composite the DB should provide you an efficient way to lookup tuples from the DB using that key. I think it is the case for all DBs. I think you do not have to bother about performance there.
Don't use multi-column keys. They get very difficult to maintain, especially if the components of the key are not human-understandable.
Use an internally generated key instead.
Imagine you have a composite primary key (field1 and field2 for example) instead of just one autoincremental identifier. Clients' requirements are very changeable and after some development the client says that field2 is not compulsory and it can be nullable, it won't be possible to continue as the primary key of the table. Imagine this table is one of the most importants in your model. Then all the foreign keys should be changed if field 2 cannot be in the composite primary key. It's a nightmare changing the primary key all over the model.
As well if there is a lot of foreign keys I think is not a very good Idea to add several keys to each table just to make the link.
I'm not sure there's enough information for us to make your call for you. Here are a few observations that might be helpful though.
is the primary key a clustered index? Is the table referenced by other tables through a foreign key? If yes, then you may benefit from a single-column key, because that key will appear in those other tables. This is how you would save space.
If the table is not referenced by other tables, then you would be using extra space in your table without much additional benefit. And, if this table only contains the two columns now, then you would increase the table size by 50%.
If you use an extra column for the primary key, do not forget your natural key (the two-column key). Create a unique constraint on the composite key. You still want to maintain the integrity of the real data.
The decision should always be based on requirements and the intended meaning of the data. A table with only a single attribute key clearly enforces a different kind of constraint and implies that your table has a very different meaning to the same table with a multi attribute key. On the other hand adding an additional unique column would also be a waste of resources and add meaningless complexity if you don't actually need to use it anywhere.
One caveat to the auto-incrementing column is that it can give a false impression of uniqueness. Sure, your identity column is always unique, but that's just a meaningless value you've attached to the table. Unless you also have a unique constraint attached to the set of columns that represent the actual semantic primary key of the table, you have no guarantee of meaningful uniqueness.
We have a database with 500+ tables, in which almost all the tables have a clustered PK that is of datatype guid (uniqueidentifier).
We are in the process of testing a switch from "normal" "random" guids generated through .NETs Guid.NewGuid() method to sequential guids generated through the NHibernate guid.comb algorithm. This seems to be working well, but what about clients that already have millions of rows with "random" primary key values?
Will they benefit from the fact that new ids generated from now on will be sequential?
Could/should anything be done to their existing data?
Thanks in advance for any pointers on this.
You could do this, but I'm not sure you would want to. I dont see any benefit in using sequential guids, in fact using guids is not recommended as a primary key unless there are distributed/replication reasons involved. Are you using a clustered index?
Having said that if you go ahead, I recommend loading a table with values from your algorithm first.
You are going to have hassles with foreign keys. You will need to associate the old and new guids in the aformentioned table, drop the foreign keys, perform a transactional update, then reapply the foreign keys.
I dont think it is worth the hassle unless you were moving away from guids altogether to say an integer based system.
It depends whether the tables are clustered on the primary index or on another index. For instance, if you are creating large amounts of new records in a table with a GUID PK and a creation date, it usually makes sense to cluster by the creation date in order to optimize the insert operation.
On the other hand, depending on the queries done, a cluster on the GUID may be better, in which case using sequential GUIDs can help with the insert performance. I'd say that it isn't possible to give a final answer to your question without in-depth knowledge of the usage.
I'm facing a similar issue, I think it would be possible to update existing data by writing an application to update your existing keys using the NHibernate guid.comb algorithm. To propogate the new keys to related foreign key tables maybe it would be possible to temporarily cascade updates? Doing this through .NET code would be slower than an SQL script, another option might be to duplicate the guid.comb logic in SQL but not sure if this is possible.
If you choose to retain the existing data, using the guid.comb algorithm should have some performance improvement, there will still be page splitting when inserts occur but because new guids are sequential instead of totally random this will be at least somewhat reduced. Another option to consider would be to remove the clustered index on your GUID primary key, although I'm not sure how much existing query performance will be impacted.
For SQL server is it better to use an uniqueidentifier(GUID) or a bigint for an identity column?
That depends on what you're doing:
If speed is the primary concern then a plain old int is probably big enough.
If you really will have more than 2 billion (with a B ;) ) records, then use bigint or a sequential guid.
If you need to be able to easily synchronize with records created remotely, then Guid is really great.
Update
Some additional (less-obvious) notes on Guids:
They can be hard on indexes, and that cuts to the core of database performance
You can use sequential guids to get back some of the indexing performance, but give up some of the randomness used in point two.
Guids can be hard to debug by hand (where id='xxx-xxx-xxxxx'), but you get some of that back via sequential guids as well (where id='xxx-xxx' + '123').
For the same reason, Guids can make ID-based security attacks more difficult- but not impossible. (You can't just type 'http://example.com?userid=xxxx' and expect to get a result for someone else's account).
In general I'd recommend a BIGINT over a GUID (as guids are big and slow), but the question is, do you even need that? (I.e. are you doing replication?)
If you're expecting less than 2 billion rows, the traditional INT will be fine.
Are you doing replication or do you have sales people who run disconnected databses that need to merge, use a GUID. Otherwise I'd go for an int or bigint. They are far easier to deal with in the long run.
Depends no what you need. DB Performance would gain from integer while GUIDs are useful for replication and not requiring to hear back from DB what identity has been created, i.e. code could create GUID identity before inserting into row.
If you're planning on using merge replication then a ROWGUIDCOL is beneficial to performance (see here for info). Otherwise we need more info about what your definition of 'better' is; better for what?
Unless you have a real need for a GUID, such as being able to generate keys anywhere and not just on the server, then I would stick with using INTEGER-based keys. GUIDs are expensive to create and make it harder to actually look at the data. Plus, have you ever tried to type a GUID in an SQL query? It's painful!
There can be few more aspects or requirements to use GUID.
If the primary key is of any numeric type (Int, BigInt or any other), then either you need to make it Identity column, or you need to check the last saved value in the table.
And in that case, if the record in foreign table is saved as transaction, then it would be difficult to get the last identity value of primary key. Like if IDENT_CURRENT is used, then will be again effect performance while saving record in foreign key.
So in case of saving the records as for transactions, then it would be convenient to firstly generate Guid for primary key, and then save the generated key (Guid) in primary and foreign table(s).
It really depends on whether or not the information coming in is somehow sequential. I highly recommend for things such as users that a GUID might be better. But for sequential data, such as orders or other things that need to be easily sortable that a bigint may well be a better solution as it will be indexed and provide fast sorting without the cost of another index.
It really depends whether you're expecting to have replication in the picture. Replication requires a row UUID, so if you're planning on doing that you may as well do it up front.
I'm with Andrew Rollings.
Now you could argue space efficiency. An int is what, 8 bytes max? A guid is going to much longer.
But I have two main reasons for preference: readability and access time. Numbers are easier for me than GUIDs (since I can always find the next/previous record easily).
As for access time, note that some DBs can start to have BIG problems with GUIDs. I know this is the case with MySQL (MySQL InnoDB Primary Key Choice: GUID/UUID vs Integer Insert Performance). This may not be much of a problem with SQL Server, but it's something to watch out for.
I'd say stick with INT or BIGINT. The only time I would think you'd want the GUID is when you are going to give them out and don't want people to be able to guess the IDs of other records for security reasons.
I've worked on a number of database systems in the past where moving entries between databases would have been made a lot easier if all the database keys had been GUID / UUID values. I've considered going down this path a few times, but there's always a bit of uncertainty, especially around performance and un-read-out-over-the-phone-able URLs.
Has anyone worked extensively with GUIDs in a database? What advantages would I get by going that way, and what are the likely pitfalls?
Advantages:
Can generate them offline.
Makes replication trivial (as opposed to int's, which makes it REALLY hard)
ORM's usually like them
Unique across applications. So We can use the PK's from our CMS (guid) in our app (also guid) and know we are NEVER going to get a clash.
Disadvantages:
Larger space use, but space is cheap(er)
Can't order by ID to get the insert order.
Can look ugly in a URL, but really, WTF are you doing putting a REAL DB key in a URL!? (This point disputed in comments below)
Harder to do manual debugging, but not that hard.
Personally, I use them for most PK's in any system of a decent size, but I got "trained" on a system which was replicated all over the place, so we HAD to have them. YMMV.
I think the duplicate data thing is rubbish - you can get duplicate data however you do it. Surrogate keys are usually frowned upon where ever I've been working. We DO use the WordPress-like system though:
unique ID for the row (GUID/whatever). Never visible to the user.
public ID is generated ONCE from some field (e.g. the title - make it the-title-of-the-article)
UPDATE:
So this one gets +1'ed a lot, and I thought I should point out a big downside of GUID PK's: Clustered Indexes.
If you have a lot of records, and a clustered index on a GUID, your insert performance will SUCK, as you get inserts in random places in the list of items (that's the point), not at the end (which is quick).
So if you need insert performance, maybe use a auto-inc INT, and generate a GUID if you want to share it with someone else (e.g., showing it to a user in a URL).
Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(
#Matt Sheppard:
Say you have a table of customers. Surely you don't want a customer to exist in the table more than once, or lots of confusion will happen throughout your sales and logistics departments (especially if the multiple rows about the customer contain different information).
So you have a customer identifier which uniquely identifies the customer and you make sure that the identifier is known by the customer (in invoices), so that the customer and the customer service people have a common reference in case they need to communicate. To guarantee no duplicated customer records, you add a uniqueness-constraint to the table, either through a primary key on the customer identifier or via a NOT NULL + UNIQUE constraint on the customer identifier column.
Next, for some reason (which I can't think of), you are asked to add a GUID column to the customer table and make that the primary key. If the customer identifier column is now left without a uniqueness-guarantee, you are asking for future trouble throughout the organization because the GUIDs will always be unique.
Some "architect" might tell you that "oh, but we handle the real customer uniqueness constraint in our app tier!". Right. Fashion regarding that general purpose programming languages and (especially) middle tier frameworks changes all the time, and will generally never out-live your database. And there is a very good chance that you will at some point need to access the database without going through the present application. == Trouble. (But fortunately, you and the "architect" are long gone, so you will not be there to clean up the mess.) In other words: Do maintain obvious constraints in the database (and in other tiers, as well, if you have the time).
In other words: There may be good reasons to add GUID columns to tables, but please don't fall for the temptation to make that lower your ambitions for consistency within the real (==non-GUID) information.
The main advantages are that you can create unique id's without connecting to the database. And id's are globally unique so you can easilly combine data from different databases. These seem like small advantages but have saved me a lot of work in the past.
The main disadvantages are a bit more storage needed (not a problem on modern systems) and the id's are not really human readable. This can be a problem when debugging.
There are some performance problems like index fragmentation. But those are easilly solvable (comb guids by jimmy nillson: http://www.informit.com/articles/article.aspx?p=25862 )
Edit merged my two answers to this question
#Matt Sheppard I think he means that you can duplicate rows with different GUIDs as primary keys. This is an issue with any kind of surrogate key, not just GUIDs. And like he said it is easilly solved by adding meaningfull unique constraints to non-key columns. The alternative is to use a natural key and those have real problems..
GUIDs may cause you a lot of trouble in the future if they are used as "uniqifiers", letting duplicated data get into your tables. If you want to use GUIDs, please consider still maintaining UNIQUE-constraints on other column(s).
One other small issue to consider with using GUIDS as primary keys if you are also using that column as a clustered index (a relatively common practice). You are going to take a hit on insert because of the nature of a guid not begin sequential in anyway, thus their will be page splits, etc when you insert. Just something to consider if the system is going to have high IO...
primary-keys-ids-versus-guids
The Cost of GUIDs as Primary Keys (SQL Server 2000)
Myths, GUID vs. Autoincrement (MySQL 5)
This is realy what you want.
UUID Pros
Unique across every table, every database, every server
Allows easy merging of records from different databases
Allows easy distribution of databases across multiple servers
You can generate IDs anywhere, instead of having to roundtrip to the database
Most replication scenarios require GUID columns anyway
GUID Cons
It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes
There is one thing that is not really addressed, namely using random (UUIDv4) IDs as primary keys will harm the performance of the primary key index. It will happen whether or not your table is clustered around the key.
RDBMs usually ensure the uniqueness of the primary keys, and ensure the lookups by a key, in a structure called BTree, which is a search tree with a large branching factor (a binary search tree has branching factor of 2). Now, a sequential integer ID would cause the inserts to occur just one side of the tree, leaving most of the leaf nodes untouched. Adding random UUIDs will cause the insertions to split leaf nodes all over the index.
Likewise if the data stored is mostly temporal, it is often the case that the most recent data needs to be accessed and joined against the most. With random UUIDs the patterns will not benefit from this, and will hit more index rows, thereby needing more of the index pages in memory. With sequential IDs if the most-recent data is needed the most, the hot index pages would require less RAM.
Advantages:
UUID values are unique between tables and databases. Thats why it can be merge rows between two databases or distributed databases.
UUID is more safer to pass through url than integer type data.
If one pass UUID through url, attackers can't guess the next id.But if we pass Integer type such as 10, then attackers can guess the next id is 11 then 12 etc.
UUID can generate offline.
One thing not mentioned so far: UUIDs make it much harder to profile data
For web apps at least, it's common to access a resource with the id in the url, like stackoverflow.com/questions/45399. If the id is an integer, this both
provides information about the number of questions (ie September 5th, 2008, the 45,399th question was asked)
provides a leverage point to iterate through questions (what happens when I increment that by 1? I open the next asked question)
From the first point, I can combine the timestamp from the question and the number to profile how frequently questions are asked and how that changes over time. this matters less on a site like Stack Overflow, with publicly available information, but, depending on context, this may expose sensitive information.
For example, I am a company that offers customers a permissions gated portal. the address is portal.com/profile/{customerId}. If the id is an integer, you could profile the number of customers regardless of being able to see their information by querying for lastKnownCustomerCount + 1 regularly, and checking if the result is 404 - NotFound (customer does not exist) or 403 - Forbidden (customer does exist, but you do not have access to view).
UUIDs non-sequential nature mitigate these issues. This isn't a garunted to prevent profiling, but it's a start.