We have an SQL Server database tables that are running out of ids soon (the primary id is a 32-bit integer and the max id is already about 2 billion). What is the best way to fix this issue?
Multiple attributes (3) in a table have Sequential ID and the range is expected to get exhausted soon
Alter the id type to 64-bit is an option but that would bring down the database for too long because the table has billion of rows.
Related
Two related tables on my production database is nearly to utilize all 32 bit of possible values for primary key. I have to migrate this key to bigint. Here, in other topics, I have found suggestions similar to this:
Drop FKs
Drop PK
Alter column Id
Alter referencing columns
Restore PK
Restore FKs
However, when I tried to perform ALTER COLUMN Id BIGINT on my local database with 1.5 billion rows (less then a half of rows in corresponding table on production db) - I have faced with huge amount of data written to log (about 800 GB, when table itself was about 50 GB after filling it with test data) after 2.5 hours of waiting operation to perform.
So I cancelled this operation, realizing this way of migration inapplicable on production DB.
It seems like migrating keys from int32 to int64 is a relatively common problem. So my question is: how do other developers and database engineers are usually dealing with it in production environments?
P.S. May be it is important that each row in tables not contains very much data.
The first problematic table consists of 6 int's (most nullable), 1 tinyint and 1 datetime. The primary key is int.
The second table is just int column references to first table's Id, and 2 varchar(255). The primary key is composite of int and varchar columns.
I was trying to create an ID column in SQL server, VB.net that would provide a sequence of numbers for every new row created in a database. So I used the following technique to create the ID column.
select * from T_Users
ALTER TABLE T_Users
ADD User_ID INT NOT NULL IDENTITY(1,1) Primary Key
Then I registered few usernames into the database and it worked just fine. For example the first six rows would be 1,2,3,4,5,6. Then I registered 4 more users the NEXT day, but this time the ID numbers jumped from 6 to A very large number such as: 1,2,3,4,5,6,1002,1003,1004,1005. Then two days later, I registered two more users and the new rows read 3002,3004. So my question is why is it skipping such a large number every other day I register users. Is the technique I used to create the sequence wrong? If it is wrong can anyone please tell me how to do it right? Now as I was getting frustrated with the technique used above, alternatively I tried to use sequentially generated GUID values. The sequence of GUID values were generated fine. However, the only downside is, it generates a very long numbers (4 times the INT size). My question here is does using GUID have any significant advantage over INT?
Regards,
Upside of GUIDs:
GUIDs are good if you ever want offline clients to be able to create new records, as you will never get a primary key clash when the new records are synchronised back to the main database.
Downside of GUIDs:
GUIDS as primary keys can have an effect on the performance of the DB, because for a clustered primary key, the DB will want to keep the rows in order of the key values. But this means a lot of inserts between existing records, because the GUIDs will be random.
Using IDENTITY column doesn't suffer from this because the next record is guaranteed to have the highest value and so the row is just tacked on the end every time. No re-shuffle needs to happen.
There is a compromise which is to generate a pseudo-GUID which means you would expect a key clash every 70 years or so, but helps the indexing immensely.
The other downsides are that a) they do take up more storage space, and b) are a real pain to write SQL against, i.e. much easier to type UPDATE TABLE SET FIELD = 'value' where KEY = 50003 than UPDATE TABLE SET FIELD = 'value' where KEY = '{F820094C-A2A2-49cb-BDA7-549543BB4B2C}'
Your declaration of the IDENTITY column looks fine to me. The gaps in your key values are probably due to failed attempts to add a row. The IDENTITY value will be incremented but the row never gets committed. Don't let it bother you, it happens in practically every table.
EDIT:
This question covers what I was meaning by pseudo-GUID. INSERTs with sequential GUID key on clustered index not significantly faster
In SQL Server 2005+ you can use NEWSEQUENTIALID() to get a random value that is supposed to be greater than the previous ones. See here for more info http://technet.microsoft.com/en-us/library/ms189786%28v=sql.90%29.aspx
Is the technique I used to create the sequence wrong?
No. If anything your google skills are non-existing. A short look for "Sql server identity skipping values" will give you a TON of returns including:
SQL Server 2012 column identity increment jumping from 6 to 1000+ on 7th entry
and the canonical:
Why are there gaps in my IDENTITY column values?
You basically wrongly assume sql server will not optimize it's access for performance. Identity numbers are markers, nothing else, no assumption of having no gaps please.
In particular: SQL Server preallocates numbers in 1000 blocks and - if you restart the server (like on your workstation) the remainder is lost.
http://www.sqlserver-training.com/sequence-breaks-gap-in-numbers-after-restart-sql-server-gap-between-numbers-after-restarting-server/-
If you do a manual sqyuence instead (new nin sql server 2012) you can define the cache size for this (pregeneration) and set it to 1 - at the cost of slightly lower performance when you do a lot of inserts.
My question here is does using GUID have any significant advantage over INT?
Yes. You can have a lot more rows with GUID's than with int. For example, int32 is limited to about 2 billion rows. For some of us that is too low (I have tables in the 10 billion range) and even a 64 large int is limited. And a truly zetabyte database, you have to use a guid in sequence, self generated.
Any normal human does not see a difference as we all do not really deal with that many rows. And the larger size makes a lot of things slower (larger key size = larger space in indices = larger indices = more memory / io for the same operation). Plus even your sequential id will jump.
Why not just adjust your expectation to reality - identity is not meant to be without gaps - or use a sequence with cache 1.
I know that sqlite database automatically generates a unique id (autoincrement) for every record inserted.
Does anyone know if there is any possibility of running out of system unique IDs in sqlite3 database, while executing the replace query?
I mean that every piece of data in database has its own type. For example, system unique id is something like int. What would the database do with the next record, if it generates unique id equal to MAX_INT?
Thanks.
The maximum possible ID is 9223372036854775807 (2^63 - 1). In any case, if it can't find a new auto-increment primary key, it will return SQLITE_FULL.
There is one other important point. If you use INTEGER PRIMARY KEY, it can reuse keys that were in the table earlier, but have been deleted. If you use INTEGER PRIMARY KEY AUTOINCREMENT, it can never reuse an auto-increment key.
http://www.sqlite.org/autoinc.html
"If the largest ROWID is equal to the largest possible integer (9223372036854775807) then the database engine starts picking positive candidate ROWIDs at random until it finds one that is not previously used. If no unused ROWID can be found after a reasonable number of attempts, the insert operation fails with an SQLITE_FULL error. If no negative ROWID values are inserted explicitly, then automatically generated ROWID values will always be greater than zero."
Given that sqlite ids are 64-bit you could insert a new record every 100 milliseconds for the next 546 years before running out (I made up those numbers; but you get the idea).
Just a SQL Server 2008 generic question, I have a table that has around 15 columns and they are string, int and bool types only, I am not storing any binary data, and I have auto generated PK column "ID" with ##IDENTITY enabled to generate unique ID on every entry, my question is that a table like this how many rows can have, is there any row limitation in SQL Server table?
thanks in advance.
There are limits defined in "Maximum Capacity Specifications for SQL Server" on MSDN
Some relevant ones
1024 columns per standard table
Rows per table: Limited by available storage
Bytes per row: 8060 (except for row overflow data)
Basically, don't worry...
Using SQL Server 2005 and 2008.
I've got a potentially very large table (potentially hundreds of millions of rows) consisting of the following columns:
CREATE TABLE (
date SMALLDATETIME,
id BIGINT,
value FLOAT
)
which is being partitioned on column date in daily partitions. The question then is should the primary key be on date, id or value, id?
I can imagine that SQL Server is smart enough to know that it's already partitioning on date and therefore, if I'm always querying for whole chunks of days, then I can have it second in the primary key. Or I can imagine that SQL Server will need that column to be first in the primary key to get the benefit of partitioning.
Can anyone lend some insight into which way the table should be keyed?
As is the standard practice, the Primary Key should be the candidate key that uniquely identifies a given row.
What you wish to do, is known as Aligned Partitioning, which will ensure that the primary key is also split by your partitioning key and stored with the appropriate table data. This is the default behaviour in SQL Server.
For full details, consult the reference Partitioned Tables and Indexes in SQL Server 2005
There is no specific need for the partition key to be the first field of any index on the partitioned table, as long as it appears within the index it can then be aligned to the partition scheme.
With that in mind, you should apply the normal rules for index field order supporting the most queries / selectivity of the values.