I know that sqlite database automatically generates a unique id (autoincrement) for every record inserted.
Does anyone know if there is any possibility of running out of system unique IDs in sqlite3 database, while executing the replace query?
I mean that every piece of data in database has its own type. For example, system unique id is something like int. What would the database do with the next record, if it generates unique id equal to MAX_INT?
Thanks.
The maximum possible ID is 9223372036854775807 (2^63 - 1). In any case, if it can't find a new auto-increment primary key, it will return SQLITE_FULL.
There is one other important point. If you use INTEGER PRIMARY KEY, it can reuse keys that were in the table earlier, but have been deleted. If you use INTEGER PRIMARY KEY AUTOINCREMENT, it can never reuse an auto-increment key.
http://www.sqlite.org/autoinc.html
"If the largest ROWID is equal to the largest possible integer (9223372036854775807) then the database engine starts picking positive candidate ROWIDs at random until it finds one that is not previously used. If no unused ROWID can be found after a reasonable number of attempts, the insert operation fails with an SQLITE_FULL error. If no negative ROWID values are inserted explicitly, then automatically generated ROWID values will always be greater than zero."
Given that sqlite ids are 64-bit you could insert a new record every 100 milliseconds for the next 546 years before running out (I made up those numbers; but you get the idea).
Related
I have a RecyclerView list of items that uses an SQLite database to store user input data. I use the traditional _id column as INTEGER PRIMARY KEY AUTOINCREMENT. If I understand correctly, newly inserted rows in the database are added below existing rows and the new ROWID takes the largest existing ROWID and increments it by +1. Therefore, a cursor search for the latest insert will have to scan down the entire set of rows to reach the bottom of the database. For example, after 10 inserts, the cursor has to search down from 1, 2, 3,... until it gets to row 10.
To avoid a lengthy search of the entire set of ROWIDs, is there any way to have new inserts be added to the top of the database and not the bottom? That way a cursor search for the latest insert using moveToFirst() will be very fast since the cursor will stop at the first row it searches, the top of the database. The cursor would search 10, 9, 8,...3,2,1 and therefore the search would be very fast since it would stop at 10, the first row at the top of the database.
You are thinking too much about the database internals. Indexes are designed for this kind of optimisation.
Make a new numeric column where you put your wished ordering as a value and use order by in selects. Do not forget to make an index on this column and verify your selects do use the indexes. (explain)
First, if you are concerned about overheads then use the recommended INTEGER PRIMARY KEY as opposed to INTEGER PRIMARY KEY AUTOINCREMENT. Both will result in a unique id, the latter has overheads as per :-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and
disk I/O overhead and should be avoided if not strictly needed. It is
usually not needed.
SQLite Autoincrement
If I understand correctly, newly inserted rows in the database are
added below existing rows and the new ROWID takes the largest existing
ROWID and increments it by +1.
Generally BUT not necessarily, there is no guarantee that the value will increment by 1.
AUTOINCREMENT utilises a table called sqlite_seqeunce that has a single row per table that stores the highest last used sequence number along with the table name. The next sequence number will be that value + probably 1 UNLESS the highest rowid is greater than the value in the sqlite_sequence table.
Without AUTOINCREMENT then the next sequence is the highest rowid + probably 1.
AUTOINCREMENT guarantees a higher number. Without AUOINCREMENT can use a lower number (BUT not until the number would be greater than 9223372036854775807). If AUTOINCREMENT would use a number higher that this then an SQLITE_FULL exception will happen.
Again with regard to rowid's and searching :-
The data for rowid tables is stored as a B-Tree structure containing
one entry for each table row, using the rowid value as the key. This
means that retrieving or sorting records by rowid is fast. Searching
for a record with a specific rowid, or for all records with rowids
within a specified range is around twice as fast as a similar search
made by specifying any other PRIMARY KEY or indexed value. ROWIDs and the INTEGER PRIMARY KEY
To avoid a lengthy search of the entire set of ROWIDs, is there any
way to have new inserts be added to the top of the database and not
the bottom?
Yes there is, simply specify the value for the rowid or typically the alias when inserting (but beware using an already used value and good luck with managing the numbering). However, I doubt that doing so would result in a faster search. Tables have a rowid by default largely due to the rowid being optimised for searching by rowid.
I was trying to create an ID column in SQL server, VB.net that would provide a sequence of numbers for every new row created in a database. So I used the following technique to create the ID column.
select * from T_Users
ALTER TABLE T_Users
ADD User_ID INT NOT NULL IDENTITY(1,1) Primary Key
Then I registered few usernames into the database and it worked just fine. For example the first six rows would be 1,2,3,4,5,6. Then I registered 4 more users the NEXT day, but this time the ID numbers jumped from 6 to A very large number such as: 1,2,3,4,5,6,1002,1003,1004,1005. Then two days later, I registered two more users and the new rows read 3002,3004. So my question is why is it skipping such a large number every other day I register users. Is the technique I used to create the sequence wrong? If it is wrong can anyone please tell me how to do it right? Now as I was getting frustrated with the technique used above, alternatively I tried to use sequentially generated GUID values. The sequence of GUID values were generated fine. However, the only downside is, it generates a very long numbers (4 times the INT size). My question here is does using GUID have any significant advantage over INT?
Regards,
Upside of GUIDs:
GUIDs are good if you ever want offline clients to be able to create new records, as you will never get a primary key clash when the new records are synchronised back to the main database.
Downside of GUIDs:
GUIDS as primary keys can have an effect on the performance of the DB, because for a clustered primary key, the DB will want to keep the rows in order of the key values. But this means a lot of inserts between existing records, because the GUIDs will be random.
Using IDENTITY column doesn't suffer from this because the next record is guaranteed to have the highest value and so the row is just tacked on the end every time. No re-shuffle needs to happen.
There is a compromise which is to generate a pseudo-GUID which means you would expect a key clash every 70 years or so, but helps the indexing immensely.
The other downsides are that a) they do take up more storage space, and b) are a real pain to write SQL against, i.e. much easier to type UPDATE TABLE SET FIELD = 'value' where KEY = 50003 than UPDATE TABLE SET FIELD = 'value' where KEY = '{F820094C-A2A2-49cb-BDA7-549543BB4B2C}'
Your declaration of the IDENTITY column looks fine to me. The gaps in your key values are probably due to failed attempts to add a row. The IDENTITY value will be incremented but the row never gets committed. Don't let it bother you, it happens in practically every table.
EDIT:
This question covers what I was meaning by pseudo-GUID. INSERTs with sequential GUID key on clustered index not significantly faster
In SQL Server 2005+ you can use NEWSEQUENTIALID() to get a random value that is supposed to be greater than the previous ones. See here for more info http://technet.microsoft.com/en-us/library/ms189786%28v=sql.90%29.aspx
Is the technique I used to create the sequence wrong?
No. If anything your google skills are non-existing. A short look for "Sql server identity skipping values" will give you a TON of returns including:
SQL Server 2012 column identity increment jumping from 6 to 1000+ on 7th entry
and the canonical:
Why are there gaps in my IDENTITY column values?
You basically wrongly assume sql server will not optimize it's access for performance. Identity numbers are markers, nothing else, no assumption of having no gaps please.
In particular: SQL Server preallocates numbers in 1000 blocks and - if you restart the server (like on your workstation) the remainder is lost.
http://www.sqlserver-training.com/sequence-breaks-gap-in-numbers-after-restart-sql-server-gap-between-numbers-after-restarting-server/-
If you do a manual sqyuence instead (new nin sql server 2012) you can define the cache size for this (pregeneration) and set it to 1 - at the cost of slightly lower performance when you do a lot of inserts.
My question here is does using GUID have any significant advantage over INT?
Yes. You can have a lot more rows with GUID's than with int. For example, int32 is limited to about 2 billion rows. For some of us that is too low (I have tables in the 10 billion range) and even a 64 large int is limited. And a truly zetabyte database, you have to use a guid in sequence, self generated.
Any normal human does not see a difference as we all do not really deal with that many rows. And the larger size makes a lot of things slower (larger key size = larger space in indices = larger indices = more memory / io for the same operation). Plus even your sequential id will jump.
Why not just adjust your expectation to reality - identity is not meant to be without gaps - or use a sequence with cache 1.
How can I get last autoincrement value of specific table right after I open database? It's not last_insert_rowid() because there is no insertion transaction. In other words I want to know in advance which number autoincrement will choose when inserting new row for particular table.
It depends on how the autoincremented column has been defined.
If the column definition is INTEGER PRIMARY KEY AUTOINCREMENT, then SQLite will keep the largest ID in an internal table called sqlite_sequence.
If the column definition does NOT contain the keyword AUTOINCREMENT, SQLite will use its ‘regular’ routine to determine the new ID. From the documentation:
The usual algorithm is to give the newly created row a ROWID that is
one larger than the largest ROWID in the table prior to the insert. If
the table is initially empty, then a ROWID of 1 is used. If the
largest ROWID is equal to the largest possible integer
(9223372036854775807) then the database engine starts picking positive
candidate ROWIDs at random until it finds one that is not previously
used. If no unused ROWID can be found after a reasonable number of
attempts, the insert operation fails with an SQLITE_FULL error. If no
negative ROWID values are inserted explicitly, then automatically
generated ROWID values will always be greater than zero.
I remember reading that, for columns without AUTOINCREMENT, the only surefire way to determine the next ID is to VACUUM the database first; that will reset all ID counters to the largest existing ID for that table + 1. But I can’t find that quote anymore, so this may no longer be true.
That said, I agree with slash_rick_dot that fetching auto-incremented IDs beforehand is a bad idea, especially if there’s even a remote chance that another process might write to the database at the same time.
Different databases implement auto-increment differently. But as far as I know, none of them will answer the question you are asking.
The auto increment feature is intended to create a unique ID for a newly added table row. If a row hasn't been inserted yet, then the feature hasn't produced the id.
And it makes sense... If you did get the next auto increment number, what would you do with it? Likely the intent is to assign it as the primary key of the not-yet-inserted new row. But between the time you got the id, and the time you used it to insert the row, the database could have used that id to insert a row for another process.
Your choices are this: manage the creation of ids yourself, or wait until rows are inserted before using their auto-created identifiers.
Auto-incrementing primary key is making big jumps as if huge numbers of rows are getting deleted and re-inserted. I'm positive they aren't getting deleted though. Nowhere in my code do I delete from the table!
I have a table with a bigint column as auto incrementing primary key and a varchar column that is indexed.
I noticed that the primary key values made huge jumps. For example..
ID Name
1 Foo
2 Bar
12586 Woo
12587 Hoo
987698 What
987698 Is Going On
The primary key is clustered. Could that be it?
If it keeps making these big jumps, it's going to overflow. What will happen then?
When you say "auto incrementing primary key," do you mean IDENTITY(1,1), or something else?
Over what time period are you seeing the increase from 1 to 987698? And why are you seeing that last value twice?
Have you run SQL Profiler to look at the activity on that table?
Are you using transactions? If so, are you experiencing rollbacks or exceptions/errors?
Check out the identity properties of your table. (Ident_current,Ident_seed,Ident_incr). Are you using transactions ?
Microsoft has changed tradition sequence in SQL 2012. You can implement custom sequencing. Just follow this link. This would help you to solve your problem.
--Create the Test schema
CREATE SCHEMA Test
-- Create a sequence
CREATE SEQUENCE Test.CountBy1
START WITH 1
INCREMENT BY 1
-- Insert three records
INSERT Test.Orders (OrderID, Name, Qty)
VALUES (NEXT VALUE FOR Test.CountBy1, 'Tire', 2)
http://msdn.microsoft.com/en-us/library/ff878058.aspx
I have a purely academic question about SQLite databases.
I am using SQLite.net to use a database in my WinForm project, and as I was setting up a new table, I got to thinking about the maximum values of an ID column.
I use the IDENTITY for my [ID] column, which according to SQLite.net DataType Mappings, is equivalent to DbType.Int64. I normally start my ID columns at zero (with that row as a test record) and have the database auto-increment.
The maximum value (Int64.MaxValue) is 9,223,372,036,854,775,807. For my purposes, I'll never even scratch the surface on reaching that maximum, but what happens in a database that does? While trying to read up on this, I found that DB2 apparently "wraps" the value around to the negative value (-9,223,372,036,854,775,807) and increments from there, until the database can't insert rows because the ID column has to be unique.
Is this what happens in SQLite and/or other database engines?
I doubt anybody knows for sure, because if a million rows per second were being inserted, it would take about 292,471 years to reach the wrap-around-risk point -- and databases have been around for a tiny fraction of that time (actually, so has Homo Sapiens;-).
IDENTITY is not actually the proper way to auto-increment in SQLite. That will require you do the incrementing in the app layer. In the SQLite shell, try:
create table bar (id IDENTITY, name VARCHAR);
insert into bar (name) values ("John");
select * from bar;
You will see that id is simply null. SQLite does not give any special significance to IDENTITY, so it is basically an ordinary (untyped) column.
On the other hand, if you do:
create table baz (id INTEGER PRIMARY KEY, name VARCHAR);
insert into baz (name) values ("John");
select * from baz;
it will be 1 as I think you expect.
Note that there is also a INTEGER PRIMARY KEY AUTOINCREMENT. The basic difference is that AUTOINCREMENT ensures keys are never reused. So if you remove John, 1 will never be reused as a id. Either way, if you use PRIMARY KEY (with optional AUTOINCREMENT) and run out of ids, SQLite is supposed to fail with SQLITE_FULL, not wrap around.
By using IDENTITY, you do open the (probably irrelevant) likelihood that your app will incorrectly wrap around if the db were ever full. This is quite possible, because IDENTITY columns in SQLite can hold any value (including negative ints). Again, try:
insert into bar VALUES ("What the hell", "Bill");
insert into bar VALUES (-9, "Mary");
Both of those are completely valid. They would be valid for baz too. However, with baz you can avoid manually specifying id. That way, there will never be junk in your id column.
The documentation at http://www.sqlite.org/autoinc.html indicates that the ROWID will try to find an unused value via randomization once it reached its maximum number.
For AUTOINCREMENT it will fail with SQLITE_FULL on all attempts to insert into this table, once there was a maximum value in the table:
If the table has previously held a row with the largest possible ROWID, then new INSERTs are not allowed and any attempt to insert a new row will fail with an SQLITE_FULL error.
This is necessary, as the AUTOINCREMENT guarantees that the ID is monotonically increasing.
I can't speak to any specific DB2 implementation logic, but the "wrap around" behavior you describe is standard for numbers that implement signing via two's complement.
As for what would actually happen, that's completely up in the air as to how the database would handle it. The issue arises at the point in time of actually CREATING the id that's too large for the field, as it's unlikely that the engine internally uses a data type of more than 64 bits. At that point, it's anybody's guess...the internal language used to develop the engine could throw up, the number could silently wrap around and just cause a primary key violation (assuming that a conflicting ID existed), the world could come to an end due to your overflow, etc.
But pragmatically, Alex is correct. The theoretical limit on the number of rows involved here (assuming it's a one-id-per row and not any sort of cheater identity insert shenanigans) would basically render the situation moot, as by the time that you could conceivably enter that many rows at even a stupendous insertion rate we'll all dead anyway, so it doesn't matter :)