In other words, is the following "cursoring" approach guaranteed to work:
retrieve rows from DB
save the largest ID from the returned records for later, e.g. in LastMax
later, "SELECT * FROM MyTable WHERE Id > {0}", LastMax
In order for that to work, I have to be sure that every row I didn't get in step 1 has an Id greater than LastMax. Is this guaranteed, or can I run into weird race conditions?
Guaranteed as in absolutely under no circumstances whatsoever could you possibly get a value that might be less than or equal to the current maximum value? No, there is no such guarantee. That said, the circumstances under which that scenario could happen are limited:
Someone disables identity insert and inserts a value.
Someone reseeds the identity column.
Someone changes the sign of the increment value (i.e. instead of +1 it is changed to -1)
Assuming none of these circumstances, you are safe from race conditions creating a situation where the next value is lower than an existing value. That said, there is no guarantee that the rows will be committed in the order that of their identity values. For example:
Open a transaction, insert into your table with an identity column. Let's say it gets the value 42.
Insert and commit into the same table another value. Let's say it gets value 43.
Until the first transaction is committed, 43 exists but 42 does not. The identity column is simply reserving a value, it is not dictating the order of commits.
I think this can go wrong depending on the duration of transactions
Consider the following sequence of events:
Transaction A starts
Transaction A performs insert - This creates a new entry in the identity column
Transaction B starts
Transaction B performs insert - This creates a new entry in the identity column
Transaction B commits
Your code performs its select and sees the identity value from the 2nd transaction
Transaction A commits -
The row inserted by Transaction A will never be found by your code. It was not already committed when step 6 was performed. And when the next query is performed it will not be found, because it has a lower value in the identity column than the query is looking for.
It could work if you perform the query with a read-uncommitted isolation mode
Identities will will always follow the increment that defines the identity:
IDENTITY [(seed ,increment)] http://msdn.microsoft.com/en-us/library/aa933196(SQL.80).aspx
which can be positive or negative (you can have it increment forward or backwards). If you set your identity to increment forward, your identity values will always be larger than the previous, but you may miss some, if you rollback an INSERT.
Yes, if you set your identity increment to a positive value your loop logic will work.
The only time records might get inserted that you wouldn't get would be if someone turns the identity insert on and manually inserts a record to a skipped id (or in some cases to a negative number). This is a fairly rare occurance and generally would only be done by a system admin. Might be done to reinsert an accidentally deleted record for instance.
The only thing that SQL Server guarantees is that your IDENTITY column will always be incremented.
Things to consider though:
If a fail INSERT occurs, the IDENTITY column will get incremented anyhow;
If a rollback occurs, the IDENTITY column will not return to its previous value;
Which explains why SQL Server doesn't guarantee sequential INDENTITY.
There is a way to reset an IDENTITY column like so using the DBCC command. But before doing so, please consider the following:
Ensure your IDENTITY column is not referenced by any other table, as your foreign keys could be not updated with it, so big troubles ahead;
You might use the SET IDENTITY_INSERT ON/OFF instruction so that you may manually specify the IDENTITY while INSERTing a row (never forget to turn it on afterward).
An IDENTITY column is one of the most important element never to be changed in DBRMs.
Here is a link that should help you: Understanding IDENTITY columns
EDIT: What you seem to do shall work as the IDENTITY column from LastMax will always increment for each INSERTed row. So:
Selecting rows from data table;
Saving LastMax state;
Selecting rows where Id > LastMax.
3) will only select rows where the IDENTITY column will be greater than LastMax, so inserted since LastMax has been saved.
Related
I hope the question is not too generic.
I have a table Person that has a PK Identity column Id.
Via C#, I insert new entries for Person and the Id get set to 1,2,3 for the 3 persons added.
Also via C#, I perform all deletions of the persons with Id=1,2,3 so that there's no Person in the Table anymore.
Afterwards, I run some change scripts (I can't post them as they are too long) also on Table Person.
I don't do any RESEED.
Now the fun:
If I call SELECT IDENT_CURRENT('Person') it shows 3 instead of 4.
If I do an insert of Person again, I get a Person with the Id 3 added instead of Id 4.
Any idea why and how this can happen?
EDIT
I think I found the explanation of my question:
While performing DB Changes via SQL Server Management Studio, The Designer creates
a temp table Tmp_Person and moves the data from Person inside there. Afterwards he performs a rename of Tmp_Person to Person. Since this is a new table the Index starts again from the beginning.
An IDENTITY property doesn't guarentee uniqueness. That's what a PRIMARY KEY or UNIQUE INDEX is for. This is covered in the documentation in the remarks section, along with other intended behaviour. CREATE TABLE (Transact-SQL) IDENTITY (Property) - Remarks:
The identity property on a column does not guarantee the following:
Uniqueness of the value - Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction - A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows
because other concurrent inserts might occur on the table. If values
must be consecutive then the transaction should use an exclusive lock
on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures -SQL Server might cache identity values for performance reasons and some of
the assigned values can be lost during a database failure or server
restart. This can result in gaps in the identity value upon insert. If
gaps are not acceptable then the application should use its own
mechanism to generate key values. Using a sequence generator with the
NOCACHE option can limit the gaps to transactions that are never
committed.
Reuse of values - For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a
particular insert statement fails or if the insert statement is rolled
back then the consumed identity values are lost and will not be
generated again. This can result in gaps when the subsequent identity
values are generated.
These restrictions are part of the design in order to improve
performance, and because they are acceptable in many common
situations. If you cannot use identity values because of these
restrictions, create a separate table holding a current value and
manage access to the table and number assignment with your
application.
Emphasis mine for this question.
I am using sql server 2012, in my database I have set primaykey on userid also I have set the Identity specification Yes,Is Identity Yes,Identity Increment 1 and Identity Seed 1.
I just insert 5 users and userid is 1,2,3,4,5. I am sure after that I haven't did any insert and no other sp or trigger is using this table. this is just a new table. Now when I tried to insert 6th user it has inserted userid is 1001.
and for 7th 1002 and for 8th it inserted 2002 ,
why such jumped in userid?
Usually Gaps occur when:
1. records are deleted.
2. error has occurred when attempting to insert a new record (e.g. not-null constraint error).the identity value is helplessly skipped.
3. somebody has inserted/updated it with explicit value (e.g. identity_insert option).
4. incremental value is more than 1.
The identity property on a column does not guarantee the following:
Uniqueness of the value – Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction – A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows because other concurrent inserts might occur on the table. If values must be consecutive then the transaction should use an exclusive lock on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures –SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert. If gaps are not acceptable then the application should use a sequence generator with the NOCACHE option or use their own mechanism to generate key values.
Reuse of values – For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a particular insert statement fails or if the insert statement is rolled back then the consumed identity values are lost and will not be generated again. This can result in gaps when the subsequent identity values are generated.
Also,
If an identity column exists for a table with frequent deletions, gaps can occur between identity values. If this is a concern, do not use the IDENTITY property. However, to make sure that no gaps have been created or to fill an existing gap, evaluate the existing identity values before explicitly entering one with SET IDENTITY_INSERT ON.
Also, Check the Identity Column Properties & check the Identity Increment value. Its should be 1.
Open your table in design view
Now check that Identity Seed and Identity Increment values are correct. If not then you must correct them.
I have a table in MS SQL SERVER 2008 and I have set its primary key to increment automatically but if I delete any row from this table and insert some new rows in the table it starts from the next identity value which created gap in the identity value. My program requires all the identities or keys to be in sequence.
Like:
Assignment Table has total 16 rows with sequence identities(1-16) but if I delete a value at 16th position
Delete From Assignment Where assignment_id=16;
and after this operation when I insert a new row
Insert into Assignment(assignment_title)Values('myassignment');
Rather than assigning 16 as a primary key to this new value it assigns 17.
How can I solve this Problem ?
Renaming or re-numbering primary key values is not a good database management practice. I suggest you keep the primary key as is, and create a separate column index with the values you require to be re-numbered. Then simply create a trigger to run a routine that will re-number every row in the order you expect, obviously by seeking the "gaps" and entering them with values incremented from their previous value.
This is SQL Servers standard behaviour. If you deleted a row with ID=8 in your example, you would still have a gap.
All you could do, is write a function getSmallestDreeID in SQL Server, that you called for every insert and that would get you the smallest not assigned ID. But you would have to take great care of transactions and ACID.
The behavior you desire isn't possible without some post processing logic to renumber the rows.
Consider thus scenario:
Session 1 begins a transaction, inserts a row (id=16), but doesn't commit yet.
Session 2 begins a transaction, inserts a row (id=17) and commits.
Session1 rolls back.
Whether 16 will or will not exist in the table is decided after 17 is committed.
And you can't renumber these in a trigger, you'll get deadlocked.
What you probably need to do is to query the data adding a row number that is a sequential integer.
Gaps in identity values isn't a problem
well, i have recently faced the same problem: i need the ID values in an external C# application in order to retrieve files named exactly as the ID.
==> here is what i did to avoid the identity property, i entered id values manually because it was a small table, but if it is not in your case, use a SEQUENCE SQL Server 2014.
Use the statement UPDATE instead of delete to keep the id values in order.
How can I get last autoincrement value of specific table right after I open database? It's not last_insert_rowid() because there is no insertion transaction. In other words I want to know in advance which number autoincrement will choose when inserting new row for particular table.
It depends on how the autoincremented column has been defined.
If the column definition is INTEGER PRIMARY KEY AUTOINCREMENT, then SQLite will keep the largest ID in an internal table called sqlite_sequence.
If the column definition does NOT contain the keyword AUTOINCREMENT, SQLite will use its ‘regular’ routine to determine the new ID. From the documentation:
The usual algorithm is to give the newly created row a ROWID that is
one larger than the largest ROWID in the table prior to the insert. If
the table is initially empty, then a ROWID of 1 is used. If the
largest ROWID is equal to the largest possible integer
(9223372036854775807) then the database engine starts picking positive
candidate ROWIDs at random until it finds one that is not previously
used. If no unused ROWID can be found after a reasonable number of
attempts, the insert operation fails with an SQLITE_FULL error. If no
negative ROWID values are inserted explicitly, then automatically
generated ROWID values will always be greater than zero.
I remember reading that, for columns without AUTOINCREMENT, the only surefire way to determine the next ID is to VACUUM the database first; that will reset all ID counters to the largest existing ID for that table + 1. But I can’t find that quote anymore, so this may no longer be true.
That said, I agree with slash_rick_dot that fetching auto-incremented IDs beforehand is a bad idea, especially if there’s even a remote chance that another process might write to the database at the same time.
Different databases implement auto-increment differently. But as far as I know, none of them will answer the question you are asking.
The auto increment feature is intended to create a unique ID for a newly added table row. If a row hasn't been inserted yet, then the feature hasn't produced the id.
And it makes sense... If you did get the next auto increment number, what would you do with it? Likely the intent is to assign it as the primary key of the not-yet-inserted new row. But between the time you got the id, and the time you used it to insert the row, the database could have used that id to insert a row for another process.
Your choices are this: manage the creation of ids yourself, or wait until rows are inserted before using their auto-created identifiers.
I have a simple table with a primary key. Most of the read operations fetch one row by the exact value of the key.
The data in each row maintains some relationship with rows before and after it in the key order. So when I insert a new row I need to read the 2 rows between which it is going to enter, make some computation and then to insert.
The concern, clearly, is that at the same time another connection may add a row with a key value in the same interval. I am covered if it is exactly the same value of the key as the second insert would fail, but if the key value is different but in the same interval the relationship may be broken.
The solution seems to be to lock the whole table for writing when I decide to add a new row, or (if possible, which I doubt) to lock an interval of key values. Yet I'd prefer that read-only transactions would not be blocked at that time.
I am using ODBC with libodbc++ wrapper for C++ in the client program and IBM DB2 free edition (although the DB choice may still change). This is what I thought of doing:
start the connection in the auto-commit and default isolation mode
when need to add a new row, set auto-commit to false and isolation mode to serialized
read the rows before and after the new key value
compute and insert the new row
commit
return back to the auto-commit and default isolation mode
Will this do the job? Will other transactions be allowed to read at the same time? Are there other/better ways to do it?
BTW, I don't see in the libodbc++ i/f a way to specify a read-only transaction. Is it possible in odbc?
EDIT: thanks for the very useful answers, I had trouble selecting one.
If your database is in SERIALIZABLE mode, you won't have any issues at all. Given a key K, to get the previous and next keys you have to run the following queries:
select key from keys where key > K order by key limit 1; # M?
select key from keys where key < K order by key desc limit 1; # I?
The above works in MySQL. This equivalent query works in DB2 (from the comments):
select key from keys where key = (select min(key) from keys where key > K);
select key from keys where key = (select max(key) from keys where key < K);
The first query sets up a range lock that prevents other transactions from inserting a key greater than K and less than or equal to M.
The second query sets up a range lock that prevents other transactions from inserting a key less than K and greater than or equal to I.
The unique index on the primary key prevents K from being inserted twice. So you're completely covered.
This is what transactions are about; so you can write your code as if the entire database is locked.
Note: This requires a database that supports true serializability. Fortunately, DB2 does. Other DBMS's that support true serializability: SQLServer, and MySQL/InnoDB. DBMS's that don't: Oracle, PostgreSQL!
If your database and storage engine allow that, you should issue SELECT FOR UPDATE for both rows you are trying to insert between.
This will conflict with any concurrent SELECT FOR UPDATE.
The downside is that a lock of rows 10 and 12 (to insert 11) will also prevent selecting 8 and 10 (to insert 9).
InnoDB in MySQL can also place a next-key lock on the index, that is lock of the index record and the gap between the next record.
In this case, you would only need to issue a SELECT FOR UPDATE on the first row and thus insert concurrently a row before that.
However, this requires forcing the index and providing a range condition on the index which may or may not be possible depending on your query.
Your general approach is correct. But you should use a SELECT statement that covers the two rows and all the possible rows in between. For example:
SELECT * FROM MYTABLE WHERE PKCOL BETWEEN 6 AND 10
In database systems with pessimistic locking and transaction isolation level serializable, this SELECT statement should prevent new rows to be inserted that would change the result of the SELECT.