I have a sequence and I want to atomically retrieve several sequential values from it.
As far as I know, sequences do not support transactions.
Standard NEXT VALUE FOR syntax can not help, while in other databases it can be done like that.
I found sp_sequence_get_range function, but I'm not sure if it is atomic. Testing such cases is quite tricky.
Is it possible?
In short, yes, sequences have the property you're looking for.
Now a little bit as to why. At their base, a sequence is just a row in a system table that has (among other things) a column that represents the current value in the sequence. Normally, when you call NEXT VALUE FOR, SQL Server fetches that current value, increments that value by 1, and finally returns the fetched value. When you use the sp_sequence_get_range, it does the same thing except it increments the value by the size of the range you asked for (e.g. if you called it with #range_size = 10, it increments the value by 10).
One thing that I was curious about was whether or not a rollback of a transaction that contained the sp_sequence_get_range procedure call also rolled back the allocation of that range. I'm happy to report that it does not. So, you needn't worry about two calls to that procedure getting the same value.
Lastly, I caution you against the fallacy of a gapless sequence. If you get a range and that thread dies for whatever reason (e.g. your application terminates, the SQL Server shuts down expectedly or unexpectedly), that sequence is lost forever. That is, there's no putting an "unused" range back in the queue so that the next call for a range gets that range.
Related
I must / have to create unique ID for invoices. I have a table id and another column for this unique number. I use serialization isolation level. Using
var seq = #"SELECT invoice_serial + 1 FROM invoice WHERE ""type""=#type ORDER BY invoice_serial DESC LIMIT 1";
Doesn't help because even using FOR UPDATE it wont read correct value as in serialization level.
Only solution seems to put some retry code.
Sequences do not generate gap-free sets of numbers, and there's really no way of making them do that because a rollback or error will "use" the sequence number.
I wrote up an article on this a while ago. It's directed at Oracle but is really about the fundamental principles of gap-free numbers, and I think the same applies here.
Well, it’s happened again. Someone has asked how to implement a requirement to generate a gap-free series of numbers and a swarm of nay-sayers have descended on them to say (and here I paraphrase slightly) that this will kill system performance, that’s it’s rarely a valid requirement, that whoever wrote the requirement is an idiot blah blah blah.
As I point out on the thread, it is sometimes a genuine legal requirement to generate gap-free series of numbers. Invoice numbers for the 2,000,000+ organisations in the UK that are VAT (sales tax) registered have such a requirement, and the reason for this is rather obvious: that it makes it more difficult to hide the generation of revenue from tax authorities. I’ve seen comments that it is a requirement in Spain and Portugal, and I’d not be surprised if it was not a requirement in many other countries.
So, if we accept that it is a valid requirement, under what circumstances are gap-free series* of numbers a problem? Group-think would often have you believe that it always is, but in fact it is only a potential problem under very particular circumstances.
The series of numbers must have no gaps.
Multiple processes create the entities to which the number is associated (eg. invoices).
The numbers must be generated at the time that the entity is created.
If all of these requirements must be met then you have a point of serialisation in your application, and we’ll discuss that in a moment.
First let’s talk about methods of implementing a series-of-numbers requirement if you can drop any one of those requirements.
If your series of numbers can have gaps (and you have multiple processes requiring instant generation of the number) then use an Oracle Sequence object. They are very high performance and the situations in which gaps can be expected have been very well discussed. It is not too challenging to minimise the amount of numbers skipped by making design efforts to minimise the chance of a process failure between generation of the number and commiting the transaction, if that is important.
If you do not have multiple processes creating the entities (and you need a gap-free series of numbers that must be instantly generated), as might be the case with the batch generation of invoices, then you already have a point of serialisation. That in itself may not be a problem, and may be an efficient way of performing the required operation. Generating the gap-free numbers is rather trivial in this case. You can read the current maximum value and apply an incrementing value to every entity with a number of techniques. For example if you are inserting a new batch of invoices into your invoice table from a temporary working table you might:
insert into
invoices
(
invoice#,
...)
with curr as (
select Coalesce(Max(invoice#)) max_invoice#
from invoices)
select
curr.max_invoice#+rownum,
...
from
tmp_invoice
...
Of course you would protect your process so that only one instance can run at a time (probably with DBMS_Lock if you're using Oracle), and protect the invoice# with a unique key contrainst, and probably check for missing values with separate code if you really, really care.
If you do not need instant generation of the numbers (but you need them gap-free and multiple processes generate the entities) then you can allow the entities to be generated and the transaction commited, and then leave generation of the number to a single batch job. An update on the entity table, or an insert into a separate table.
So if we need the trifecta of instant generation of a gap-free series of numbers by multiple processes? All we can do is to try to minimise the period of serialisation in the process, and I offer the following advice, and welcome any additional advice (or counter-advice of course).
Store your current values in a dedicated table. DO NOT use a sequence.
Ensure that all processes use the same code to generate new numbers by encapsulating it in a function or procedure.
Serialise access to the number generator with DBMS_Lock, making sure that each series has it’s own dedicated lock.
Hold the lock in the series generator until your entity creation transaction is complete by releasing the lock on commit
Delay the generation of the number until the last possible moment.
Consider the impact of an unexpected error after generating the number and before the commit is completed — will the application rollback gracefully and release the lock, or will it hold the lock on the series generator until the session disconnects later? Whatever method is used, if the transaction fails then the series number(s) must be “returned to the pool”.
Can you encapsulate the whole thing in a trigger on the entity’s table? Can you encapsulate it in a table or other API call that inserts the row and commits the insert automatically?
Original article
You could create a sequence with no cache , then get the next value from the sequence and use that as your counter.
CREATE SEQUENCE invoice_serial_seq START 101 CACHE 1;
SELECT nextval('invoice_serial_seq');
More info here
You either lock the table to inserts, and/or need to have retry code. There's no other option available. If you stop to think about what can happen with:
parallel processes rolling back
locks timing out
you'll see why.
In 2006, someone posted a gapless-sequence solution to the PostgreSQL mailing list: http://www.postgresql.org/message-id/44E376F6.7010802#seaworthysys.com
I wan to know how sequence for a table works internally in oracle database. Specially how they increment value of sequence.
Are they use trigger for incrementing value of sequence or anything else???
Oracle does not handle sequences as other objects, like tables.
If you insert 100 records using the NEXTVAL and issue a ROLLBACK,
this sequence does not get rolled back.
Instead, 100 records will have incremented the sequence.
The next insert will have the 101-st value of the sequence.
This will lead to "spaces" in the sequence. That allows for multiple people to
safely use sequences without the risk of duplicates.
If two users simultaneously grab the NEXTVAL, they will be
assigned unique numbers. Oracle caches sequences in memory.
The init.ora parameter SEQUENCE_CACHE_ENTRIES defines the cache size.
I have an SSIS package that appears to call one task three times that the designer does not indicate. A recent change in the stored procedure now logs when the task is called. There are always three entries when the execution path does not pass through it or four when it does. It contains no loops and three calls are almost always within a tenth of a second or less while when the fourth occurs it tends to be a wider time gap.
I've tried poking around in it as raw XML, but nothing immediately obvious jumps out. I see the number of references to its key that seem to match the number of incoming and outgoing constraints, only one reference to the called proc and offhand no extra references to the object the proc itself is referenced in.
Is it possible to find all predecessors to/callers of a task? Is there a better way to identify just what is calling the task?
Is there any performance impact or any kind of issues?
The reason I am doing this is that we are doing some synchronization between two set of DBs with similar tables and we want to avoid duplicate PK errors when synchronizing data.
Yes, it's okay.
Note: If you have perfomance concerns you could use the "CACHE" option on "CREATE SEQUENCE":
"Specify how many values of the sequence the database preallocates and keeps in memory for faster access. This integer value can have 28 or fewer digits. The minimum value for this parameter is 2. For sequences that cycle, this value must be less than the number of values in the cycle. You cannot cache more values than will fit in a given cycle of sequence numbers. Therefore, the maximum value allowed for CACHE must be less than the value determined by the following formula:"
(CEIL (MAXVALUE - MINVALUE)) / ABS (INCREMENT)
"If a system failure occurs, all cached sequence values that have not been used in committed DML statements are lost. The potential number of lost values is equal to the value of the CACHE parameter."
Sure. What you plan on doing is actually a rather common practice. Just make sure the variables in your client code which you use to hold IDs are big enough (i.e., use longs instead of ints)
The only problem we recently had with creating tables with really large seeds was when we tried to interface with a system we did not control. That system was apparently reading our IDs as a char(6) field, so when we sent row 10000000 it would fail to write.
Performance-wise we have seen no issues on our side with using large ID numbers.
No performance impact that we've seen. I routinely bump sequences up by a large amount. The gaps come in handy if you need to "backfill" data into the table.
The only time we had a problem was when a really large sequence exceeded MAXINT on a particular client program. The sequence was fine, but the conversion to an integer in the client app started failing! In our case it was easy to refactor the ID column in the table and get things running again, but in retrospect this could have been a messy situation if the tables had been arranged differently!
If you are synching two tables why not change the PK seed/increment amount so that everything takes care of itself when a new PK is added?
Let's say you had to synch the data from 10 patient tables in 10 different databases.
Let's also say that eventually all databases had to be synched into a Patient table at headquarters.
Increment the PK by ten for each row but ensure the last digit was different for each database.
DB0 10,20,30..
DB1 11,21,31..
.....
DB9 19,29,39..
When everything is merged there is guaranteed to be no conflicts.
This is easily scaled to n database tables. Just make sure your PK key type will not overflow. I think BigInt could be big enough for you...
I'm trying to find if there is a reliable way (using SQLite) to find the ID of the next row to be inserted, before it gets inserted. I need to use the id for another insert statement, but don't have the option of instantly inserting and getting the next row.
Is predicting the next id as simple as getting the last id and adding one? Is that a guarantee?
Edit: A little more reasoning...
I can't insert immediately because the insert may end up being canceled by the user. User will make some changes, SQL statements will be stored, and from there the user can either save (inserting all the rows at once), or cancel (not changing anything). In the case of a program crash, the desired functionality is that nothing gets changed.
Try SELECT * FROM SQLITE_SEQUENCE WHERE name='TABLE';. This will contain a field called seq which is the largest number for the selected table. Add 1 to this value to get the next ID.
Also see the SQLite Autoincrement article, which is where the above info came from.
Cheers!
Either scrapping or committing a series of database operations all at once is exactly what transactions are for. Query BEGIN; before the user starts fiddling and COMMIT; once he/she's done. You're guaranteed that either all the changes are applied (if you commit) or everything is scrapped (if you query ROLLBACK;, if the program crashes, power goes out, etc). Once you read from the db, you're also guaranteed that the data is good until the end of the transaction, so you can grab MAX(id) or whatever you want without worrying about race conditions.
http://www.sqlite.org/lang_transaction.html
You can probably get away with adding 1 to the value returned by sqlite3_last_insert_rowid under certain conditions, for example, using the same database connection and there are no other concurrent writers. Of course, you may refer to the sqlite source code to back up these assumptions.
However, you might also seriously consider using a different approach that doesn't require predicting the next ID. Even if you get it right for the version of sqlite you're using, things could change in the future and it will certainly make moving to a different database more difficult.
Insert the row with an INVALID flag of some kind, Get the ID, edit it, as needed, delete if necessary or mark as valid. That and don't worry about gaps in the sequence
BTW, you will need to figure out how to do the invalid part yourself. Marking something as NULL might work depending on the specifics.
Edit: If you can, use Eevee's suggestion of using proper transactions. It's a lot less work.
I realize your application using SQLite is small and SQLite has its own semantics. Other solutions posted here may well have the effect that you want in this specific setting, but in my view every single one of them I have read so far is fundamentally incorrect and should be avoided.
In a normal environment holding a transaction for user input should be avoided at all costs. The way to handle this, if you need to store intermediate data, is to write the information to a scratch table for this purpose and then attempt to write all of the information in an atomic transaction. Holding transactions invites deadlocks and concurrency nightmares in a multi-user environment.
In most environments you cannot assume data retrieved via SELECT within a transaction is repeatable. For example
SELECT Balance FROM Bank ...
UPDATE Bank SET Balance = valuefromselect + 1.00 WHERE ...
Subsequent to UPDATE the value of balance may well be changed. Sometimes you can get around this by updating the row(s) your interested in Bank first within a transaction as this is guaranteed to lock the row preventing further updates from changing its value until your transaction has completed.
However, sometimes a better way to ensure consistency in this case is to check your assumptions about the contents of the data in the WHERE clause of the update and check row count in the application. In the example above when you "UPDATE Bank" the WHERE clause should provide the expected current value of balance:
WHERE Balance = valuefromselect
If the expected balance no longer matches neither does the WHERE condition -- UPDATE does nothing and rowcount returns 0. This tells you there was a concurrency issue and you need to rerun the operation again when something else isn't trying to change your data at the same time.
select max(id) from particular_table is unreliable for the reason below..
http://www.sqlite.org/autoinc.html
"The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID. If you ever delete rows or if you ever create a row with the maximum possible ROWID, then ROWIDs from previously deleted rows might be reused when creating new rows and newly created ROWIDs might not be in strictly ascending order."
I think this can't be done because there is no way to be sure that nothing will get inserted between you asking and you inserting. (you might be able to lock the table to inserts but Yuck)
BTW I've only used MySQL but I don't think that will make any difference)
Most likely you should be able to +1 the most recent id. I would look at all (going back a while) of the existing id's in the ordered table .. Are they consistent and is each row's ID is one more than the last? If so, you'll probably be fine. I'd leave a comments in the code explaining the assumption however. Doing a Lock will help guarantee that you're not getting additional rows while you do this as well.
Select the last_insert_rowid() value.
Most of everything that needs to be said in this topic already has... However, be very careful of race conditions when doing this. If two people both open your application/webpage/whatever, and one of them adds a row, the other user will try to insert a row with the same ID and you will have lots of issues.
select max(id) from particular_table;
The next id will be +1 from the maximum id.