I'm trying to find if there is a reliable way (using SQLite) to find the ID of the next row to be inserted, before it gets inserted. I need to use the id for another insert statement, but don't have the option of instantly inserting and getting the next row.
Is predicting the next id as simple as getting the last id and adding one? Is that a guarantee?
Edit: A little more reasoning...
I can't insert immediately because the insert may end up being canceled by the user. User will make some changes, SQL statements will be stored, and from there the user can either save (inserting all the rows at once), or cancel (not changing anything). In the case of a program crash, the desired functionality is that nothing gets changed.
Try SELECT * FROM SQLITE_SEQUENCE WHERE name='TABLE';. This will contain a field called seq which is the largest number for the selected table. Add 1 to this value to get the next ID.
Also see the SQLite Autoincrement article, which is where the above info came from.
Cheers!
Either scrapping or committing a series of database operations all at once is exactly what transactions are for. Query BEGIN; before the user starts fiddling and COMMIT; once he/she's done. You're guaranteed that either all the changes are applied (if you commit) or everything is scrapped (if you query ROLLBACK;, if the program crashes, power goes out, etc). Once you read from the db, you're also guaranteed that the data is good until the end of the transaction, so you can grab MAX(id) or whatever you want without worrying about race conditions.
http://www.sqlite.org/lang_transaction.html
You can probably get away with adding 1 to the value returned by sqlite3_last_insert_rowid under certain conditions, for example, using the same database connection and there are no other concurrent writers. Of course, you may refer to the sqlite source code to back up these assumptions.
However, you might also seriously consider using a different approach that doesn't require predicting the next ID. Even if you get it right for the version of sqlite you're using, things could change in the future and it will certainly make moving to a different database more difficult.
Insert the row with an INVALID flag of some kind, Get the ID, edit it, as needed, delete if necessary or mark as valid. That and don't worry about gaps in the sequence
BTW, you will need to figure out how to do the invalid part yourself. Marking something as NULL might work depending on the specifics.
Edit: If you can, use Eevee's suggestion of using proper transactions. It's a lot less work.
I realize your application using SQLite is small and SQLite has its own semantics. Other solutions posted here may well have the effect that you want in this specific setting, but in my view every single one of them I have read so far is fundamentally incorrect and should be avoided.
In a normal environment holding a transaction for user input should be avoided at all costs. The way to handle this, if you need to store intermediate data, is to write the information to a scratch table for this purpose and then attempt to write all of the information in an atomic transaction. Holding transactions invites deadlocks and concurrency nightmares in a multi-user environment.
In most environments you cannot assume data retrieved via SELECT within a transaction is repeatable. For example
SELECT Balance FROM Bank ...
UPDATE Bank SET Balance = valuefromselect + 1.00 WHERE ...
Subsequent to UPDATE the value of balance may well be changed. Sometimes you can get around this by updating the row(s) your interested in Bank first within a transaction as this is guaranteed to lock the row preventing further updates from changing its value until your transaction has completed.
However, sometimes a better way to ensure consistency in this case is to check your assumptions about the contents of the data in the WHERE clause of the update and check row count in the application. In the example above when you "UPDATE Bank" the WHERE clause should provide the expected current value of balance:
WHERE Balance = valuefromselect
If the expected balance no longer matches neither does the WHERE condition -- UPDATE does nothing and rowcount returns 0. This tells you there was a concurrency issue and you need to rerun the operation again when something else isn't trying to change your data at the same time.
select max(id) from particular_table is unreliable for the reason below..
http://www.sqlite.org/autoinc.html
"The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID. If you ever delete rows or if you ever create a row with the maximum possible ROWID, then ROWIDs from previously deleted rows might be reused when creating new rows and newly created ROWIDs might not be in strictly ascending order."
I think this can't be done because there is no way to be sure that nothing will get inserted between you asking and you inserting. (you might be able to lock the table to inserts but Yuck)
BTW I've only used MySQL but I don't think that will make any difference)
Most likely you should be able to +1 the most recent id. I would look at all (going back a while) of the existing id's in the ordered table .. Are they consistent and is each row's ID is one more than the last? If so, you'll probably be fine. I'd leave a comments in the code explaining the assumption however. Doing a Lock will help guarantee that you're not getting additional rows while you do this as well.
Select the last_insert_rowid() value.
Most of everything that needs to be said in this topic already has... However, be very careful of race conditions when doing this. If two people both open your application/webpage/whatever, and one of them adds a row, the other user will try to insert a row with the same ID and you will have lots of issues.
select max(id) from particular_table;
The next id will be +1 from the maximum id.
Related
I have a legacy database and I was able to use ALL_TAB_MODIFICATIONS to easily find candidates for deletion - tables, that are no longer used by the application using that database, and then continuing with more detailed review of these tables and possibly deleting them
(I know it isn't guaranteed, hence I call them just "candidates for deletion").
My question is, is there anything similar I could use to find columns which aren't used?
That means that all newly inserted data has NULL, and it is never UPDATEd to get a not null value.
If there isn't a similar view provided by Oracle, what other approach would you recommend to find them?
This isn't needed because of storage reasons, but because the database is open also for reporting purposes and we have had cases where reports have been created based on old columns and thus providing wrong results.
Why not invest a little and get an exact result?
The idea is to store the contant of the tables say at the begin of the month and repeat this at the end of teh month.
From the difference you can see with table columns were changed by updates or with inserts. You'd probably tolerate changes caused by deletes.
You'll only need twice the space of your DB and invest a bit in the reporting SQLs.
Please note also that a drop of a column - even if not actively used - may invalidate your application in case wenn the column is referenced or select * from is used.
I am trying to create a cache for a table in Oracle DB. I monitor the changes in the DB using DBMS_CHANGE_NOTIFICATION to automatically update the cache.
This is however only working in a satisfactory manner as long as the updates I do are rather small -- if I delete large portion of rows, the ALL_ROWS flag of the notification structure is set to true and the array of ROWIDs is NULL.
By trial and error I found out that the threshold for number of updated rows is about 100 rows which is really too little. If a table contains several million rows and I delete a thousand I do not get information on what was updated and I have to refresh the cache for the whole table which is unacceptable.
Can I somehow change this threshold? I could not find a specific answer in documentation:
If the ALL_ROWS (0x1) bit is set it means that either the entire table
is modified (for example, DELETE * FROM t) or row level granularity of
information is not requested or not available in the notification and
the receiver has to conservatively assume that the entire table has
been invalidated.
This only gives me vague information.
From the docs I found this:
If the ALL_ROWS bit is set in the table operation flag, then it means
that all rows within the table may have been potentially modified. In
addition to operations like TRUNCATE that affect all rows in the
tables, this bit may also be set if individual rowids have been rolled
up into a FULL table invalidation.
This can occur if too many rows were modified on a given table in a
single transaction (more than 80) or the total shared memory
consumption due to rowids on the RDBMS is determined too large
(exceeds 1 % of the dynamic shared pool size). In this case, the
recipient must conservatively assume that the entire table has been
invalidated and the callback/application must be able to handle this
condition.
I rolled by own solution years ago, which gives me control/flexibility, but perhaps someone has a workaround for you (commit in small chunks of 50? but what if your app isn't the only one changing the table?). I think the whole point is to only cache tables that are slowly changing, but this restriction does seem silly to me.
Currently there is a procedure where you can specify the value:
SET_ROWID_THRESHOLD
It would be nice if I could look up what the current value is with a getter, I haven't found it.
I have a website that has a very popular forum on it and occasionally throughout the day I see several deadlocks happening between two identical (minus the data within them) update statements on the same forum. I'm not exactly sure why this is happening on this query as there are many other queries on the site that run with high concurrency without issue.
Full Image
The query between the two processes is nearly identical, the graph shows it as:
update [Forum] set [DateModified] = #DateModified, [LatestLocalThreadID] = #LatestLocalThreadID where ID = 310
Can anyone shed any light on what could be causing this?
This is because there is a foreign key to ForumThreads that generates an S-lock when you set LatestLocalThreadID (to make sure that the row still exist when the statement completes). A possible fix would be to prefix the update statement with
SELECT *
FROM ForumThreads WITH (XLOCK, ROWLOCK, HOLDLOCK)
WHERE ID = #LatestLocalThreadID
in order to X-lock on that. You can also try UPDLOCK as a less aggressive mode. This can of course cause deadlocks in other places, but it is the best first try.
Basically deadlocks are prevented by accessing the objects (tables, pages, rows) always In the same order. In your example there's one process accessing forum first and forumThread second and another thread doing it vice versa. An update usually searches first for the rows to update and uses S-locks during the search. The rows it has identified to be changed, are locked by X-locks and then the actual change happens.
The quick and dirty solutions might be to do a begin Tran then lock the objects in the order you need and do the update followed by the commit that will release the locks again. But this will bring down the overall thruput of your website because of blocking locks.
The better way is to identify the two statements (you might edit your question and give us the other one when you found it) and the execution plan of them. It should be possible to rewrite the transactions somehow to access all objects in the same order - and prevent the deadlock.
Lets say we generate our order numbers in SQL.Normally i get the next number with a
SELECT COUNT(numbers)+1 FROM X
etc.
The problem is, I want to give this number to the user first,then wait for the user to input the contents, then do the insert to the table.But since there are multiple users i also want them to get the number but not the same number as the first user,is there a way to do this more elegantly?
Shortly i want the number to be reserved to the specific user and insert it if he does,if not, just release the number.
Create a table of Numbers. Pre-populate the table with values and use this table as a queue. A transaction can reserve a Number by dequeing a row. On rollback, the number will become again available. Other transactions can concurrently dequeue other numbers due to the readpast semantics of using-a-table-as-queue. Add more numbers (insert more rows) as needed.
If this seems overkill, rest assured: it is not. Naive solutions may not account for concurrency or rollbacks, which are not trivial to solve.
You could insert into the table, and then update it when the user commits their data.
To get around your "releasing" of the numbers you could:
Have a flag on the table to say if the row is "free" or not. When you first insert the flag is "not free". If the user commits their data, keep it as "not free".
If they release their number, mark it as "free".
When assigning a number to a user find the first "free" row, if they aren't any insert a new one.
If you need not to have all numbers sequentially without gaps in your system, you can make simple table containing a column of Identity type. So, insert fake record into it and use ##IDENTITY as generated number. This solution of course have some drawbacks as Remus Rusanu mentioned for 'naive solutions'.
If possible, avoid display this number for user before really store it into database. You can generate some abstract number for temporary reference, e.g. from date and time, and after inserting data into database you can display your real number. Almost nothing to code but 100% reliable.
I have a table with more than a millon rows. This table is used to index tiff images. Each image has fields like date, number, etc. I have users that index these images in batches of 500. I need to know if it is better to first insert 500 rows and then perform 500 updates or, when the user finishes indexing, to do the 500 inserts with all the data. A very important thing is that if I do the 500 inserts at first, this time is free for me because I can do it the night before.
So the question is: is it better to do inserts or inserts and updates, and why? I have defined a id value for each image, and I also have other indices on the fields.
Updates in Sql server result in ghosted rows - i.e. Sql crosses one row out and puts a new one in. The crossed out row is deleted later.
Both inserts and updates can cause page-splits in this way, they both effectively 'add' data, it's just that updates flag the old stuff out first.
On top of this updates need to look up the row first, which for lots of data can take longer than the update.
Inserts will just about always be quicker, especially if they are either in order or if the underlying table doesn't have a clustered index.
When inserting larger amounts of data into a table look at the current indexes - they can take a while to change and build. Adding values in the middle of an index is always slower.
You can think of it like appending to an address book: Mr Z can just be added to the last page, while you'll have to find space in the middle for Mr M.
Doing the inserts first and then the updates does seem to be a better idea for several reasons. You will be inserting at a time of low transaction volume. Since inserts have more data, this is a better time to do it.
Since you are using an id value (which is presumably indexed) for updates, the overhead of updates will be very low. You would also have less data during your updates.
You could also turn off transactions at the batch (500 inserts/updates) level and use it for each individual record, thus reducing some overhead.
Finally, test this out to see the actual performance on your server before making a final decision.
This isn't a cut and dry question. Krishna's and Galegian's points are spot on.
For updates, the impact will be lessened if the updates are affecting fixed-length fields. If updating varchar or blob fields, you may add a cost of page splits during update when the new value surpasses the length of the old value.
I think inserts will run faster. They do not require a lookup (when you do an update you are basically doing the equivalent of a select with the where clause). And also, an insert won't lock the rows the way an update will, so it won't interfere with any selects that are happening against the table at the same time.
The execution plan for each query will tell you which one should be more expensive. The real limiting factor will be the writes to disk, so you may need to run some tests while running perfmon to see which query causes more writes and causes the disk queue to get the longest (longer is bad).
I'm not a database guy, but I imagine doing the inserts in one shot would be faster because the updates require a lookup whereas the inserts do not.