Does SQLite have anything like SQL Server's rowversion column that will increment every time a row changes? Essentially, I want to have a logical clock for each of my tables that updates whenever a table updates. With this logical clock, my application can hold the version it most recently saw, and can only re-fetch if data has changed.
I could implement this with something like:
CREATE TRIGGER FOO_VERSION_INSERT_TRIGGER
AFTER INSERT ON FOO FOR EACH ROW
BEGIN
UPDATE CLOCKS
SET VERSION = (
SELECT IFNULL(MAX(VERSION), 0) + 1 FROM CLOCKS
)
WHERE TABLE_NAME = "FOO"
END
CREATE TRIGGER FOO_VERSION_UPDATE_TRIGGER
AFTER UPDATE ON FOO FOR EACH ROW
BEGIN
UPDATE CLOCKS
SET VERSION = (
SELECT IFNULL(MAX(VERSION), 0) + 1 FROM CLOCKS
)
WHERE TABLE_NAME = "FOO"
END
CREATE TRIGGER FOO_VERSION_DELETE_TRIGGER
AFTER INSERT ON FOO FOR EACH ROW
BEGIN
UPDATE CLOCKS
SET VERSION = (
SELECT IFNULL(MAX(VERSION), 0) + 1 FROM CLOCKS
)
WHERE TABLE_NAME = "FOO"
END
But this seems like something that should natively exist already.
You can use PRAGMA data_version to see if any other connection to the database modified it and your program thus needs to refresh data.
The integer values returned by two invocations of "PRAGMA data_version" from the same connection will be different if changes were committed to the database by any other connection in the interim.
It won't tell you which specific tables were changed, though.
Related
In my web application, I store a counter value in the database table, which I need to increment or reset at each transaction (which are highly concurrent). Do I need to explicitly lock the row to avoid lost updates? Read committed transaction isolation level is being used at the connection level. The following statement updates the counter
UPDATE Counter c SET value =
CASE
WHEN c.last_updated = SYSDATE THEN c.value+1
ELSE 1
END,
last_updated = SYSDATE
WHERE c.counter_id = 123;
The statement is atomic and read committed isolation level implicitly locks the rows for update statements, as far as I know. Does this render the use of explicit locking redundant in this case?
You're talking optimistic locking vs pessimistic locking ("explicit lock").
If you go with pessimistic locking, you're guaranteed to have no lost updates. However, the approach comes at a price:
It doesn't scale well - you're essentially serializing access to the row being updated, and if the client running the first transaction hangs for some reason - everyone is stuck.
Given the nature of the usually multi-tier web apps, it may be difficult (or impossible) to implement, as the explicit lock needs to be run in the same database connection as the update itself, which your middle tier may or may not guarantee.
So you can go with optimistic locking instead. Assume the following table:
create table t (key int, value int, version int);
insert into t (1, 1, 1);
Basically, the logic would be like this (PL/SQL code):
declare
current_version t.version%type;
current_value t.value%type;
new_value t.value%type;
begin
-- read current version of a row
select version, value
into current_version, current_value
from t where id = 1;
-- calculate new value; while we're doing this,
-- someone else may update the row, changing its version
new_value = func_calculate_new_value(current_value);
-- update the row...
update t
set
value = new_value,
version = version + 1
where 1 = 1
and id = 1
-- but ONLY if the version of the row is the one we read
-- otherwise there would be a lost update
and version = current_version
;
if sql%rowcount = 0 then
-- 0 updated rows means the version is different
-- we're not updating because we don't want lost updates
-- and we throw to let the caller know
raise_application_error(-20000, 'Row version has changed');
rollback;
end if;
end;
I have a table Conservation_Dev with these columns (amongst 30 other):
I3_IDENTITY - a BIGINT and a unique key
STATE - a 2-letter US state abbreviation
ZONE - when I want to store the time zone for this record
I also have a table TimeZoneCodes that maps US states to time zones (forget the fact that some states are in more than one time zone):
state_code - the 2-letter abbreviation for the state
time_zone - the text with the time zone (EST, CST, etc)
I have data being loaded into Conservation_Dev without the time zone data and that is something that I can't control. I want to create an after insert trigger that updates the record. Doing some research on previous threads I came up with the following:
CREATE TRIGGER [dbo].[PopulateTimeZoneBasedOnUSState]
ON [dbo].[Conservation_Dev]
AFTER INSERT
AS
BEGIN
UPDATE [dbo].[Conservation_Dev]
SET [ZONE] = (SELECT [time_zone]
FROM [dbo].[TimeZoneCodes] Z
WHERE Z.[state_code] = i.[STATE])
FROM Inserted i
WHERE [I3_IDENTITY] = i.[I3_IDENTITY]
END
I get an error, though:
Ambiguous column name 'I3_IDENTITY'
Also, is this the right way to do this? Will this be a problem if the data load is, say, 5 or 10 thousand records at a time through an SSIS import package?
Try:
CREATE TRIGGER [dbo].[PopulateTimeZoneBasedOnUSState]
ON [dbo].[Conservation_Dev]
AFTER INSERT
AS
BEGIN
UPDATE A
SET A.[ZONE] = Z.[time_zone]
FROM [dbo].[Conservation_Dev] as A
INNER JOIN Inserted as i
ON A.[I3_IDENTITY] = i.[I3_IDENTITY]
INNER JOIN [dbo].[TimeZoneCodes] as Z
ON Z.[state_code] = i.[STATE]
END
My vote would be to move this update into the SSIS package so that the insert of data is all in one place. I may then move to a stored procedure that runs after the data is loaded. A trigger is going to happen on every insert. I think table based queries/updates would have a better performance. Triggers can be hard to find from a troubleshooting standpoint, but if they are well documented then it may not be much of an issue. The trigger will allow the Zone to be populated once a record is inserted.
The problem with the trigger you are creating is because SQL Server doesn't know which I3_IDENTITY you are referring to in the first part of the WHERE clause. This should fix it:
CREATE TRIGGER [dbo].[PopulateTimeZoneBasedOnUSState]
ON [dbo].[Conservation_Dev]
AFTER INSERT
AS
BEGIN
UPDATE [dbo].[Conservation_Dev]
SET [ZONE] = (SELECT TOP 1 [time_zone] FROM [dbo].[TimeZoneCodes] Z WHERE Z.[state_code] = i.[STATE])
FROM Inserted i
WHERE [dbo].[Conservation_Dev].[I3_IDENTITY] = i.[I3_IDENTITY]
END
This would be a table level update that will update all time_zones in one sweep. I would use something like this in a stored procedure after the initial inserts were complete.
CREATE PROCEDURE dbo.UpdateAllZones
AS
Update dbo.Conservation_Dev
SET Zone = Time_Zone
From dbo.Conservation_Dev c INNER JOIN dbo.TimeZoneCodes z ON c.state_code = z.state_code
Go
Executed as
EXEC dbo.UpdateAllZones
My C# program is doing the following (pseudo code):
START TRANSACTION READ COMMITTED
Select isOriginal, * from myTable where tnr = x;
//Record found?
Yes:
//isOriginal?
Yes:
update myTable set is_active = 0 where tnr = x;
No:
delete from myTable where tnr = x;
//Do some simple logic on the values
Insert into myTable (newvalues)
No:
return record_not_found;
END TRANSACTION
However, when I start two instances of my program and both edit the same record at the same time two records are inserted as they both find the record in the select query.
What should happen is that the first transaction finds the record and inserts a new row while the second transaction returns a record not found.
How can I fix this? Put my transaction to serializable? Check the return value of update/delete? What's the best way?
Edit:
It should work on Sybase, Oracle & SQL Server.
without knowing what db you are using you could setup a lock field in the db.
where each concurrent thread has a pid or thread id or at least a unique timestamp
just do a
update myTable set lock = <pid> where pid = null limit 1;
select isOriginal, * from myTable where lock = <pid>
If you are using MS SQL You will need to look at NOLOCK and ROWLOCK
NOLOCK tells SQL Server to ignore any type of locks and read directly from the actual tables. The pro is it has great performance, the con is in this way you are circumventing a locked system. ROWLOCK on the other hand asks SQL Server to use row-level locks. Performance does hinder with rowlock so you need to determine if you need to lock on UPDATES / DELETES
In your case SELECT isOriginal, * FROM myTable WITH (NOLOCK) WHERE tnr=x
Then UPDATE myTable WITH (ROWLOCK) SET is_active=0 WHERE tnr=x
I have a simple table in my SQL Server 2008 DB:
Tasks_Table
-id
-task_complete
-task_active
-column_1
-..
-column_N
The table stores instructions for uncompleted tasks that have to be executed by a service.
I want to be able to scale my system in future. Until now only 1 service on 1 computer read from the table. I have a stored procedure, that selects all uncompleted and inactive tasks. As the service begins to process tasks it updates the task_active flag in all the returned rows.
To enable scaleing of the system I want to enable deployment of the service on more machines. Because I want to prevent a task being returned to more than 1 service I have to update the stored procedure that returns uncompleted and inactive tasks.
I figured that i have to lock the table (only 1 reader at a time - I know I have to use an apropriate ISOLATION LEVEL), and updates the task_active flag in each row of the result set before returning the result set.
So my question is how to modify the SELECT result set iin the stored procedure before returning it?
This is the typical dequeue pattern, is implemented using the OUTPUT clause and and is described in the MSDN, see the Queues paragraph in OUTPUT Clause (Transact-SQL):
UPDATE TOP(1) Tasks_Table WITH (ROWLOCK, READPAST)
SET task_active = 1
OUTPUT INSERTED.id,INSERTED.column_1, ...,INSERTED.column_N
WHERE task_active = 0;
The ROWLOCK,READPAST hint allows for high throughput and high concurency: multiple threads/processed can enqueue new tasks while mutliple threads/process dequeue tasks. There is no order guarantee.
Updated
If you want to order the result you can use a CTE:
WITH cte AS (
SELECT TOP(1) id, task_active, column_1, ..., column_N
FROM Task_Table WITH (ROWLOCK, READPAST)
WHERE task_active = 0
ORDER BY <order by criteria>)
UPDATE cte
SET task_active = 1
OUTPUT INSERTED.id, INSERTED.column_1, ..., INSERTED.column_N;
I discussed this and other enqueue/dequeue techniques on the article Using Tables as Queues.
I have an application that uses incident numbers (amongst other types of numbers). These numbers are stored in a table called "Number_Setup", which contains the current value of the counter.
When the app generates a new incident, it number_setup table and gets the required number counter row (counters can be reset daily, weekly, etc and are stored as int's). It then incremenets the counter and updates the row with the new value.
The application is multiuser (approximately 100 users at any one time, as well as sql jobs that run and grab 100's of incident records and request incident numbers for each). The incident table has some duplicate incident numbers where they should not be duplicate.
A stored proc is used to retrieve the next counter.
SELECT #Counter = counter, #ShareId=share_id, #Id=id
FROM Number_Setup
WHERE LinkTo_ID=#LinkToId
AND Counter_Type='I'
IF isnull(#ShareId,0) > 0
BEGIN
-- use parent counter
SELECT #Counter = counter, #ID=id
FROM Number_Setup
WHERE Id=#ShareID
END
SELECT #NewCounter = #Counter + 1
UPDATE Number_Setup SET Counter = #NewCounter
WHERE id=#Id
I've now surrounded that block with a transaction, but I'm not entirely sure it' will 100% fix the problem, as I think there's still shared locks, so the counter can be read anyway.
Perhaps I can check that the counter hasn't been updated, in the update statement
UPDATE Number_Setup SET Counter = #NewCounter
WHERE Counter = #Counter
IF ##ERROR = 0 AND ##ROWCOUNT > 0
COMMIT TRANSACTION
ELSE
ROLLBACK TRANSACTION
I'm sure this is a common problem with invoice numbers in financial apps etc.
I cannot put the logic in code either and use locking at that level.
I've also locked at HOLDLOCK but I'm not sure of it's application. Should it be put on the two SELECT statements?
How can I ensure no duplicates are created?
The trick is to do the counter update and read in a single atomic operation:
UPDATE Number_Setup SET Counter = Counter+1
OUTPUT INSERTED.Counter
WHERE id=#Id;
This though does not assign the new counter to #NewCounter, but instead returns it as a result set to the client. If you have to assign it, use an intermediate table variable to output the new counter INTO:
declare #NewCounter int;
declare #tabCounter table (NewCounter int);
UPDATE Number_Setup SET Counter = Counter+1
OUTPUT INSERTED.Counter INTO #tabCounter (NewCounter)
WHERE id=#Id
SELECT #NewCounter = NewCounter FROM #tabCounter;
This solves the problem of making the Counter increment atomic. You still have other race conditions in your procedure because the LinkTo_Id and share_id can still be updated after the first select so you can increment the counter of the wrong link-to item, but that cannot be solved just from this code sample as it depends also on the code that actualy updates the shared_id and/or LinkTo_Id.
BTW you should get into the habbit of name your fields with consistent case. If they are named consistently then you must use the exact match case in T-SQL code. Your scripts run fine now just because you have a case insensitive collation server, if you deploy on a case sensitive collation server and your scripts don't match the exact case of the field/tables names errors will follow galore.
have you tried using GUIDs instead of autoincrements as your unique identifier?
If you have the ablity to modify your job that gets mutiple records, I would change the thinking so that your counter is an identity column. Then when you get the next record you can just do an insert and get the ##identity of the table. That would ensure that you get the biggest number. You would also have to do a dbccReseed to reset the counter instead of just updating the table when you want to reset the identity. The only issue is that you'd have to do 100 or so inserts as part of your sql job to get a group of identities. That may be too much overhead but using an identity column is a guarenteed way to get unique numbers.
I might be missing something, but it seems like you are trying to reinvent technology that has already been solved by most databases.
instead of reading and updating from the 'Counter' column in the Number_Setup table, why don't you just use an autoincrementing primary key for your counter? You'll never have a duplicate value for a primary key.