Provide counterfeit data protection for end users of service – tools? - database

I need to create a service but i need a help with choice of tools.
Imagine service in which users create some data that have value in historical view (e.g. transactions). Other users can see this data but they need a proof that data are real and not falsified by users or even by service.
Example:
User A creates record with number 42
Couple of months passes
User B see this record and wants to be sure that service can't update this record with any other number 37
Service has trust window with 24 hours: it even can change users data, which were made on this day.
Question: Which instruments can help me to achieve that?
I was thinking about doing public daily backups (or reports?) that any user can download. From each report hash will be calculated and inserted into next backup – thus, a chan of hashes created. If service will change something in past, then hashes in this chain will not converge. Of course, i'll create open sourced tool for easy comparing diff between data and check if chain is valid.
Point of trust: there is one thing that i'm afraid of. Service can use many databases simultaneously and update all backups with all hashes one time (because first backup has no hash of previous one). So, to cover that case too, i think of storing hashes in some place that service can't change at all. For example, in one of the existed blockchains (btc, eth, ...) from official wallet of service. Or, maybe, DAG with some blockchain like IOTA?
What do you think of point of trust?
Can i achieve my goal with some simpler way (without blockchain)? And which one?
What are bottlenecks in this logic?

There are 2 participating variables here
timestamp at which the record is created.
the data.
Solution premise,
Tampering proof.
the data can be changed in the same GMT calendar day without violating tamper-proof guarantee. (can be changed to a fixed window after creation)
RDBMS as the data store, (can be changed to any NoSQL with minor mods, but the idea remains the same).
Doesn't depend on any other mechanism which can be faulty or error-prone.
Single query verification.
## Proposed solution
create data table
CREATE TABLE TEST(
ID INT PRIMARY KEY AUTO_INCREMENT,
DATA VARCHAR(64) NOT NULL,
CREATED_AT DATETIME DEFAULT CURRENT_TIMESTAMP()
);
create checksum table, which monitor tempering
CREATE TABLE SIGN(
ID INT PRIMARY KEY AUTO_INCREMENT,
DATA_ID INT NOT NULL,
SIGNATURE VARCHAR(128) NOT NULL,
CREATED_AT DATETIME DEFAULT CURRENT_TIMESTAMP(),
UPDATED_AT TIMESTAMP
);
create trigger on insert of data
/** Trigger on insert */
DELIMITER //
CREATE TRIGGER sign_after_insert
AFTER INSERT
ON TEST FOR EACH ROW
BEGIN
-- INSERT VAL
INSERT INTO SIGN(DATA_ID, `SIGNATURE`) VALUES(
NEW.ID, MD5(CONCAT (NEW.DATA, DATE(NEW.CREATED_AT)))
);
END; //
DELIMITER ;
Create a trigger for update of data
-- UPDATE TRIGGER
DELIMITER //
CREATE TRIGGER SIGN_AFTER_UPDATE
AFTER UPDATE
ON TEST FOR EACH ROW
BEGIN
-- UPDATE VALS
IF (NEW.DATA <> OLD.DATA) AND (DATE(OLD.CREATED_AT) = CURRENT_DATE() ) THEN
UPDATE SIGN SET SIGNATURE=MD5(CONCAT(NEW.DATA, DATE(NEW.CREATED_AT))) WHERE DATA_ID=OLD.ID;
END IF;
END; //
DELIMITER ;
Test
Step 1: insert the data
INSERT INTO TEST(DATA) VALUES ('DATA2');
The signature of data and the date at which it was created, will reflect as the signature in SIGN table.
Step 2: update the data
the signature will get updated if value is changed and it is the SAME DAY.
UPDATE TEST SET DATA='DATA' WHERE ID =1;
Step 3: validate
you can always validate the data signature as
SELECT MD5(CONCAT (T.DATA, DATE(T.`CREATED_AT`))) AS CHECKSUM, S.SIGNATURE FROM TEST AS T ,SIGN AS S WHERE S.DATA_ID= T.ID AND S.`id`=1;
Output
| CHECKSUM | SIGNATURE |
| ------ | ------ |
|2bba70178abdafc5915ba0b5061597fa |2bba70178abdafc5915ba0b5061597fa

Related

SQL Azure Alternative to Service Broker

Our software is a collection of Windows applications that connect to a SQL database. Currently all our client sites have their own server and SQL Server database, however I'm working on making our software work with Azure-hosted databases too.
I've hit one snag with it, and so far not found anything particularly helpful while Googling around.
The current SQL Server version includes a database auditing system I wrote, which does the following:-
The C# Applications include in the connection string information about which program and version it is, and which User is currently logged in.
Important tables have Update and Delete triggers, which send details of any changes to a Service Broker queue. (I don't log Inserts).
The Service Broker then goes through the queue, and records details of the change to a separate AuditLog table.
These details include:-
Table, PK of the row changed, Field, Old Value, New Value, whether it was an Update or Delete, date/time of change, UserID of the user logged in to our software, and which program and version made the change.
This all works very nicely, and I was hoping to keep the system as-is for the Azure version, but unfortunately SQL Azure does not have Service Broker.
So, I need to look for an alternative, which as I mentioned is proving troublesome.
There is SQL Azure Managed Instances, which does have Service Broker, however they are way too expensive for us to even consider. Not one of our clients would pay that much per month.
Anything else I've looked at doesn't seem to have everything I need. In particular, logging which program, version and UserID. Note that this isn't the SQL login UserID, which will be the same for everyone, this is the ID from the Users table with which they log in to our software, and is passed in the Connection String.
So, ideally I'd like something similar to what I have, just with something else in the place of the Service Broker:-
The C# Applications include in the connection string information about which program and version it is, and which User is currently logged in.
Important tables have Update and Delete triggers, which send details of any changes to an asynchronous queue of some sort.
Something then goes through the queue outside the normal program flow, and records details of the change to a separate AuditLog table.
The asynchronous queue and processing outside the normal program flow is important. Obviously I could very easily have the Update and Delete triggers do all the processing and add the records to the AuditLog table, in fact that was v1.0 of the system, but the problem there is that SQL will wait until the triggers have finished before returning to the C# program. This then causes the C# program to slow down considerably when multiple Updates or Deletes are happening.
I'd be happy to look into other logging systems instead of the above, however something which only records data changes without the extra information I pass, specifically program, version and UserID, won't be of any use to me. Our Users always want to know this information whenever they query something they think is an incorrect change.
So, any suggestions for an alternative to Service Broker for SQL Azure please? TIA!
Ok, looks like I have a potential solution: Temporal Tables
Temporal Tables work in Azure, and record a new row in a History table whenever something changes:-
CREATE TABLE dbo.LMSTemporalTest
(
[EmployeeID] INT NOT NULL PRIMARY KEY CLUSTERED
, [Name] NVARCHAR(100) NOT NULL
, [Position] NVARCHAR(100) NOT NULL
, [Department] NVARCHAR(100) NOT NULL
, [Address] NVARCHAR(1024) NOT NULL
, [AnnualSalary] DECIMAL (10,2) NOT NULL
, [UpdatedBy] UniqueIdentifier NOT NULL
, [UpdatedDate] DateTime NOT NULL
, [ValidFrom] DateTime2 (2) GENERATED ALWAYS AS ROW START HIDDEN
, [ValidTo] DateTime2 (2) GENERATED ALWAYS AS ROW END HIDDEN
, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.LMSTemporalTestHistory));
GO
I can then insert a record into the table...
INSERT INTO LMSTemporalTest(EmployeeID,Name,Position,Department,Address,AnnualSalary, UpdatedBy, UpdatedDate)
VALUES(1, 'Bob', 'Builder', 'Fixers','Oops I forgot', 1, '0D7F5584-C79B-4044-87BD-034A770C4985', GetDate())
GO
Update the row...
UPDATE LMSTemporalTest SET
Address = 'Sunflower Valley, Bobsville',
UpdatedBy = '2C62290B-61A9-4B75-AACF-02B7A5EBFB80',
UpdatedDate = GetDate()
WHERE EmployeeID = 1
GO
Update the row again...
UPDATE LMSTemporalTest SET
AnnualSalary = 420.69,
UpdatedBy = '47F25135-35ED-4855-8050-046CD73E5A7D',
UpdatedDate = GetDate()WHERE EmployeeID = 1
GO
And then check the results:-
SELECT * FROM LMSTemporalTest
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate
1 Bob Builder Fixers Sunflower Valley, Bobsville 420.69 47F25135-35ED-4855-8050-046CD73E5A7D 2019-07-01 16:20:00.230
Note: Because I set them as Hidden, the Valid From and Valid To don't show up
Check the changes for a date / time range:-
SELECT * FROM LMSTemporalTest
FOR SYSTEM_TIME BETWEEN '2019-Jul-01 14:00' AND '2019-Jul-01 17:10'
WHERE EmployeeID = 1
ORDER BY ValidFrom;
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate
1 Bob Builder Fixers Oops I forgot 1.00 0D7F5584-C79B-4044-87BD-034A770C4985 2019-07-01 16:20:00.163
1 Bob Builder Fixers Sunflower Valley, Bobsville 1.00 2C62290B-61A9-4B75-AACF-02B7A5EBFB80 2019-07-01 16:20:00.197
1 Bob Builder Fixers Sunflower Valley, Bobsville 420.69 47F25135-35ED-4855-8050-046CD73E5A7D 2019-07-01 16:20:00.230
And I can even view the History table
SELECT * FROM LMSTemporalTestHistory
GO
EmployeeID Name Position Department Address AnnualSalary UpdatedBy UpdatedDate ValidFrom ValidTo
1 Bob Builder Fixers Oops I forgot 1.00 0D7F5584-C79B-4044-87BD-034A770C4985 2019-07-01 16:20:00.163 2019-07-01 16:20:00.16 2019-07-01 16:20:00.19
1 Bob Builder Fixers Sunflower Valley, Bobsville 1.00 2C62290B-61A9-4B75-AACF-02B7A5EBFB80 2019-07-01 16:20:00.197 2019-07-01 16:20:00.19 2019-07-01 16:20:00.22
Note: the current row doesn't show up, as it's still Valid
All of our important tables have CreatedBy, CreatedDate, UpdatedBy and UpdatedDate already, so I can use those for the UserID logging. No obvious way of handling the Program and Version as standard, but I can always add another hidden field and use Triggers to set that.
EDIT: Actually tested it out
First hurdle was: can you actually change an existing table into a Temporal Table, and the answer was: yes!
ALTER TABLE Clients ADD
[ValidFrom] DateTime2 (2) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL DEFAULT '1753-01-01 00:00:00.000',
[ValidTo] DateTime2 (2) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL DEFAULT '9999-12-31 23:59:59.997',
PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
GO
ALTER TABLE Clients SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ClientsHistory))
GO
An important bit above is the defaults on the ValidFrom and ValidTo fields. It only works if ValidTo is the maximum value a DateTime2 can be, hence '9999-12-31 23:59:59.997'. ValidFrom doesn't seem to matter, so I set that to the minimum just to cover everything.
Ok, so I've converted a table, but it now has two extra fields that the non-Azure table doesn't, which are theoretically hidden, but will our software complain about them?
Seems not. Fired up the software, edited a record on the Clients table and saved it, and the software didn't complain at all.
Checked the Clients and ClientsHistory tables:-
SELECT * FROM Clients
FOR SYSTEM_TIME BETWEEN '1753-01-01 00:00:00.000' AND '9999-12-31 23:59:59.997'
WHERE sCAccountNo = '0001064'
ORDER BY ValidFrom
Shows two records, the original and the edited one, and the existing UpdatedUser and UpdatedDate fields show correctly so I know who made the change and when.
SELECT * FROM ClientsHistory
Shows the original record, with ValidTo set to the date of the change,
All seems good, now I just need to check that it still only returns the current record in queries and to our software:-
SELECT * FROM Clients
WHERE sCAccountNo = '0001064'
Just returned the one record, and doesn't show the HIDDEN fields, ValidFrom and ValidTo.
Did a search in our software for Client 0001064, and again it just returned the one record, and didn't complain about the two extra fields.
Still need to set up a few Triggers and add another HIDDEN field to record the program and version from the Connection String, but it looks like Temporal Tables gives me a viable audit option.
Only downside so far is that it creates an entire record row for each set of changes, meaning you have to compare it to other records to find out what changed, but I can write something to simplify that easily enough.

SSIS data flow - copy new data or update existing

I queried some data from table A(Source) based on certain condition and insert into temp table(Destination) before upsert into Crm.
If data already exist in Crm I dont want to query the data from table A and insert into temp table(I want this table to be empty) unless there is an update in that data or new data was created. So basically I want to query only new data or if there any modified data from table A which already existed in Crm. At the moment my data flow is like this.
clear temp table - delete sql statement
Query from source table A and insert into temp table.
From temp table insert into CRM using script component.
In source table A I have audit columns: createdOn and modifiedOn.
I found one way to do this. SSIS DataFlow - copy only changed and new records but no really clear on how to do so.
What is the best and simple way to achieve this.
The link you posted is basically saying to stage everything and use a MERGE to update your table (essentially an UPDATE/INSERT).
The only way I can really think of to make your process quicker (to a significant degree) by partially selecting from table A would be to add a "last updated" timestamp to table A and enforcing that it will always be up to date.
One way to do this is with a trigger; see here for an example.
You could then select based on that timestamp, perhaps keeping a record of the last timestamp used each time you run the SSIS package, and then adding a margin of safety to that.
Edit: I just saw that you already have a modifiedOn column, so you could use that as described above.
Examples:
There are a few different ways you could do it:
ONE
Include the modifiedOn column on in your final destination table.
You can then build a dynamic query for your data flow source in a SSIS string variable, something like:
"SELECT * FROM [table A] WHERE modifiedOn >= DATEADD(DAY, -1, '" + #[User::MaxModifiedOnDate] + "')"
#[User::MaxModifiedOnDate] (string variable) would come from an Execute SQL Task, where you would write the result of the following query to it:
SELECT FORMAT(CAST(MAX(modifiedOn) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DestinationTable
The DATEADD part, as well as the CAST to a certain degree, represent your margin of safety.
TWO
If this isn't an option, you could keep a data load history table that would tell you when you need to load from, e.g.:
CREATE TABLE DataLoadHistory
(
DataLoadID int PRIMARY KEY IDENTITY
, DataLoadStart datetime NOT NULL
, DataLoadEnd datetime
, Success bit NOT NULL
)
You would begin each data load with this (Execute SQL Task):
CREATE PROCEDURE BeginDataLoad
#DataLoadID int OUTPUT
AS
INSERT INTO DataLoadHistory
(
DataLoadStart
, Success
)
VALUES
(
GETDATE()
, 0
)
SELECT #DataLoadID = SCOPE_IDENTITY()
You would store the returned DataLoadID in a SSIS integer variable, and use it when the data load is complete as follows:
CREATE PROCEDURE DataLoadComplete
#DataLoadID int
AS
UPDATE DataLoadHistory
SET
DataLoadEnd = GETDATE()
, Success = 1
WHERE DataLoadID = #DataLoadID
When it comes to building your query for table A, you would do it the same way as before (with the dynamically generated SQL query), except MaxModifiedOnDate would come from the following query:
SELECT FORMAT(CAST(MAX(DataLoadStart) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DataLoadHistory WHERE Success = 1
So the DataLoadHistory table, rather than your destination table.
Note that this would fail on the first run, as there'd be no successful entries on the history table, so you'd need you insert a dummy record, or find some other way around it.
THREE
I've seen it done a lot where, say your data load is running every day, you would just stage the last 7 days, or something like that, some margin of safety that you're pretty sure will never be passed (because the process is being monitored for failures).
It's not my preferred option, but it is simple, and can work if you're confident in how well the process is being monitored.

How to reset SQL Server 2008 Column based on years

I'm working on a leave software, and my problem is that i need to reset the leave days to default number of days (30 day) after one year. would you pleas help me with that.
ps: I'm using VB.NET AND SQL SERVER.
create table Addemployees
(
Fname varchar (500),
Lname varchar (500),
ID int not null identity(1, 1) primary key,
CIN varchar (500),
fromD date,
toD date,
Email varchar(500),
phone varchar(500),
Leave_num int
)
This is the tablet that contains the column Leave_num that has the leave numbers inserted by the user
update addemployees
set leave_num = 30
As for how you trigger this logic. There are many ways you could go about this. You'll need some sort of scheduler like an Agent job, or whatever else you have at your disposal to run this process on a recurring, scheduled, basis. The key thing is not to keep updating the LeaveNum if it's already been updated. You could maintain an extra column on each row indicating the last time they were reset. This is probably the simplest, but if it's truly an all-or-nothing type thing, and those dates will all be the same, that's sort of a waste of space.
You could then either create a separate table which just contains information about when these once-a-year jobs run, or something like an Extended Property (which is a little more involved to set up).
Whatever the solution you choose, Just save off the date (or even just the year), and then when your process runs, if the difference between the last update is greater than a year (or if the year of the last update is less than the current year) run your update, then update however you're storing that information; be it columns, a separate table, or an extended property.

How to retrieve the last autoincremented ID from a SQLite table?

I have a table Messages with columns ID (primary key, autoincrement) and Content (text).
I have a table Users with columns username (primary key, text) and Hash.
A message is sent by one Sender (user) to many recipients (user) and a recipient (user) can have many messages.
I created a table Messages_Recipients with two columns: MessageID (referring to the ID column of the Messages table and Recipient (referring to the username column in the Users table). This table represents the many to many relation between recipients and messages.
So, the question I have is this. The ID of a new message will be created after it has been stored in the database. But how can I hold a reference to the MessageRow I just added in order to retrieve this new MessageID? I can always search the database for the last row added of course, but that could possibly return a different row in a multithreaded environment?
EDIT: As I understand it for SQLite you can use the SELECT last_insert_rowid(). But how do I call this statement from ADO.Net?
My Persistence code (messages and messagesRecipients are DataTables):
public void Persist(Message message)
{
pm_databaseDataSet.MessagesRow messagerow;
messagerow=messages.AddMessagesRow(message.Sender,
message.TimeSent.ToFileTime(),
message.Content,
message.TimeCreated.ToFileTime());
UpdateMessages();
var x = messagerow;//I hoped the messagerow would hold a
//reference to the new row in the Messages table, but it does not.
foreach (var recipient in message.Recipients)
{
var row = messagesRecipients.NewMessages_RecipientsRow();
row.Recipient = recipient;
//row.MessageID= How do I find this??
messagesRecipients.AddMessages_RecipientsRow(row);
UpdateMessagesRecipients();//method not shown
}
}
private void UpdateMessages()
{
messagesAdapter.Update(messages);
messagesAdapter.Fill(messages);
}
One other option is to look at the system table sqlite_sequence. Your sqlite database will have that table automatically if you created any table with autoincrement primary key. This table is for sqlite to keep track of the autoincrement field so that it won't repeat the primary key even after you delete some rows or after some insert failed (read more about this here http://www.sqlite.org/autoinc.html).
So with this table there is the added benefit that you can find out your newly inserted item's primary key even after you inserted something else (in other tables, of course!). After making sure that your insert is successful (otherwise you will get a false number), you simply need to do:
select seq from sqlite_sequence where name="table_name"
With SQL Server you'd SELECT SCOPE_IDENTITY() to get the last identity value for the current process.
With SQlite, it looks like for an autoincrement you would do
SELECT last_insert_rowid()
immediately after your insert.
http://www.mail-archive.com/sqlite-users#sqlite.org/msg09429.html
In answer to your comment to get this value you would want to use SQL or OleDb code like:
using (SqlConnection conn = new SqlConnection(connString))
{
string sql = "SELECT last_insert_rowid()";
SqlCommand cmd = new SqlCommand(sql, conn);
conn.Open();
int lastID = (Int32) cmd.ExecuteScalar();
}
I've had issues with using SELECT last_insert_rowid() in a multithreaded environment. If another thread inserts into another table that has an autoinc, last_insert_rowid will return the autoinc value from the new table.
Here's where they state that in the doco:
If a separate thread performs a new INSERT on the same database connection while the sqlite3_last_insert_rowid() function is running and thus changes the last insert rowid, then the value returned by sqlite3_last_insert_rowid() is unpredictable and might not equal either the old or the new last insert rowid.
That's from sqlite.org doco
According to Android Sqlite get last insert row id there is another query:
SELECT rowid from your_table_name order by ROWID DESC limit 1
Sample code from #polyglot solution
SQLiteCommand sql_cmd;
sql_cmd.CommandText = "select seq from sqlite_sequence where name='myTable'; ";
int newId = Convert.ToInt32( sql_cmd.ExecuteScalar( ) );
sqlite3_last_insert_rowid() is unsafe in a multithreaded environment (and documented as such on SQLite)
However the good news is that you can play with the chance, see below
ID reservation is NOT implemented in SQLite, you can also avoid PK using your own UNIQUE Primary Key if you know something always variant in your data.
Note:
See if the clause on RETURNING won't solve your issue
https://www.sqlite.org/lang_returning.html
As this is only available in recent version of SQLite and may have some overhead, consider Using the fact that it's really bad luck if you have an insertion in-between your requests to SQLite
see also if you absolutely need to fetch SQlite internal PK, can you design your own predict-able PK:
https://sqlite.org/withoutrowid.html
If need traditional PK AUTOINCREMENT, yes there is a small risk that the id you fetch may belong to another insertion. Small but unacceptable risk.
A workaround is to call twice the sqlite3_last_insert_rowid()
#1 BEFORE my Insert, then #2 AFTER my insert
as in :
int IdLast = sqlite3_last_insert_rowid(m_db); // Before (this id is already used)
const int rc = sqlite3_exec(m_db, sql,NULL, NULL, &m_zErrMsg);
int IdEnd = sqlite3_last_insert_rowid(m_db); // After Insertion most probably the right one,
In the vast majority of cases IdEnd==IdLast+1. This the "happy path" and you can rely on IdEnd as being the ID you look for.
Else you have to need to do an extra SELECT where you can use criteria based on IdLast to IdEnd (any additional criteria in WHERE clause are good to add if any)
Use ROWID (which is an SQlite keyword) to SELECT the id range that is relevant.
"SELECT my_pk_id FROM Symbols WHERE ROWID>%d && ROWID<=%d;",IdLast,IdEnd);
// notice the > in: ROWID>%zd, as we already know that IdLast is NOT the one we look for.
As second call to sqlite3_last_insert_rowid is done right away after INSERT, this SELECT generally only return 2 or 3 row max.
Then search in result from SELECT for the data you Inserted to find the proper id.
Performance improvement: As the call to sqlite3_last_insert_rowid() is way faster than the INSERT, (Even if mutex may make that wrong it is statistically true) I bet on IdEnd to be the right one and unwind the SELECT results by the end. Nearly in every cases we tested the last ROW does contain the ID you look for).
Performance improvement: If you have an additional UNIQUE Key, then add it to the WHERE to get only one row.
I experimented using 3 threads doing heavy Insertions, it worked as expected, the preparation + DB handling take the vast majority of CPU cycles, then results is that the Odd of mixup ID is in the range of 1/1000 insertions (situation where IdEnd>IdLast+1)
So the penalty of an additional SELECT to resolve this is rather low.
Otherwise said the benefit to use the sqlite3_last_insert_rowid() is great in the vast majority of Insertion, and if using some care, can even safely be used in MT.
Caveat: Situation is slightly more awkward in transactional mode.
Also SQLite didn't explicitly guaranty that ID will be contiguous and growing (unless AUTOINCREMENT). (At least I didn't found information about that, but looking at the SQLite source code it preclude that)
the simplest method would be using :
SELECT MAX(id) FROM yourTableName LIMIT 1;
if you are trying to grab this last id in a relation to effect another table as for example : ( if invoice is added THEN add the ItemsList to the invoice ID )
in this case use something like :
var cmd_result = cmd.ExecuteNonQuery(); // return the number of effected rows
then use cmd_result to determine if the previous Query have been excuted successfully, something like : if(cmd_result > 0) followed by your Query SELECT MAX(id) FROM yourTableName LIMIT 1; just to make sure that you are not targeting the wrong row id in case the previous command did not add any Rows.
in fact cmd_result > 0 condition is very necessary thing in case anything fail . specially if you are developing a serious Application, you don't want your users waking up finding random items added to their invoice.
I recently came up with a solution to this problem that sacrifices some performance overhead to ensure you get the correct last inserted ID.
Let's say you have a table people. Add a column called random_bigint:
create table people (
id int primary key,
name text,
random_bigint int not null
);
Add a unique index on random_bigint:
create unique index people_random_bigint_idx
ON people(random_bigint);
In your application, generate a random bigint whenever you insert a record. I guess there is a trivial possibility that a collision will occur, so you should handle that error.
My app is in Go and the code that generates a random bigint looks like this:
func RandomPositiveBigInt() (int64, error) {
nBig, err := rand.Int(rand.Reader, big.NewInt(9223372036854775807))
if err != nil {
return 0, err
}
return nBig.Int64(), nil
}
After you've inserted the record, query the table with a where filter on the random bigint value:
select id from people where random_bigint = <put random bigint here>
The unique index will add a small amount of overhead on the insertion. The id lookup, while very fast because of the index, will also add a little overhead.
However, this method will guarantee a correct last inserted ID.

How to make tasks double-checked (the way how to store it in the DB)?

I have a DB that stores different types of tasks and more items in different tables.
In many of these tables (that their structure is different) I need a way to do it that the item has to be double-checked, meaning that the item can't be 'saved' (I mean of course it will be saved) before someone else goes in the program and confirms it.
What should be the right way to say which item is confirmed:
Each of these tables should have a column "IsConfirmed", then when that guy wants to confirm all the stuff, the program walks thru all the tables and creates a list of the items that are not checked.
There should be a third table that holds the table name and Id of that row that has to be confirmed.
I hope you have a better idea than the two uglies above.
Is the double-confirmed status something that happens exactly once for an entity? Or can it be rejected and need to go through confirmation again? In the latter case, do you need to keep all of this history? Do you need to keep track of who confirmed each time (e.g. so you don't have the same person performing both confirmations)?
The simple case:
ALTER TABLE dbo.Table ADD ConfirmCount TINYINT NOT NULL DEFAULT 0;
ALTER TABLE dbo.Table ADD Processed BIT NOT NULL DEFAULT 0;
When the first confirmation:
UPDATE dbo.Table SET ConfirmCount = 1 WHERE PK = <PK> AND ConfirmCount = 0;
On second confirmation:
UPDATE dbo.Table SET ConfirmCount = 2 WHERE PK = <PK> AND ConfirmCount = 1;
When rejected:
UPDATE dbo.Table SET ConfirmCount = 0 WHERE PK = <PK>;
Now obviously your background job can only treat rows where Processed = 0 and ConfirmCount = 2. Then when it has processed that row:
UPDATE dbo.Table SET Processed = 1 WHERE PK = <PK>;
If you have a more complex scenario than this, please provide more details, including the goals of the double-confirm process.
Consider adding a new table to hold the records to be confirmed (e.g. TasksToBeConfirmed). Once the records are confirmed, move those records to the permanent table (Tasks).
The disadvantage of adding an "IsConfirmed" column is that virtually every SQL statement that uses the table will have to filter on "IsConfirmed" to prevent getting unconfirmed records. Every time this is missed, a defect is introduced.
In cases where you need confirmed and unconfirmed records, use UNION.
This pattern is a little more work to code and implement, but in my experience, significantly improves performance and reduces defects.

Resources