MS SQL Server trigger to update item rating and number of votes - sql-server

To make this easier to understand, I will present the exact same problem as if it was about a forum (the actual app doesn't have to do with forums at all, but I think such a parallel is easier for most of us to grasp, the actual app is about something very specific that most programmers won't understand (it's an app intended for hardcore graphic designers)).
Let's suppose that there is a thread table that stores information about each forum thread and a threadrating table that stores thread ratings per user (1-5). For efficiency I decided to cache the rating average and number of votes in the thread table and triggers sounded like a good idea for updating it (I used to do such stuff in the actual application code, but I think triggers are worth a try, despite the debugging dangers).
As you know, MS SQL Server doesn't support a trigger to be executed per row, it has to be per statement. So I tried defining it this way:
CREATE TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE thread
SET
thread.rating = (thread.rating * thread.voters + SUM(inserted.rating))/(thread.voters + COUNT(inserted.rating)),
thread.voters = thread.voters + COUNT(inserted.rating)
FROM thread
INNER JOIN inserted ON(inserted.threadid = thread.threadid)
GROUP BY inserted.threadid
but I get an error for the "GROUP BY" clause (which I expected). The question is, how can I make this work?
Sorry if the question is stupid, but it's the first time I actually try to use triggers.
Additional info: The thread table would contain threadid (int, primary key), rating (float), voters(int) and some other fields that are irrelevent to the current question.
The threadrating table only contains threadid (foreign key), userid (foreign key to the primary key of the users table) and rating (tinyint between 1 and 5).
The error message is "Incorrect syntax near the keyword 'GROUP'."

First, I strongly recommend that you not use triggers.
If you're getting a syntax error, check that your parens are balanced as well as your begin/ends. In your case, you have an end (at the end) but no begin. You can fix that be just removing the end.
Once you fix that, you'll likely get some more errors like "columns x,y,z not in an aggregate or group by". That's because you have several columns that are not in either. You need to add thread.rating, thread.voters, etc. to your group by or perform some kind of aggregate on them.
This is all assuming that there are multiple records with the same threadID (ie, it's not the primary key). If that's not the case, then what's the purpose of the group by?
Edit:
I'm stumped on the syntax error. I worked around it with a couple correlated sub queries. I guessed at your table structure so modify as needed and try this:
--CREATE TABLE ThreadRating (threadid int not null, userid int not null, rating int not null)
--CREATE TABLE Thread (threadid int not null, rating int not null, voters int not null)
ALTER TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE Thread
SET Thread.rating =
(SELECT (Thread.Rating * Thread.Voters + SUM(I.Rating)) / (Thread.Voters + COUNT(I.Rating))
FROM ThreadRating I WHERE I.ThreadID = thread.ThreadID)
,Thread.Voters =
(SELECT Thread.Voters + COUNT(I.Rating)
FROM ThreadRating I WHERE I.ThreadID = Thread.ThreadID)
FROM Thread
JOIN Inserted ON Inserted.ThreadID = Thread.ThreadID
If that's what you wanted, then we can check the performance/execution plan and modify as needed. We might be able to get it to work with the group by yet.
Alternatives to triggers
If you are updating data that impact ratings in only a few select places, I'd recommend updating the ratings directly there. Factoring the logic into a trigger is nice but provides lots of problems (performance, visibility, etc.). This can be aided by a function.
Consider this: your trigger will execute every single time someone touches that table. Things like view counts, last updated dates, etc. will execute this trigger. You can add logic to short circuit the trigger in those cases but it gets complicated rapidly.

D'ohh! I totally misread your question and I thought you were asking about MySQL. Mea culpa! I will leave the solution below intact, and mark it as community wiki. Maybe it'll be useful to someone with a similar problem on MySQL.
MySQL triggers are executed per row. Also the pseudo-table "inserted" is a Microsoft SQL Server convention.
MySQL uses pseudo-tables NEW and OLD as extensions to the trigger language.
Here's a solution to your problem:
CREATE TRIGGER thread_rating
AFTER INSERT ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters + NEW.rating)/(voters+1),
voters = voters + 1
WHERE threadid = NEW.threadid;
END
Likewise you'd need triggers for UPDATE and DELETE:
CREATE TRIGGER thread_rating
AFTER UPDATE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating + NEW.rating)/voters,
WHERE threadid = NEW.threadid;
END
CREATE TRIGGER thread_rating
AFTER DELETE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating)/(voters-1),
voters = voters - 1
WHERE threadid = OLD.threadid;
END

You may find the following reading helpful:
An introduction to Triggers
Wikipedia: DB Triggers

Related

Query help to track daily updates made to table(for specific column)

I have 2 tables Individual(IndividualId is primary key) and IndividualAudit. Every time update is made on individual table
record goes to audit table. There are many columns that can be modified but i am interested only in picking up records where SSN is modified.
I m using below query:
Select DI.IndividualId,DI.ssn FRom Individual I
INNER JOIN IndividualAudit A
ON(I.IndividualId = A.IndividualId and A.UpdateDate = GETDATE())
where i.updatedate = GETDATE() and I.ssn <> a.ssn
group by I.IndividualId,I.ssn
Can someone please tell me whether my approach is correct.
Actually i was searching on google and got scared looking at below link:
Query help when using audit table
the person who answered similar query on this post seem to be very good with sql and comparing with his answer my approach looks quite naive.
so i just want to know where am i wrong in my understanding.
Thanks a lot
Rather than fixing the query, I'd suggest instead using an update trigger aimed specifically at changes to that SSN column you're concerned about. The query you've supplied won't work because of the date comparison (as user2159471 has pointed out). But even after you get the query fixed, you'll still have to run it in order to see which SSNs have been updated.
Instead use a SQL update trigger that, perhaps, inserts an entry into a third table each time an individual's SSN get changed. Then you can look at that table any time you, or run a report against it, to see who's been changed.
The trigger code looks like this:
CREATE TRIGGER MyCoolNewTrigger ON Individual
FOR UPDATE
AS
SET NOCOUNT ON
IF (UPDATE(SSN))
BEGIN
Declare #oldSSN as varchar(40)
Declare #NewSSN as varchar(40)
set #oldSSN = deleted.SSN --holds the old SSN being changes
Set #NewSSN = inserted.SSN -- holds the new SSN inserted
Insert into IndividualUpdateLog (NewSSN, OldSSN, ChangeDate)
values (#NewSSN, #oldSSN, getdate)
END

How to retrieve the last autoincremented ID from a SQLite table?

I have a table Messages with columns ID (primary key, autoincrement) and Content (text).
I have a table Users with columns username (primary key, text) and Hash.
A message is sent by one Sender (user) to many recipients (user) and a recipient (user) can have many messages.
I created a table Messages_Recipients with two columns: MessageID (referring to the ID column of the Messages table and Recipient (referring to the username column in the Users table). This table represents the many to many relation between recipients and messages.
So, the question I have is this. The ID of a new message will be created after it has been stored in the database. But how can I hold a reference to the MessageRow I just added in order to retrieve this new MessageID? I can always search the database for the last row added of course, but that could possibly return a different row in a multithreaded environment?
EDIT: As I understand it for SQLite you can use the SELECT last_insert_rowid(). But how do I call this statement from ADO.Net?
My Persistence code (messages and messagesRecipients are DataTables):
public void Persist(Message message)
{
pm_databaseDataSet.MessagesRow messagerow;
messagerow=messages.AddMessagesRow(message.Sender,
message.TimeSent.ToFileTime(),
message.Content,
message.TimeCreated.ToFileTime());
UpdateMessages();
var x = messagerow;//I hoped the messagerow would hold a
//reference to the new row in the Messages table, but it does not.
foreach (var recipient in message.Recipients)
{
var row = messagesRecipients.NewMessages_RecipientsRow();
row.Recipient = recipient;
//row.MessageID= How do I find this??
messagesRecipients.AddMessages_RecipientsRow(row);
UpdateMessagesRecipients();//method not shown
}
}
private void UpdateMessages()
{
messagesAdapter.Update(messages);
messagesAdapter.Fill(messages);
}
One other option is to look at the system table sqlite_sequence. Your sqlite database will have that table automatically if you created any table with autoincrement primary key. This table is for sqlite to keep track of the autoincrement field so that it won't repeat the primary key even after you delete some rows or after some insert failed (read more about this here http://www.sqlite.org/autoinc.html).
So with this table there is the added benefit that you can find out your newly inserted item's primary key even after you inserted something else (in other tables, of course!). After making sure that your insert is successful (otherwise you will get a false number), you simply need to do:
select seq from sqlite_sequence where name="table_name"
With SQL Server you'd SELECT SCOPE_IDENTITY() to get the last identity value for the current process.
With SQlite, it looks like for an autoincrement you would do
SELECT last_insert_rowid()
immediately after your insert.
http://www.mail-archive.com/sqlite-users#sqlite.org/msg09429.html
In answer to your comment to get this value you would want to use SQL or OleDb code like:
using (SqlConnection conn = new SqlConnection(connString))
{
string sql = "SELECT last_insert_rowid()";
SqlCommand cmd = new SqlCommand(sql, conn);
conn.Open();
int lastID = (Int32) cmd.ExecuteScalar();
}
I've had issues with using SELECT last_insert_rowid() in a multithreaded environment. If another thread inserts into another table that has an autoinc, last_insert_rowid will return the autoinc value from the new table.
Here's where they state that in the doco:
If a separate thread performs a new INSERT on the same database connection while the sqlite3_last_insert_rowid() function is running and thus changes the last insert rowid, then the value returned by sqlite3_last_insert_rowid() is unpredictable and might not equal either the old or the new last insert rowid.
That's from sqlite.org doco
According to Android Sqlite get last insert row id there is another query:
SELECT rowid from your_table_name order by ROWID DESC limit 1
Sample code from #polyglot solution
SQLiteCommand sql_cmd;
sql_cmd.CommandText = "select seq from sqlite_sequence where name='myTable'; ";
int newId = Convert.ToInt32( sql_cmd.ExecuteScalar( ) );
sqlite3_last_insert_rowid() is unsafe in a multithreaded environment (and documented as such on SQLite)
However the good news is that you can play with the chance, see below
ID reservation is NOT implemented in SQLite, you can also avoid PK using your own UNIQUE Primary Key if you know something always variant in your data.
Note:
See if the clause on RETURNING won't solve your issue
https://www.sqlite.org/lang_returning.html
As this is only available in recent version of SQLite and may have some overhead, consider Using the fact that it's really bad luck if you have an insertion in-between your requests to SQLite
see also if you absolutely need to fetch SQlite internal PK, can you design your own predict-able PK:
https://sqlite.org/withoutrowid.html
If need traditional PK AUTOINCREMENT, yes there is a small risk that the id you fetch may belong to another insertion. Small but unacceptable risk.
A workaround is to call twice the sqlite3_last_insert_rowid()
#1 BEFORE my Insert, then #2 AFTER my insert
as in :
int IdLast = sqlite3_last_insert_rowid(m_db); // Before (this id is already used)
const int rc = sqlite3_exec(m_db, sql,NULL, NULL, &m_zErrMsg);
int IdEnd = sqlite3_last_insert_rowid(m_db); // After Insertion most probably the right one,
In the vast majority of cases IdEnd==IdLast+1. This the "happy path" and you can rely on IdEnd as being the ID you look for.
Else you have to need to do an extra SELECT where you can use criteria based on IdLast to IdEnd (any additional criteria in WHERE clause are good to add if any)
Use ROWID (which is an SQlite keyword) to SELECT the id range that is relevant.
"SELECT my_pk_id FROM Symbols WHERE ROWID>%d && ROWID<=%d;",IdLast,IdEnd);
// notice the > in: ROWID>%zd, as we already know that IdLast is NOT the one we look for.
As second call to sqlite3_last_insert_rowid is done right away after INSERT, this SELECT generally only return 2 or 3 row max.
Then search in result from SELECT for the data you Inserted to find the proper id.
Performance improvement: As the call to sqlite3_last_insert_rowid() is way faster than the INSERT, (Even if mutex may make that wrong it is statistically true) I bet on IdEnd to be the right one and unwind the SELECT results by the end. Nearly in every cases we tested the last ROW does contain the ID you look for).
Performance improvement: If you have an additional UNIQUE Key, then add it to the WHERE to get only one row.
I experimented using 3 threads doing heavy Insertions, it worked as expected, the preparation + DB handling take the vast majority of CPU cycles, then results is that the Odd of mixup ID is in the range of 1/1000 insertions (situation where IdEnd>IdLast+1)
So the penalty of an additional SELECT to resolve this is rather low.
Otherwise said the benefit to use the sqlite3_last_insert_rowid() is great in the vast majority of Insertion, and if using some care, can even safely be used in MT.
Caveat: Situation is slightly more awkward in transactional mode.
Also SQLite didn't explicitly guaranty that ID will be contiguous and growing (unless AUTOINCREMENT). (At least I didn't found information about that, but looking at the SQLite source code it preclude that)
the simplest method would be using :
SELECT MAX(id) FROM yourTableName LIMIT 1;
if you are trying to grab this last id in a relation to effect another table as for example : ( if invoice is added THEN add the ItemsList to the invoice ID )
in this case use something like :
var cmd_result = cmd.ExecuteNonQuery(); // return the number of effected rows
then use cmd_result to determine if the previous Query have been excuted successfully, something like : if(cmd_result > 0) followed by your Query SELECT MAX(id) FROM yourTableName LIMIT 1; just to make sure that you are not targeting the wrong row id in case the previous command did not add any Rows.
in fact cmd_result > 0 condition is very necessary thing in case anything fail . specially if you are developing a serious Application, you don't want your users waking up finding random items added to their invoice.
I recently came up with a solution to this problem that sacrifices some performance overhead to ensure you get the correct last inserted ID.
Let's say you have a table people. Add a column called random_bigint:
create table people (
id int primary key,
name text,
random_bigint int not null
);
Add a unique index on random_bigint:
create unique index people_random_bigint_idx
ON people(random_bigint);
In your application, generate a random bigint whenever you insert a record. I guess there is a trivial possibility that a collision will occur, so you should handle that error.
My app is in Go and the code that generates a random bigint looks like this:
func RandomPositiveBigInt() (int64, error) {
nBig, err := rand.Int(rand.Reader, big.NewInt(9223372036854775807))
if err != nil {
return 0, err
}
return nBig.Int64(), nil
}
After you've inserted the record, query the table with a where filter on the random bigint value:
select id from people where random_bigint = <put random bigint here>
The unique index will add a small amount of overhead on the insertion. The id lookup, while very fast because of the index, will also add a little overhead.
However, this method will guarantee a correct last inserted ID.

T-SQL Insert or update

I have a question regarding performance of SQL Server.
Suppose I have a table persons with the following columns: id, name, surname.
Now, I want to insert a new row in this table. The rule is the following:
If id is not present in the table, then insert the row.
If id is present, then update.
I have two solutions here:
First:
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
if ##ROWCOUNT = 0
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
Second:
if exists (select id from persons where id = #p_id)
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
else
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
What is a better approach? It seems like in the second choice, to update a row, it has to be searched two times, whereas in the first option - just once. Are there any other solutions to the problem? I am using MS SQL 2000.
Both work fine, but I usually use option 2 (pre-mssql 2008) since it reads a bit more clearly. I wouldn't stress about the performance here either...If it becomes an issue, you can use NOLOCK in the exists clause. Though before you start using NOLOCK everywhere, make sure you've covered all your bases (indexes and big picture architecture stuff). If you know you will be updating every item more than once, then it might pay to consider option 1.
Option 3 is to not use destructive updates. It takes more work, but basically you insert a new row every time the data changes (never update or delete from the table) and have a view that selects all the most recent rows. It's useful if you want the table to contain a history of all its previous states, but it can also be overkill.
Option 1 seems good. However, if you're on SQL Server 2008, you could also use MERGE, which may perform good for such UPSERT tasks.
Note that you may want to use an explicit transaction and the XACT_ABORT option for such tasks, so that the transaction consistency remains in the case of a problem or concurrent change.
I tend to use option 1. If there is record in a table, you save one search. If there isn't, you don't loose anything. Moreover, in the second option you may run into funny locking and deadlocking issues related to locks incompatibility.
There's some more info on my blog:
http://sqlblogcasts.com/blogs/piotr_rodak/archive/2010/01/04/updlock-holdlock-and-deadlocks.aspx
You could just use ##RowCount to see if the update did anything. Something like:
UPDATE MyTable
SET SomeData = 'Some Data' WHERE ID = 1
IF ##ROWCOUNT = 0
BEGIN
INSERT MyTable
SELECT 1, 'Some Data'
END
Aiming to be a little more DRY, I avoid writing out the values list twice.
begin tran
insert into persons (id)
select #p_id from persons
where not exists (select * from persons where id = #p_id)
update persons
set name=#p_name, surname=#p_surname
where id = #p_id
commit
Columns name and surname have to be nullable.
The transaction means no other user will ever see the "blank" record.
Edit: cleanup

How to make tasks double-checked (the way how to store it in the DB)?

I have a DB that stores different types of tasks and more items in different tables.
In many of these tables (that their structure is different) I need a way to do it that the item has to be double-checked, meaning that the item can't be 'saved' (I mean of course it will be saved) before someone else goes in the program and confirms it.
What should be the right way to say which item is confirmed:
Each of these tables should have a column "IsConfirmed", then when that guy wants to confirm all the stuff, the program walks thru all the tables and creates a list of the items that are not checked.
There should be a third table that holds the table name and Id of that row that has to be confirmed.
I hope you have a better idea than the two uglies above.
Is the double-confirmed status something that happens exactly once for an entity? Or can it be rejected and need to go through confirmation again? In the latter case, do you need to keep all of this history? Do you need to keep track of who confirmed each time (e.g. so you don't have the same person performing both confirmations)?
The simple case:
ALTER TABLE dbo.Table ADD ConfirmCount TINYINT NOT NULL DEFAULT 0;
ALTER TABLE dbo.Table ADD Processed BIT NOT NULL DEFAULT 0;
When the first confirmation:
UPDATE dbo.Table SET ConfirmCount = 1 WHERE PK = <PK> AND ConfirmCount = 0;
On second confirmation:
UPDATE dbo.Table SET ConfirmCount = 2 WHERE PK = <PK> AND ConfirmCount = 1;
When rejected:
UPDATE dbo.Table SET ConfirmCount = 0 WHERE PK = <PK>;
Now obviously your background job can only treat rows where Processed = 0 and ConfirmCount = 2. Then when it has processed that row:
UPDATE dbo.Table SET Processed = 1 WHERE PK = <PK>;
If you have a more complex scenario than this, please provide more details, including the goals of the double-confirm process.
Consider adding a new table to hold the records to be confirmed (e.g. TasksToBeConfirmed). Once the records are confirmed, move those records to the permanent table (Tasks).
The disadvantage of adding an "IsConfirmed" column is that virtually every SQL statement that uses the table will have to filter on "IsConfirmed" to prevent getting unconfirmed records. Every time this is missed, a defect is introduced.
In cases where you need confirmed and unconfirmed records, use UNION.
This pattern is a little more work to code and implement, but in my experience, significantly improves performance and reduces defects.

How to control order of Update query execution?

I have a table in MS SQL 2005. And would like to do:
update Table
set ID = ID + 1
where ID > 5
And the problem is that ID is primary key and when I do this I have an error, because when this query comes to row with ID 8 it tries to change the value to 9, but there is old row in this table with value 9 and there is constraint violation.
Therefore I would like to control the update query to make sure that it's executed in the descending order.
So no for ID = 1,2,3,4 and so on, but rather ID = 98574 (or else) and then 98573, 98572 and so on. In this situation there will be no constraint violation.
So how to control order of update execution? Is there a simple way to acomplish this programmatically?
Transact SQL defers constraint checking until the statement finishes.
That's why this query:
UPDATE mytable
SET id = CASE WHEN id = 7 THEN 8 ELSE 7 END
WHERE id IN (7, 8)
will not fail, though it swaps id's 7 and 8.
It seems that some duplicate values are left after your query finishes.
Try this:
update Table
set ID = ID * 100000 + 1
where ID > 5
update Table
set ID = ID / 100000
where ID > 500000
Don't forget the parenthesis...
update Table
set ID = (ID * 100000) + 1
where ID > 5
If the IDs get too big here, you can always use a loop.
Personally I would not update an id field this way, I would create a work table that is the old to new table. It stores both ids and then all the updates are done from that. If you are not using cascade delete (which could incidentally lock your tables for a long time), then start with the child tables and work up, other wise start with the pk table. Do not do this unless you are in single user mode or you can get some nasty data integrity problems if other users are changin things while the tables are not consistent with each other.
PKs are nothing to fool around with changing and if at all possible should not be changed.
Before you do any changes to production data in this way, make sure to take a full backup. Messing this up can cost you your job if you can't recover.

Resources