T-SQL Insert or update - sql-server

I have a question regarding performance of SQL Server.
Suppose I have a table persons with the following columns: id, name, surname.
Now, I want to insert a new row in this table. The rule is the following:
If id is not present in the table, then insert the row.
If id is present, then update.
I have two solutions here:
First:
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
if ##ROWCOUNT = 0
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
Second:
if exists (select id from persons where id = #p_id)
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
else
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
What is a better approach? It seems like in the second choice, to update a row, it has to be searched two times, whereas in the first option - just once. Are there any other solutions to the problem? I am using MS SQL 2000.

Both work fine, but I usually use option 2 (pre-mssql 2008) since it reads a bit more clearly. I wouldn't stress about the performance here either...If it becomes an issue, you can use NOLOCK in the exists clause. Though before you start using NOLOCK everywhere, make sure you've covered all your bases (indexes and big picture architecture stuff). If you know you will be updating every item more than once, then it might pay to consider option 1.
Option 3 is to not use destructive updates. It takes more work, but basically you insert a new row every time the data changes (never update or delete from the table) and have a view that selects all the most recent rows. It's useful if you want the table to contain a history of all its previous states, but it can also be overkill.

Option 1 seems good. However, if you're on SQL Server 2008, you could also use MERGE, which may perform good for such UPSERT tasks.
Note that you may want to use an explicit transaction and the XACT_ABORT option for such tasks, so that the transaction consistency remains in the case of a problem or concurrent change.

I tend to use option 1. If there is record in a table, you save one search. If there isn't, you don't loose anything. Moreover, in the second option you may run into funny locking and deadlocking issues related to locks incompatibility.
There's some more info on my blog:
http://sqlblogcasts.com/blogs/piotr_rodak/archive/2010/01/04/updlock-holdlock-and-deadlocks.aspx

You could just use ##RowCount to see if the update did anything. Something like:
UPDATE MyTable
SET SomeData = 'Some Data' WHERE ID = 1
IF ##ROWCOUNT = 0
BEGIN
INSERT MyTable
SELECT 1, 'Some Data'
END

Aiming to be a little more DRY, I avoid writing out the values list twice.
begin tran
insert into persons (id)
select #p_id from persons
where not exists (select * from persons where id = #p_id)
update persons
set name=#p_name, surname=#p_surname
where id = #p_id
commit
Columns name and surname have to be nullable.
The transaction means no other user will ever see the "blank" record.
Edit: cleanup

Related

Which is more efficient update where or if exists then update

I would like to know which is more efficient and why.
if not exists (select 1 from table where ID = 101 and TT = 5)
begin
update table
set TT = 5
where ID = 101;
end;
or
update table
set TT = 5
where ID = 101 and TT <> 5;
Assume there is a clustered index on ID (nothing more table used default table creation setting)
WHERE, IF EXISTS and IN all have different performance benefits. I would suggest checking out these two articles.
https://www.sqlshack.com/t-sql-commands-performance-comparison-not-vs-not-exists-vs-left-join-vs-except/
https://sqlchitchat.com/sqldev/tsql/semi-joins-in-sql-server/
SQL Server will generally optimize a non-updating UPDATE to not actually issue any updates. Therefore, with a simple table, you are not going to see much difference.
If you have triggers, they will be fired if the UPDATE statement executes, irrelevant of how many rows are updated.
If the UPDATE statement executes over rows, even if they are modified to the same value, they will appear in the trigger.
If rows are filtered out with a WHERE clause, for example and TT <> 5, then the trigger will fire with 0 rows
rowversion and GENERATED AS columns will be updated regardless.
Clustered key columns will cause a delete and insert of the whole row.
If ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT are on, even if not being used, then due to the way row-versioning works, an actual update will always be made.
If the IF EXISTS is complex, it still may not be worth doing, but in simple cases it usually is.

Query help to track daily updates made to table(for specific column)

I have 2 tables Individual(IndividualId is primary key) and IndividualAudit. Every time update is made on individual table
record goes to audit table. There are many columns that can be modified but i am interested only in picking up records where SSN is modified.
I m using below query:
Select DI.IndividualId,DI.ssn FRom Individual I
INNER JOIN IndividualAudit A
ON(I.IndividualId = A.IndividualId and A.UpdateDate = GETDATE())
where i.updatedate = GETDATE() and I.ssn <> a.ssn
group by I.IndividualId,I.ssn
Can someone please tell me whether my approach is correct.
Actually i was searching on google and got scared looking at below link:
Query help when using audit table
the person who answered similar query on this post seem to be very good with sql and comparing with his answer my approach looks quite naive.
so i just want to know where am i wrong in my understanding.
Thanks a lot
Rather than fixing the query, I'd suggest instead using an update trigger aimed specifically at changes to that SSN column you're concerned about. The query you've supplied won't work because of the date comparison (as user2159471 has pointed out). But even after you get the query fixed, you'll still have to run it in order to see which SSNs have been updated.
Instead use a SQL update trigger that, perhaps, inserts an entry into a third table each time an individual's SSN get changed. Then you can look at that table any time you, or run a report against it, to see who's been changed.
The trigger code looks like this:
CREATE TRIGGER MyCoolNewTrigger ON Individual
FOR UPDATE
AS
SET NOCOUNT ON
IF (UPDATE(SSN))
BEGIN
Declare #oldSSN as varchar(40)
Declare #NewSSN as varchar(40)
set #oldSSN = deleted.SSN --holds the old SSN being changes
Set #NewSSN = inserted.SSN -- holds the new SSN inserted
Insert into IndividualUpdateLog (NewSSN, OldSSN, ChangeDate)
values (#NewSSN, #oldSSN, getdate)
END

UPDATE slow when setting column to NULL

I have a SQL Server 2008 table with 80,000 rows and am executing the following query:
UPDATE dbo.TableName WITH (ROWLOCK)
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
HelloWorldID is an int and the #helloWorldID parameter is also int.
The query is taking too long and I'd like to optimize it. I created a nonclustered index on HelloWorldID but it didn't matter. I may have to redesign this...maybe put the HelloWorldID on another table that links it to the TableName table?
Since the command you're waiting on is DELETE I have to guess that there is a trigger on dbo.TableName and that it is performing additional work that you do not expect. Or perhaps some CASCADE option that is affecting other tables that have triggers on them.
It all depends on how much rows will be updated by this query.
If you're updating a lot of rows, say 30% of the table, then the index will actually slow down the query (as index will be updated along with the table, and it won't help with filtering the rows for update). Also ROWLOCK will slow it down, because the engine will issue a separate lock for each row (as opposed to pagelocks that would occur normally).
Try removing the index and running this update using WITH(TABLOCK) just to see what happens.
I get this problem sometimes. Your query is dependent upon simultaneously getting a write-lock on every row in the table meeting the conditions of the WHERE-Clause . Depending on your needs for full 'ACID', you could do something like this:
SELECT getdate() -- force ##rowcount=1
while ##rowcount > 0
UPDATE TOP (1000) dbo.TableName
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
This will do the update is smaller chunks, and help overcome locking issues. But remember, this-method gives up on doing this-query as a single-transaction. You will need to tune the 1000 to a value that is right for your server.

T-SQL: what COLUMNS have changed after an update?

OK. I'm doing an update on a single row in a table.
All fields will be overwritten with new data except for the primary key.
However, not all values will change b/c of the update.
For example, if my table is as follows:
TABLE (id int ident, foo varchar(50), bar varchar(50))
The initial value is:
id foo bar
-----------------
1 hi there
I then execute UPDATE tbl SET foo = 'hi', bar = 'something else' WHERE id = 1
What I want to know is what column has had its value changed and what was its original value and what is its new value.
In the above example, I would want to see that the column "bar" was changed from "there" to "something else".
Possible without doing a column by column comparison? Is there some elegant SQL statement like EXCEPT that will be more fine-grained than just the row?
Thanks.
There is no special statement you can run that will tell you exactly which columns changed, but nevertheless the query is not difficult to write:
DECLARE #Updates TABLE
(
OldFoo varchar(50),
NewFoo varchar(50),
OldBar varchar(50),
NewBar varchar(50)
)
UPDATE FooBars
SET <some_columns> = <some_values>
OUTPUT deleted.foo, inserted.foo, deleted.bar, inserted.bar INTO #Updates
WHERE <some_conditions>
SELECT *
FROM #Updates
WHERE OldFoo != NewFoo
OR OldBar != NewBar
If you're trying to actually do something as a result of these changes, then best to write a trigger:
CREATE TRIGGER tr_FooBars_Update
ON FooBars
FOR UPDATE AS
BEGIN
IF UPDATE(foo) OR UPDATE(bar)
INSERT FooBarChanges (OldFoo, NewFoo, OldBar, NewBar)
SELECT d.foo, i.foo, d.bar, i.bar
FROM inserted i
INNER JOIN deleted d
ON i.id = d.id
WHERE d.foo <> i.foo
OR d.bar <> i.bar
END
(Of course you'd probably want to do more than this in a trigger, but there's an example of a very simplistic action)
You can use COLUMNS_UPDATED instead of UPDATE but I find it to be pain, and it still won't tell you which columns actually changed, just which columns were included in the UPDATE statement. So for example you can write UPDATE MyTable SET Col1 = Col1 and it will still tell you that Col1 was updated even though not one single value actually changed. When writing a trigger you need to actually test the individual before-and-after values in order to ensure you're getting real changes (if that's what you want).
P.S. You can also UNPIVOT as Rob says, but you'll still need to explicitly specify the columns in the UNPIVOT clause, it's not magic.
Try unpivotting both inserted and deleted, and then you could join, looking for where the value has changed.
You could detect this in a Trigger, or utilise CDC in SQL Server 2008.
If you create a trigger FOR AFTER UPDATE then the inserted table will contain the rows with the new values, and the deleted table will contain the corresponding rows with the old values.
Alternative option to track data changes is to write data to another (possible temporary) table and then analyse difference with using XML. Changed data is being write to audit table together with column names. Only one thing is you need to know table fields to prepare temporary table.
You can find this solution here:
part 1
part 2
If you are using SQL Server 2008, you should probably take a look at at the new Change Data Capture feature. This will do what you want.
OUTPUT deleted.bar AS [OLD VALUE], inserted.bar AS [NEW VALUE]
#Calvin I was just basing on the UPDATE example. I am not saying this is the full solution. I was giving a hint that you could do this somewhere in your code ;-)
Since I already got a -1 from the above answer, let me pitch this in:
If you don't really know which Column was updated, I'd say create a trigger and use COLUMNS_UPDATED() function in the body of that trigger (See this)
I have created in my blog a Bitmask Reference for use with this COLUMNS_UPDATED(). It will make your life easier if you decide to follow this path (Trigger + Columns_Updated())
If you're not familiar with Trigger, here's my example of basic Trigger http://dbalink.wordpress.com/2008/06/20/how-to-sql-server-trigger-101/

How to make tasks double-checked (the way how to store it in the DB)?

I have a DB that stores different types of tasks and more items in different tables.
In many of these tables (that their structure is different) I need a way to do it that the item has to be double-checked, meaning that the item can't be 'saved' (I mean of course it will be saved) before someone else goes in the program and confirms it.
What should be the right way to say which item is confirmed:
Each of these tables should have a column "IsConfirmed", then when that guy wants to confirm all the stuff, the program walks thru all the tables and creates a list of the items that are not checked.
There should be a third table that holds the table name and Id of that row that has to be confirmed.
I hope you have a better idea than the two uglies above.
Is the double-confirmed status something that happens exactly once for an entity? Or can it be rejected and need to go through confirmation again? In the latter case, do you need to keep all of this history? Do you need to keep track of who confirmed each time (e.g. so you don't have the same person performing both confirmations)?
The simple case:
ALTER TABLE dbo.Table ADD ConfirmCount TINYINT NOT NULL DEFAULT 0;
ALTER TABLE dbo.Table ADD Processed BIT NOT NULL DEFAULT 0;
When the first confirmation:
UPDATE dbo.Table SET ConfirmCount = 1 WHERE PK = <PK> AND ConfirmCount = 0;
On second confirmation:
UPDATE dbo.Table SET ConfirmCount = 2 WHERE PK = <PK> AND ConfirmCount = 1;
When rejected:
UPDATE dbo.Table SET ConfirmCount = 0 WHERE PK = <PK>;
Now obviously your background job can only treat rows where Processed = 0 and ConfirmCount = 2. Then when it has processed that row:
UPDATE dbo.Table SET Processed = 1 WHERE PK = <PK>;
If you have a more complex scenario than this, please provide more details, including the goals of the double-confirm process.
Consider adding a new table to hold the records to be confirmed (e.g. TasksToBeConfirmed). Once the records are confirmed, move those records to the permanent table (Tasks).
The disadvantage of adding an "IsConfirmed" column is that virtually every SQL statement that uses the table will have to filter on "IsConfirmed" to prevent getting unconfirmed records. Every time this is missed, a defect is introduced.
In cases where you need confirmed and unconfirmed records, use UNION.
This pattern is a little more work to code and implement, but in my experience, significantly improves performance and reduces defects.

Resources