Update a column only when having a different value - sql-server

Update table1 set Name='Deepak' where id=1 and Name !='Deepak'
Does adding a condition on name column improves the performance considering that id has clustered index and Name has non clustered and there is around 60% probability of getting '0 rows updated' message after running the above query.

Reasons to Only Update If Different
Reduced locks
Prevents unnecessary activity if you have triggers or certain configs of SQL Server Replication
Preserve audit trail columns like LastModifiedDateTime
Only Update If Different Using EXCEPT
Most people's main complaint would probably be extra query complexity, but I find using EXCEPT makes this process super easy. EXCEPT is ideal because it handles any data type and NULL values without issue
UPDATE Table1
SET Col1 = #NewVal1
,Col2 = #NewVal2
....
,LastModifiedBy = #UserID
,LastModifiedDateTime = GETDATE()
WHERE EXISTS (
SELECT Col1,Col2
EXCEPT
SELECT #NewVal1,#NewVal2
)

Related

Which is more efficient update where or if exists then update

I would like to know which is more efficient and why.
if not exists (select 1 from table where ID = 101 and TT = 5)
begin
update table
set TT = 5
where ID = 101;
end;
or
update table
set TT = 5
where ID = 101 and TT <> 5;
Assume there is a clustered index on ID (nothing more table used default table creation setting)
WHERE, IF EXISTS and IN all have different performance benefits. I would suggest checking out these two articles.
https://www.sqlshack.com/t-sql-commands-performance-comparison-not-vs-not-exists-vs-left-join-vs-except/
https://sqlchitchat.com/sqldev/tsql/semi-joins-in-sql-server/
SQL Server will generally optimize a non-updating UPDATE to not actually issue any updates. Therefore, with a simple table, you are not going to see much difference.
If you have triggers, they will be fired if the UPDATE statement executes, irrelevant of how many rows are updated.
If the UPDATE statement executes over rows, even if they are modified to the same value, they will appear in the trigger.
If rows are filtered out with a WHERE clause, for example and TT <> 5, then the trigger will fire with 0 rows
rowversion and GENERATED AS columns will be updated regardless.
Clustered key columns will cause a delete and insert of the whole row.
If ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT are on, even if not being used, then due to the way row-versioning works, an actual update will always be made.
If the IF EXISTS is complex, it still may not be worth doing, but in simple cases it usually is.

in-place update leads to forwarded records

I am well aware what a forwarded record within a heap is.
Since I want to keep forwarded records at 0, we decided to update only on columns that could not be extended.
Recently on my system I encountered forwarded records.
Table design is like this:
CREATE TABLE dbo.test (
HashValue BINARY(16) NOT NULL,
LoadTime DATETIME NOT NULL,
LoadEndTime DATETIME NULL,
[other columns that never get updates]
) WITH(DATA_COMPRESSION=PAGE);
The insert statements ALWAYS brings all the columns, so none is left NULL. I checked the query logs.
I insert a value of '9999-12-31' for the LoadEndTime.
Now system performs an update on LoadTime like this.
;WITH CTE AS (
SELECT *, COALESCE(LEAD(LoadTime) OVER(PARTITION BY HashValue ORDER BY LoadTime) ,'9999-12-31') as EndTimeStamp
)
UPDATE CTE SET LoadEndTime = EndTimeStamp;
since the LoadEntTime column is always filled there should be no extention of that column within the row when the update is executed. It should be an in place update. Still i get forwarded records always after that process... It doesn't make sense to me...

SQL Server Performance With Large Query

Hi everyone I have a couple of queries for some reports in which each query is pulling Data from 35+ tables. Each Table has almost 100K records. All the Queries are Union ALL for Example
;With CTE
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table3 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table4 WHERE Some_Condition
.
.
. And so on
)
SELECT col1, col2, col3 FROM CTE
ORDER BY col3 DESC
So far I have only tested this query on Dev Server and I can see It takes its time to get the results. All of these 35+ tables are not related with each other and this is the only way I can think of to get all the Desired Data in result set.
Is there a better way to do this kind of query ??
If this is the only way to go for this kind of query how can I
improve the performance for this Query by making any changes if
possible??
My Opinion
I Dont mind having a few dirty-reads in this report. I was thinking of using Query hints with nolock or Transaction Isolation Level set to READ UNCOMMITED.
Will any of this help ???
Edit
Every Table has 5-10 Bit columns and a Corresponding Date column to each Bit Column and my condition for each SELECT Statement is something like
WHERE BitColumn = 1 AND DateColumn IS NULL
Suggestion By Peers
Filtered Index
CREATE NONCLUSTERED INDEX IX_Table_Column
ON TableName(BitColumn)
WHERE BitColum = 1
Filtered Index with Included Column
CREATE NONCLUSTERED INDEX fIX_IX_Table_Column
ON TableName(BitColumn)
INCLUDE (DateColumn)
WHERE DateColumn IS NULL
Is this the best way to go ? or any suggestions please ???
There are lots of things that can be done to make it faster.
If I assume you need to do these UNIONs, then you can speed up the query by :
Caching the results, for example,
Can you create an indexed view from the whole statement ? Or there are lots of different WHERE conditions, so there'd be lots of indexed views ? But know that this will slow down modifications (INSERT, etc.) for those tables
Can you cache it in a different way ? Maybe in the mid layer ?
Can it be recalculated in advance ?
Make a covering index. Leading columns are columns form WHERE and then all other columns from the query as included columns
Note that a covering index can be also filtered but filtered index isn't used if the WHERE in the query will have variables / parameters and they can potentially have the value that is not covered by the filtered index (i.e., the row isn't covered)
ORDER BY will cause sort
If you can cache it, then it's fine - no sort will be needed (it's cached sorted)
Otherwise, sort is CPU bound (and I/O bound if not in memory). To speed it up, do you use fast collation ? The performance difference between the slowest and fastest collation can be even 3 times. For example, SQL_EBCDIC280_CP1_CS_AS, SQL_Latin1_General_CP1251_CS_AS, SQL_Latin1_General_CP1_CI_AS are one of the fastest collations. However, it's hard to make recommendations if I don't know the collation characteristics you need
Network
'network packet size' for the connection that does the SELECT should be the maximum value possible - 32,767 bytes if the result set (number of rows) will be big. This can be set on the client side, e.g., if you use .NET and SqlConnection in the connection string. This will minimize CPU overhead when sending data from the SQL Server and will improve performance on both side - client and server. This can boost performance even by tens of percents if the network was the bottleneck
Use shared memory endpoint if the client is on the SQL Server; otherwise TCP/IP for the best performance
General things
As you said, using isolation level read uncommmitted will improve the performance
...
Probably you can't do changes beyond rewriting the query, etc. but just in case, adding more memory in case it isn't sufficient now, or using SQL Server 2014 in memory features :-), ... would surely help.
There are way too many things that could be tuned but it's hard to point out the key ones if the question isn't very specific.
Hope this helps a bit
well you haven't give any statistics or sample run time of any execution so it is not possible to guess what is slow and is it really slow. how much data is in the result set? it might be just retrieving 100K rows as in result is just taking its time. if the result set of 10000 rows is taking 5 minute, yes definitely something can be looked at. so if you have sample query, number of rows in result and how much time it took for couple of execution with different where conditions, post that. it will help us to compare results.
BTW, do not use CTE just use regular inner and outer query select. make sure the Temp DB is configured properly. LDF and MDF is not default configured for 10% increase. by certain try and error you will come to know how much log and temp DB is increased for verity of range queries and based on that you should set the initial and increment size of the MDF and LDF of temp DB. for the Covered filter index the include column should be col1, col2 and co3 not column Date unless Date is also in select list.
how frequently the data in original 35 tables get updated? if max once per day or if they all get updates almost same time then Indexed-Views can be a possible solution. but if original tables getting updates more than once a day or they gets updates anytime and no where they are in same line then do no think about Indexed-View.
if disk space is not an issue as a last resort try and test performance using trigger on each 35 table. create new table to hold final results as you are expecting from this select query. create insert/update/delete trigger on each 35 table where you check the conditions inside trigger and if yes then only copy the same insert/update/delete to new table. yes you will need a column in new table that identifies which data coming from which table. because Date is Null-Able column you do not get full advantage of Index on that Column as "mostly you are looking for WHERE Date is NULL".
in the new Table only query you always do is where Date is NULL then do not even bother to create that column just create BIT columns and other col1, col2, col3 etc... if you give real example of your query and explain the actual tables, other details can be workout later.
The query hints or the Isolation Level are only going to help you in case of any blocking occurs.
If you dont mind dirty reads and there are locks during the execution it could be a good idea.
The key question is how many data fits the Where clausule you need to use (WHERE BitColumn = 1 AND DateColumn IS NULL)
If the subset filtered by that is small compared with the total number of rows, then use an index on both columns, BitColum and DateColumn, including the columns in the select clausule to avoid "Page Lookup" operations in your query plan.
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(BitColumn, DateColumn)
INCLUDE (col1, col2, col3)
Of course the space needed for that covered-filtered index depends on the datatype of the fields involved and the number of rows that satisfy WHERE BitColumn = 1 AND DateColumn IS NULL.
After that I recomend to use a View instead of a CTE:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
.
.
.
)
By doing that, your query plan should look like 35 small index scans, but if most of the data satisfies the where clausule of your index, the performance is going to be similar to scan the 35 source tables and the solution won't worth it.
But You say "Every Table has 5-10 Bit columns and a Corresponding Date column.." then I think is not going to be a good idea to make an index per bit colum.
If you need to filter by using diferent BitColums and Different DateColums, use a compute column in your table:
ALTER TABLE Table1 ADD ComputedFilterFlag AS
CAST(
CASE WHEN BitColum1 = 1 AND DateColumn1 IS NULL THEN 1 ELSE 0 END +
CASE WHEN BitColum2 = 1 AND DateColumn2 IS NULL THEN 2 ELSE 0 END +
CASE WHEN BitColum3 = 1 AND DateColumn3 IS NULL THEN 4 ELSE 0 END
AS tinyint)
I recomend you use the value 2^(X-1) for conditionX(BitColumnX=1 and DateColumnX IS NOT NULL). It is going to allow you to filter by using any combination of that criteria.
By using value 3 you can locate all rows that accomplish: Bit1, Date1 and Bit2, Date2 condition. Any condition combination has its corresponding ComputedFilterFlag value because the ComputedFilterFlag acts as a bitmap of conditions.
If you heve less than 8 diferents filters you should use tinyint to save space in the index and decrease the IO operations needed.
Then use an Index over ComputedFilterFlag colum:
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(ComputedFilterFlag)
INCLUDE (col1, col2, col3)
And create the view:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
.
.
.
)
By doing that, your index coveres all the conditions and your query plan should look like 35 small index seeks.
But this is a tricky solution, may be a refactoring in your table schema could produce simpler and faster results.
You'll never get real time results from a union all query over many tables but I can tell you how I got a little speed out of a similar situation. Hopefully this will help you out.
You can actually run all of them at once with a little bit coding and ingenuity.
You create a global temporary table instead of a common table expression and don't put any keys on the global temporary table it will just slow things down. Then you start all the individual queries which insert into the global temporary table. I've done this a hundred or so times manually and it's faster than a union query because you get a query running on each cpu core. The tricky part is the mechanism to determine when the individual queries have finished your on your own for that piece hence I do these manually.

UPDATE slow when setting column to NULL

I have a SQL Server 2008 table with 80,000 rows and am executing the following query:
UPDATE dbo.TableName WITH (ROWLOCK)
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
HelloWorldID is an int and the #helloWorldID parameter is also int.
The query is taking too long and I'd like to optimize it. I created a nonclustered index on HelloWorldID but it didn't matter. I may have to redesign this...maybe put the HelloWorldID on another table that links it to the TableName table?
Since the command you're waiting on is DELETE I have to guess that there is a trigger on dbo.TableName and that it is performing additional work that you do not expect. Or perhaps some CASCADE option that is affecting other tables that have triggers on them.
It all depends on how much rows will be updated by this query.
If you're updating a lot of rows, say 30% of the table, then the index will actually slow down the query (as index will be updated along with the table, and it won't help with filtering the rows for update). Also ROWLOCK will slow it down, because the engine will issue a separate lock for each row (as opposed to pagelocks that would occur normally).
Try removing the index and running this update using WITH(TABLOCK) just to see what happens.
I get this problem sometimes. Your query is dependent upon simultaneously getting a write-lock on every row in the table meeting the conditions of the WHERE-Clause . Depending on your needs for full 'ACID', you could do something like this:
SELECT getdate() -- force ##rowcount=1
while ##rowcount > 0
UPDATE TOP (1000) dbo.TableName
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
This will do the update is smaller chunks, and help overcome locking issues. But remember, this-method gives up on doing this-query as a single-transaction. You will need to tune the 1000 to a value that is right for your server.

T-SQL Insert or update

I have a question regarding performance of SQL Server.
Suppose I have a table persons with the following columns: id, name, surname.
Now, I want to insert a new row in this table. The rule is the following:
If id is not present in the table, then insert the row.
If id is present, then update.
I have two solutions here:
First:
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
if ##ROWCOUNT = 0
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
Second:
if exists (select id from persons where id = #p_id)
update persons
set id=#p_id, name=#p_name, surname=#p_surname
where id=#p_id
else
insert into persons(id, name, surname)
values (#p_id, #p_name, #p_surname)
What is a better approach? It seems like in the second choice, to update a row, it has to be searched two times, whereas in the first option - just once. Are there any other solutions to the problem? I am using MS SQL 2000.
Both work fine, but I usually use option 2 (pre-mssql 2008) since it reads a bit more clearly. I wouldn't stress about the performance here either...If it becomes an issue, you can use NOLOCK in the exists clause. Though before you start using NOLOCK everywhere, make sure you've covered all your bases (indexes and big picture architecture stuff). If you know you will be updating every item more than once, then it might pay to consider option 1.
Option 3 is to not use destructive updates. It takes more work, but basically you insert a new row every time the data changes (never update or delete from the table) and have a view that selects all the most recent rows. It's useful if you want the table to contain a history of all its previous states, but it can also be overkill.
Option 1 seems good. However, if you're on SQL Server 2008, you could also use MERGE, which may perform good for such UPSERT tasks.
Note that you may want to use an explicit transaction and the XACT_ABORT option for such tasks, so that the transaction consistency remains in the case of a problem or concurrent change.
I tend to use option 1. If there is record in a table, you save one search. If there isn't, you don't loose anything. Moreover, in the second option you may run into funny locking and deadlocking issues related to locks incompatibility.
There's some more info on my blog:
http://sqlblogcasts.com/blogs/piotr_rodak/archive/2010/01/04/updlock-holdlock-and-deadlocks.aspx
You could just use ##RowCount to see if the update did anything. Something like:
UPDATE MyTable
SET SomeData = 'Some Data' WHERE ID = 1
IF ##ROWCOUNT = 0
BEGIN
INSERT MyTable
SELECT 1, 'Some Data'
END
Aiming to be a little more DRY, I avoid writing out the values list twice.
begin tran
insert into persons (id)
select #p_id from persons
where not exists (select * from persons where id = #p_id)
update persons
set name=#p_name, surname=#p_surname
where id = #p_id
commit
Columns name and surname have to be nullable.
The transaction means no other user will ever see the "blank" record.
Edit: cleanup

Resources