MS Sql Server performance UPDATEing a single column vs all columns - sql-server

MS SQL Srvr 2005/2008 on Win Srvr 2003+
I am only updating 1 row at a time, the UPDATE is in response to a user change on a web form.
I am updating a few columns in a table using the PK. The table has 95 columns. Typically 1 FK column and 1 or 2 other columns will be updated. The table has 6 FK's.
Is it of benefit for me to dynamically generate the UPDATE statement only having the columns being changed in the SET portion of the UPDATE, or stick with the current Stored Procedure using a parameterized update with all of the columns?
Currently, and not subject to immediate change, the data from the web form is posted back to the server and is available for the update. I can't jump to an AJAX scenario where only changed data is posted back to the server from the client browser at this point.
Thanx,
G

SQL Server reads and writes "pages" that consist of 8kb of data. Typically, a page will contain one or more row.
Since disk I/O is the expensive part of an update, the cost of updating half the columns and all the columns is roughly the same. It will still result in 8kb disk I/O.
There's another aspect, that usually doesn't come into play because SQL Server writes in 8kb pages. But imagine your row looks like this:
id int identity
col1 varchar(50)
col2 varchar(50)
Now if you update col1 to be 5 bytes longer, col2 has to be moved forward by five bytes. So even if you don't update col2, it will still have to be written to disk.

In terms of performance its better to update multiple columns in a single update than updating single columns in multiple updates, once the databases locks a row for updating the time used for changing the values is not a performance issue, on the other hand the time that takes to lock a row can cause performance issues and it can get worst if you have multiple connections trying to access the same information. I would recommend to stay as you are with the parameterized stored procedure rather than trying to update single row-columns.

Related

Read Committed Snapshot isolation with LOBs

I have a table in an SQL Server 2017 DB used by a lot of long running transactions that originate from multiple threads. This causes deadlocking several times a day so I am considering implementing read committed snapshot isolation. The trick is that this table has 3 VARBINARY(MAX) columns and each of them contains data between 10-1000MB (with the mean around 20 MB) beside several int and bit columns.
Now the questions:
Q1: Will SQL Server copy the entire row (including the VARBINARY(MAX) columns) into the TEMPDB?
Q2: If so, would the performance benefit from moving the VARBINARY(MAX) columns into a separate table with a 1:1 relationship to the original table?
Sql Server has to present you with consistent view on your data (e.g. T2 sees your row, including LOB, as it were before T1 started mutating transaction). Which means -- yes, it has no choice but to copy LOB with the rest of the row data. Which makes me think that yes, performance may benefit from having separate table with LOBs.
As usual, I would recommend doing simple experiment that will measure performance with both configurations. Please post your results here.

Find out the recently selected rows from a Oracle table and can I update a LAST_ACCESSED column whenever the table is accessed

I have a database table which have more than 1 million records uniquely identified by a GUID column. I want to find out which of these record or rows was selected or retrieved in the last 5 years. The select query can happen from multiple places. Sometimes the row will be returned as a single row. Sometimes it will be part of a set of rows. there is select query that does the fetching from a jdbc connection from a java code. Also a SQL procedure also fetches data from the table.
My intention is to clean up a database table.I want to delete all rows which was never used( retrieved via select query) in last 5 years.
Does oracle DB have any inbuild meta data which can give me this information.
My alternative solution was to add a column LAST_ACCESSED and update this column whenever I select a row from this table. But this operation is a costly operation for me based on time taken for the whole process. Atleast 1000 - 10000 records will be selected from the table for a single operation. Is there any efficient way to do this rather than updating table after reading it. Mine is a multi threaded application. so update such large data set may result in deadlocks or large waiting period for the next read query.
Any elegant solution to this problem?
Oracle Database 12c introduced a new feature called Automatic Data Optimization that brings you Heat Maps to track table access (modifications as well as read operations). Careful, the feature is currently to be licensed under the Advanced Compression Option or In-Memory Option.
Heat Maps track whenever a database block has been modified or whenever a segment, i.e. a table or table partition, has been accessed. It does not track select operations per individual row, neither per individual block level because the overhead would be too heavy (data is generally often and concurrently read, having to keep a counter for each row would quickly become a very costly operation). However, if you have you data partitioned by date, e.g. create a new partition for every day, you can over time easily determine which days are still read and which ones can be archived or purged. Also Partitioning is an option that needs to be licensed.
Once you have reached that conclusion you can then either use In-Database Archiving to mark rows as archived or just go ahead and purge the rows. If you happen to have the data partitioned you can do easy DROP PARTITION operations to purge one or many partitions rather than having to do conventional DELETE statements.
I couldn't use any inbuild solutions. i tried below solutions
1)DB audit feature for select statements.
2)adding a trigger to update a date column whenever a select query is executed on the table.
Both were discarded. Audit uses up a lot of space and have performance hit. Similary trigger also had performance hit.
Finally i resolved the issue by maintaining a separate table were entries older than 5 years that are still used or selected in a query are inserted. While deleting I cross check this table and avoid deleting entries present in this table.

Real time table alternative vs swapping table

I use SSMS 2016. I have a view that has a few millions of records. The view is not indexed and should not be as it's being updated (insert, delete, update) every 5 minutes by a job on the server to then display update data sets in to the client calling application in GUI.
The view does a very heavy volume of conversion INT values to VARCHAR appending to them some string values.
The view also does some CAST operations on the NULL assigning them column names aliases. And the worst performance hit is that the view uses FOR XML PATH('') function on 20 columns.
Also the view uses two CTEs as the source as well as Subsidiaries to define a single column value.
I made sure I created the right indexes (Clustered, nonclustered,composite and Covering) that are used in the view Select,JOIN,and WHERE clauses.
Database Tuning Advisor also have not suggested anything that could substantialy improve performance.
AS a workaround I decided to create two identical physical tables with clustered indexes on each and using the Merge statement (further converted into a SP and then Into as SQL Server Agent Job) maintain them updated. And to assure there is no long locking of the view. I will then swap(rename) the tables names immediately after each merge finishes. So in this case all the heavy workload falls onto a SQL Server Agent Job keep the tables updated.
The problem is that the merge will take roughly 15 minutes considering current size of the data, which may increase in the future. So, I need to have a real time design to assure that the view has the most up-to-date information.
Any ideas?

Fastest way to compare multiple column values in sql server?

I have a Table in sql server consisting of 200 million records in two different servers. I need to move this table from Server 1 to Server 2.
Table in server 1 can be a subset or a superset of the table in server 2. Some of the records(around 1 million) in server 1 are updated which I need to update in server 2. So currently I am following this approach :-
1) Use SSIS to move data from server 1 to staging database in server 2.
2) Then compare data in staging with the table in server 2 column by column. If any of the column is different, I update the whole row.
This is taking a lot of time. I tried using hashbytes inorder to compare rows like this:-
HASHBYTES('sha',CONCAT(a.[account_no],a.[transaction_id], ...))
<>
HASHBYTES('sha',CONCAT(b.[account_no],b.[transaction_id], ...))
But this is taking even more time.
Any other approach which can be faster and can save time?
This is a problem that's pretty common.
First - do not try and do the updates directly in SQL - the performance will be terrible, and will bring the database server to its knees.
In context, TS1 will be the table on Server 1, TS2 will be the table on Server 2
Using SSIS - create two steps within the job:
First, find the deleted - scan TS2 by ID, and any TS2 ID that does not exist in TS1, delete it.
Second, scan TS1, and if the ID exists in TS2, you will need to update that record. If memory serves, SSIS can inspect for differences and only update if needed, otherwise, just execute the update statement.
While scanning TS1, if the ID does not exist in TS2, then insert the record.
I can't speak to performance on this due to variations in schemas as servers, but it will be compute intensive to analyze the 200mm records. It WILL take a long time.
For on-going execution, you will need to add a "last modified date" timestamp to each record and a trigger to update the field on any legitimate change. Then use that to filter out your problem space. The first scan will not be terrible, as it ONLY looks at the IDs. The insert/update phase will actually benefit from the last modified date filter, assuming the number of records being modified is small (< 5%?) relative to the overall dataset. You will also need to add an index to that column to aid in the filtering.
The other option is to perform a burn and load each time - disable any constraints around TS2, truncate TS2 and copy the data into TS2 from TS1, finally reenabling the constraints and rebuild any indexes.
Best of luck to you.

Best practices for moving data using triggers in SQL Server 2000

I have some troubles trying to move data from SQL Server 2000 (SP4) to Oracle10g, so the link is ready and working, now my issue is how to move detailed data, my case is the following:
Table A is Master
Table B is Detail
Both relationed for work with the trigger (FOR INSERT)
So My query needs to query both for create a robust query, so when trigger get fired on first insert of Master it passed normal, in the next step the user will insert one or more details in Table B, so the trigger will be fired any time the record increment, my problem is that I need to send for example :
1 Master - 1 Detail = 2 rows (Works Normal)
1 Master - 2 Details = 4 rows (Trouble)
In the second case I work around the detail that in each select for each insert it duplicates data, I said if Detail have 2 details the normal is that it will be 2 selects with 1 row each one, but in the second select the rows get doubled (query the first detail inserted)
How can I move one row per insert using triggers on Table B?
Most of the time this boils down to a coding error, and I blogged about it here:
http://www.brentozar.com/archive/2009/01/triggers-need-to-handle-multiple-records/
However, I'm concerned about what's going to happen with rollbacks. If you have a program on your SQL Server that does several things in a row to different tables, and they're encapsulated in different transactions, I can envision scenarios where data will get inserted into Oracle but it won't be in SQL Server. I would advise against using triggers for cross-server data synchronization.
Instead, consider using something like DTS or SSIS to synchronize the two servers regularly.

Resources