Computed column vs Updates in SQL Server

Computed column vs Updates in SQL Server - sql-server

I would like to know if there any performance boost for computed columns in SQL Server, please check the example below.
Now I have a stored procedure in production environment that updates tables, concatenating two VARCHAR columns into another column, which is NULL when created.
If I would like to switch the updating logic to using a computed column, which will automatically generate the value when loading the data.
Question is: will this help me to boost the process time for that derived column? I cannot really make the changes and test in the production environment at this point, but before I do that, in general, any advantages using computed columns vs updates.
Please note the updated column will remain as it is and the total number of records in effect will be up to million.
UPDATES:
Table definition
CREATE TableA
(
ColumnA VARCHAR(50),
ColumnB VARCHAR(50),
ColumnC VARCHAR(50)
)
ColumnA and ColumnB will be populated with data from SSIS package, and ColumnC will updated by the stored procedure, which is
UPDATE TableA
SET ColumnC = ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
These updates will affect about up to millions of records.
If I would like to use:
CREATE TableA
(
ColumnA VARCHAR(50),
ColumnB VARCHAR(50),
ColumnC as ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
)
will this be quicker for populating the ColumnC?

On the update I suggest a where so you don't update rows that do not need to be updated. An update takes a lock and puts an entry in the transaction log.
UPDATE TableA
SET ColumnC = ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
WHERE ColumnC <> ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
You can control growing the transaction log with:
(this is from memory so may have syntax error(s))
select 1
while (##rowcount > 0)
begin
UPDATE top(10000) TableA
SET ColumnC = ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
WHERE ColumnC <> ISNULL(ColumnA,'') + ISNULL(ColumnB,'')
end
Computed Column
Computed column is a virtual column unless it is persisted. So if you don't persist then there is no extra time to load ColumnA and ColumnB. Select on ColumnC will be slower as it is computed on the fly.
If ColumnC is persisted will be like the update but done on the fly when ColumnA or ColumnB are inserted or updated.
As stated in comments a computed column is consistent. An update is only consistent up to the last time the command was run.

Related

Update Query with Order by while preventing two users updating the same row

I have a SQL Server table with an expirydate column, I want to update rows on this table with the nearest expirydate, running two queries (select then update) won't work because two users may update the same row at the same time, so it has be one query.
The following query:
Update Top(5) table1
Set col1 = 1
Output deleted.* Into table2
This query runs fine but it doesn't sort by expirydate
This query:
WITH q AS
(
SELECT TOP 5 *
FROM table1
ORDER BY expirydate
)
UPDATE table1
SET col1 = 1
OUTPUT deleted.* INTO table2
WHERE table1.id IN (SELECT id FROM q)
It works but again I run the risk of two users updating the same row at the same time
What options do I have to make this work?
Thanks for the help

In these types of scenarios if you want a more optimistic concurrency approach, you need to include either an Order By AND / OR a Where clause to filter out the rows.
In application design it is common to use SELECT TOP (#count) FROM... style queries to fill the interface, however to execute DELETE or UPDATE statements you would use the primary key to specifically identify the rows to modify.
As long as you are not executing delete, then you could use a timestamp or other date based descriminator column to ensure that your updates only affect the rows that haven't been changed since the last select.
So you could query the current time as part of the select query:
SELECT TOP 5 *, SYSDATETIMEOFFSET() as [Selected]
FROM table1
ORDER BY expirydate
or query for the timestamp first, and add a created column to the table to track new records so you do not include them in deletes, either way you need to ensure that the query to select the rows will always return the same records, even if I run it tomorrow, which means you will need to ensure that no one can modify the expirydate column, if that could be modified, then you can't use it as your primary sort or filter key.
DECLARE #selectedTimestamp DateTimeOffset = (SELECT SYSDatetimeoffset())
SELECT TOP 5 *, SYSDATETIMEOFFSET() as [Selected]
FROM table1
WHERE CREATED < #selectedTimestamp
ORDER BY expirydate
Then in your update, make sure you only update the rows if they have not changed since the time that we selected them, this will either require you to have setup a standard audit trigger on the table to keep created and modified columns up to date, or for you to manage it manually in your update statement:
WITH q AS
(
SELECT TOP 5 *
FROM table1
WHERE CREATED < #selectedTimestamp
ORDER BY expirydate
)
UPDATE table1
SET col1 = 1, MODIFIED = SYSDatetimeoffset()
OUTPUT deleted.* INTO table2
WHERE table1.id IN (SELECT id FROM q)
AND MODIFIED < #selectedTimestamp
In this way we are effectively ignoring our change if another user has already updated records that were in the same or similar initial selection range.
Ultimately you could combine my initial advice to UPDATE based on the primary key AND the modified dates if you are genuinely concerned about the rows being updated twice.
If you need a more pessimistic approach, you could lock the rows with a specific user based flag so that other users cannot even select those rows, but that requires a much more detailed explanation.

SQL trigger for audit table getting out of sync

I recently created a SQL trigger to replace a very expensive query I used to run to reduce the amount of updates my database does each day.
Before I preform an update I check to see how many updates have already occurred for the day, this used to be done by querying:
SELECT COUNT(*) FROM Movies WHERE DateAdded = Date.Now
Well my database has over 1 million records and this query is run about 1-2k a minute so you can see why I wanted to take a new approach for this.
So I created an audit table and setup a SQL Trigger to update this table when any INSERT or UPDATE happens on the Movie table. However I'm noticing the audit table is getting out of sync by a few hundred everyday (the audit table count is higher than the actual updates in the movie table). As this does not pose a huge issue I'm just curious what could be causing this or how to go about debugging it?
SQL Trigger:
ALTER TRIGGER [dbo].[trg_Audit]
ON [dbo].[Movies]
AFTER UPDATE, INSERT
AS
BEGIN
UPDATE Audit SET [count] = [count] + 1 WHERE [date] = CONVERT (date, GETDATE())
IF ##ROWCOUNT=0
INSERT INTO audit ([date], [count]) VALUES (GETDATE(), 1)
END
The above trigger only happens after an UPDATE or INSERT on the Movie table and tries to update the count + 1 in the Audit table and if it doesn't exists (IF ##ROWCOUNT=0) it then creates it. Any help would be much appreciated! Thanks.

Something like this should work:
create table dbo.Movies (
A int not null,
B int not null,
DateAdded datetime not null
)
go
create view dbo.audit
with schemabinding
as
select CONVERT(date,DateAdded) as dt,COUNT_BIG(*) as cnt
from dbo.Movies
group by CONVERT(date,DateAdded)
go
create unique clustered index IX_MovieCounts on dbo.audit (dt)
This is called an indexed view. The advantage is that SQL Server takes responsibility for maintaining the data stored in this view, and it's always right.
Unless you're on Enterprise/Developer edition, you'd query the audit view using the NOEXPAND hint:
SELECT * from audit with (noexpand)
This has the advantages that
a) You don't have to write the triggers yourself now (SQL Server does actually have something quite similar to triggers behind the scenes),
b) It can now cope with multi-row inserts, updates and deletes, and
c) You don't have to write the logic to cope with an update that changes the DateAdded value.

Rather than incrementing the count by 1 you should probably be incrementing it by the number of records that have changed e.g.
UPDATE Audit
SET [count] = [count] + (SELECT COUNT(*) FROM INSERTED)
WHERE [date] = CONVERT (date, GETDATE())
IF ##ROWCOUNT=0
INSERT INTO audit ([date], [count])
VALUES (GETDATE(), (SELECT COUNT(*) FROM INSERTED))

Copy table rows using OUTPUT INTO in SQL Server 2005

I have a table which I need to copy records from back into itself. As part of that, I want to capture the new rows using an OUTPUT clause into a table variable so I can perform other opertions on the rows as well in the same process. I want each row to contain its new key and the key it was copied from. Here's a contrived example:
INSERT
MyTable (myText1, myText2) -- myId is an IDENTITY column
OUTPUT
Inserted.myId,
Inserted.myText1,
Inserted.myText2
INTO
-- How do I get previousId into this table variable AND the newly inserted ID?
#MyTable
SELECT
-- MyTable.myId AS previousId,
MyTable.myText1,
MyTable.myText2
FROM
MyTable
WHERE
...
SQL Server barks if the number of columns on the INSERT doesn't match the number of columns from the SELECT statement. Because of that, I can see how this might work if I added a column to MyTable, but that isn't an option. Previously, this was implemented with a cursor which is causing a performance bottleneck -- I'm purposely trying to avoid that.
How do I copy these records while preserving the copied row's key in a way that will achieve the highest possible performance?

I'm a little unclear as to the context - is this in an AFTER INSERT trigger.
Anyway, I can't see any way to do this in a single call. The OUTPUT clause will only allow you to return rows that you have inserted. What I would recommend is as follows:
DECLARE #MyTable (
myID INT,
previousID INT,
myText1 VARCHAR(20),
myText2 VARCHAR(20)
)
INSERT #MyTable (previousID, myText1, myText2)
SELECT myID, myText1, myText2 FROM inserted
INSERT MyTable (myText1, myText2)
SELECT myText1, myText2 FROM inserted
-- ##IDENTITY now points to the last identity value inserted, so...
UPDATE m SET myID = i.newID
FROM #myTable m, (SELECT ##IDENTITY - ROW_NUMBER() OVER(ORDER BY myID DESC) + 1 AS newID, myID FROM inserted) i
WHERE m.previousID = i.myID
...
Of course, you wouldn't put this into an AFTER INSERT trigger, because it will give you a recursive call, but you could do it in an INSTEAD OF INSERT trigger. I may be wrong on the recursive issue; I've always avoid the recursive call, so I've never actually found out. Using ##IDENTITY and ROW_NUMBER(), however, is a trick I've used several times in the past to do something similar.

Update table column with data from other columns in same row

I have a table that has several columns of textual data. The goal is to concatenate those columns into a single different column in the same table and same row.
What is the SQL Server query syntax that would allow me to do this?

Something like this:
UPDATE myTable SET X = Y + Z

Do you absolutely have to duplicate your data? If one of the column values changes, you will have to update the concatenated value.
A computed column:
alter table dbo.MyTable add ConcatenatedColumn = ColumnA + ColumnB
Or a view:
create view dbo.MyView as
select ColumnA, ColumnB, ColumnA + ColumnB as 'ConcatenatedColumn'
from dbo.MyTable
Now you can update ColumnA or ColumnB, and ConcatenatedColumn will always be in sync. If that's the behaviour you need, of course.

Might be misunderstanding but:
Alter table myTable add combinedColumn Varchar(1000);
Update myTable set combinedColumn = textField1 + textField2;

select
textfield1 + textfield2 + ... + textfieldN as conc_text,
otherfield1,
otherfield2,
...
otherfieldN
from
mytable

SQL - Inserting and Updating Multiple Records at Once

I have a stored procedure that is responsible for inserting or updating multiple records at once. I want to perform this in my stored procedure for the sake of performance.
This stored procedure takes in a comma-delimited list of permit IDs and a status. The permit IDs are stored in a variable called #PermitIDs. The status is stored in a variable called #Status. I have a user-defined function that converts this comma-delimited list of permit IDs into a Table. I need to go through each of these IDs and do either an insert or update into a table called PermitStatus.
If a record with the permit ID does not exist, I want to add a record. If it does exist, I'm want to update the record with the given #Status value. I know how to do this for a single ID, but I do not know how to do it for multiple IDs. For single IDs, I do the following:
-- Determine whether to add or edit the PermitStatus
DECLARE #count int
SET #count = (SELECT Count(ID) FROM PermitStatus WHERE [PermitID]=#PermitID)
-- If no records were found, insert the record, otherwise add
IF #count = 0
BEGIN
INSERT INTO
PermitStatus
(
[PermitID],
[UpdatedOn],
[Status]
)
VALUES
(
#PermitID,
GETUTCDATE(),
1
)
END
ELSE
UPDATE
PermitStatus
SET
[UpdatedOn]=GETUTCDATE(),
[Status]=#Status
WHERE
[PermitID]=#PermitID
How do I loop through the records in the Table returned by my user-defined function to dynamically insert or update the records as needed?

create a split function, and use it like:
SELECT
*
FROM YourTable y
INNER JOIN dbo.splitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTableAll]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this WILL return empty rows
----------------
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTableAll(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
RowNumber ListValue
----------- ----------
1 1
2 2
3 3
4
5
6 4
7 5
8 6777
9
10
11
(11 row(s) affected)
To make what you need work, do the following:
--this would be the existing table
DECLARE #OldData table (RowID int, RowStatus char(1))
INSERT INTO #OldData VALUES (10,'z')
INSERT INTO #OldData VALUES (20,'z')
INSERT INTO #OldData VALUES (30,'z')
INSERT INTO #OldData VALUES (70,'z')
INSERT INTO #OldData VALUES (80,'z')
INSERT INTO #OldData VALUES (90,'z')
--these would be the stored procedure input parameters
DECLARE #IDList varchar(500)
,#StatusList varchar(500)
SELECT #IDList='10,20,30,40,50,60'
,#StatusList='A,B,C,D,E,F'
--stored procedure local variable
DECLARE #InputList table (RowID int, RowStatus char(1))
--convert input prameters into a table
INSERT INTO #InputList
(RowID,RowStatus)
SELECT
i.ListValue,s.ListValue
FROM dbo.FN_ListToTableAll(',',#IDList) i
INNER JOIN dbo.FN_ListToTableAll(',',#StatusList) s ON i.RowNumber=s.RowNumber
--update all old existing rows
UPDATE o
SET RowStatus=i.RowStatus
FROM #OldData o WITH (UPDLOCK, HOLDLOCK) --to avoid race condition when there is high concurrency as per #emtucifor
INNER JOIN #InputList i ON o.RowID=i.RowID
--insert only the new rows
INSERT INTO #OldData
(RowID, RowStatus)
SELECT
i.RowID, i.RowStatus
FROM #InputList i
LEFT OUTER JOIN #OldData o ON i.RowID=o.RowID
WHERE o.RowID IS NULL
--display the old table
SELECT * FROM #OldData order BY RowID
OUTPUT:
RowID RowStatus
----------- ---------
10 A
20 B
30 C
40 D
50 E
60 F
70 z
80 z
90 z
(9 row(s) affected)
EDIT thanks to #Emtucifor click here for the tip about the race condition, I have included the locking hints in my answer, to prevent race condition problems when there is high concurrency.

There are various methods to accomplish the parts you ask are asking about.
Passing Values
There are dozens of ways to do this. Here are a few ideas to get you started:
Pass in a string of identifiers and parse it into a table, then join.
SQL 2008: Join to a table-valued parameter
Expect data to exist in a predefined temp table and join to it
Use a session-keyed permanent table
Put the code in a trigger and join to the INSERTED and DELETED tables in it.
Erland Sommarskog provides a wonderful comprehensive discussion of lists in sql server. In my opinion, the table-valued parameter in SQL 2008 is the most elegant solution for this.
Upsert/Merge
Perform a separate UPDATE and INSERT (two queries, one for each set, not row-by-row).
SQL 2008: MERGE.
An Important Gotcha
However, one thing that no one else has mentioned is that almost all upsert code, including SQL 2008 MERGE, suffers from race condition problems when there is high concurrency. Unless you use HOLDLOCK and other locking hints depending on what's being done, you will eventually run into conflicts. So you either need to lock, or respond to errors appropriately (some systems with huge transactions per second have used the error-response method successfully, instead of using locks).
One thing to realize is that different combinations of lock hints implicitly change the transaction isolation level, which affects what type of locks are acquired. This changes everything: which other locks are granted (such as a simple read), the timing of when a lock is escalated to update from update intent, and so on.
I strongly encourage you to read more detail on these race condition problems. You need to get this right.
Conditional Insert/Update Race Condition
“UPSERT” Race Condition With MERGE
Example Code
CREATE PROCEDURE dbo.PermitStatusUpdate
#PermitIDs varchar(8000), -- or (max)
#Status int
AS
SET NOCOUNT, XACT_ABORT ON -- see note below
BEGIN TRAN
DECLARE #Permits TABLE (
PermitID int NOT NULL PRIMARY KEY CLUSTERED
)
INSERT #Permits
SELECT Value FROM dbo.Split(#PermitIDs) -- split function of your choice
UPDATE S
SET
UpdatedOn = GETUTCDATE(),
Status = #Status
FROM
PermitStatus S WITH (UPDLOCK, HOLDLOCK)
INNER JOIN #Permits P ON S.PermitID = P.PermitID
INSERT PermitStatus (
PermitID,
UpdatedOn,
Status
)
SELECT
P.PermitID,
GetUTCDate(),
#Status
FROM #Permits P
WHERE NOT EXISTS (
SELECT 1
FROM PermitStatus S
WHERE P.PermitID = S.PermitID
)
COMMIT TRAN
RETURN ##ERROR;
Note: XACT_ABORT helps guarantee the explicit transaction is closed following a timeout or unexpected error.
To confirm that this handles the locking problem, open several query windows and execute an identical batch like so:
WAITFOR TIME '11:00:00' -- use a time in the near future
EXEC dbo.PermitStatusUpdate #PermitIDs = '123,124,125,126', 1
All of these different sessions will execute the stored procedure in nearly the same instant. Check each session for errors. If none exist, try the same test a few times more (since it's possible to not always have the race condition occur, especially with MERGE).
The writeups at the links I gave above give even more detail than I did here, and also describe what to do for the SQL 2008 MERGE statement as well. Please read those thoroughly to truly understand the issue.
Briefly, with MERGE, no explicit transaction is needed, but you do need to use SET XACT_ABORT ON and use a locking hint:
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Table WITH (HOLDLOCK) AS TableAlias
...
This will prevent concurrency race conditions causing errors.
I also recommend that you do error handling after each data modification statement.

If you're using SQL Server 2008, you can use table valued parameters - you pass in a table of records into a stored procedure and then you can do a MERGE.
Passing in a table valued parameter would remove the need to parse CSV strings.
Edit:
ErikE has raised the point about race conditions, please refer to his answer and linked articles.

If you have SQL Server 2008, you can use MERGE. Here's an article describing this.

You should be able to do your insert and your update as two set based queries.
The code below was based on a data load procedure that I wrote a while ago that took data from a staging table and inserted or updated it into the main table.
I've tried to make it match your example, but you may need to tweak this (and create a table valued UDF to parse your CSV into a table of ids).
-- Update where the join on permitstatus matches
Update
PermitStatus
Set
[UpdatedOn]=GETUTCDATE(),
[Status]=staging.Status
From
PermitStatus status
Join
StagingTable staging
On
staging.PermitId = status.PermitId
-- Insert the new records, based on the Where Not Exists
Insert
PermitStatus(Updatedon, Status, PermitId)
Select (GETUTCDATE(), staging.status, staging.permitId
From
StagingTable staging
Where Not Exists
(
Select 1 from PermitStatus status
Where status.PermitId = staging.PermidId
)

Essentially you have an upsert stored procedure (eg. UpsertSinglePermit)
(like the code you have given above) for dealing with one row.
So the steps I see are to create a new stored procedure (UpsertNPermits) which does
a) Parse input string into n record entries (each record contains permit id and status)
b) Foreach entry in above, invoke UpsertSinglePermit

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Computed column vs Updates in SQL Server - sql-server

Related

Update Query with Order by while preventing two users updating the same row

SQL trigger for audit table getting out of sync

Copy table rows using OUTPUT INTO in SQL Server 2005

Update table column with data from other columns in same row

SQL - Inserting and Updating Multiple Records at Once

Categories

Resources