How to execute REPLACE query in SQL Server [duplicate] - sql-server

MySQL has this incredibly useful yet proprietary REPLACE INTO SQL Command.
Can this easily be emulated in SQL Server 2005?
Starting a new Transaction, doing a Select() and then either UPDATE or INSERT and COMMIT is always a little bit of a pain, especially when doing it in the application and therefore always keeping 2 versions of the statement.
I wonder if there is an easy and universal way to implement such a function into SQL Server 2005?

This is something that annoys me about MSSQL (rant on my blog). I wish MSSQL supported upsert.
#Dillie-O's code is a good way in older SQL versions (+1 vote), but it still is basically two IO operations (the exists and then the update or insert)
There's a slightly better way on this post, basically:
--try an update
update tablename
set field1 = 'new value',
field2 = 'different value',
...
where idfield = 7
--insert if failed
if ##rowcount = 0 and ##error = 0
insert into tablename
( idfield, field1, field2, ... )
values ( 7, 'value one', 'another value', ... )
This reduces it to one IO operations if it's an update, or two if an insert.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(

The functionality you're looking for is traditionally called an UPSERT. Atleast knowing what it's called might help you find what you're looking for.
I don't think SQL Server 2005 has any great ways of doing this. 2008 introduces the MERGE statement that can be used to accomplish this as shown in: http://www.databasejournal.com/features/mssql/article.php/3739131 or http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
Merge was available in the beta of 2005, but they removed it out in the final release.

What the upsert/merge is doing is something to the effect of...
IF EXISTS (SELECT * FROM [Table] WHERE Id = X)
UPDATE [Table] SET...
ELSE
INSERT INTO [Table]
So hopefully the combination of those articles and this pseudo code can get things moving.

I wrote a blog post about this issue.
The bottom line is that if you want cheap updates and want to be safe for concurrent usage, try:
update t
set hitCount = hitCount + 1
where pk = #id
if ##rowcount < 1
begin
begin tran
update t with (serializable)
set hitCount = hitCount + 1
where pk = #id
if ##rowcount = 0
begin
insert t (pk, hitCount)
values (#id,1)
end
commit tran
end
This way you have 1 operation for updates and a max of 3 operations for inserts. So, if you are generally updating, this is a safe cheap option.
I would also be very careful not to use anything that is unsafe for concurrent usage. It's really easy to get primary key violations or duplicate rows in production.

Related

Recompilation issue with in-memory tables

In our high-loaded OLTP processing we're using permanent in-memory tables like temporary tables (similar to https://learn.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/faster-temp-table-and-table-variable-by-using-memory-optimization?view=sql-server-2017, case C). But, under a low load we found very big number of recompilations in stored procedures caused by the reason '2 - Statistics changed'. Number of rows in these tables varies from 0 to 50-100 each execution. There is no way to disable auto update statistics on in-memory tables. Also, an option 'KEEPFIXED PLAN' cannot be applied in subqueries like this:
if exists(
select 1 from dbo.mytable option (KEEPFIXED PLAN)
)
begin
select 1
end
Any ideas, how can we avoid excessive recompilations?
Aaron, thank you very much for your suggestion - solution found. It's not perfect - we have to rewrite a lot of code, but it works. We have to change an 'if' operator to a query, so we can apply hints this way:
declare #exists int
select #exists = 1 where exists (select 1 from dbo.MyTable) option (KEEPFIXED PLAN)
if #exists = 1
begin
select 'exists!!!'
end
else
begin
select 'not exists...'
end

Deleting 1 millions rows in SQL Server

I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:
DELETE FROM table_1 where condition1 = 'value' ?
Here is a structure for a batched delete as suggested above. Do not try 1M at once...
The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).
This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.
Edit: Added a stop time #MAXRUNTIME & #BSTOPATMAXTIME. If you set #BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.
Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.
DECLARE #BATCHSIZE INT, #WAITFORVAL VARCHAR(8), #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #WAITFORVAL = '00:00:10'
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
DELETE TOP(#BATCHSIZE)
FROM SOMETABLE
WHERE 1=2
SET #BATCHSIZE=##ROWCOUNT
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
WAITFOR DELAY #WAITFORVAL
END
BEGIN TRANSACTION
DoAgain:
DELETE TOP (1000)
FROM <YourTable>
IF ##ROWCOUNT > 0
GOTO DoAgain
COMMIT TRANSACTION
Maybe this solution from Uri Dimant
WHILE 1 = 1
BEGIN
DELETE TOP(2000)
FROM Foo
WHERE <predicate>;
IF ##ROWCOUNT < 2000 BREAK;
END
(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)
Here is something I have used:
If the bad data is mixed in with the good-
INSERT INTO #table
SELECT columns
FROM old_table
WHERE statement to exclude bad rows
TRUNCATE old_table
INSERT INTO old_table
SELECT columns FROM #table
Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)
create a duplicate table of table_1 like table_1_dup
insert into table_1_dup select * from table_1 where condition1 <> 'value';
drop table table_1
sp_rename table_1_dup table_1
If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
If you are in a hurry and need the fastest way possible:
take the database out of production
drop all non-clustered indexes and triggers
delete the records (or if the majority of records is bad, copy+drop+rename the table)
(if applicable) fix the inconsistencies caused by the fact that you dropped triggers
re-create the indexes and triggers
bring the database back in production

Is a single SQL Server statement atomic and consistent?

Is a statement in SQL Server ACID?
What I mean by that
Given a single T-SQL statement, not wrapped in a BEGIN TRANSACTION / COMMIT TRANSACTION, are the actions of that statement:
Atomic: either all of its data modifications are performed, or none of them is performed.
Consistent: When completed, a transaction must leave all data in a consistent state.
Isolated: Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions.
Durable: After a transaction has completed, its effects are permanently in place in the system.
The reason I ask
I have a single statement in a live system that appears to be violating the rules of the query.
In effect my T-SQL statement is:
--If there are any slots available,
--then find the earliest unbooked transaction and mark it booked
UPDATE Transactions
SET Booked = 1
WHERE TransactionID = (
SELECT TOP 1 TransactionID
FROM Slots
INNER JOIN Transactions t2
ON Slots.SlotDate = t2.TransactionDate
WHERE t2.Booked = 0 --only book it if it's currently unbooked
AND Slots.Available > 0 --only book it if there's empty slots
ORDER BY t2.CreatedDate)
Note: But a simpler conceptual variant might be:
--Give away one gift, as long as we haven't given away five
UPDATE Gifts
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
In both of these statements, notice that they are single statements (UPDATE...SET...WHERE).
There are cases where the wrong transaction is being "booked"; it's actually picking a later transaction. After staring at this for 16 hours, I'm stumped. It's as though SQL Server is simply violating the rules.
I wondered what if the results of the Slots view is changing before the update happens? What if SQL Server is not holding SHARED locks on the transactions on that date? Is it possible that a single statement can be inconsistent?
So I decided to test it
I decided to check if the results of sub-queries, or inner operations, are inconsistent. I created a simple table with a single int column:
CREATE TABLE CountingNumbers (
Value int PRIMARY KEY NOT NULL
)
From multiple connections, in a tight loop, I call the single T-SQL statement:
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
In other words the pseudo-code is:
while (true)
{
ADOConnection.Execute(sql);
}
And within a few seconds I get:
Violation of PRIMARY KEY constraint 'PK__Counting__07D9BBC343D61337'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate value is (1332)
Are statements atomic?
The fact that a single statement wasn't atomic makes me wonder if single statements are atomic?
Or is there a more subtle definition of statement, that differs from (for example) what SQL Server considers a statement:
Does this fundamentally means that within the confines of a single T-SQL statement, SQL Server statements are not atomic?
And if a single statement is atomic, what accounts for the key violation?
From within a stored procedure
Rather than a remote client opening n connections, I tried it with a stored procedure:
CREATE procedure [dbo].[DoCountNumbers] AS
SET NOCOUNT ON;
DECLARE #bumpedCount int
SET #bumpedCount = 0
WHILE (#bumpedCount < 500) --safety valve
BEGIN
SET #bumpedCount = #bumpedCount+1;
PRINT 'Running bump '+CAST(#bumpedCount AS varchar(50))
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
IF (#bumpedCount >= 500)
BEGIN
PRINT 'WARNING: Bumping safety limit of 500 bumps reached'
END
END
PRINT 'Done bumping process'
and opened 5 tabs in SSMS, pressed F5 in each, and watched as they too violated ACID:
Running bump 414
Msg 2627, Level 14, State 1, Procedure DoCountNumbers, Line 14
Violation of PRIMARY KEY constraint 'PK_CountingNumbers'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate key value is (4414).
The statement has been terminated.
So the failure is independent of ADO, ADO.net, or none of the above.
For 15 years i've been operating under the assumption that a single statement in SQL Server is consistent; and the only
What about TRANSACTION ISOLATION LEVEL xxx?
For different variants of the SQL batch to execute:
default (read committed): key violation
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
default (read committed), explicit transaction: no error key violation
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
serializable: deadlock
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
snapshot (after altering database to enable snapshot isolation): key violation
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRANSACTION
INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
Bonus
Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)
Default transaction isolation level (READ COMMITTED)
Turns out every query I've ever written is broken
This certainly changes things. Every update statement I've ever written is fundamentally broken. E.g.:
--Update the user with their last invoice date
UPDATE Users
SET LastInvoiceDate = (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.uid = Users.uid)
Wrong value; because another invoice could be inserted after the MAX and before the UPDATE. Or an example from BOL:
UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD +
(SELECT SUM(so.SubTotal)
FROM Sales.SalesOrderHeader AS so
WHERE so.OrderDate = (SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader AS so2
WHERE so2.SalesPersonID = so.SalesPersonID)
AND Sales.SalesPerson.BusinessEntityID = so.SalesPersonID
GROUP BY so.SalesPersonID);
without exclusive holdlocks, the SalesYTD is wrong.
How have I been able to do anything all these years.
I've been operating under the assumption that a single statement in SQL Server is consistent
That assumption is wrong. The following two transactions have identical locking semantics:
STATEMENT
BEGIN TRAN; STATEMENT; COMMIT
No difference at all. Single statements and auto-commits do not change anything.
So merging all logic into one statement does not help (if it does, it was by accident because the plan changed).
Let's fix the problem at hand. SERIALIZABLE will fix the inconsistency you are seeing because it guarantees that your transactions behave as if they executed single-threadedly. Equivalently, they behave as if they executed instantly.
You will be getting deadlocks. If you are ok with a retry loop, you're done at this point.
If you want to invest more time, apply locking hints to force exclusive access to the relevant data:
UPDATE Gifts -- U-locked anyway
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts WITH (UPDLOCK, HOLDLOCK) --this normally just S-locks.
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
You will now see reduced concurrency. That might be totally fine depending on your load.
The very nature of your problem makes achieving concurrency hard. If you require a solution for that we'd need to apply more invasive techniques.
You can simplify the UPDATE a bit:
WITH g AS (
SELECT TOP 1 Gifts.*
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
UPDATE g -- U-locked anyway
SET GivenAway = 1
This gets rid of one unnecessary join.
Below is an example of an UPDATE statement that does increment a counter value atomically
-- Do this once for test setup
CREATE TABLE CountingNumbers (Value int PRIMARY KEY NOT NULL)
INSERT INTO CountingNumbers VALUES(1)
-- Run this in parallel: start it in two tabs on SQL Server Management Studio
-- You will see each connection generating new numbers without duplicates and without timeouts
while (1=1)
BEGIN
declare #nextNumber int
-- Taking the Update lock is only relevant in case this statement is part of a larger transaction
-- to prevent deadlock
-- When executing without a transaction, the statement will itself be atomic
UPDATE CountingNumbers WITH (UPDLOCK, ROWLOCK) SET #nextNumber=Value=Value+1
print #nextNumber
END
Select does not lock exclusively, even serializable does, but only for the time the select is executed! Once the select is over, the select lock is gone. Then, update locks take on as they now know what to lock as Select has return results. Meanwhile, anyone else can Select again!
The only sure way to safely read and lock a row is:
begin transaction
--lock what i need to read
update mytable set col1=col1 where mykey=#key
--now read what i need
select #d1=col1,#d2=col2 from mytable where mykey=#key
--now do here calculations checks whatever i need from the row i read to decide my update
if #d1<#d2 set #d1=#d2 else set #d1=#d2 * 2 --just an example calc
--now do the actual update on what i read and the logic
update mytable set col1=#d1,col2=#d2 where mykey=#key
commit transaction
This way any other connection running the same statement for the same data it will surely wait at the first (fake) update statement until the previous is done. This ensures that when lock is released only one connection will granted permission to lock request to 'update' and this one will surely read committed finalized data to make calculations and decide if and what to actually update at the second 'real' update.
In other words, when you need to select information to decide if/how to update, you need a begin/commit transaction block plus you need to start with a fake update of what you need to select - before you select it(update output will also do).

SQL Server Triggers: Update

Using Microsoft SQL Server, I am writing a SQL trigger for an update on a table and I am stuck. I am not very proficient in SQL, so it may be something basic that I am missing.
CREATE TRIGGER test
ON tableName
AFTER UPDATE
AS
BEGIN
DECLARE #variableA int
SELECT #variableA = variableA FROM DELETED
DECLARE #variableB int
SELECT #variableB = variableB FROM INSERTED;
f(#variableA <> #variableB )
BEGIN
//Do What I want
END
This works correctly, as it preforms the action when the two variables are different. However, I do not want to consider ALL records from tableName.
I wrote the following to get only the entries that I want.
WITH
table (variableID, variable)
AS
(
SELECT variableID, variable
FROM tableName
WHERE variable= 'value'
)
SELECT * FROM table
So what I want is to apply the trigger ONLY to the values that are found in the SELECT. Am I going about this the right way?
An UPDATE query isn't guaranteed to only impact a single row at a time, and your queries against INSERTED and DELETED need to reflect that to be on the safe side. Probably the easiest way to detect a changed value is to join the two trigger tables on the primary key.
CREATE TRIGGER test ON tableName
AFTER UPDATE AS
BEGIN
IF EXISTS (
SELECT * FROM INSERTED I
INNER JOIN DELETED D ON I.variableID = D.variableID
WHERE D.VariableA <> I.VariableB
/* AND further conditions */
) BEGIN
-- Perform your action
END
END
Where this snippet has /* AND further conditions */ would be a good place to insert the additional checks you want to do against the data before running your action.
For example, you can limit your action to updates where variable was 'value' before the update...
AND D.variable = 'value'
or where variable is set to 'value' by the update...
AND I.variable = 'value'

SQL Server SELECT/UPDATE Stored Procedure Weirdness

I have a table I'm using as a work queue. Essentially, it consists of a primary key, a piece of data, and a status flag (processed/unprocessed). I have multiple processes trying to grab the next unprocessed row, so I need to make sure that they observe proper lock and update semantics to avoid race condition nastiness. To that end, I've defined a stored procedure they can call:
CREATE PROCEDURE get_from_q
AS
DECLARE #queueid INT;
BEGIN TRANSACTION TRAN1;
SELECT TOP 1
#queueid = id
FROM
MSG_Q WITH (updlock, readpast)
WHERE
MSG_Q.status=0;
SELECT TOP 1 *
FROM
MSG_Q
WHERE
MSG_Q.id=#queueid;
UPDATE MSG_Q
SET status=1
WHERE id=#queueid;
COMMIT TRANSACTION TRAN1;
Note the use of "WITH (updlock, readpast)" to make sure that I lock the target row and ignore rows that are similarly locked already.
Now, the procedure works as listed above, which is great. While I was putting this together, however, I found that if the second SELECT and the UPDATE are reversed in order (i.e. UPDATE first then SELECT), I got no data back at all. And no, it didn't matter whether the second SELECT was before or after the final COMMIT.
My question is thus why the order of the second SELECT and UPDATE makes a difference. I suspect that there is something subtle going on there that I don't understand, and I'm worried that it's going to bite me later on.
Any hints?
by default transactions are READ COMMITTED :
"Specifies that shared locks are held while the data is being read to avoid dirty reads, but the data can be changed before the end of the transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default."
http://msdn.microsoft.com/en-us/library/aa259216.aspx
I think you are getting nothing in the select because the record is still marked as dirty. You'd have to change the transaction isolation level OR, what I do is do the update first and then read the record, but to do this you have to flag the record w/ a unique value (I use a getdate() for batchs but a GUID would be what you probably want to use).
Although not directly answering your question here, rather than reinventing the wheel and making life difficult for yourself, unless you enjoy it of course ;-), may I suggest that you look at using SQL Server Service Broker.
It provides an existing framework for using queues etc.
To find out more visit.
Service Broker Link
Now back to the question, I am not able to replicate your problem, as you will see if you execute the code below, data is returned regardless of the order os the select/update statement.
So your example above then.
create table #MSG_Q
(id int identity(1,1) primary key,status int)
insert into #MSG_Q select 0
DECLARE #queueid INT
BEGIN TRANSACTION TRAN1
SELECT TOP 1 #queueid = id FROM #MSG_Q WITH (updlock, readpast) WHERE #MSG_Q.status=0
UPDATE #MSG_Q SET status=1 WHERE id=#queueid
SELECT TOP 1 * FROM #MSG_Q WHERE #MSG_Q.id=#queueid
COMMIT TRANSACTION TRAN1
select * from #MSG_Q
drop table #MSG_Q
Returns the Results (1,1) and (1,1)
Now swapping the statement order.
create table #MSG_Q
(id int identity(1,1) primary key,status int)
insert into #MSG_Q select 0
DECLARE #queueid INT
BEGIN TRANSACTION TRAN1
SELECT TOP 1 #queueid = id FROM #MSG_Q WITH (updlock, readpast) WHERE #MSG_Q.status=0
SELECT TOP 1 * FROM #MSG_Q WHERE #MSG_Q.id=#queueid
UPDATE #MSG_Q SET status=1 WHERE id=#queueid
COMMIT TRANSACTION TRAN1
select * from #MSG_Q
drop table #MSG_Q
Results in: (1,0), (1,1) as expected.
Perhaps you could qualify your issue further?
More experimentation leads me to conclude that I was chasing a red herring, brought about by the tools I was using to exec my stored procedure. I was initially using DBVisualizer (free edition) and Netbeans, and they both appear to be confused by something about the format of the results. DBVisualizer suggests that I'm getting multiple result sets back, and that the free edition doesn't handle that.
Since then, I grabbed the free MS SQL Server Management Studio Express and things work perfectly. For those interested, the URL to SMSE is here:
MS SQL Server SMSE
Don't forget to install the MSXML6 service pack, too:
MSXML Service Pack 1
So, totally my bad in this case. :-(
Major thanks and kudos to you guys for your answers though. You helped me confirm that what I was doing should work, which lead me to the change I had to make to actually "solve" the issue. Thanks ever so much!
One more point-- including a "SET NOCOUNT ON" in the stored procedure fixed things for all ODBC clients. Apparently the rowcounts for the first select was confusing the ODBC clients, and telling SQL Server to not return that value makes things work perfectly...

Resources