I am using SQL Server. I have a stored procedure that does the following:
INSERT INTO Tbl1
SELECT col1, col2, col3
FROM Tbl2
My question is: does this need a transaction with commit? From looking online it does not seem so. I will have it as part of a nightly batch process so want to make sure it behaves properly. Should I do a try catch in this case?
If you have just only this statement it doesn't matter, because implicitly it is a transaction. In case of failure it won't insert any rows, but in other case it will commit the changes. You might have a scenario when at the beggining of your procedure you delete rows and then inserting rows into a table. In such scenario it might be good to wrap it up in one transaction. Thanks to this when delete succeedes, but insert fails you will still have data in your table.
Related
A common case for DB transactions is performing operations on multiple tables, as you can then easily rollback all operations if one fails. However, a common scenario I run into is wanting to insert records to multiple tables where the later inserts need the serial ID from the previous inserts.
Since the ID is not generated/available until the transaction is actually committed, how can one accomplish this? If you have to commit after the first insert in order to get the ID and then execute the second insert, it seems to defeat the purpose of the transaction in the first place because after committing (or if I don't use a transaction at all) I cannot rollback the first insert if the second insert fails.
This seems like such a common use case for DB transactions that I can't imagine it would not be supported in some way. How can this be accomplished?
cte (common table expression) with data modifying statements should cover your need, see the manual.
Typical example :
WITH cte AS (INSERT INTO table_A (id) VALUES ... RETURNING id)
INSERT INTO table_B (id) SELECT id FROM cte
see the demo in dbfiddle
I am using SQL Server 2016
I tried to the following query.
SELECT CONVERT(BIGINT, 'A') col1 INTO #tmp
This query is obviously in error. Because it does not convert.
However, the temporary table (#tmp) is created even if the query fails.
Why? I think this is by design, but I want to know.
P.S. PDW (parallel datawarehouse) does not create temporary table.
I am querying a table in SQL Server DB that gets continuous inserts from other sources. The SELECT statement used to read the data from this table is used in my ETL job and it queries only a selected partition in the table.
SELECT *
FROM REALTIMESRC
WHERE PARTITION = '2018-11';
I understand that SELECT statement by default introduces a Shared Lock on the rows that it selects.
When this table gets inserts from other sources in the same partition where I am querying, does data insert get impacted due to my Select operation?
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel. Can someone please clarify this?
I understand that SELECT statement by default introduces a shared lock on the rows that it selects.
That is correct, yes.
When this table gets inserts from other sources in the same partition
where I am querying, does data insert get impacted due to my Select operation?
No, since the insert only introduces new rows that you haven't selected, there shouldn't be any problem.
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel.
Yes, that is correct - the INSERT and SELECT should work just fine in parallel.
There might be some edge cases where you could run into trouble:
if the INSERT statement tries to insert more than 5000 rows in a single transaction, SQL Server might opt to escalate those 5000 individual locks into a table-level exclusive lock - at which point no more SELECT operations would be possible until the INSERT transaction completes
I know at least three ways to insert a record if it doesn't already exist in a table:
The first one is using if not exist:
IF NOT EXISTS(select 1 from table where <condition>)
INSERT...VALUES
The second one is using merge:
MERGE table AS target
USING (SELECT values) AS source
ON (condition)
WHEN NOT MATCHED THEN
INSERT ... VALUES ...
The third one is using insert...select:
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>)
But which one is the best?
The first option seems to be not thread-safe, as the record might be inserted between the select statement in the if and the insert statement that follows, if two or more users try to insert the same record.
As for the second option, merge seems to be an overkill for this, as the documentation states:
Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating the row if it does match. When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.
So I think the third option is the best for this scenario (only insert the record if it doesn't already exist, no need to update if it does), but I would like to know what SQL Server experts think.
Please note that after the insert, I'm not interested to know whether the record was already there or whether it's a brand new record, I just need it to be there so that I can carry on with the rest of the stored procedure.
When you need to guarantee the uniqueness of records on a condition that can not to be expressed by a UNIQUE or PRIMARY KEY constraint, you indeed need to make sure that the check for existence and insert are being done in one transaction. You can achieve this by either:
Using one SQL statement performing the check and the insert (your third option)
Using a transaction with the appropriate isolation level
There is a fourth way though that will help you better structure your code and also make it work in situations where you need to process a batch of records at once. You can create a TABLE variable or a temporary table, insert all of the records that need to be inserted in there and then write the INSERT, UPDATE and DELETE statements based on this variable.
Below is (pseudo)code demonstrating this approach:
-- Logic to create the data to be inserted if necessary
DECLARE #toInsert TABLE (idCol INT PRIMARY KEY,dataCol VARCHAR(MAX))
INSERT INTO #toInsert (idCol,dataCol) VALUES (1,'row 1'),(2,'row 2'),(3,'row 3')
-- Logic to insert the data
INSERT INTO realTable (idCol,dataCol)
SELECT TI.*
FROM #toInsert TI
WHERE NOT EXISTS (SELECT 1 FROM realTable RT WHERE RT.dataCol=TI.dataCol)
In many situations I use this approach as it makes the TSQL code easier to read, possible to refactor and apply unit tests to.
Following Vladimir Baranov's comment, reading Dan Guzman's blog posts about Conditional INSERT/UPDATE Race Condition and “UPSERT” Race Condition With MERGE, seems like all three options suffers from the same drawbacks in a multi-user environment.
Eliminating the merge option as an overkill, we are left with options 1 and 3.
Dan's proposed solution is to use an explicit transaction and add lock hints to the select to avoid race condition.
This way, option 1 becomes:
BEGIN TRANSACTION
IF NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK) where <condition>)
BEGIN
INSERT...VALUES
END
COMMIT TRANSACTION
and option 2 becomes:
BEGIN TRANSACTION
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK)where <condition>)
COMMIT TRANSACTION
Of course, in both options there need to be some error handling - every transaction should use a try...catch so that we can rollback the transaction in case of an error.
That being said, I think the 3rd option is probably my personal favorite, but I don't think there should be a difference.
Update
Following a conversation I've had with Aaron Bertrand in the comments of some other question - I'm not entirely convinced that using ISOLATION LEVEL is a better solution than individual query hints, but at least that's another option to consider:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>);
COMMIT TRANSACTION;
My kindergarten SQL Server taught me that a trigger may be fired with multiple rows in the inserted and deleted pseudo tables. I mostly write my trigger code with this in mind, often resulting in some cursor based cludge. Now I'm really only able to test them firing for a single row at a time. How can I generate a multirow trigger and will SQL Server actually ever send a multirow trigger? Can I set a flag so that SQL Server will only fire single row triggers??
Trigger definitions should always handle multiple rows.
Taken from SQLTeam:
-- BAD Trigger code following:
CREATE TRIGGER trg_Table1
ON Table1
For UPDATE
AS
DECLARE #var1 int, #var2 varchar(50)
SELECT #var1 = Table1_ID, #var2 = Column2
FROM inserted
UPDATE Table2
SET SomeColumn = #var2
WHERE Table1_ID = #var1
The above trigger will only work for the last row in the inserted table.
This is how you should implement it:
CREATE TRIGGER trg_Table1
ON Table1
FOR UPDATE
AS
UPDATE t2
SET SomeColumn = i.SomeColumn
FROM Table2 t2
INNER JOIN inserted i
ON t2.Table1_ID = i.Table1_ID
Yes, if a statement affects more than one row, it should be handled by a single trigger call, as you might want to revert the whole transaction. It is not possible to split it to separate trigger calls logically and I don't think SQL Server provides such a flag. You can make SQL Server call your trigger with multiple rows by issuing an UPDATE or DELETE statement that affects multiple rows.
First it concerns me that you are making the triggers handle multiple rows by using a cursor. Do not do that! Use a set-based statment instead jioining to the inserted or deleted pseudotables. Someone put one of those cursor based triggerson our database before I came to work here. It took over forty minutes to handle a 400,00 record insert (and I often have to do inserts of over 100,000 records to this table for one client). Changing it to a set-based solution changed the time to less than a minute. While all triggers must be capable of handling multiple rows, you must not do so by creating a performance nightmare.
If you can write a select statment for the cusor, you can write an insert, update or delete based on the same select statment which is set-based.
I've always written my triggers to handle multiple rows, it was my understanding that if a single query inserted/updated/deleted multiple rows then only one trigger would fire and as such you would have to use a cursor to move through the records one by one.
One SQL statement always invokes one trigger execution - that's part of the definition of a trigger. (It's also a circumstance that seems to at least once trip up everyone who writes a trigger.) I believe you can discover how many records are being affected by inspecting ##ROWCOUNT.