Read committed isolation level and truncate table inside snowflake transaction

Read committed isolation level and truncate table inside snowflake transaction - snowflake-cloud-data-platform

Just a curious question in my mind and I thought of asking to Snowflake experts to clarify this question. We know that Snowflake default isolation level is read committed; I have one transaction let us A in which I am truncating data from Table T1 and Loading the Table T1 using transformed fresh data; at the same time I have another transaction say B is trying to read the data from Table T1 while getting this data truncated in transaction A; would I be able read the data from Table T1 in transaction B which it is still getting truncated in another transaction A.
My mind says yes; transaction B should be able to read it from Table T1 because transaction A still in progress and not yet committed.

Try running these 2 scripts in two different tabs with app.snowflake.com:
Script 1:
set xxx = 'xxx1';
select $xxx;
-- xxx1
select *
from will_transact;
begin transaction;
delete from will_transact
where a='a';
-- number of rows deleted = 1
commit;
select *
from will_transact;
begin transaction;
truncate table will_transact;
commit;
Script 2:
select $xxx;
-- Error: Session variable '$XXX' does not exist (line 1)
create or replace table will_transact as
select 'a' a, 'b' b;
select *
from will_transact;
-- 1 row
select *
from will_transact;
-- 1 row
select *
from will_transact;
-- 0 rows
create or replace table will_transact as
select 'a' a, 'b' b;
select *
from will_transact;
select *
from will_transact;
-- 0 rows
If you run these 2 scripts in parallel, step by step, you will notice that:
Each tab runs a separate session, the variables are not shared.
Once you start a transaction and delete rows or truncate the table, the other session doesn't notice the changes until they are committed.

Related

Is it possible to produce phantom read in single SQL Server query?

All of explanations of phantom reads I managed to find demonstrate phantom read by running 2 select statements in one transaction (e.g. https://blobeater.blog/2017/10/26/sql-server-phantom-reads/ )
BEGIN TRAN
SELECT #1
DELAY DURING WHICH AN INSERT TAKES PLACE IN A DIFFERENT TRANSACTION
SELECT #2
END TRAN
Is it possible to reproduce a phantom read in one select statement? This would mean that select statement starts on transaction #1. Then insert runs on transaction #2 and commits. Finally select statement from transaction #1 completes, but does not return a row that transaction #2 has inserted.

The SQL Server Transaction Isolation Levels documentation defines a phantom row as one "that matches the search criteria but is not initially seen" (emphasis mine). Consequently, more than one SELECT statement is needed for a phantom read to occur.
Data inserted during execution SELECT statement execution might not be returned in the READ COMMITTED isolation level depending on the timing but this is not a phantom read by definition. The example below shows this behavior.
--create table with enough data for a long-running SELECT query
CREATE TABLE dbo.PhantomReadExample(
PhantomReadExampleID int NOT NULL
CONSTRAINT PK_PhantomReadExample PRIMARY KEY
, PhantomReadData char(8000) NOT NULL
);
--insert 100K rows
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,t1m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b)
INSERT INTO dbo.PhantomReadExample WITH(TABLOCKX) (PhantomReadExampleID, PhantomReadData)
SELECT num*2, 'data'
FROM t1m
WHERE num <= 100000;
GO
--run this on connection 1
SELECT *
FROM dbo.PhantomReadExample
ORDER BY PhantomReadExampleID;
GO
--run this on connection 2 while the connection 1 SELECT is running
INSERT INTO dbo.PhantomReadExample(PhantomReadExampleID, PhantomReadData)
VALUES(1, 'data');
GO
Shared locks are acquired on rows as they are read during the SELECT query scan to ensure only committed data are read but these are immediately released once data are read improve concurrency. This allows other sessions to insert, update, and delete rows while the SELECT query is running.
The inserted row is not returned in this case because the ordered clustered index scan had already past the point of the insert.

Below is the wikipedia definition of phantom reads
A phantom read occurs when, in the course of a transaction, new rows
are added by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfill
that WHERE clause.
This is certainly possible to reproduce in a single reading query (of course other database activity must also be happening to produce the phantom rows).
Setup
CREATE TABLE Test(X INT PRIMARY KEY);
Connection 1 (leave this running)
SET NOCOUNT ON;
WHILE 1 = 1
INSERT INTO Test VALUES (CRYPT_GEN_RANDOM(4))
Connection 2
This is extremely likely to return some rows if running at read committed lock isolation level (default for the on premise product and enforced with table hint below)
WITH CTE AS
(
SELECT *
FROM Test WITH (READCOMMITTEDLOCK)
WHERE X BETWEEN 0 AND 2147483647
)
SELECT *
FROM CTE c1
FULL OUTER HASH JOIN CTE c2 ON c1.X = c2.X
WHERE (c1.X IS NULL OR c2.X IS NULL)
The returned rows are values added between the first and second read of the table for rows matching the WHERE X BETWEEN 0 AND 2147483647 predicate.

Conditionally insert into table if column exists

I'm trying to write an idempotent db migration script, which, among other things, needs to shuffle some data. Later in the script, one of the columns I'm selecting from is removed (the purpose of the migration is to move data from that column into a new place), so I have something like this (generated by EF Core):
IF NOT EXISTS (SELECT * FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'AName')
BEGIN
INSERT INTO Foos (A, B)
SELECT OldA, OldB FROM Bars
END
-- a little later in the script:
IF NOT EXISTS (SELECT * FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'AnotherName')
BEGIN
ALTER TABLE [Bars] DROP COLUMN [OldB];
END
However, this isn't as idempotent as I'd hoped it would be; the second time I run the script, it fails with an error on the first INSERT statement, since the OldB column doesn't exist on Bars anymore.
However, the guard clause above will always be false if OldB has been dropped, because in the same go as dropping OldB, we also insert that row into the migrations history (and yes, I've checked that this is true now too; the row exists). So the INSERT should never run without all columns it cares about existing.
How can I write an idempotent INSERT like the one above, that doesn't validate existence of all columns until it's actually run?

You could check if all columns exist:
IF NOT EXISTS (SELECT * FROM [__EFMigrationsHistory]
WHERE [MigrationId] = N'AName')
BEGIN
IF (SELECT COUNT(*)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('Bars')
AND name IN ('OldA', 'OldB')) = 2
BEGIN
EXEC('INSERT INTO Foos (A, B)
SELECT OldA, OldB FROM Bars');
END
END
-- a little later in the script:
IF NOT EXISTS (SELECT * FROM [__EFMigrationsHistory]
WHERE [MigrationId] = N'AnotherName')
BEGIN
EXEC('ALTER TABLE [Bars] DROP COLUMN [OldB]');
END

SQL Server Custom Identity Column

I want to generate a custom identity column related to type of product.
Can this query guaranty the order of identity and resolve concurrency.
This is a sample query:
BEGIN TRAN
INSERT INTO TBLKEY
VALUES((SELECT 'A-' + CAST(MAX(CAST(ID AS INT)) + 1 AS NVARCHAR) FROM TBLKEY),'EHSAN')
COMMIT

Try this:
BEGIN TRAN
INSERT INTO TBLKEY
VALUES((SELECT MAX(ID) + 1 AS NVARCHAR) FROM TBLKEY WITH (UPDLOCK)),'EHSAN')
COMMIT
When selecting the max ID you acquire a U lock on the row. The U lock is incompatible with the U lock which will try to acquire another session with the same query running at the same time. Only one query will be executed at a given time. The ids will be in order and continuous without any gaps between them.
A better solution would be to create an extra table dedicated only for storing the current or next id and use it instead of the maximum.
You can understand the difference by doing the following:
Prepare a table
CREATE TABLE T(id int not null PRIMARY KEY CLUSTERED)
INSERT INTO T VALUES(1)
And then run the following query in two different sessions one after another with less than 10 seconds apart
BEGIN TRAN
DECLARE #idv int
SELECT #idv = max (id) FROM T
WAITFOR DELAY '0:0:10'
INSERT INTO T VALUES(#idv+1)
COMMIT
Wait for a while until both queries complete. Observe that one of them succeeded and the other failed.
Now do the same with the following query
BEGIN TRAN
DECLARE #idv int
SELECT #idv = max (id) FROM T WITH (UPDLOCK)
WAITFOR DELAY '0:0:5'
INSERT INTO T VALUES(#idv+1)
COMMIT
View the contents of T
Cleanup the T Table with DROP TABLE T

This would be a bad thing to do as there is no way to guarantee that two queries running at the same time wouldn't get MAX(ID) as being the same value.
If you used a standard identity column you could also have a computed column which uses that or just return the key when you return the data.
Ed

Can someone please clarify be on the below explanation

I have a Employee table with 5 records, And i am running the below script. the result of the script is returning a record with EmpID 2. at the same time record is getting deleted.
Is this is the right way?
begin transaction A
select * from Employee where EmpID=2
begin transaction B
delete from Employee where EmpID=2
commit transaction B
commit transaction A

You may just use this:
DELETE
FROM employee
WHERE empId = 2
OUTPUT DELETED.*
This will delete the record and output its contents in one statement, atomically.

If you want to select the row with EmpID = 2, so you can do some work on it and ensure that this doesn't get changed before your delete, use an update lock:
begin transaction A
select * from Employee with (updlock) where EmpID=2
delete from Employee where EmpID=2
commit transaction A
You could also use a transaction isolation level higher than read committed, preferably serializable.

How do I get the right locks for this SQL?

My database is SQL Server 2005/8. In a booking system we have a limit of 24 bookings on an event. This code in a stored procedure checks:
- that the current user (#UserId) is not already booked on the event (#EventsID)
- that the current event has a current booking list of under 24
- inserts a new booking.
BEGIN TRANSACTION
IF (((select count (*) from dbo.aspnet_UsersEvents with (updlock)
where UserId = #UserId and EventsId = #EventsId) = 0)
AND ((SELECT Count(*) FROM dbo.aspnet_UsersEvents with (updlock)
WHERE EventsId = #EventsId) < 24))
BEGIN
insert into dbo.aspnet_UsersEvents (UserId, EventsId)
Values (#UserId, #EventsId)
END
COMMIT
The problem is that it is not safe. Two users might perform the test simultaneously and conclude they can both book. Both insert a row and we end up with 25 bookings.
Simply enclosing it in a transaction does not work. I tried adding WITH (UPDLOCK) to the selects in the hope that one would take update locks and keep the other out. That does not work.

Three options:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
Change the lock hint to WITH (UPDLOCK, HOLDLOCK)
Add a unique constraint to dbo.aspnet_UsersEvents and a TRY/CATCH around the insert.
You can use the following script to confirm that the lock is taken and immediately released if you omit HOLDLOCK. You will also see that the lock is not released (no 'releasing lock reference on KEY' output) when HOLDLOCK is used.
(Gist script)

Just do it in one statement, at READ COMMITTED or higher.
INSERT dbo.aspnet_UsersEvents
(UserId,EventsId)
OUTPUT inserted.UserEventsId -- Or whatever, just getting back one row identifies the insert was successful
SELECT #UserId
, #EventsId
WHERE ( SELECT COUNT (*)
FROM dbo.aspnet_UsersEvents
WHERE UserId = #UserId
AND EventsId = #EventsId ) = 0
AND ( SELECT COUNT(*)
FROM dbo.aspnet_UsersEvents
WHERE EventsId = #EventsId ) < 24
Side note: your SELECT COUNT(*) for duplicate checking seems excessive, personally I'd use NOT EXISTS(SELECT NULL FROM ... WHERE UserID = ..., EventsID = ....

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read committed isolation level and truncate table inside snowflake transaction - snowflake-cloud-data-platform

Related

Is it possible to produce phantom read in single SQL Server query?

Conditionally insert into table if column exists

SQL Server Custom Identity Column

Can someone please clarify be on the below explanation

How do I get the right locks for this SQL?

Categories

Resources