Auto increment skip the sequence with merge statemnt in snowflake - snowflake-cloud-data-platform

I am working on a use case where I need to implement a surrogate key. I have a column ID that should auto-increment by 1 but when I use merge it skips the 2 to 4 sequence.
create or replace table auto_increment(id int primary key autoincrement start 1 increment 1,name varchar(50),city varchar(50));
create or replace table emp(name varchar(50),city varchar(50));
create or replace stream emp_stream on table emp;
insert into emp values('salman','mumbai'),('akshay','pune'),('aamir','mumbai');
merge into auto_increment as a
using emp_stream as e
on e.name=a.name
when matched then update
set a.name=e.name,
a.city=e.city
when not matched then insert (name,city) values(e.name,e.city);
select * from auto_increment;
ID
1
2
3
insert into emp values('aamir','chennai'),('akshay','mumbai'),('ranjikant','chennai'),('mahesh babu','hyderabad');
merge into auto_increment as a
using emp_stream as e
on e.name=a.name
when matched then update
set a.name=e.name,
a.city=e.city
when not matched then insert (name,city) values(e.name,e.city);
select * from auto_increment;
ID
1
2
3
6
7
why it has skipped 4,5? when I use merge again, it gives more gaps in the sequence.

It's already answered here:
MERGE command results in gaps in sequence numbers
Per the Snowflake documentation, Snowflake does not guarantee there will be no gaps in sequences.
https://docs.snowflake.net/manuals/user-guide/querying-sequences.html.
I can say that Snowflake development team is working on improving sequences for MERGE statements.

Related

What is the best way to assert that a set of columns could form a primary key in Snowflake?

Infamously primary key constraints are not enforced in snowflake sql:
-- Generating a table with 4 rows that contain duplicates and NULLs:
CREATE OR REPLACE TEMP TABLE PRIMARY_KEY_TEST AS
SELECT
*
FROM (
SELECT 1 AS PK, 'TEST_TEXT' AS TEXT
UNION ALL SELECT 1 AS PK, 'TEST_TEXT' AS TEXT
UNION ALL SELECT NULL AS PK, NULL AS TEXT
UNION ALL SELECT NULL AS PK, NULL AS TEXT
)
;
SELECT *
FROM PRIMARY_KEY_TEST
;
PK
TEXT
1
TEST_TEXT
1
TEST_TEXT
NULL
NULL
NULL
NULL
-- These constraints will NOT throw any errors in Snowflake
ALTER TABLE PRIMARY_KEY_TEST ADD PRIMARY KEY (PK);
ALTER TABLE PRIMARY_KEY_TEST ADD UNIQUE (TEXT);
However knowing that a set of colums has values that are uniuqe for every row and never NULL is vital to check when updating a set of data.
So I'm looking for a easy to write and read (ideally 1-2 lines) piece of code (proably based on some Snowflake function) that throws an error if a set of columns no longer forms a viable primary key in Snowflake SQL.
Any Suggestions?
So I'm looking for a easy to write and read (ideally 1-2 lines) piece of code (proably based on some Snowflake function) that throws an error if a set of columns no longer forms a viable primary key in Snowflake SQL
Such test query is easy to write using QUALIFY and windowed COUNT. The pattern is to place primary key column list into PARTITION BY part and search for non-unique values, additional check for nulls could be added too. If the column list is a valid candidate for Primary key, it will not return any rows, if there are rows violating the rules they will be returned:
-- checking if PK is applicable
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
OR PK IS NULL;
-- chekcing if TEXT column is applicable
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY TEXT) > 1
OR TEXT IS NULL;
-- chekcing if PK,TEXT columns are applicable
SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY PK,TEXT) > 1
OR PK IS NULL
OR TEXT IS NULL;
I'd still prefer code that can throw an error though
It is possible using Snowflake Scripting and RAISE exception:
BEGIN
LET my_exception EXCEPTION (-20002, 'Columns cannot be used as PK.');
IF (EXISTS(SELECT *
FROM PRIMARY_KEY_TEST
QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
OR PK IS NULL
)) THEN
RAISE my_exception;
END IF;
END;
-20002 (P0001): Uncaught exception of type 'MY_EXCEPTION' on line 8 at position 5 : Columns cannot be used as PK.
you can enforce NOT NULL in Snowflake by adding a NOT NULL constraint on the columns which you want not to be null able.
The primary key constraint is informational only; It is not enforced when you insert the data into a table. For primary key you will have to either remove / delete the data or before inserting you will have to check if the data exists then you only update.
Depending upon what you are doing you may use the following
Merge (insert and update)
Use Distinct to check if the row exist then update or delete the old and insert the new.
You could use ROW_NUMBER analytical function to identify the duplicates.

Inserting concatenate Identity column with other column

I have a identity column and i have other column while inserting a new row in
table i need to insert into third column with concatenate of two columns result
For reference please see below table
------------------------------------------------
A | B | c
----------------------------------------------
1 | 33 | 133(1 [identity result] + 33)
2 | 112 | 2112
Please help me to solve this issue.
There is already an answer to this question but i think is not the best way to achieve it.
Here's an example on how to achieve it with a computed column.
CREATE TABLE dbo.calculatedTEST (
A INT IDENTITY(1,1) NOT NULL,
B INT NOT NULL,
c AS CONVERT(INT,CONVERT(VARCHAR(max),A)+CONVERT(VARCHAR(max),B))
)
insert into dbo.calculatedTEST
(B)
values
(1),
(1),
(2),
(2)
select * from dbo.calculatedTEST
A computed column is computed from an expression that can use other
columns in the same table. The expression can be a noncomputed column
name, constant, function, and any combination of these connected by
one or more operators. The expression cannot be a subquery.
Unless otherwise specified, computed columns are virtual columns that
are not physically stored in the table. Their values are recalculated
every time they are referenced in a query. The Database Engine uses
the PERSISTED keyword in the CREATE TABLE and ALTER TABLE statements
to physically store computed columns in the table. Their values are
updated when any columns that are part of their calculation change. By
marking a computed column as PERSISTED, you can create an index on a
computed column that is deterministic but not precise. Additionally,
if a computed column references a CLR function, the Database Engine
cannot verify whether the function is truly deterministic. In this
case, the computed column must be PERSISTED so that indexes can be
created on it. For more information, see Creating Indexes on Computed
Columns.
Don't need to insert Column C, You can easily get Column C using Select Statement.
like this.
select A,B,cast(Cast(A as varchar(max))+cast(B as varchar(max)) as
varchar(max)) as C from Your_Table_Name
If you really need to insert column C, then you have to run insert and Update query at the same time to inset value in the C column of the table.
Like:
insert into Table_Name(B) values('33');Select IDENT_CURRENT();
--you'll get the inserted Identity.
--now run the Update query for Identity you get from the insert query.
Sample.
create table #tab1
(
Id bigint identity(1,1) primary key,
a int,
b varchar(50)
)
insert into #tab1(a) values(88);
declare #id1 as bigint set #id1=(select SCOPE_IDENTITY());
update #tab1 set b=cast(id as varchar(max))+cast(a as varchar(max)) where Id=#id1

Insert multiple rows of default values into a table

I have a table with a single column, which is an auto-generated identity
create table SingleIdTable (
id int identity(1,1) not null
)
I can insert a single row with an auto-generated id with:
insert into SingleIdTable default values
I want to insert many rows and use the output syntax to get their ids, something like:
insert into SingleIdTable
output inserted.Id into #TableOfIds
select (default values) from SomeOtherTable where Attribute is null
Where the intention is to insert a row into SingleIdTable for each row in SomeOtherTable where Attribute is null using an auto-generated id. The above doesn't work, but how could I do it. I note that if my table had more than just a single column I could do it, but I can't select empty rows which is what I really want to do.
I can't change the definition of SomeOtherTable.
If SQL Server 2008+ you can use MERGE for this. Example syntax below.
MERGE INTO SingleIdTable
USING (SELECT *
FROM SomeOtherTable
WHERE Attribute IS NULL) T
ON 1 = 0
WHEN NOT MATCHED THEN
INSERT
DEFAULT VALUES
OUTPUT INSERTED.id;
I'm not sure what practical use this single column table has though?
you did not specify which version of SQL Server you are on. If you happen to be on SQL 2012 you probably can replace you SingleIdTable with a sequence: http://msdn.microsoft.com/en-us/library/ff878091.aspx

SQL Server Merge and Indexing Speed

I have a merge statement that needs to compare on many columns. The source table has 26,000 rows. The destination table has several million rows. The desintation table only has a typical Primary Key index on an int-type column.
I did some selects with group by to count the number of unique values in the source.
The test part of the Merge is
Merge Into desttable
Using #temptable
On
(
desttable.ColumnA = #temptable.ColumnA
and
desttable.ColumnB = #temptable.ColumnB
and
desttable.ColumnC = #temptable.ColumnC
and
desttable.ColumnD = #temptable.ColumnD
and
desttable.ColumnE = #temptable.ColumnE
and
desttable.ColumnF = #temptable.ColumnF
)
When Not Matched Then Insert Values (.......)
-- ColumnA: 167 unique values in #temptable
-- ColumnB: 1 unique values in #temptable
-- ColumnC: 13 unique values in #temptable
-- ColumnD: 89 unique values in #temptable
-- ColumnE: 550 unique values in #temptable
-- ColumnF: 487 unique values in #temptable
-- ColumnA: 3690 unique values in desttable
-- ColumnB: 3 unique values (plus null is possible) in desttable
-- ColumnC: 1113 unique values in desttable
-- ColumnD: 2662 unique values in desttable
-- ColumnE: 1770 unique values in desttable
-- ColumnF: 1480 unique values in desttable
The merge right now takes a very, very long time. I think I need to change my primary key but am not sure what the best tactic might be. 26,000 rows can be inserted on the first merge, but subsequent merges might only have ~2,000 inserts to do. Since I have no indexes and only a simple PK, everything is slow. :)
Can anyone point out how to make this better?
Thanks!
Well, an obvious candidate would be an index on the columns you use to do your matching in the MERGE statement - do you have an index on (ColumnA, ColumnB, ColumnC, ColumnD, ColumnE, ColumnF) on your destitation table??
This tuple of columns is being used to determine whether or not a row from your source table already exists in the database. If you don't have that index nor any other usable index in place, you get a table scan on the large destination table for each row in your source table, basically.
If not: I would try to add it and then see how the runtime behavior changes. Does the MERGE now run a little less than a very, very long time??
My suggestion is if you only need to run it once, then Merge statement is acceptable if time is not that critical. But, if you're going to use the script more often, I think it'll be better if you do it step by step instead of using the Merge statement. Step by step, like creating your own select, insert, update, delete statements in order to attain the goal. With this you'll have more control almost on everything(query optimization, indexing, etc...)
In your case, probably separating the 6 where criteria might be more efficient than combining them all at once. Downside is you'll have longer script.

Question about skipping IDs in an identity column in MSSQL

Say I have an MSSQL table with two columns: an int ID column that's the identity column and some other datetime or whatever column. Say the table has 10 records with IDs 1-10. Now I delete the record with ID = 5.
Are there any scenarios where another record will "fill-in" that missing ID? I.e. when would a record be inserted and given an ID of 5?
No, unless you specifically enable identity inserts (typically done when copying tables with identity columns) and insert a row manually with the id of 5. SQLServer keeps track of the last identity inserted into each table with identity columns and increments the last inserted value to obtain the next value on insert.
Only if you manually turn off identity IDs by using SET IDENTITY_INSERT command and then do a insert with ID=5
Otherwise MS-SQL will always increment to a higher number and missing slots are never re-used.
One scenario not already mentioned where another record will "fill-in" missing IDENTITY values is when the IDENTITY is reseeded. Example (SQL Server 2008):
CREATE TABLE Test
(
ID INTEGER IDENTITY(1, 1) NOT NULL,
data_col INTEGER NOT NULL
);
INSERT INTO Test (data_col)
VALUES (1), (2), (3), (4);
DELETE
FROM Test
WHERE ID BETWEEN 2 AND 3;
DBCC CHECKIDENT ('Test', RESEED, 1)
INSERT INTO Test (data_col)
VALUES (5), (6), (7), (8);
SELECT T1.ID, T1.data_col
FROM Test AS T1
ORDER
BY data_col;
The results are:
ID data_col
1 1
4 4
2 5
3 6
4 7
5 8
This shows that, not only are the 'holes' filled in with new auto-generated values, values that were auto-generated before the reseed are resued and can even duplicate existing IDENTITY values.

Resources