Snowflake - add sequence to an existing column - snowflake-cloud-data-platform

I have to migrate a SQL server database to Snowflake, most of my dimension tables have an identity column as PK, these columns are then referenced across multiple facts tables.
I am planning on copying these tables in snowflake however I need to first insert the existing data (so the identity values stay the same) and then alter my tables to add a sequence to my PK, the sequence will start from the higher value + 1.
I am a bit stuck as it does not seem to be possible to alter an existing column and add a sequence to it, is there any work around or best practice I should be following?
Cheers

You'll want to create a SEQUENCE and then reference that when you create the table with the SEQUENCE as your DEFAULT for your primary key. This is in the documentation here:
https://docs.snowflake.net/manuals/sql-reference/sql/create-sequence.html
https://docs.snowflake.net/manuals/sql-reference/sql/create-table.html#optional-parameters
And then load your data with the data specified for that column, so that it'll populate with the current values. You will want to set your "next value" when you create the SEQUENCE, as you can't alter it later.

Related

SSIS only extract Delta changes

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.
Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.
What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.
I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.
Thanks for any help.
The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.
For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.
Add the column HashValue to your table as a varbinary data type. NOT NULL as well.
Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.
Create a RowsToSend table that's structurally identical to your main table, including the HashValue.
Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)
Send out your full data set.
Now move all of the key values and HashValues to the SentRows table. Truncate your main table.
On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.
Primary key match + hash match = Unchanged row
Primary key match + hash mismatch = Updated row
Primary key in incoming data but missing from existing data set = New row
Primary key not in incoming data but in existing data set = Deleted row
Pull out any changes you need to send to the RowsToSend table.
Send the changes from RowsToSend.
Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.
Truncate the SentRows table to get ready for tomorrow.
If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.
And away you go. Nothing but deltas from now on.
Edit 2019-10-31:
Step by step (or TL;DR):
1) Flush and Fill MainTable.
2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.
3) Move new/changed rows to RowsToSend.
4) Send the rows that are in RowsToSend.
5) Move all the rows from RowsToSend to SentRows.
6) Truncate RowsToSend.

SQL Server re-uses the same IDENTITY Id twice

I hope the question is not too generic.
I have a table Person that has a PK Identity column Id.
Via C#, I insert new entries for Person and the Id get set to 1,2,3 for the 3 persons added.
Also via C#, I perform all deletions of the persons with Id=1,2,3 so that there's no Person in the Table anymore.
Afterwards, I run some change scripts (I can't post them as they are too long) also on Table Person.
I don't do any RESEED.
Now the fun:
If I call SELECT IDENT_CURRENT('Person') it shows 3 instead of 4.
If I do an insert of Person again, I get a Person with the Id 3 added instead of Id 4.
Any idea why and how this can happen?
EDIT
I think I found the explanation of my question:
While performing DB Changes via SQL Server Management Studio, The Designer creates
a temp table Tmp_Person and moves the data from Person inside there. Afterwards he performs a rename of Tmp_Person to Person. Since this is a new table the Index starts again from the beginning.
An IDENTITY property doesn't guarentee uniqueness. That's what a PRIMARY KEY or UNIQUE INDEX is for. This is covered in the documentation in the remarks section, along with other intended behaviour. CREATE TABLE (Transact-SQL) IDENTITY (Property) - Remarks:
The identity property on a column does not guarantee the following:
Uniqueness of the value - Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.
Consecutive values within a transaction - A transaction inserting multiple rows is not guaranteed to get consecutive values for the rows
because other concurrent inserts might occur on the table. If values
must be consecutive then the transaction should use an exclusive lock
on the table or use the SERIALIZABLE isolation level.
Consecutive values after server restart or other failures -SQL Server might cache identity values for performance reasons and some of
the assigned values can be lost during a database failure or server
restart. This can result in gaps in the identity value upon insert. If
gaps are not acceptable then the application should use its own
mechanism to generate key values. Using a sequence generator with the
NOCACHE option can limit the gaps to transactions that are never
committed.
Reuse of values - For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a
particular insert statement fails or if the insert statement is rolled
back then the consumed identity values are lost and will not be
generated again. This can result in gaps when the subsequent identity
values are generated.
These restrictions are part of the design in order to improve
performance, and because they are acceptable in many common
situations. If you cannot use identity values because of these
restrictions, create a separate table holding a current value and
manage access to the table and number assignment with your
application.
Emphasis mine for this question.

Changing columns to identity (SQL Server)

My company has an application with a bunch of database tables that used to use a sequence table to determine the next value to use. Recently, we switched this to using an identity property. The problem is that in order to upgrade a client to the latest version of the software, we have to change about 150 tables to identity. To do this manually, you can right click on a table, choose design, change (Is Identity) to "Yes" and then save the table. From what I understand, in the background, SQL Server exports this to a temporary table, drops the table and then copies everything back into the new table. Clients may have their own unique indexes and possibly other things specific to the client, so making a generic script isn't really an option.
It would be really awesome if there was a stored procedure for scripting this task rather than doing it in the GUI (which takes FOREVER). We made a macro that can go through and do this, but even then, it takes a long time to run and is error prone. Something like: exec sp_change_to_identity 'table_name', 'column name'
Does something like this exist? If not, how would you handle this situation?
Update: This is SQL Server 2008 R2.
This is what SSMS seems to do:
Obtain and Drop all the foreign keys pointing to the original table.
Obtain the Indexes, Triggers, Foreign Keys and Statistics of the original table.
Create a temp_table with the same schema as the original table, with the Identity field.
Insert into temp_table all the rows from the original table (Identity_Insert On).
Drop the original table (this will drop its indexes, triggers, foreign keys and statistics)
Rename temp_table to the original table name
Recreate the foreign keys obtained in (1)
Recreate the objects obtained in (2)

Making primary key and identity column after data has been loaded

I have quick question for you SQL gurus. I have existing tables without primary key column and Identity is not set. Now I am trying to modify those tables by making existing integer column as primary key and adding identity values for that column. My question is should I first copy all the records from the table to a temp table before making those changes . Do I loose all the previous records if I ran the T-SQL commnad to make primary key and add identity column on those tables. What are the approaches should I take such as
1) Create temp table to copy all the records from the table to be modified
2) Load all the records to the temptable
3) Make changes on the table schema
4) Finally load the records from the temp table to the original table.
Or
there are better ways that this? I really appreciate your help
Thanks
Tools>Options>Designers>Table and Database Designers
Uncheck "Prevent saving changes that require table re-creation"
[Edit] I've tried this with populated tables and I didn't lose data, but I don't really know much about this.
Hopefully you don't have too many records in the table. What happens if you use Management studio to change an existing field to identity is that it creates another table with the identity field set. it turns identity insert on and inserets the records from the original table, then turns identity insert off. Then it drops the old table and renames the table it just created. This can be quite a lengthy process if you have many records. If so I would script this out and then do it in a job that runs during the off hours because the table will be completely locked while you do this.
just do all of your changes in management studio, copy/paste the generated script into a file. DON'T SAVE CHANGES at this point. Look over and edit that script as necessary, it will probably do almost exactly what you are thinking (it will drop the original table and rename the temp one to the original's name), but handle all constraints and FKs as well.
If your existing integer column is unique and suitable, there should be no problem converting it to a PK.
Another alternative, if you don't want to use the existing column, you can add a new PK columns to the main table, populate it and seed it, then run update statements to update all other tables with new PK.
Whatever way you do it, make sure you do a back-up first!!
You can always add the IDENTITY column after you have finished copying your data around. You can also then reset the IDENTITY seed to the max integer + 1. That should solve your problems.
DBCC CHECKIDENT ('MyTable', RESEED, n)
Where n is the number you want the identity to start at.

How to add a constant column when replicating a database?

I am using SQL Server 2000 and I have two databases that both replicate (transactional push subscription) to a single database. I need to know which database the records came from.
So I want to add a fixed column specified in the publication to my table so I can tell which database the row originated from.
How do I go about doing this?
I would like to avoid altering the main databases mostly due to the fact there are many tables I would need to do this to. I was hoping for some built in feature of replication that would do this for me some where. Other than that I would go with the view idea.
You could use a calculated column Use the following on the two databases:
ALTER TABLE TableName ADD
MyColumn AS 'Server1'
Then just define the single "master" database to use a VARCHAR column (or whatever you want) that you fill using the calculated columns value.
You can create a view, which adds the "constant" column, and use it as a replication source.
So the solution for me was to set up the replication publications to allow transformations and create a DTS package for each site that appends the siteid into the tables to keep the ids unique as I can't use guids.

Resources