Snowflake alter column to have default value - snowflake-cloud-data-platform

I have a table in snowflake. I want to alter one column so that it can have the default value.
Following is the structure:
I want to set the default value for LAST_UPDATED column.
I am running this query:
alter table "TEST_STATUS" modify LAST_UPDATED set default CURRENT_TIMESTAMP() ;
I am getting error as:
Unsupported feature 'Alter Column Set Default'.
How do I alter the table?

You can not use ALTER TABLE to change the default for a column unless it's a sequence or add a column default.
Check Default Values section in here
You need to recreate your table

a default value on a table has to behaviors,
when null use 42, which can be done implemented as read operation
when inserting null use a more complex form to set a value. like seq() or current_date(), this can only be done on write.
The latter form on some data bases just "rewrites" the data then and there, but given Snowflake is a no free lunch/no hidden costs. If you want your table rewritten (to push on a complex new value like the second case) you should rewrite your table. When you have a simple table with 10 rows, this can seem absurd, to make you jump through hoops like this. But when you have tables with terabytes of data, rewriting all that data take a lot of compute time and more importantly is definitely not atomic, thus it needs to be intentionally done as put of a structured data migration process.
Which like nearly everything about Snowflake, it's designed to do heavy lifting, and thus big tasks should be planned tasks. Thus a large rewrite might be a bigger warehouse, and pausing ingress data processes.

Related

Stored procedure to update different columns

I have an API that i'm trying to read that gives me just the updated field. I'm trying to take that and update my tables using a stored procedure. So far the only way I have been able to figure out how to do this is with dynamic SQL but i would prefer to not do that if there is a way not to.
If it was just a couple columns, I'd just write a proc for each but we are talking about 100 fields and any of them could be updated together. One ticket might just need a timestamp updated at this time, but the next ticket might be a timestamp and who modified it while the next one might just be a note.
Everything I've read and have been taught have told me that dynamic SQL is bad and while I'll write it if I have too, I'd prefer to have a proc.
YOU CAN PERHAPS DO SOMETHING LIKE THIS:::
IF EXISTS (SELECT * FROM NEWTABLE NOT IN (SELECT * FROM OLDTABLE))
BEGIN
UPDATE OLDTABLE
SET OLDTABLE.OLDRECORDS = NEWTABLE.NEWRECORDS
WHERE OLDTABLE.PRIMARYKEY= NEWTABLE.PRIMARYKEY
END
The best way to solve your problem is using MERGE:
Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
As you can see your update could be more complex but more efficient as well. Using MERGE requires some proficiency, but when you start to use it you'll use it with pleasure again and again.
I am not sure how your business logic works that determines what columns are updated at what time. If there are separate business functions that require updating different but consistent columns per function, you will probably want to have individual update statements for each function. This will ensure that each process updates only the columns that it needs to update.
On the other hand, if your API is such that you really don't know ahead of time what needs to be updated, then building a dynamic SQL query is a good idea.
Another option is to build a save proc that sets every user-configurable field. As long as the calling process has all of that data, it can call the save procedure and pass every updateable column. There is no harm in having a UPDATE MyTable SET MyCol = #MyCol with the same values on each side.
Note that even if all of the values are the same, the rowversion (or timestampcolumns) will still be updated, if present.
With our software, the tables that users can edit have a widely varying range of columns. We chose to create a single save procedure for each table that has all of the update-able columns as parameters. The calling processes (our web servers) have all the required columns in memory. They pass all of the columns on every call. This performs fine for our purposes.

Quickly dropping and re-creating multiple indexes, views, statistics when altering a column

I have a column "StoreNumber" in my "Project" table which I want to change to be "NOT NULL". I recently sanitized all of the old data so that there are no null entries. However, when I execute the following statement it fails due to multiple dependencies to various views, indexes, and statistics
ALTER TABLE [Project]
ALTER COLUMN StoreNumber VARCHAR(9) NOT NULL
GO
What is the fastest way to drop all of these views/indexes/statistics then run the alter statement, and then recreate all of the views/indexes/statistics again? I know that I could copy out all of the drop and create statements one by one but I would prefer to generate the script in a single query.
On a side note, why does SQL Server care if I'm making the column more restrictive? The data does not contain nulls and I'm not altering the data type or the size of the columns. How would this type of change ever break a dependent view, index, or statistics? I'm sure there is sound reasoning that I'm not seeing but I would like an example.
Just thinking; will it work if you set a default value first? (dind't check the sintax myself)
ALTER TABLE Project
ADD CONSTRAINT col_sn_def
DEFAULT '' FOR StoreNumber;
GO
The following will drop multiple indexes. Note that the final statement does not include the comma.
DROP INDEX [index1_1] ON [schema].[table1],
[index1_2] ON [schema].[table1],
[index2_1] ON [schema].[table2],
[index3_1] ON [schema].[table3],
...n,
[lastIndexToDrop] ON [schema].[tableName]
Drop View looks like this. Note the semicolon to terminate the statement.
DROP VIEW [schema].[view1], [schema].[view2];
I am only concerned with Indexes in my application at this time. To quickly recreate the indexes, I am reading a .sql file into code and executing it in an ExecuteNonQuery call. If I had views to consider, I would follow the same method of reading from a file into a command to execute with ExecuteNonQuery.
https://msdn.microsoft.com/en-us/library/ms173492.aspx
https://msdn.microsoft.com/en-us/library/ms176118.aspx

Schema change table rebuild

I'm working on a script to keep table schemas synchronized.
Is there an exhaustive list of actions done to a table schema in MS SQL that requires the table to be dropped and recreated and the data to be reinserted?
You may be better off standardizing on the CREATE-COPY-DROP-RENAME (CCDR) strategy and only attempting an in-place alter in the few scenarios where your DDL will not require a rebuild rather than trying to compile the exhaustive list. This is the strategy described here: link.
AFAIK, you are only permitted to add columns to an existing table (without rebuilding) if the column is:
added to the end of the table AND
is nullable or has a default constraint
In all other cases, MSSQL will potentially fail if it does not know what to use as a value in the rows of the newly added column or data loss is a result (truncation for example). Even defaulted columns added in the middle will force a rebuild.
To further complicate things, in some cases the success of your deploy will depend on the type of data in the table, and not simply the schema involved. For example, altering a column length to a greater value (varchar(50) --> varchar(100)) will likely succeed; however, decreasing the length is only sometimes permitted. Migrating data type changes is another tricky mess.
In short, I would always rebuild and rarely alter in place.
--
To illustrate in-row data affecting outcome:
create table dbo.Yak(s varchar(100));
insert into dbo.Yak
values(replicate('a', 100));
go
-- attempt to alter datatype to 50 (FAIL: String or binary data would be truncated.)
alter table dbo.Yak
alter column s varchar(50);
go
-- shorten the data in row to avoid data loss
delete from dbo.Yak;
insert into dbo.Yak
values(replicate('a', 50));
go
-- again, attempt to alter datatype to 50 (SUCCESS)
alter table dbo.Yak
alter column s varchar(50);
go
select len(s),* from dbo.Yak;
go
--cleanup
drop table dbo.Yak;
go
In Management Studio, select the table you want to change and right-click Design. Change the datatype of a column in the table design window (tested with int to money).
Instead of saving, right-click in the window and select "Generate Change Script". Copy the SQL statements from the dialog.
*) In previous versions (SQL2000), any changes would recreate the whole table (as far as I remember). It seems that renaming and adding columns have been optimized to ALTER TABLE statements.
I've gotten pretty spoiled by using Visual Studio Database Projects to manage this sort of thing. Once my schema is imported into a project, I can make whatever change I want, and the VSDP will figure out whether the change can be done w/o dropping objects (with an ALTER, for example), or whether it needs to create a new object and copy values over from the old one (which it does automatically).
Plan on a little work to understand how you'll fit this into your specific environment and workflow, but I've found the effort to be very worthwhile.

LINQ Inserts without IDENTITY column

I'm using LINQ, but my database tables do not have an IDENTITY column (although they are using a surrogate Primary Key ID column)
Can this work?
To get the identity values for a table, there is a stored procedure called GetIDValueForOrangeTable(), which looks at a SystemValues table and increments the ID therein.
Is there any way I can get LINQ to get the ID value from this SystemValues table on an insert, rather than the built in IDENTITY?
As an aside, I don't think this is a very good idea, especially not for a web application. I imagine there will be a lot of concurrency conflicts because of this SystemValues lookup. Am I justified in my concern?
Cheers
Duncan
Sure you can make this work with LINQ, and safely, too:
wrap the access to the underlying SystemValues table in the "GetIDValue.....()" function in a TRANSACTION (and not with the READUNCOMMITTED isolation level!), then one and only one user can access that table at any given time and you should be able to safely distribute ID's
call that stored proc from LINQ just before saving your entity and store the ID if you're dealing with a new entity (if the ID hasn't been set yet)
store your entity in the database
That should work - not sure if it's any faster and any more efficient than letting the database handle the work - but it should work - and safely.
Marc
UPDATE:
Something like this (adapt to your needs) will work safely:
CREATE PROCEDURE dbo.GetNextTableID(#TableID INT OUTPUT)
AS BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
UPDATE SystemTables
SET MaxTableID = MaxTableID + 1
WHERE ........
SELECT
#TableID = MaxTableID
FROM
dbo.SystemTables
COMMIT TRANSACTION
END
As for performance - as long as you have a reasonable number (less than 50 maybe) of concurrent users, and as long as this SystemTables tables isn't used for much else, then it should perform OK.
You are very justified in your concern. If two users try to insert at the sametime, both might be given the same number unless you do as described by marc_s and put the thing in a transaction. However, if the transaction doesn't wrap around your whole insert as well as the table that contains the id values, you may still have gaps if the outer insert fails (It got a value but then for some other reason didn't insert a record). Since most people do this to avoid gaps (something that is in most cases an unnecessary requirement) it makes life more complicated and still may not achieve the result. Using an identity field is almost always a better choice.

What is a maintainable way to store large text fields without sacrificing performance?

I have been dancing around this issue for awhile but it keeps coming up. We have a system and our may of our tables start with a description that is originally stored as an NVARCHAR(150) and I then we get a ticket asking to expand the field size to 250, then 1000 etc, etc...
This cycle is repeated on ever "note" field and/or "description" field we add to most tables. Of course the concern for me is performance and breaking the 8k limit of the page. However, my other concern is making the system less maintainable by breaking these fields out of EVERY table in the system into a lazy loaded reference.
So here I am faced with these same to 2 options that have been staring me in the face. (others are welcome) please lend me your opinions.
Change all may notes and/or descriptions to NVARCHAR(MAX) and make sure we do exclude these fields in all listings. Basically never do a: SELECT * FROM [TableName] unless is it only retrieving one record.
Remove all notes and/or description fields and replace them with a forign key reference to a [Notes] table.
CREATE TABLE [dbo].[Notes] (
[NoteId] [int] NOT NULL,
[NoteText] [NVARCHAR](MAX)NOT NULL )
Obviously I would prefer use option 1 because it will change so much in our system if we go with 2. However if option 2 is really the only good way to proceed, then at least I can say these changes are necessary and I have done the homework.
UPDATE:
I ran several test on a sample database with 100,000 records in it. What I find is that the because of cluster index scans the IO required for option 1 is "roughly" twice that of option 2. If I select a large number of records (1000 or more) option 1 is twice as slow even if I do not include the large text field in the select. As I request less rows the lines blur more. I a web app where page sizes of 50 or so are the norm, so option 1 will work, but I will be converting all instances to option 2 in the (very) near future for scalability.
Option 2 is better for several reasons:
When querying your tables, the large
text fields fill up pages quickly,
forcing the database to scan more
pages to retrieve data. This is
especially taxing when you don't
actually need to return the text
data.
As you mentioned, it gives you
a clean break to change the data
type in one swoop. Microsoft has
deprecated TEXT in SQL Server 2008,
so you should stick with
VARCHAR/VARBINARY.
Separate filegroups. Having
all your text data in a slower,
cheaper storage location might be
something you decide to pursue in
the future. If not, no harm, no
foul.
While Option 1 is easier for now, Option 2 will give you more flexibility in the long-term. My suggestion would be to implement a simple proof-of-concept with the "notes" information separated from the main table and perform some of your queries on both examples. Compare the execution plans, client statistics and logical I/O reads (SET STATISTICS IO ON) for some of your queries against these tables.
A quick note to those suggesting the use of a TEXT/NTEXT from MSDN:
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature. Use varchar(max),
nvarchar(max) and varbinary(max) data
types instead. For more information,
see Using Large-Value Data Types.
I'd go with Option 2.
You can create a view that joins the two tables to make the transition easier on everyone, and then go through a clean-up process that removes the view and uses the single table wherever possible.
You want to use a TEXT field. TEXT fields aren't stored directly in the row; instead, it stores a pointer to the text data. This is transparent to queries, though - if you ask for a TEXT field, it will return the actual text, not the pointer.
Essentially, using a TEXT field is somewhat between your two solutions. It keeps your table rows much smaller than using a varchar, but you'll still want to avoid asking for them in your queries if possible.
The TEXT/NTEXT data type has practically unlimited length while taking up next to nothing in your record.
It comes with a few strings attached, like special behavior with string functions, but for a secondary "notes/description" type of field these may be less of a problem.
Just to expand on Option 2
You could:
Rename existing MyTable to MyTable_V2
Move the Notes column into a joined Notes table (with 1:1 joining ID)
Create a VIEW called MyTable that joins MyTable_V2 and Notes tables
Create an INSTEAD OF trigger on MyTable view which saves the Notes column into the Notes table (IF NULL then delete any existing Notes row, if NOT NULL then Insert if not found, otherwise Update). Perform appropriate action on MyTable_V2 table
Note: We've had trouble doing this where there is a Computed column in MyTable_V2 (I think that was the problem, either way we've hit snags when doing this with "unusual" tables)
All new Insert/Update/Delete code should be written to operate directly on MyTable_V2 and Notes tables
Optionally: Have the INSERT OF trigger on MyTable log the fact that it was called (it can do this minimally, UPDATE a pre-existing log table row with GetDate() only if existing row's date is > 24 hours old - so will only do an update once a day).
When you are no longer getting any log records you can drop the INSTEAD OF trigger on MyTable view and you are now fully MyTable_V2 compliant!
Huge amount of hassle to implement, as you surmised.
Alternatively trawl the code for all references to MyTable and change them to MyTable_V2, put a VIEW in place of MyTable for SELECT only, and not create the INSTEAD OF trigger.
My plan would be to fix all Insert/Update/Delete statements referencing the now deprecated MyTable. For me this would be made somewhat easier because we use unique names for all tables and columns in the database, and we use the same names in all application code, so making sure I had found all instances by a simple FIND would be high.
P.S. Option 2 is also preferable if you have any SELECT * lying around. We have had clients whos application performance has gone downhill fast when they added large Text/Blob columns to existing tables - because of "lazy" SELECT * statements. Hopefully that isn;t the case in your shop though!

Resources