Behavior of DEFAULT option in Snowflake Tables - snowflake-cloud-data-platform

I've created a table and added default values for some columns.
E.g.
Create table Table1( COL1 NUMBER(38,0),
COL2 STRING,
MODIFIED_DT STRING DEFAULT CURRENT_DATE(),
IS_USER_MODIFIED BOOLEAN DEFAULT 'FALSE' )
Current Behavior:
During data load, I see that when running inserts, my column 'MODIFIED_DT' is getting inserted with default values.
However, if there are any subsequent updates, the default value is not getting updated.
Expected Behavior:
My requirement is that the column value should be automatically taken care by ANY INSERT/UPDATE operation.
E.g. In SQL Server, if I add a Default, the column value will always be inserted/updated with the default values whenever a DML operation takes place on the record
Is there a way to make it work? or does default value apply only to Inserts?
Is there a way to add logic to the DEFAULT values.
E.g. In the above table's example, for the column IS_USER_MODIFIED, can I do:
Case when CURRENT_USER() = 'Admin_Login' then 'FALSE' Else 'TRUE' end
If not, is there another option in snowflake to implement such functionality?

the following is generic to most (all?) databases and is not specific to Snowflake...
Default values on columns in table definitions only get inserted when there is no explicit reference to that column in an INSERT statement. So if I have a table with 2 columns (column_a and column_b and with a default value for column_b) and I execute this type of INSERT:
INSERT INTO [dbo].[doc_exz]
([column_a])
VALUES
(3),
(2);
column_b will be set to the default value. However, with this INSERT statement:
INSERT INTO [dbo].[doc_exz]
([column_a]
,[column_b])
VALUES
(5,1),
(6,NULL);
column_b will have values of 1 and null. Because I have explicitly referenced column_b the value I use, even if it is NULL, will be written to the record even though that column definition has a default value.
Default values only work with INSERT statements not UPDATE statements (as an existing record must have a "value" in the column, even if it is a NULL value, so when you UPDATE it the default doesn't apply) - so I don't believe your statement about defaults working with updates on SQL Server is correct; I've just tried it, just to be sure, and it doesn't.
Snowflake-specific Answer
Given that column defaults only work with INSERT statements, they are not going to be a solution to your problem. The only straightforward solution I can think of is to explicitly include these columns in your INSERT/UPDATE statements.
You could write a stored procedure to do the INSERT/UPDATES, and automatically populate these columns, but that would perform poorly for bulk changes and probably wouldn't be simple to use as you'd need to pass in the table name, the list of columns and the list of values.
Obviously, if you are inserting/updating these records using an external tool you'd put this logic in the tool rather than trying to implement it in Snowflake.

Snowflake has a "DERIVED COLUMN" feature. These columns are VIRTUAL/COMPUTED and are not used in ETL process. However, any DML activity will automatically influence the column values.
Nice thing is, we can even write CASE logic in the column definition. This solved my problem.
CREATE OR REPLACE TABLE DB_NAME.DBO.TEST_TABLE
(
FILE_ID NUMBER(38,0),
MANUAL_OVERRIDE_FLG INT as (case when current_user() = 'some_admin_login' then 0 else 1 end),
RECORD_MODIFIED_DT DATE as (CURRENT_DATE()),
RECORD_MODIFIED_BY STRING as (current_user())
);

Related

Using `BEFORE INSERT` trigger to change the datatype of incoming data to match the column datatype in PostgreSQL

I have a postgres table, with a column C which has type T. People will be using COPY to insert data into this table. However sometimes they try to insert a value for C that isn't of type T, however I have a postgres function which can convert the value to T.
I'm trying to write a BEFORE INSERT trigger on the table which will call this function on the data so that I can ensure that I get no insert type errors. However it doesn't appear to work, I'm getting errors when trying to insert the data, even with the trigger there.
Before I spend too much time investigating, I want to find out if this is possible. Can I use triggers in this way to change the type of incoming data?
I want this to run on postgresql 9.3, but I have noticed the error and non-functioning trigger on postgres 9.5.
As Patrick stated you have to specify a permissive target so that Postgres validation doesn't reject the data before you get a chance to manipulate it.
Another way without a second table, is to create a view on your base table that casts everything to varchar, and then have an INSTEAD OF trigger that populates the base table whenever an insert is tried on the view.
For example, the table tab1 below has an integer column. The view v_tab1 has a varchar instead so any insert will work for the view. The instead of trigger then checks to see if the entered value is numeric and if not uses a 0 instead.
create table tab1 (i1 int, v1 varchar);
create view v_tab1 as select cast(i1 as varchar) i1, v1 from tab1;
create or replace function v_tab1_insert_trgfun() returns trigger as
$$
declare
safe_i1 int;
begin
if new.i1 ~ '^([0-9]+)$' then
safe_i1 = new.i1::int;
else
safe_i1 = 0;
end if;
insert into tab1 (i1, v1) values (safe_i1, new.v1);
return new;
end;
$$
language plpgsql;
create trigger v_tab1_insert_trigger instead of insert on v_tab1 for each row execute procedure v_tab1_insert_trgfun();
Now the inserts will work regardless of the value
insert into v_tab1 values ('12','hello');
insert into v_tab1 values ('banana','world');
select * from tab1;
Giving
|i1 |v1 |
+-----+-----+
|12 |hello|
|0 |world|
Fiddle at: http://sqlfiddle.com/#!15/9af5ab/1
No, you can not use this approach. The reason is that the backend already populates a record with the values that are to be inserted into the table. That is in the form of the NEW parameter that is available in the trigger. So the error is thrown even before the trigger fires.
The same applies to rules, incidentally, so Kevin's suggestion in his comment won't work.
Probably your best solution is to create a staging table with "permissive" column data types (such as text) and then put a BEFORE INSERT trigger on that table that casts all column values to their correct type before inserting them in the final table. If that second insertion is successful you can even RETURN NULL from the insert so the row won't go into the table (not sure, though, what COPY thinks about that...). Those records that do end up in the table have some weird data in them and you can then deal with those rows manually.

Convert Date Stored as VARCHAR into INT to compare to Date Stored as INT

I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:

Get auto Incremented field value in SQL Server 2008 from C# Code

I have the following table:
tbl_ProductCatg
Id IDENTITY
Code
Description
a few more.
Id field is auto-incremented and I have to insert this field value in Code field.
i.e. if Id generated is 1 then in Code field the value should be inserted like 0001(formatted for having length of four),if id is 77 Code should be 0077.
For this, I made the query like:
insert into tbl_ProductCatg(Code,Description)
values(RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4),'testing')
This query runs well in sql server query analyzer but if I write this in C# then it insets Null in Code even Id field is updated well.
Thanks
You may want to look at Computed Columns (Definition)
From what is sounds like you are trying to do, this would work well for you.
CREATE TABLE tbl_ProductCatg
(
ID INT IDENTITY(1, 1)
, Code AS RIGHT('000' + CAST(ID AS VARCHAR(4)), 4)
, Description NVARCHAR(128)
)
or
ALTER TABLE tbl_ProductCatg
ADD Code AS RIGHT('000' + CAST(id AS VARCHAR(4)), 4)
You can also make the column be PERSISTED so it is not calculated every time it is referenced.
Marking a column as PERSISTED Specifies that the Database Engine will physically store the computed values in the table, and update the values when any other columns on which the computed column depends are updated.
Unfortunately SCOPE_IDENTITY isn't designed to be used during an insert so the value will not be populated until after the insert happens.
The three solutions I can see of doing this would be either making a stored procedure to generate the scope identity and then do an update of the field.
insert into tbl_ProductCatg(Description) values(NULL,'testing')
update tbl_ProductCatg SET code=RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4) WHERE id=SCOPE_IDENTITY()
The second option, is taking this a step further and making this into a trigger which runs on UPDATE and INSERT. I've always been taught to avoid triggers where possible and instead do things at the SP level, but triggers are justified in some cases.
The third option is computed fields, as described by #Adam Wenger

T-SQL: what COLUMNS have changed after an update?

OK. I'm doing an update on a single row in a table.
All fields will be overwritten with new data except for the primary key.
However, not all values will change b/c of the update.
For example, if my table is as follows:
TABLE (id int ident, foo varchar(50), bar varchar(50))
The initial value is:
id foo bar
-----------------
1 hi there
I then execute UPDATE tbl SET foo = 'hi', bar = 'something else' WHERE id = 1
What I want to know is what column has had its value changed and what was its original value and what is its new value.
In the above example, I would want to see that the column "bar" was changed from "there" to "something else".
Possible without doing a column by column comparison? Is there some elegant SQL statement like EXCEPT that will be more fine-grained than just the row?
Thanks.
There is no special statement you can run that will tell you exactly which columns changed, but nevertheless the query is not difficult to write:
DECLARE #Updates TABLE
(
OldFoo varchar(50),
NewFoo varchar(50),
OldBar varchar(50),
NewBar varchar(50)
)
UPDATE FooBars
SET <some_columns> = <some_values>
OUTPUT deleted.foo, inserted.foo, deleted.bar, inserted.bar INTO #Updates
WHERE <some_conditions>
SELECT *
FROM #Updates
WHERE OldFoo != NewFoo
OR OldBar != NewBar
If you're trying to actually do something as a result of these changes, then best to write a trigger:
CREATE TRIGGER tr_FooBars_Update
ON FooBars
FOR UPDATE AS
BEGIN
IF UPDATE(foo) OR UPDATE(bar)
INSERT FooBarChanges (OldFoo, NewFoo, OldBar, NewBar)
SELECT d.foo, i.foo, d.bar, i.bar
FROM inserted i
INNER JOIN deleted d
ON i.id = d.id
WHERE d.foo <> i.foo
OR d.bar <> i.bar
END
(Of course you'd probably want to do more than this in a trigger, but there's an example of a very simplistic action)
You can use COLUMNS_UPDATED instead of UPDATE but I find it to be pain, and it still won't tell you which columns actually changed, just which columns were included in the UPDATE statement. So for example you can write UPDATE MyTable SET Col1 = Col1 and it will still tell you that Col1 was updated even though not one single value actually changed. When writing a trigger you need to actually test the individual before-and-after values in order to ensure you're getting real changes (if that's what you want).
P.S. You can also UNPIVOT as Rob says, but you'll still need to explicitly specify the columns in the UNPIVOT clause, it's not magic.
Try unpivotting both inserted and deleted, and then you could join, looking for where the value has changed.
You could detect this in a Trigger, or utilise CDC in SQL Server 2008.
If you create a trigger FOR AFTER UPDATE then the inserted table will contain the rows with the new values, and the deleted table will contain the corresponding rows with the old values.
Alternative option to track data changes is to write data to another (possible temporary) table and then analyse difference with using XML. Changed data is being write to audit table together with column names. Only one thing is you need to know table fields to prepare temporary table.
You can find this solution here:
part 1
part 2
If you are using SQL Server 2008, you should probably take a look at at the new Change Data Capture feature. This will do what you want.
OUTPUT deleted.bar AS [OLD VALUE], inserted.bar AS [NEW VALUE]
#Calvin I was just basing on the UPDATE example. I am not saying this is the full solution. I was giving a hint that you could do this somewhere in your code ;-)
Since I already got a -1 from the above answer, let me pitch this in:
If you don't really know which Column was updated, I'd say create a trigger and use COLUMNS_UPDATED() function in the body of that trigger (See this)
I have created in my blog a Bitmask Reference for use with this COLUMNS_UPDATED(). It will make your life easier if you decide to follow this path (Trigger + Columns_Updated())
If you're not familiar with Trigger, here's my example of basic Trigger http://dbalink.wordpress.com/2008/06/20/how-to-sql-server-trigger-101/

Using UDF for default value of a column

I created a UDF that I am using to generate a default value for a column. It works great, but I want to pass another field as a parameter into the function. Is this possible?
For example, one of the fields is a DealerID field, and I want to pass in the value of the DealerID field into my UDF because I will use it to calculate the new value. Any help would be appreciated!
No, because the default value will be needed before DealerID is known (eg on INSERT)
Edit:
This means that SQL Server does not the value in the table at the time of insert, only after. Therefore, it can not a UDF for the default.
For example, what about a multiple row insert, or where you have NEWID() default?
Now, using logic basic on DealerID: if it's GUID, why? It's an internal, non-user readable value.
If you really need this, you'll have to use a computed column for the "base" value and another column for the "actual" value with ISNULL.
I had a similar issue where I wanted to automatically assign a URL slug for new records inserted to the table. The approach I took was to set the field's default value to 'NOTSET' (just a text placeholder value) then used an insert trigger to update the field ON INSERT to the value of my UDF (where the field value is NOTSET) as follows:
CREATE TRIGGER [dbo].[TR_MyTable_MyTriggerName]
ON [dbo].[tblMyTable]
AFTER INSERT
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Create the geography field from the lat and lon coordinates
UPDATE tblMyTable
SET fldURLSlug = dbo.UDF_MyFunction(INS.[fldRecordTitle])
FROM tblMyTable MT INNER JOIN inserted INS ON MT.fldRecordId = INS.fldRecordId
WHERE INS.fldURLSlug = 'NOTSET'
END
GO
Please correct me if you have a specific reason why you need to use a UDF, but why not just define the default value for the column in your table DDL, which will then be overwritten if you supply a specific value in your UPDATE, INSERT etc.? Using a UDF in a SELECT will cause the function to be executed every row, an overhead you will save if it is taken care of at the table definition level.

Resources