I have a table with CDC enabled that's throwing the following weird behavior.
In an update where one of three nullable columns already has a value [8,23|NULL|NULL] and I update only the other two columns [AlexJ, 1], CDC tracks a change against all three columns.
2018-06-22 13:55:37.763 NULL NULL NULL
2018-06-22 13:55:37.763 8,23 AlexJ 1
I use a templated query to get these data from the cdc.dbo_Tablename_CT table.
...
SELECT sys.fn_cdc_map_lsn_to_time([__$start_lsn]) AS 'ModifiedDate',
[Tags],[ModifiedBy], [IsInactive]
FROM cdc.fn_cdc_get_all_changes_dbo_Tablename
(#from_lsn, #to_lsn, N'all update old')
WHERE Id = #Id
...
How do I get around this? It's most annoying and may direct me away from using CDC, not that deploying and maintaining a CDC'ed table is a walk in the park in the best of times.
https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-capture-instance-ct-transact-sql?view=sql-server-2017
It’s always going to put null for varchar(max) … the tags column.
Large Object Data Types
Columns of data type image, text, and ntext are always assigned a NULL value when _$operation = 1 or _$operation = 3. Columns of data type varbinary(max), varchar(max), or nvarchar(max) are assigned a NULL value when __$operation = 3 unless the column changed during the update. When __$operation = 1, these columns are assigned their value at the time of the delete. Computed columns that are included in a capture instance always have a value of NULL.
Related
I've created a table and added default values for some columns.
E.g.
Create table Table1( COL1 NUMBER(38,0),
COL2 STRING,
MODIFIED_DT STRING DEFAULT CURRENT_DATE(),
IS_USER_MODIFIED BOOLEAN DEFAULT 'FALSE' )
Current Behavior:
During data load, I see that when running inserts, my column 'MODIFIED_DT' is getting inserted with default values.
However, if there are any subsequent updates, the default value is not getting updated.
Expected Behavior:
My requirement is that the column value should be automatically taken care by ANY INSERT/UPDATE operation.
E.g. In SQL Server, if I add a Default, the column value will always be inserted/updated with the default values whenever a DML operation takes place on the record
Is there a way to make it work? or does default value apply only to Inserts?
Is there a way to add logic to the DEFAULT values.
E.g. In the above table's example, for the column IS_USER_MODIFIED, can I do:
Case when CURRENT_USER() = 'Admin_Login' then 'FALSE' Else 'TRUE' end
If not, is there another option in snowflake to implement such functionality?
the following is generic to most (all?) databases and is not specific to Snowflake...
Default values on columns in table definitions only get inserted when there is no explicit reference to that column in an INSERT statement. So if I have a table with 2 columns (column_a and column_b and with a default value for column_b) and I execute this type of INSERT:
INSERT INTO [dbo].[doc_exz]
([column_a])
VALUES
(3),
(2);
column_b will be set to the default value. However, with this INSERT statement:
INSERT INTO [dbo].[doc_exz]
([column_a]
,[column_b])
VALUES
(5,1),
(6,NULL);
column_b will have values of 1 and null. Because I have explicitly referenced column_b the value I use, even if it is NULL, will be written to the record even though that column definition has a default value.
Default values only work with INSERT statements not UPDATE statements (as an existing record must have a "value" in the column, even if it is a NULL value, so when you UPDATE it the default doesn't apply) - so I don't believe your statement about defaults working with updates on SQL Server is correct; I've just tried it, just to be sure, and it doesn't.
Snowflake-specific Answer
Given that column defaults only work with INSERT statements, they are not going to be a solution to your problem. The only straightforward solution I can think of is to explicitly include these columns in your INSERT/UPDATE statements.
You could write a stored procedure to do the INSERT/UPDATES, and automatically populate these columns, but that would perform poorly for bulk changes and probably wouldn't be simple to use as you'd need to pass in the table name, the list of columns and the list of values.
Obviously, if you are inserting/updating these records using an external tool you'd put this logic in the tool rather than trying to implement it in Snowflake.
Snowflake has a "DERIVED COLUMN" feature. These columns are VIRTUAL/COMPUTED and are not used in ETL process. However, any DML activity will automatically influence the column values.
Nice thing is, we can even write CASE logic in the column definition. This solved my problem.
CREATE OR REPLACE TABLE DB_NAME.DBO.TEST_TABLE
(
FILE_ID NUMBER(38,0),
MANUAL_OVERRIDE_FLG INT as (case when current_user() = 'some_admin_login' then 0 else 1 end),
RECORD_MODIFIED_DT DATE as (CURRENT_DATE()),
RECORD_MODIFIED_BY STRING as (current_user())
);
I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:
I've written an Oracle DB Conversion Script that transfers Data from a previous singular table into a new DB with a main table and several child/reference/maintenance tables. Naturally, this more standardized layout (previous could have, say Bob/Storage Room/Ceiling as the [Location] value) has more fields than the old table and thus cannot be exactly converted over.
For the moment, I have inserted a record value (ex.) [NO_CONVERSION_DATA] into each of my child tables. For my main table, I need to set (ex.) [Color_ID] to 22, [Type_ID] to 57 since there is no explicit conversion for these new fields (annually, all of these records are updated, and after the next update all records will exist with proper field values whereupon the placeholder value/record [NO_CONVERSION_DATA] will be removed from the child tables).
I also similarly need to set [Status_Id] something like the following (not working):
INSERT INTO TABLE1 (STATUS_ID)
VALUES
-- Status was not set as Recycled, Disposed, etc. during Conversion
IF STATUS_ID IS NULL THEN
(CASE
-- [Owner] field has a value, set ID to 2 (Assigned)
WHEN RTRIM(LTRIM(OWNER)) IS NOT NULL THEN 2
-- [Owner] field has no value, set ID to 1 (Available)
WHEN RTRIM(LTRIM(OWNER)) IS NULL THEN 1
END as Status)
Can anyone more experienced with Oracle & PL/SQL assist with the syntax/layout for what I'm trying to do here?
Ok, I figured out how to set the 2 specific columns to the same value for all rows:
UPDATE TABLE1
SET COLOR_ID = 24;
UPDATE INV_ASSETSTEST
SET TYPE_ID = 20;
I'm still trying to figure out setting the STATUS_ID based upon the value in the [OWNER] field being NULL/NOT NULL. Coco's solution below looked good at first glace (regarding his comment, not the solution posted, itself), but the below causes each of my NON-NULLABLE columns to flag and the statement will not execute:
INSERT INTO TABLE1(STATUS_ID)
SELECT CASE
WHEN STATUS_ID IS NULL THEN
CASE
WHEN TRIM(OWNER) IS NULL THEN 1
WHEN TRIM(OWNER) IS NOT NULL THEN 2
END
END FROM TABLE1;
I've tried piecing a similar UPDATE statement together, but so far no luck.
Try with this
INSERT INTO TABLE1 (STATUS_ID)
VALUES
(
case
when TATUS_ID IS NULL THEN
(CASE
-- [Owner] field has a value, set ID to 2 (Assigned)
WHEN RTRIM(LTRIM(OWNER)) IS NOT NULL THEN 2
-- [Owner] field has no value, set ID to 1 (Available)
WHEN RTRIM(LTRIM(OWNER)) IS NULL THEN 1
END )
end);
I am using SSIS to move excel data to a temp sql server table and from there to the target table.
So my temp table consists of only varchar columns - my target table expects money values for some columns. In my temp table the original excel columns have a formula but leave an empty cell on some rows which is represented by the temp table with an empty cell as well. But when I cast one of these columns to money these originally blank cells become 0,00 in the target column.
Of course that is not what I want, so how can I get NULL values in there? Keeping in mind that it is possible that a wanted 0,00 shows up in one of these columns.
I guess I would need to edit my temp table to turn the empty cells to NULL. Can I do this from within a SSIS package or is there a setting for the table I could use?
thank you.
For existing data you can write a simple script that updates data to NULL where empty.
UPDATE YourTable SET Column = NULL WHERE Column = ''
For inserts you can use NULLIF function to insert nulls if empty
INSERT INTO YourTable (yourColumn)
SELECT NULLIF(sourceColum, '') FROM SourceTable
Edit: for multiple column updates you need to combine the two solutions and write something like:
UPDATE YourTable SET
Column1 = NULLIF(Column1, '')
, Column2 = NULLIF(Column2, '')
WHERE Column1 = '' OR Column2 = ''
etc
That will update all
I know that the value itself for a RowVersion column is not in and of itself useful, except that it changes each time the row is updated. However, I was wondering if they are useful for relative (inequality) comparison.
If I have a table with a RowVersion column, are either of the following true:
Will all updates that occur simultaneously (either same update statement or same transaction) have the same value in the RowVersion column?
If I do update "A", followed by update "B", will the rows involved in update "B" have a higher value than the rows involved in update "A"?
Thanks.
From MSDN:
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. This tracks a relative time within a database, not an actual time that can be associated with a clock. Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column.
http://msdn.microsoft.com/en-us/library/ms182776.aspx
As far as I understand, nothing ACTUALLY happens simultaneously in the system. This means that all rowversions should be unique. I venture to say that they would be effectively useless if duplicates were allowed within the same table. Also giving credance to rowversions not being duplicated is MSDN's stance on not using them as primary keys not because it would cause violations, but because it would cause foreign key issues.
According to MSDN, "The rowversion data type is just an incrementing number..." so yes, later is larger.
To the question of how much it increments, MSDN states, "[rowversion] tracks a relative time within a database" which indicates that it is not a fluid integer incrementing, but time based. However, this "time" reveals nothing of when exactly, but rather when in relation to other rows a row was inserted/modified.
Some additional information.
RowVersion converts nicely to bigint and thus one can display better readable output when debugging:
CREATE TABLE [dbo].[T1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NULL,
[RowVer] [timestamp] NOT NULL
)
insert into t1 ([value]) values ('a')
insert into t1 ([value]) values ('b')
insert into t1 ([value]) values ('c')
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'x' where id = 3
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'y'
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
Id Value RowVer
1 a 2037
2 b 2038
3 c 2039
Id Value RowVer
1 a 2037
2 b 2038
3 x 2040
Id Value RowVer
1 y 2041
2 y 2042
3 y 2043
I spent ages trying to sort something out with this - to ask for columns updated after a particular sequence number. The timestamp is really just a sequence number - it's also bigendian when c# functions like BitConverter.ToInt64 want littleendian.
I ended up creating a db view on the table i want data from with an alias column 'SequenceNo'
SELECT ID, CONVERT(bigint, Timestamp) AS SequenceNo
FROM dbo.[User]
c# Code first sees the view (ie UserV) identically to a normal table
then in my linq I can join the view and parent table and compare with a sequence number
var users = (from u in context.GetTable<User>()
join uv in context.GetTable<UserV>() on u.ID equals uv.ID
where mysequenceNo < uv.SequenceNo
orderby uv.SequenceNo
select u).ToList();
to get what I want - all the entries changed since the last time I checked.
What makes you think Timestamp data types are evil? The data type is very useful for concurrency checking. Linq-To-SQL uses this data type for this very purpose.
The answers to your questions:
1) No. This value is updated each time the row is updated. If you are updating the row say five times, each update will increment the Timestamp value. Of course, you realize that updates that "occur simultaneously" really don't. They still only occur one at a time, in turn.
2) Yes.
Just as a note, timestamp is deprecated in SQL Server 2008 onwards. rowversion should be used instead.
From this page on MSDN:
The timestamp syntax is deprecated. This feature will be removed in a
future version of Microsoft SQL Server. Avoid using this feature in
new development work, and plan to modify applications that currently
use this feature.
Rowversion does break one of the "idealistic" approaches of SQL - that an UPDATE statement is a single, atomic action, and acts as if all UPDATEs (both to all columns within a row, and all rows within the table) occur "at the same time". But in this case, with Rowversion, it is possible to determine that one row was updated at a slightly different time than another.
Note that the order in which rows are updated (by a single update statement) is not guaranteed - it may, by coincidence follow the same order as the clustered key for the table, but I wouldn't count on that being true.
To answer part of your question: you can end up with duplicate values according to MSDN:
Duplicate rowversion values can be generated by using the SELECT INTO
statement in which a rowversion column is in the SELECT list. We do
not recommend using rowversion in this manner.
Source: rowversion (Transact-SQL)
Every database has a counter that is incremented one by one on every data modification that is done in the database. If the table containing the affected (by update/insert) row contains a timestamp/rowversion column, the current counter value of the database is stored in that column of the updated/inserted record.