Snowflake Unique column allowing duplicate entries

Snowflake Unique column allowing duplicate entries - snowflake-cloud-data-platform

I have a simple table like this:
CREATE OR REPLACE TABLE ETL_LOG (
NAME VARCHAR(1000) NOT NULL,
SCHEMA_NAME VARCHAR(1000) NOT NULL,
QUERY_TEXT VARCHAR(50000) NOT NULL,
STATE VARCHAR(1000) NOT NULL,
ERROR_CODE VARCHAR(1000) NULL,
ERROR_MESSAGE VARCHAR(500000) NULL,
SCHEDULED_TIME TIMESTAMP_LTZ(3) NOT NULL,
NEXTS_SCHEDULED_TIME TIMESTAMP_LTZ(3) NULL,
COMPLETED_TIME TIMESTAMP_LTZ(3) NOT NULL,
RUN_ID VARCHAR(5000) NOT NULL,
UNIQUE(RUN_ID)
);
When I insert data, despite the unique run it I get stuff like this. No idea why this might be. I have only displayed the unique value (RUN_ID) and completed time here. What causes this? Whitespaces not present in the actual data.
+-------------------------------+-------------------------+
| COMPLETED_TIME | RUN_ID |
+-------------------------------+-------------------------+
| 2020-04-30 01:05:30.034 -0700 | 1588233900020 |
| 2020-04-30 01:05:30.034 -0700 | 1588233900020 |
| 2020-04-30 01:06:17.659 -0700 | 1588233960000 |
| 2020-04-30 01:06:17.659 -0700 | 1588233960000 |
+-------------------------------+-------------------------+

I think the title is misleading. If you are using Snowflake (according to the screenshot and the tag you choose), please note that Snowflake does not enforce primary keys:
Snowflake supports defining and maintaining constraints, but does not
enforce them, except for NOT NULL constraints, which are always
enforced.
https://docs.snowflake.com/en/sql-reference/constraints-overview.html#supported-constraint-types

Due to the limitations pointed out above I ended up creating this work around.
CREATE OR REPLACE table ETL_LOG_DEDUP as select distinct * from ETL_LOG;
I put this in a task that will simply retrieve the non duplicated rows from the original. Maybe not ideal but it will do what I need it to. I created a third task that just truncates the primary table once a week.
Just for context we have something like ~3k tasks running daily and several other monitoring solutions along the data pipeline. I just wanted a very clear and efficient log due to the limitations of the native logging feature in Snowflake. As it only contains task history for a limited time.

Related

Add NOT NULL column without DEFAULT but WITH VALUES

I'm using SQL Server 2017 and I want to add a NOT NULL column without DEFAULT but supply a values for current record e.g. using WITH VALUES in a single query.
Let me explain. I understand the fact that I cannot create a NOT NULL column without supplying values. But a DEFAULT clause sets a default value for this column also for future inserts which I don't want. I want a default value to be used only for adding this new column and that's it.
Let me explain.
Assume such a sequence of queries:
CREATE TABLE items (
id INT NOT NULL PRIMARY KEY IDENTITY(1,1)
);
ALTER TABLE items ADD name VARCHAR(255) NOT NULL; -- No default value because table is empty
INSERT INTO items(name) VALUES( 'test'); -- ERROR
Last query gives error (as expected):
Error: Cannot insert the value NULL into column 'description', table 'suvibackend.dbo.items'; column does not allow nulls. INSERT fails.
It is so because we didn't supply value for description column. It's obvious.
Let's consider a situation when there are some records in items table. Without a DEFAULT and WITH VALUES clauses it will fail (obviously) so let's use them now:
CREATE TABLE items (
id INT NOT NULL PRIMARY KEY IDENTITY(1,1),
name varchar(255) NOT NULL
);
INSERT INTO items(name) VALUES ('name-test-1');
INSERT INTO items(name) VALUES ('name-test-2');
ALTER TABLE items ADD description VARCHAR(255) NOT NULL DEFAULT 'no-description' WITH VALUES;
So now our table looks like this as expected:
SELECT * FROM items;
--------------------------------------
| id | name | description |
| --- | ----------- | -------------- |
| 1 | name-test-1 | no-description |
| 2 | name-test-2 | no-description |
--------------------------------------
But from now on, it is possible to INSERT records without description:
INSERT INTO items(name) VALUES ('name-test-3'); -- No description column
SELECT * FROM ITEMS;
--------------------------------------
| id | name | description |
| --- | ----------- | -------------- |
| 1 | name-test-1 | no-description |
| 2 | name-test-2 | no-description |
| 3 | name-test-3 | no-description |
--------------------------------------
But when we compare this to our first situation (empty table without DEFAULT clause) it is different. I still want an error regarding NULL and description column.
SQL Server has created a default constraint for this column which I don't want to have.
The solution is to either drop a constraint after adding a new column with DEFAULT clause, or to split adding new column into 3 queries:
CREATE TABLE items
(
id INT NOT NULL PRIMARY KEY IDENTITY(1,1),
name varchar(255) NOT NULL
);
INSERT INTO items(name) VALUES ('name-test-1');
INSERT INTO items(name) VALUES ('name-test-2');
ALTER TABLE items
ADD description VARCHAR(255) NULL;
UPDATE items
SET description = 'no description'
ALTER TABLE items
ALTER COLUMN description VARCHAR(255) NOT NULL;
INSERT INTO items(name)
VALUES ('name-test-3'); -- ERROR as expected
My question:
Is there a way to achieve it in a single query, but without having a default constaint created?
It would be nice if it is possible to use a default value just for a query without permanently creating a constraint.

Although you can't specify an ephemeral default constraint that's automatically dropped after adding the column (i.e. single statement operation), you can explicitly name the constraint to facilitate dropping it immediately afterward.
ALTER TABLE dbo.items
ADD description VARCHAR(255) NOT NULL
CONSTRAINT DF_items_description DEFAULT 'no-description' WITH VALUES;
ALTER TABLE dbo.items
DROP CONSTRAINT DF_items_description;
Explict constraint names are a best practice, IMHO, as it makes subsequent DDL operations easier.

What is the insert query to insert the password in secure form in mssql?

I am using quires as below
Create table UserDetails
(
UserId int primary key identity(1,1),
UserName nvarchar(50) Not Null,
UserContactNumber int Not Null,
UserEmail varchar(50) Not Null,
UserPassword nvarchar(50) Not Null,
UserConfirmPassword nvarchar(50) Not Null
)
Insert into UserDetails
values ('Shefali',36547895,'s.jain#gmail.com',HASHBYTES('MD5','Shefali1234$')
,HASHBYTES('MD5','Shefali1234$'))
Result:
+---+---------+----------+------------------+---------------+---------------+
| 1 | Shefali | 36547895 | s.jain#gmail.com | ꉹ㇒ᆔ唡鈑쳕켆� | ꉹ㇒ᆔ唡鈑쳕켆� |
+---+---------+----------+------------------+---------------+---------------+

HASHBYTES returns a VARBINARY(16) for MD5 (Microsoft Docs). What you're seeing is correct and what you'd expect by what you're doing but BINARY(16) would give you something a little nicer to look at. You don't try to get the password out of the DB, rather you hash on the webserver and test against the hashed value.
Also, you don't need to store both the password and the password confirmation. It should be enough to test that they match before they get stored, probably in javascript in the frontend.
Further, be certain that what you're doing is what you want to be doing. MD5 is not considered secure. EG
https://codahale.com/how-to-safely-store-a-password/
If you do go with bcrypt you can store it in BINARY(60) as in https://stackoverflow.com/a/5882472/2281968

SQL Server - Database import from CSV/XLS

Have a basic question regarding how to solve this import problem.
Have a CSV with ca. 40 fields, that need to be inserted across ca. 5 tables.
Let say tables are like this
tpeople
Column Name | Datatype
GUID | uniqueidentifier
Fname | varchar
Lname | varchar
UserEnteredGUID | uniqueidentifier
tcompany
Column Name | Datatype
GUID | uniqueidentifier
CompanyTypeGUID | uniqueidentifier
PrintName | varchar
Website | varchar
tcompanyLocation
Column Name | Datatype
GUID | uniqueidentifier
CompanyGUID | uniqueidentifier
City | varchar
As we can see database is normalized and we can see different GUIDs.
My question is, when I will write for example a Python script to enter data, how should I handle the GUIDs?
For example I want to add:
Fname: John
Lname: Smith
Company: IBM
Location: New York
Website: www.ibm.com
UserEntered: Admin
How do I make sure all relations/GUID are correct?
I would try:
insert into tpeople(GUID,Fname,Lname,UserEnteredGUID) values("","John","Smith",???)
Question
How to get UserEnteredGUID? Do I have to make a select on GUID from UserEntered table where user equals "Admin"?
Or here:
insert into tcompany(GUID,CompanyTypeGUID,PrintName,Website) values("",??,"IBM","www.ibm.com")
Here the same? How should I handle CompanyTypeGUID? It would also mean that I have to populate CompanyType table BEFORE I add anything to tcompany table?
This does not look right to me, kind of thinking from back to forward, thinking how each table is connected to the other one .... there has to be a way to insert records to normalized database where this GUID, Foreign Keys stuff has to be somehow automated.
I hope somebody got my problem and can guide me towards solution.
Thanks!

how do I model subtyping in a relational schema?

Is the following DB-schema ok?
REQUEST-TABLE
REQUEST-ID | TYPE | META-1 | META-2 |
This table stores all the requests each of which has a unique REQUEST-ID. The TYPE is either A, B or C. This will tell us which table contains the specific request parameters. Other than that we have the tables for the respective types. These tables store the parameters for the respective requests. META-1 are just some additional info like timestamps and stuff.
TYPE-A-TABLE
REQUEST-ID | PARAM_X | PARAM_Y | PARAM_Z
TYPE-B-TABLE
REQUEST-ID | PARAM_I | PARAM_J
TYPE-C-TABLE
REQUEST-ID | PARAM_L | PARAM_M | PARAM_N | PARAM_O | PARAM_P | PARAM_Q
The REQUEST-ID is the foreign key into the REQUEST-TABLE.
Is this design normal/best-practice? Or is there a better/smarter way? What are the alternatives?
It somehow feels strange to me, having to do a query on the REQUEST-TABLE to find out which TYPE-TABLE contains the information I need, to then do the actual query I'm interested in.
For instance imagine a method which given an ID should retrieve the parameters. This method would need to do 2 db-access.
- Find correct table to query
- Query table to get the parameters
Note: In reality we have like 10 types of requests, i.e. 10 TYPE tables. Moreover there are many entries in each of the tables.
Meta-Note: I find it hard to come up with a proper title for this question (one that is not overly broad). Please feel free to make suggestions or edit the title.

For exclusive types, you just need to make sure rows in one type table can't reference rows in any other type table.
create table requests (
request_id integer primary key,
request_type char(1) not null
-- You could also use a table to constrain valid types.
check (request_type in ('A', 'B', 'C', 'D')),
meta_1 char(1) not null,
meta_2 char(1) not null,
-- Foreign key constraints don't reference request_id alone. If they
-- did, they might reference the wrong type.
unique (request_id, request_type)
);
You need that apparently redundant unique constraint so the pair of columns can be the target of a foreign key constraint.
create table type_a (
request_id integer not null,
request_type char(1) not null default 'A'
check (request_type = 'A'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_x char(1) not null,
param_y char(1) not null,
param_z char(1) not null
);
The check() constraint guarantees that only 'A' can be stored in the request_type column. The foreign key constraint guarantees that each row will reference an 'A' row in the table "requests". Other type tables are similar.
create table type_b (
request_id integer not null,
request_type char(1) not null default 'B'
check (request_type = 'B'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_i char(1) not null,
param_j char(1) not null
);
Repeat for each type table.
I usually create one updatable view for each type. The views join the table "requests" with one type table. Application code uses the views instead of the base tables. When I do that, it usually makes sense to revoke privileges on the base tables. (Not shown.)
If you don't know which type something is, then there's no alternative to running one query to get the type, and another query to select or update.
select request_type from requests where request_id = 42;
-- Say it returns 'A'. I'd use the view type_a_only.
update type_a_only
set param_x = '!' where request_id = 42;
In my own work, it's pretty rare to not know the type, but it does happen sometimes.

The phrase you may be looking for is "how do I model inheritance in a relational schema". It's been asked before. Whilst this is a reference to object oriented software design, the basic question is the same: how do I deal with data where there is a "x is a type of y" relationship.
In your case, "request" is the abstract class, and typeA, TypeB etc. are the subclasses.
Your solution is one of the classic answers - "table per subclass". It's clean and easy to maintain, but does mean you can have multiple database access requests to retrieve the data.

Alter table add column not null on empty table in netezza

SYSTEM.ADMIN(ADMIN)=> create table test ( name varchar(20), age int);
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> alter table test add column dob varchar(20) NOT NULL;
ERROR: ALTER TABLE: not null constraint for column "DOB" not allowed without default value
Do we have to specify a default value after not null even on empty table?
SYSTEM.ADMIN(ADMIN)=> alter table test add column dob varchar(20) NOT NULL DEFAULT '0';
ALTER TABLE
Is this expected behavior ?

You can create the table from scratch without specifying a default value.
create table test ( name varchar(20)
, age int
,dob varchar(20) NOT NULL );
However when adding a column it is required in postgresql (netezza) to specify a default value to fill any nulls that would be present. This is expected. The sequence to remove the default is as follows:
create table test ( name varchar(20), age int);
ALTER TABLE test add column dob varchar(20) NOT NULL default 'a';
ALTER TABLE test ALTER COLUMN dob DROP DEFAULT;
How can I add a column to a Postgresql database that doesn't allow nulls?

This behavior is expected. When altering a table, Netezza uses a versioned table approach. If you add a column to a table, there will actually be two different table versions under the covers which are presented as a single table to the user.
The original table version (the one without the new NOT NULL DEFAULT column) is not modified until a GROOM VERSIONS collapses the versions again into a single underlying table. The upside here is that the alter is fast because it doesn't require a scan/update of the existing rows. Instead it knows to provide the DEFAULT value for column that doesn't exist in the original underlying table version.
When altering a table to add a column with the NOT NULL property, the system requires a DEFAULT specification so that it knows how to represent the added column. This is required whether the table actually has any rows or not.
TESTDB.ADMIN(ADMIN)=> CREATE TABLE TEST ( NAME VARCHAR(20), AGE INT);
CREATE TABLE
TESTDB.ADMIN(ADMIN)=> insert into test values ('mine',5);
INSERT 0 1
TESTDB.ADMIN(ADMIN)=> ALTER TABLE TEST ADD COLUMN DOB VARCHAR(20) NOT NULL DEFAULT '0';
ALTER TABLE
TESTDB.ADMIN(ADMIN)=> insert into test values ('yours',50);
INSERT 0 1
TESTDB.ADMIN(ADMIN)=> select* from test;
NAME | AGE | DOB
-------+-----+-----
yours | 50 | 0
mine | 5 | 0
(2 rows)
The good news is that you can then alter the newly added column to remove that default.
TESTDB.ADMIN(ADMIN)=> ALTER TABLE TEST ALTER COLUMN DOB DROP DEFAULT;
ALTER TABLE
TESTDB.ADMIN(ADMIN)=> \d test
Table "TEST"
Attribute | Type | Modifier | Default Value
-----------+-----------------------+----------+---------------
NAME | CHARACTER VARYING(20) | |
AGE | INTEGER | |
DOB | CHARACTER VARYING(20) | NOT NULL |
Distributed on random: (round-robin)
Versions: 2
TESTDB.ADMIN(ADMIN)=> select * from test;
NAME | AGE | DOB
-------+-----+-----
yours | 50 | 0
mine | 5 | 0
(2 rows)
As a parting note, it's important to groom any versioned tables as promptly as possible in order to keep your performance from degrading over time due to the nature of versioned tables.
TESTDB.ADMIN(ADMIN)=> GROOM TABLE TEST VERSIONS;
NOTICE: Groom will not purge records deleted by transactions that started after 2015-07-27 01:32:16.
NOTICE: If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "TEST"'
NOTICE: Groom processed 1 pages; purged 0 records; scan size unchanged; table size unchanged.
GROOM VERSIONS
TESTDB.ADMIN(ADMIN)=> \d test
Table "TEST"
Attribute | Type | Modifier | Default Value
-----------+-----------------------+----------+---------------
NAME | CHARACTER VARYING(20) | |
AGE | INTEGER | |
DOB | CHARACTER VARYING(20) | NOT NULL |
Distributed on random: (round-robin)
At this point the table is no longer a versioned table an all values for the NOT NULL columns are fully materialized.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight