Significance of Constraints in Snowflake - snowflake-cloud-data-platform

Snowflake allows UNIQUE, PRIMARY KEY, FOREIGN KEY and NOT NULL constraints but I read that it enforces only NOT NULL constraint. Then what is the purpose of other keys and under what circumstances do we have to define them? I appreciate any examples.
Thank you,
Prashanth.

They express intent, helping people understand your data models. Data modeling tools can use them to generate diagrams. You can also programmatically access them to validate data integrity yourself.

Constraints
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
Constraints are provided primarily for data modeling purposes and compatibility with other databases, as well as to support client tools that utilize constraints. For example, Tableau supports using constraints to perform join culling (join elimination), which can improve the performance of generated queries and cube refresh.
Constraints could also improve the query performance:
Extended Constraint Properties
RELY | NORELY
Specifies whether a constraint in NOVALIDATE mode is taken into account during query rewrite.
By default, this constraint property is set to NORELY.
If you have ensured that the data in the table does comply with the constraints, you can change this to RELY to indicate that the query optimizer should expect the data in the table to adhere to the constraints. Setting this can improve query performance (e.g. by eliminating unnecessary joins).
Understanding How Snowflake Can Eliminate Redundant Joins
In some cases, a join on a key column can refer to tables that are not needed for the join. If your tables have key columns and you are using and enforcing the UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints, Snowflake can improve query performance by eliminating unnecessary joins on key columns.
Eliminating an Unnecessary Left Outer Join
Eliminating an Unnecessary Self-Join
Eliminating an Unnecessary Join on a Primary Key and Foreign Key

Related

Unique constraint still alows duplicate values

I have a table with a unique constraint on event_time and card_nr, nevertheless, I'm still able to insert duplicate values to the table using the statements below.
The screenshot below shows the table I used as well as my DDL & insert queries.
There is also an EXPLAIN statement to prove that the columns are unique.
An error should be thrown since I'm violating my constraint values?
Supported Constraint Types
Snowflake supports the following constraint types from the ANSI SQL standard:
UNIQUE
PRIMARY KEY
FOREIGN KEY
NOT NULL
...
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.
What is the point of supporting a unique constraint but not enforcing it?
Constraints
Constraints are provided primarily for data modeling purposes and compatibility with other databases, as well as to support client tools that utilize constraints. For example, Tableau supports using constraints to perform join culling (join elimination), which can improve the performance of generated queries and cube refresh.

Creating a SQL database without defining primary key

So in my work environment we don't use a 'primary key' as defined by SQL Server. In other words, we don't right click a column and select "set as primary key".
We do however still have primary keys, we just use a unique ID column. In stored procedures we use these to access the data like you would in any relational database.
My question is, other than the built in functionality that comes with defining a primary key in SQL Server like Entity Framework stuff etc. Is there a good reason to use the 'primary key' functionality over just using a unique ID column and accessing your tables with that in your own stored procedures?
The biggest drawback I see (again other than being able to use Entity Framework and things like that) is that you have to mentally keep track or otherwise keep track of what ID relates to what tables.
There is nothing "special" about the PRIMARY KEY constraint. It's just a uniqueness constraint and you can achieve the same results by using the UNIQUE NOT NULL syntax to define your keys instead.
However, uniqueness constraints (i.e. keys in general, not "primary" keys specifically) are very important for data integrity reasons. They ensure that your data is unique which means that sensible, meaningful results can be derived from your data. It's extremely difficult to get accurate results from a database that contains duplicate data. Also, uniqueness constraints are required to enforce referential integrity between tables, which is another very important aspect of data integrity. Poor data integrity is a data management problem that costs businesses billions of dollars every year and that's the bottom line of why keys are important.
There is a further reason where unique indexes are important: query optimization and performance. Unique indexes improve query performance. If your data is supposed to be unqiue then creating a unique index on it will give the query optimizer the best chance of picking a good execution plan for your queries.
I think the drawback is not using the primary key at all and using a unique key constraint for something it wasn't intended to do.
Unique keys: You can have many of them. They are meant to offer a way to determine uniqueness among rows.
Primary key: like the Highlander, there can only be one. It's intended use is to identify the rows of the table.
I can't think of any good reason not to use a primary key. My opinion is that without a primary key, your table isn't actually a table. It's just a lump of data.
Follow Up: If you don't believe me, check out this guy who asked a bunch of DBA's if it was OK not to use a primary key.
Is it OK not to use a Primary Key When I don't Need one
There are philosophical and practical answers to your question.
The practical answer is that using the primary key constraint enforces "not null", and "unique". This protects you from application-level bugs.
The philosophical answer is that you want developers to operate at the highest possible level of abstraction, so that they don't have to stuff their brain full of detail when trying to solve problems.
Primary and foreign keys are abstractions that allow us to make assumptions about the underlying data model. We can think in terms of (business) entities, and their relationships.
In your workplace, you're forcing developers to think in terms of tables and indexes and conventions. You no longer think about "customers" and "orders" and "line items", but about software artefacts that represent those business entities, and the "we always represent uniqueness by a combination of a GUID and unique index" rule. That mental model is already complicated enough in most applications; you're just making it harder for yourselves, especially when bringing new developers into the team.

Do databases use foreign keys transparently?

Do database engines utilize foreign keys transparently or a query should explicitly use them?
Based on my experience there is no explicit notion of foreign keys on a table, except that a constraint that maintains uniqueness of the key and the fact that the key (single or a group of fields) is a key which makes search efficient.
To clarify this, here is an example why it is important: I have a middleware (in particular ArcGIS for my case), for which I can control the back-end database (so I can create keys, indices, etc.) and I usually use the front (a RESTful API here). The middleware itself is a black box and to provide effective tools to take advantage of the underlying DBMS's capabilities. So what I want to understand is that if I build foreign key constraints and use queries that if implemented normally would translate into queries that would use those foreign keys, should I see performance improvements?
Is that generally the case or various engines do it differently? (I am using PostgresSQL).
Foreign keys aren't there to improve performance. They're there to enforce data integrity. They will decrease performance for inserts/updates/deletes, but they make no difference to queries.
Some DBMSs will automatically add an index to the foreign key field, which may be where the confusion is coming from. Postgres does not do this; you'll need to create the index yourself. (And yes, the database will use this index transparently.)
As far as I know Database engines needs specific queries to use foreign keys. You have to write some sort of join queries to get data from related tables.
However some Data access framework hides the complexity of accessing data from foreign keys by providing transparent way of accessing data from related tables but I am not sure that may provide much improvement in performance.
This is completely depends on the database engine.
In PostgreSQL constraints won't cause performance improvements directly, only indexes will do that.
CREATE INDEX is a PostgreSQL language extension. There are no provisions for indexes in the SQL standard.
However, adding some constraints will automatically create an index for that column(s) -- f.ex. UNIQUE & PRIMARY KEY constraints creates a btree index on the affected column(s).
The FOREIGN KEY constraint won't create indexes on the referencing column(s), but:
A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Obtain referential integrity at the expense of 2NF- is it a reasonable trade off?

Consider the following two tables:
Table A: [K1, K2, PropA]
Table B: [K3, PropB]
The primary key of Table A is composite [K1,K2].
The primary key of table B is K3.
Table has an inclusive dependency on on table B- values in K3 have to be matched by value in K2. Unfortunately, since K2 is not a unique primary key, I can't define an foreign key constraint on these columns.
as I see it the solution is to either enforce it in the application layer or to propagate the K1 column to table B, so that it will contain the entire foreign key of table A.
My question: is this considered a good or bad practice in DB design? Assume that adding the extra column is not a problem from a maintenance perspective or integrity perspective (insert is transactional).
I am using MSSQL and Oracle.
Create table C, with just one attribute, K2, which is the primary key. Now you can reference K2 with a foreign key constraint.
It would probably be a bad idea to put the K1 attribute into table B. If I've understood you correctly doing that looks like it would create a non-key dependency in violation of 2NF.
Your question does raise an important point: the often poor support for data integrity constraints offered in modern database software means database designers sometimes have to make some awkward compromises.
"... the solution is to either enforce it in the application layer or to propagate ..."
You see right or you see wrong, depending on perspective.
The true solution is for the DBMS vendors to support the general case of inclusion dependencies, of which the foreign key is only a special case.
But as long as the DBMS vendors don't do that, you are right, as long as you keep restraining yourself to those vendors' offerings, you'll have no other choice than to move constraint enforcement to the application [*], or screw up the design with all those ugly hacks you need in order to get your constraint enforced using just FKs.
A system that offers you a solution that involves neither of "enforcement in the application" and "screw up the design" is at shark.armchair.mb.ca/~erwin. Disclosure : that project is of my own making.
[*] or put all the constraint enforcement code in triggers, such that at least a new application cannot "forget" to enforce some given existing business rule.

Relationship between tables to prevent insertion of records where foreign key doesn't exist

Hi I've set up two very basic tables. One table will act as a look up, with an identity field as a primary key. The other table uses the look up ID as a foreign key.
I have created a relationship constraint so now I cannot delete from the look up if the foreign key is used in the "main" table.
However my issue is i can add a record with a foreign key that doesn't exist.
To my way of thinking this shouldn't be allowed, can anyone tell me what setting I need to use to enforce this and whether this is typical database design or not?
Thanks Dave
You way of thinking is correct. Good database design provides some way of enforcing what is called "Referential Integrity". This is simply a buzzword for the concept you have derived on your own. Namely that a foreign key should be rejected if it refers to a non existent row. For a general discussion of referential integrity, see the following Wikipedia article. It's short.
http://en.wikipedia.org/wiki/Referential_integrity
Some pprogrammers would like to enforce referential integrity inside their programs. In general, it's a much better plan to define a referential integrity constraint inside the database, and let the DBMS do the enforcement. It's easier, it's faster, and it's more effective.
The SQL Data Definition Language (DDL) provides a way to declare a foreign key constraint when you create a table. The syntax differs a little between different dialects of SQL, but it's basically the same idea in all of them. Here's a capsule summary.
http://www.w3schools.com/sql/sql_foreignkey.asp
The documentation for SQL Server should have a description of the referential integrity constraint under the CREATE TABLE command.

Resources