Foreign Key Useful in SQLite? - database

I have two tables 'Elements' and 'Lists'
Lists has a primary key and a list name.
Elements has data pertaining to an individual entry in the list.
Elements needs a column that holds which list the element is in.
I've read about SQL's foreign key constraint and figure that is the best way to link the tables, but I'm using SQLite which doesn't enforce the foerign key constraint.
Is there a point to declaring the foreign key constraint if there is no enforcement?

It's always good to do, even if your database doesn't enforce the constraint (old MySQL, for instance). The reasoning for this, is that someday, someone will try reading your schema (perhaps even yourself).
If you can't use the new version, you can still declare the constraint and enforce it with triggers. In either case, I wouldn't omit the notation. It's far too helpful.

Nowadays sqlite enforces foreign keys, download the new release.

A foreign key is a field (or fields)
that points to the primary key of
another table. The purpose of the
foreign key is to ensure referential
integrity of the data. In other words,
only values that are supposed to
appear in the database are permitted.
It only enforces the "business rule". If you require this from the business side, then yes, it is required.
Indexing will not be affected.
You can still create indexes as requred.
Have a look at Foreign Key
and
Wikipedia Foreign key

Related

ForeignKey column comes from a lot of tables

So i have a table in which i have a column named parentKey. And this column has actually keys (which by definition are foreign keys) to MANY other tables (at least 4). And it seems strange to me to even create a column like this. I haven't yet seen a construction of a table that had this. Because you can't add a foreign key constraint since the column doesn't link to one single table. So i don't know is this is allowed to exist. I mean it's there it is created but i'm not sure if i should let it like this.
My idea is to create a column for each of the possible tables and name it correctly like : MyTable1Key, MyTable2Key and let them be foreign keys. But the problem with that is that if one of the foreign keys is assigned then the other ones will be null (And it will never be assigned so it will always stay null).
So do i have to let this parentKey column like it is or should i split it to different columns linked to tables by foreign keys and so have null values for some columns?
Unless you have a good reason, do not combine multiple foreign keys into a single column. As you've already noted it removes the referential integrity of your foreign key.
Either you will risk having a key which could belong to two tables or you have a master table somewhere that you should use as your foreign key reference. It is possible to have a primary key as a foreign key.
It sounds like you may be looking at the supertype-subtype pattern in which case this question might give you some good ideas. How do I apply subtypes into an SQL Server database?

Do we need to explicitly mention which column is foreign key column?

When we create relational database tables, we have to use foreign key columns. It is obvious, otherwise we can not create relationships.
However, I noticed that it is enough to have a foreign key column, you do not need to say that there is a foreign key relationship in table A with table B.
As long as you can write the queries you can retrieve the data.
Do we use this concept for make thing easy? I know, when I look at a database table schema which has marked what columns are foreign key columns, it is easy to understand and start to work with it.
Is there any other reasons?
The point is Referential integrity. If you don't enforce it, sooner or later a bug in the code or some other accident happens and your database is left in an inconsistent state. These inconsistencies are very hard or impossible to fix afterwards.
When we create relational database tables, we have to use foreign key
columns. It is obvious, otherwise we can not create relationships.
Incorrect. You do not need to create foreign keys (though it's a good idea), and they do not represent relationships. They enforce the integrity of the relationship. A foreign key makes sure that a value in one column exists in another column.
However, I noticed that it is enough to have a foreign key column, you
do not need to say that there is a foreign key relationship in table A
with table B. As long as you can write the queries you can retrieve the data.
Yes, the relationship is based on the data itself, not by the inclusion of a foreign key. Also, foreign keys do not need to be between two tables, a table can have a foreign key to itself.
Do we use this concept for make thing easy?
No, we use foreign keys to enforce integrity. That they happen to make ERD diagrams easier to understand is simply a bonus.

Do databases use foreign keys transparently?

Do database engines utilize foreign keys transparently or a query should explicitly use them?
Based on my experience there is no explicit notion of foreign keys on a table, except that a constraint that maintains uniqueness of the key and the fact that the key (single or a group of fields) is a key which makes search efficient.
To clarify this, here is an example why it is important: I have a middleware (in particular ArcGIS for my case), for which I can control the back-end database (so I can create keys, indices, etc.) and I usually use the front (a RESTful API here). The middleware itself is a black box and to provide effective tools to take advantage of the underlying DBMS's capabilities. So what I want to understand is that if I build foreign key constraints and use queries that if implemented normally would translate into queries that would use those foreign keys, should I see performance improvements?
Is that generally the case or various engines do it differently? (I am using PostgresSQL).
Foreign keys aren't there to improve performance. They're there to enforce data integrity. They will decrease performance for inserts/updates/deletes, but they make no difference to queries.
Some DBMSs will automatically add an index to the foreign key field, which may be where the confusion is coming from. Postgres does not do this; you'll need to create the index yourself. (And yes, the database will use this index transparently.)
As far as I know Database engines needs specific queries to use foreign keys. You have to write some sort of join queries to get data from related tables.
However some Data access framework hides the complexity of accessing data from foreign keys by providing transparent way of accessing data from related tables but I am not sure that may provide much improvement in performance.
This is completely depends on the database engine.
In PostgreSQL constraints won't cause performance improvements directly, only indexes will do that.
CREATE INDEX is a PostgreSQL language extension. There are no provisions for indexes in the SQL standard.
However, adding some constraints will automatically create an index for that column(s) -- f.ex. UNIQUE & PRIMARY KEY constraints creates a btree index on the affected column(s).
The FOREIGN KEY constraint won't create indexes on the referencing column(s), but:
A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Does a foreign key automatically create an index?

I've been told that if I foreign key two tables, that SQL Server will create something akin to an index in the child table. I have a hard time believing this to be true, but can't find much out there related specifically to this.
My real reason for asking this is because we're experiencing some very slow response time in a delete statement against a table that has probably 15 related tables. I've asked our database guy and he says that if there is a foreign key on the fields, then it acts like an index. What is your experience with this? Should I add indexes on all foreign key fields or are they just unnecessary overhead?
A foreign key is a constraint, a relationship between two tables - that has nothing to do with an index per se.
However, it makes a lot of sense to index all the columns that are part of any foreign key relationship. An FK-relationship will often need to look up a relating table and extract certain rows based on a single value or a range of values.
So it makes good sense to index any columns involved in an FK, but an FK per se is not an index.
Check out Kimberly Tripp's excellent article "When did SQL Server stop putting indexes on Foreign Key columns?".
Wow, the answers are all over the map. So the Documentation says:
A FOREIGN KEY constraint is a candidate for an index because:
Changes to PRIMARY KEY constraints are checked with FOREIGN KEY constraints in related tables.
Foreign key columns are often used in join criteria when the data from related tables is combined in queries by matching the column(s) in the FOREIGN KEY constraint of one table with the primary or unique key column(s) in the other table. An index allows Microsoft® SQL Server™ 2000 to find related data in the foreign key table quickly. However, creating this index is not a requirement. Data from two related tables can be combined even if no PRIMARY KEY or FOREIGN KEY constraints are defined between the tables, but a foreign key relationship between two tables indicates that the two tables have been optimized to be combined in a query that uses the keys as its criteria.
So it seems pretty clear (although the documentation is a bit muddled) that it does not in fact create an index.
No, there is no implicit index on foreign key fields, otherwise why would Microsoft say "Creating an index on a foreign key is often useful". Your colleague may be confusing the foreign key field in the referring table with the primary key in the referred-to table - primary keys do create an implicit index.
Foreign keys do not create indexes. Only alternate key constraints(UNIQUE) and primary key constraints create indexes. This is true in Oracle and SQL Server.
In PostgeSql you can check for indexes yourself if you hit \d tablename
You will see that btree indexes have been automatically created on columns with primary key and unique constraints, but not on columns with foreign keys.
I think that answers your question at least for postgres.
Say you have a big table called orders, and a small table called customers. There is a foreign key from an order to a customer. Now if you delete a customer, Sql Server must check that there are no orphan orders; if there are, it raises an error.
To check if there are any orders, Sql Server has to search the big orders table. Now if there is an index, the search will be fast; if there is not, the search will be slow.
So in this case, the slow delete could be explained by the absence of an index. Especially if Sql Server would have to search 15 big tables without an index.
P.S. If the foreign key has ON DELETE CASCADE, Sql Server still has to search the order table, but then to remove any orders that reference the deleted customer.
SQL Server autocreates indices for Primary Keys, but not for Foreign Keys. Create the index for the Foreign Keys. It's probably worth the overhead.
It depends. On MySQL an index is created if you don't create it on your own:
MySQL requires that foreign key columns be indexed; if you create a table with a foreign key constraint but no index on a given column, an index is created.
Source: https://dev.mysql.com/doc/refman/8.0/en/constraint-foreign-key.html
The same for MySQL 5.6 eh.
Strictly speaking, foreign keys have absolutely nothing to do with indexes, yes. But, as the speakers above me pointed out, it makes sense to create one to speed up the FK-lookups. In fact, in MySQL, if you don't specify an index in your FK declaration, the engine (InnoDB) creates it for you automatically.
Not to my knowledge. A foreign key only adds a constraint that the value in the child key also be represented somewhere in the parent column. It's not telling the database that the child key also needs to be indexed, only constrained.
I notice that Entity Framework 6.1 pointed at MSSQL does automatically add indexes on foreign keys.

Primary Key versus Unique Constraint?

I'm currently designing a brand new database. In school, we always learned to put a primary key in each table.
I read a lot of articles/discussions/newsgroups posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK.
What's your point of view?
A Primary Key is really just a candidate key that does not allow for NULL. As such, in SQL terms - it's no different than any other unique key.
However, for our non-theoretical RDBMS's, you should have a Primary Key - I've never heard it argued otherwise. If that Primary Key is a surrogate key, then you should also have unique constraints on the natural key(s).
The important bit to walk away with is that you should have unique constraints on all the candidate (whether natural or surrogate) keys. You should then pick the one that is easiest to reference in a Foreign Key to be your Primary Key*.
You should also have a clustered index*. this could be your Primary Key, or a natural key - but it's not required to be either. You should pick your clustered index based on query usage of the table. When in doubt, the Primary Key is not a bad first choice.
Though it's technically only required to refer to a unique key in a foreign key relationship, it's accepted standard practice to greatly favor the primary key. In fact, I wouldn't be surprised if some RDBMS only allow primary key references.
Edit: It's been pointed out that Oracle's term of "clustered table" and "clustered index" are different than Sql Server. The equivalent of what I'm speaking of in Oracle-ese is an Index Ordered Table and it is recommended for OLTP tables - which, I think, would be the main focus of SO questions. I assume if you're responsible for a large OLAP data warehouse, you should already have your own opinions on database design and optimization.
Can you provide references to these articles?
I see no reason to change the tried and true methods. After all, Primary Keys are a fundamental design feature of relational databases.
Using UNIQUE to serve the same purpose sounds really hackish to me. What is their rationale?
Edit: My attention just got drawn back to this old answer. Perhaps the discussion that you read regarding PK vs. UNIQUE dealt with people making something a PK for the sole purpose of enforcing uniqueness on it. The answer to this is, If it IS a key, then make it key, otherwise make it UNIQUE.
A primary key is just a candidate key (unique constraint) singled out for special treatment (automatic creation of indexes, etc).
I expect that the folks who argue against them see no reason to treat one key differently than another. That's where I stand.
[Edit] Apparently I can't comment even on my own answer without 50 points.
#chris: I don't think there's any harm. "Primary Key" is really just syntactic sugar. I use them all the time, but I certainly don't think they're required. A unique key is required, yes, but not necessarily a Primary Key.
It would be very rare denormalization that would make you want to have a table without a primary key. Primary keys have unique constraints automatically just by their nature as the PK.
A unique constraint would be used when you want to guarantee uniqueness in a column in ADDITION to the primary key.
The rule of always have a PK is a good one.
http://msdn.microsoft.com/en-us/library/ms191166.aspx
You should always have a primary key.
However I suspect your question is just worded bit misleading, and you actually mean to ask if the primary key should always be an automatically generated number (also known as surrogate key), or some unique field which is actual meaningful data (also known as natural key), like SSN for people, ISBN for books and so on.
This question is an age old religious war in the DB field.
My take is that natural keys are preferable if they indeed are unique and never change. However, you should be careful, even something seemingly stable like a persons SSN may change under certain circumstances.
Unless the table is a temporary table to stage the data while you work on it, you always want to put a primary key on the table and here's why:
1 - a unique constraint can allow nulls but a primary key never allows nulls. If you run a query with a join on columns with null values you eliminate those rows from the resulting data set because null is not equal to null. This is how even big companies can make accounting errors and have to restate their profits. Their queries didn't show certain rows that should have been included in the total because there were null values in some of the columns of their unique index. Shoulda used a primary key.
2 - a unique index will automatically be placed on the primary key, so you don't have to create one.
3 - most database engines will automatically put a clustered index on the primary key, making queries faster because the rows are stored contiguously in the data blocks. (This can be altered to place the clustered index on a different index if that would speed up the queries.) If a table doesn't have a clustered index, the rows won't be stored contiguously in the data blocks, making the queries slower because the read/write head has to travel all over the disk to pick up the data.
4 - many front end development environments require a primary key in order to update the table or make deletions.
Primary keys should be used in situations where you will be establishing relationships from this table to other tables that will reference this value. However, depending on the nature of the table and the data that you're thinking of applying the unique constraint to, you may be able to use that particular field as a natural primary key rather than having to establish a surrogate key. Of course, surrogate vs natural keys are a whole other discussion. :)
Unique keys can be used if there will be no relationship established between this table and other tables. For example, a table that contains a list of valid email addresses that will be compared against before inserting a new user record or some such. Or unique keys can be used when you have values in a table that has a primary key but must also be absolutely unique. For example, if you have a users table that has a user name. You wouldn't want to use the user name as the primary key, but it must also be unique in order for it to be used for log in purposes.
We need to make a distinction here between logical constructs and physical constructs, and similarly between theory and practice.
To begin with: from a theoretical perspective, if you don't have a primary key, you don't have a table. It's just that simple. So, your question isn't whether your table should have a primary key (of course it should) but how you label it within your RDBMS.
At the physical level, most RDBMSs implement the Primary Key constraint as a Unique Index. If your chosen RDBMS is one of these, there's probably not much practical difference, between designating a column as a Primary Key and simply putting a unique constraint on the column. However: one of these options captures your intent, and the other doesn't. So, the decision is a no-brainer.
Furthermore, some RDBMSs make additional features available if Primary Keys are properly labelled, such as diagramming, and semi-automated foreign-key-constraint support.
Anyone who tells you to use Unique Constraints instead of Primary Keys as a general rule should provide a pretty damned good reason.
the thing is that a primary key can be one or more columns which uniquely identify a single record of a table, where a Unique Constraint is just a constraint on a field which allows only a single instance of any given data element in a table.
PERSONALLY, I use either GUID or auto-incrementing BIGINTS (Identity Insert for SQL SERVER) for unique keys utilized for cross referencing amongst my tables. Then I'll use other data to allow the user to select specific records.
For example, I'll have a list of employees, and have a GUID attached to every record that I use behind the scenes, but when the user selects an employee, they're selecting them based off of the following fields: LastName + FirstName + EmployeeNumber.
My primary key in this scenario is LastName + FirstName + EmployeeNumber while unique key is the associated GUID.
posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK
i guess that the only point here is the same old discussion "natural vs surrogate keys", because unique indexes and pk´s are the same thing.
translating:
posts saying that it's better to use natural key instead of surrogate key
I usually use both PK and UNIQUE KEY. Because even if you don't denote PK in your schema, one is always generated for you internally. It's true both for SQL Server 2005 and MySQL 5.
But I don't use the PK column in my SQLs. It is for management purposes like DELETEing some erroneous rows, finding out gaps between PK values if it's set to AUTO INCREMENT. And, it makes sense to have a PK as numbers, not a set of columns or char arrays.
I've written a lot on this subject: if you read anything of mine be clear that I was probably referring specifically to Jet a.k.a. MS Access.
In Jet, the tables are physically ordered on the PRIMARY KEY using a non-maintained clustered index (is clustered on compact). If the table has no PK but does have candidate keys defined using UNIQUE constraints on NOT NULL columns then the engine will pick one for the clustered index (if your table has no clustered index then it is called a heap, arguably not a table at all!) How does the engine pick a candidate key? Can it pick one which includes nullable columns? I really don't know. The point is that in Jet the only explicit way of specifying the clustered index to the engine is to use PRIMARY KEY. There are of course other uses for the PK in Jet e.g. it will be used as the key if one is omitted from a FOREIGN KEY declaration in SQL DDL but again why not be explicit.
The trouble with Jet is that most people who create tables are unaware of or unconcerned about clustered indexes. In fact, most users (I wager) put an autoincrement Autonumber column on every table and define the PRIMARY KEY solely on this column while failing to put any unique constraints on the natural key and candidate keys (whether an autoincrement column can actually be regarded as a key without exposing it to end users is another discussion in itself). I won't go into detail about clustered indexes here but suffice to say that IMO a sole autoincrement column is rarely to ideal choice.
Whatever you SQL engine, the choice of PRIMARY KEY is arbitrary and engine specific. Usually the engine will apply special meaning to the PK, therefore you should find out what it is and use it to your advantage. I encourage people to use NOT NULL UNIQUE constraints in the hope they will give greater consideration to all candidate keys, especially when they have chosen to use 'autonumber' columns which (should) have no meaning in the data model. But I'd rather folk choose one well considered key and used PRIMARY KEY rather than putting it on the autoincrement column out of habit.
Should all tables have a PK? I say yes because doing otherwise means at the very least you are missing out on a slight advantage the engine affords the PK and at worst you have no data integrity.
BTW Chris OC makes a good point here about temporal tables, which require sequenced primary keys (lowercase) which cannot be implemented via simple PRIMARY KEY constraints (SQL key words in uppercase).
PRIMARY KEY
1. Null
It doesn’t allow Null values. Because of this we refer PRIMARY KEY =
UNIQUE KEY + Not Null CONSTRAINT.
2. INDEX
By default it adds a clustered index.
3. LIMIT
A table can have only one PRIMARY KEY Column[s].
UNIQUE KEY
1. Null
Allows Null value. But only one Null value.
2. INDEX
By default it adds a UNIQUE non-clustered index.
3. LIMIT
A table can have more than one UNIQUE Key Column[s].
If you plan on using LINQ-to-SQL, your tables will require Primary Keys if you plan on performing updates, and they will require a timestamp column if you plan on working in a disconnected environment (such as passing an object through a WCF service application).
If you like .NET, PK's and FK's are your friends.
I submit that you may need both. Primary keys by nature need to be unique and not nullable. They are often surrogate keys as integers create faster joins than character fileds and especially than multiple field character joins. However, as these are often autogenerated, they do not guarantee uniqueness of the data record excluding the id itself. If your table has a natural key that should be unique, you should have a unique index on it to prevent data entry of duplicates. This is a basic data integrity requirement.
Edited to add: It is also a real problem that real world data often does not have a natural key that truly guarantees uniqueness in a normalized table structure, especially if the database is people centered. Names, even name, address and phone number combined (think father and son in the same medical practice) are not necessarily unique.
I was thinking of this problem my self. If you are using unique, you will hurt the 2. NF. According to this every non-pk-attribute has to be depending on the PK. The pair of attributes in this unique constraint are to be considered as part of the PK.
sorry for replying to this 7 years later but didn't want to start a new discussion.

Resources