How should I design this table? - database

Because I have a poor memory, I want to write a simple application to store table column information, especially the meaning of table columns. Now I have a problem with my table design. I plan to make the table have the following columns:
id, table_name, column_name, data_type, is_pk, meaning
But this design can’t express the foreign key relationship. For example: table1.column1 and table2.column3 and table8.column5 all have the same data type and the same meaning, how can I modify my table design to express this information(relationship)?
Great thanks!
PS:
In fact, recently I'm working on a legacy application. The database is poorly designed. The foreign key relationship is not expressed on the database layer but on the application layer. Now my boss not allow us to modify the database. We just need to make the application working. So I can't do some work on the database directly.

Depending on your DBMS, you could probably use comments on the table / column to record the meaning of each one of those columns. Most DBMS allow you to perform some kind of annotation.
If you must have it in your table you have a few choices.
Free text If this is just to serve as a memory aid, it doesn't really need to be machine readable. This makes it easier for you to read / use directly.
fk_id Store the ID of the field this foreign key maps into. You could then define a view that pulls in the meaning column from this foreign key.
Meaning Table Store meaning as an ID into a seperate table and use a view to make it easier to work with.
Create a document Keep it in a document instead. That way you can print it out and have it handy.
You could try designing a full de-normalized schema for this, but I'd argue thats seriously over-thinking something that's just meant as a memory aid.

I would just add a column to your design "FK_Column_ID" that will hold a reference to column ID in case of a FK constraint.
The other way will be to create a duplicate of your DB as DBDefinitions or something like that.

Almost all DBMS allow you to attach descriptions or comments to table, index, and column definitions.
Oracle:
COMMENT ON COLUMN employees.job_id IS 'abbreviated job title';
If you specify foreign key relationships as part of the schema, the database will keep track of them, and will display them for you.

It is not possible to define a compound foreign key relationship with a single additional column. I would suggest that you create a second table to define the foreign keys, perhaps with the following columns:
id, fk_name, primary_table_id, foreign_table_id
and add a fk_id column to relate the fields used in the foreign key relationship. This works for both the single column foreign key and the compound foreign key.
Alternatively, and with some attempt at diplomacy, tell your boss that if you can't fix the root cause of an issue, then the time required to complete the project will be much longer than expected. First you will take some time to implement a work around which will not perform adequately, then you will take more time to implement the fix you should have implemented in the first place (which in this case is fixing the database.)

If you're not allowed to edit the database then presumably you're creating this in another standalone DBMS. I don't think is something you can acheive simply and you may well be better of just writing it up in a text document.
I think that you need more than one table. If you create a table of tables:
id, table_name, meaning
And then a table of columns:
id, column_name, datatype, meaning
You can then create a link table:
table_id, column_id, is_pk, meaning
This will enable you to have the same column linked to more than one table - thus expressing your foreign keys. As I said above though - it may be more effort than its worth.

FWIW, I do this quite often and the best "simple application" I've found is a spreadsheet.
I use a page for table/column defs, and extra pages as I need them for things like FK relationships, lookup values etc.
One of the great things about a spreadsheet for this app, is adding columns to the sheet as you need them, & removing them when you don't.
The indexing ability of a spreadsheet is also v. useful when you have a large number of tables to work with.

I know this does not answer your question directly, but how about using a database diagram?
I also have a poor memory (age I guess) and I always have an up to date diagram on my wall.
You can show all the tables, fields and foreign keys and also add comments.
I use the PowerAMC (aka PowerDesigner from Sybase) database designer, it also generates the SQL script to create the database, perhaps not very useful for legacy databases, although it will reverse engineer the database and create the diagram automatically (it can take some time to make the diagram readable).

I don't see a reason why you should implement some app to store some info there. You can as well use smth like OneNote or any other available organizer, development wiki, etc.: there are tons of ways to store info in such a way that it comes handy when you look up for it in future.
If you can make some inner changes, you can change keys' constraints names to readable pattern, like table1_colName_table2_colName.
And at least you can make some diagram, whether hand-made or using some design application.
If all this doesn't solve your problem, some more details are needed on what exactly you need to solve :)

Related

The foreign key usage/extra table

I recently started looking into Database design. I have worked with Oracle, but would now like to create logical or conceptual design relationships first, before I implement them into the database. Learning the basics you could say.
I would like to create a database for cars. I have some tables, but am having trouble with the relationships, and when to use a foreign key/extra table.
I have created a car table, and added attributes. Now it is very clear to me to use a manufacturer foreign key in the car table referencing the manufacturer table.
But for example I would like to show what type(SUV, sedan, etc.) the car is. Furthermore I would like to show What class(normal, Upperclass, etc.) the car is. Since I will only differentiate between a maximum of 5 car types, do I still need to add a foreign key? Same goes for the Class Situation as well.
I have heard to always use a foreign key, because it safeguards the integrity of the database, but at University my teacher always told us to use the Minimum amount of tables as possible, therefore putting me in an awkward spot.
What should I do?
I would greatly appreciate clarification in this matter.
Even in toy systems, you should use foreign keys and extra tables; learning this habit will serve you well when the database gets larger, or when you start designing real databases.
I imagine that the remark about "minimum amount of tables" (should be "minimum number") is to prevent you creating tables containing two values (e.g. "yes" and "no", "present" and "not present"), again a good habit to learn. It could also refer to queries, where you do not want to join unnecessary tables; this is liable to lead to the query retrieving duplicate rows.
Car type and car class are examples of what is called an enumerated domain. A domain is the set of values that an attribute may take on. In Oracle terms it's the set of values that may be placed in a given column. Oracle, unlike some other SQL dialects, does not have a CREATE DOMAIN statement. You may be able to use CREATE TYPE instead but I can't tell you how.
The simplest solution for you may be to just create a lookup table, as you suggest. Don't worry too much about added complexity here.
It is easier to manage 100 tables than 100,000 lines of code. That's an old slogan from the dba world, but it has some truth to it.

Database normalization - How not OK is it to have a table with no relationships?

I'm really new to database design, as I will now demonstrate:
I have an MS Sql database that I need to add a table to. The table contains information that pertains to another table. However, there are no candidates for primary keys (all fields can be duplicates). The only thing the table will ever be used for is to keep records that may be required for a certain kind of query, and they can be retrieved super-easily using a field that my other tables also contain (but never uniquely).
Specifically, my main table has a bunch of chemistry records. Each chemistry record is associated with another set of records called quality-control records (in my second table). They are associated by a field called "BatchID". The super-easy part is that I can say, "get all records with this BatchID" and get exactly what I need. But there can be multiple instances of any BatchID in both tables (in fact, there usually are), so I'd need to jump through hoops to link them. In a more general sense, in theory, is it OK to have a table floating around not attached to anything?
The overwhelmingly simple solution is to just put the quality control in the db with no relationships to the chemistry table. I'd need to insert at least one other table to relate it to anything else, maybe more, and the only reason for complicating my life like that is that I don't want to violate some important precept of database design.
My question is, is it ever OK to just have a free-floating table in a database? Or is that right out?
Thanks for any help.
In theory, it's ok to have a table that doesn't have any foreign key constraints. But the table you describe (both tables you describe) should probably have a foreign key that references the table of batches. We'd expect the table of batches to have "BatchID" as its primary key.
The relational model requires tables to have at least one candidate key. It's almost always a bad idea to have a SQL table that doesn't have a candidate key.

Foreign Key Referencing Multiple Tables

I have a column with a uniqueidentifier that can potentially reference one of four different tables. I have seen this done in two ways, but both seem like bad practice.
First, I've seen a single ObjectID column without explicitly declaring it as a foreign key to a specific table. Then you can just shove any uniqueidentifier you want in it. This means you could potentially insert IDs from tables that are not part of the 4 tables I wanted.
Second, because the data can come from four different tables, I've also seen people make 4 different foreign keys. And in doing so, the system relies on ONE AND ONLY ONE column having a non-NULL value.
What's a better approach to doing this? For example, records in my table could potentially reference Hospitals(ID), Clinics(ID), Schools(ID), or Universities(ID)... but ONLY those tables.
Thanks!
You might want to consider a Type/SubType data model. This is very much like class/subclasses in object oriented programming, but much more awkward to implement, and no RDBMS (that I am aware of) natively supports them. The general idea is:
You define a Type (Building), create a table for it, give it a primary key
You define two or more sub-types (here, Hospital, Clinic, School, University), create tables for each of them, make primary keys… but the primary keys are also foreign keys that reference the Building table
Your table with one “ObjectType” column can now be built with a foreign key onto the Building table. You’d have to join a few tables to determine what kind of building it is, but you’d have to do that anyway. That, or store redundant data.
You have noticed the problem with this model, right? What’s to keep a Building from having entries in in two or more of the subtype tables? Glad you asked:
Add a column, perhaps “BuildingType”, to Building, say char(1) with allowed values of {H, C, S, U} indicating (duh) type of building.
Build a unique constraint on BuildingID + BuildingType
Have the BulidingType column in the subtables. Put a check constraint on it so that it can only ever be set to the value (H for the Hospitals table, etc.) In theory, this could be a computed column; in practice, this won't work because of the following step:
Build the foreign key to relate the tables using both columns
Voila: Given a BUILDING row set with type H, an entry in the SCHOOL table (with type S) cannot be set to reference that Building
You will recall that I did say it was hard to implement.
In fact, the big question is: Is this worth doing? If it makes sense to implement the four (or more, as time passes) building types as type/subtype (further normalization advantages: one place for address and other attributes common to every building, with building-specific attributes stored in the subtables), it may well be worth the extra effort to build and maintain. If not, then you’re back to square one: a logical model that is hard to implement in the average modern-day RDBMS.
Let's start at the conceptual level. If we think of Hospitals, Clinics, Schools, and Universities as classes of subject matter entities, is there a superclass that generalizes all of them? There probably is. I'm not going to try to tell you what it is, because I don't understand your subject matter as well as you do. But I'm going to proceed as if we can call all of them "Institutions", and treat each of the four as subclasses of Institutions.
As other responders have noted, class/subclass extension and inheritance are not built into most relational database systems. But there is plenty of assistance, if you know the right buzzwords. What follows is intended to teach you the buzzwords, in database lingo. Here is a summary of the buzzwords coming: "ER Generalization", "ER Specialization", "Single Table Inheritance", "Class Table Inheritance", "Shared Primary Key".
Staying at the conceptual level, ER modeling is a good way of understanding the data at a conceptual level. In ER modeling, there is a concept, "ER Generalization", and a counterpart concept "ER Specialization" that parallel the thought process I just presented above as "superclass/subclass". ER Specialization tells you how to diagram subclasses, but it doesn't tell you how to implement them.
Next we move down from the conceptual level to the logical level. We express the data in terms of relations or, if you will, SQL tables. There are a couple of techniques for implementing subclasses. One is called "Single Table Inheritance". The other is called "Class Table Inheritance". In connection with Class table inheritance, there is another technique that goes by the name "Shared primary Key".
Going forward in your case with class table inheritance, we first design a table called "Institutions", with an Id field, a name field, and all of the fields that pertain to institutions, no matter which of the four kinds they are. Things like mailing address fields, for instance. Again, you understand your data better than I do, and you can find fields that are in all four of your existing tables. We populate the id field in the usual way.
Next we design four tables called "Hospitals", "Clinics", "Schools", and "Universities". These will contain an id field, plus all of the data fields that pertain only to that kind of institution. For instance, a hospital might have a "bed capacity". Again, you understand your data better than I do, and you can figure these out from the fields in your existing tables that didn't make it into the Institutions table.
This is where "shared primary key" comes in. When a new entry is made into "Institutions", we have to make a new parallel entry into one of four specialized subclass tables. But we don't use some sort of autonumber feature to populate the id field. Instead, we put a copy of the id field from the "Institutions" table into the id field of the subclass table.
This is a little work, but the benefits are well worth the effort. Shared primary key enforces the one-to-one nature of the relationship between subclass entries and superclass entries. It makes joining superclass data and subclass data simple, easy, and fast. It eliminates the need for a special field to tell you which subclass a given institution belongs in.
And, in your case, it provides a handy answer to your original question. The foreign key you were originally asking about is now always a foreign key to the Institutions table. And, because of the magic of shared-primary-key, the foreign key also references the entry in the appropriate subclass table, with no extra work.
You can create four views that combine institution data with each of the four subclass tables, for convenience.
Look up "ER Specialization", "Class Table Inheritance", "Shared Primary Key", and maybe "Single Table Inheritance" on the web, and here in SO. There are tags for most of these concepts or techniques here in SO.
You could put a trigger on the table and enforce the referential integrity there. I don't think there's a really good out-of-the-box feature to implement this requirement.

How can I create a foreign key on a column, every record of which may refer to a column in one of several tables?

I'm in the process of creating a social network. It has several entities like news, photo, which can have comments. Since all comments have the same columns and behave the same way, and the only difference is their type — news, or photo, or something else to be added in the future — I decided to create one table for all comments with a column named type. It worked perfectly until I decided to add foreign keys to my database schema.
The comment table have a column parent, which refers to id of news or photo table, depending on the column type.
The problem is, I can't add a foreign key which refers to the unknown in advance table, and even more, which refers to several tables at once.
The whole database now uses foreign keys, except this one parent column in the comment table. It bothers me because it's the only place where I can't add a foreign key.
I'm sure I can't create such a foreign key; something in my database design needs to be changed. I decided to create one table for comments to be ready to add new comment types for new entities in the future — video, music, article, etc — and don't run into maintenance hell when I want to add one new column for all comments.
If I absolutely have to create a separate table for each comment type to be able to use foreign keys fully, I'll do that. But maybe another common solution to this problem already exists, and I'm just not aware of it?
Maybe I should create some sort of link table, which links the comment table with other entities' tables? But maybe this solution is even more complex than creating a separate table for each comment type?
Maybe I should have several columns in the comment table, like newsId, photoId, to which I can add foreign key?
These solutions just don't seem elegant to me, or I just misunderstand something. My whole perception of this issue might be plain wrong. That's why I'm here. Please share your ideas.
I think your problem is that you have several entities - news, photos. But these are all just types of (say) items. Like comments,items will probably have some attributes in common as well as some distinct attributes. One of those attributes will be the ability to be commented upon.
In this approach you have a table CommentableItems (1), with the common attributes. Then you have some sub-tables NewsItems, PhotoItems, etc. It is quite easy to set-up the keys for these tables to enforce the required one-to-one relationship. Obviously, Comments has a foreign key which references CommentableItems.
(1) Actually I would probably shoot myself rather than allow a table called something as ghastly as CommentableItems into my schema, but this is just for the sake of example.

Reasons not to use an auto-incrementing number for a primary key

I'm currently working on someone else's database where the primary keys are generated via a lookup table which contains a list of table names and the last primary key used. A stored procedure increments this value and checks it is unique before returning it to the calling 'insert' SP.
What are the benefits for using a method like this (or just generating a GUID) instead of just using the Identity/Auto-number?
I'm not talking about primary keys that actually 'mean' something like ISBNs or product codes, just the unique identifiers.
Thanks.
An auto generated ID can cause problems in situations where you are using replication (as I'm sure the techniques you've found can!). In these cases, I generally opt for a GUID.
If you are not likely to use replication, then an auto-incrementing PK will most likely work just fine.
There's nothing inherently wrong with using AutoNumber, but there are a few reasons not to do it. Still, rolling your own solution isn't the best idea, as dacracot mentioned. Let me explain.
The first reason not to use AutoNumber on each table is you may end up merging records from multiple tables. Say you have a Sales Order table and some other kind of order table, and you decide to pull out some common data and use multiple table inheritance. It's nice to have primary keys that are globally unique. This is similar to what bobwienholt said about merging databases, but it can happen within a database.
Second, other databases don't use this paradigm, and other paradigms such as Oracle's sequences are way better. Fortunately, it's possible to mimic Oracle sequences using SQL Server. One way to do this is to create a single AutoNumber table for your entire database, called MainSequence, or whatever. No other table in the database will use autonumber, but anyone that needs a primary key generated automatically will use MainSequence to get it. This way, you get all of the built in performance, locking, thread-safety, etc. that dacracot was talking about without having to build it yourself.
Another option is using GUIDs for primary keys, but I don't recommend that because even if you are sure a human (even a developer) is never going to read them, someone probably will, and it's hard. And more importantly, things implicitly cast to ints very easily in T-SQL but can have a lot of trouble implicitly casting to a GUID. Basically, they are inconvenient.
In building a new system, I'd recommend using a dedicated table for primary key generation (just like Oracle sequences). For an existing database, I wouldn't go out of my way to change it.
from CodingHorror:
GUID Pros
Unique across every table, every database, every server
Allows easy merging of records from different databases
Allows easy distribution of databases across multiple servers
You can generate IDs anywhere, instead of having to roundtrip to the database
Most replication scenarios require GUID columns anyway
GUID Cons
It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes
The article provides a lot of good external links on making the decision on GUID vs. Auto Increment. If I can, I go with GUID.
It's useful for clients to be able to pre-allocate a whole bunch of IDs to do a bulk insert without having to then update their local objects with the inserted IDs. Then there's the whole replication issue, as mentioned by Galwegian.
The procedure method of incrementing must be thread safe. If not, you may not get unique numbers. Also, it must be fast, otherwise it will be an application bottleneck. The built in functions have already taken these two factors into account.
My main issue with auto-incrementing keys is that they lack any meaning
That's a requirement of a primary key, in my mind -- to have no other reason to exist other than identifying a record. If it has no real-world meaning, then it has no real-world reason to change. You don't want primary keys to change, generally speaking, because you have to search-replace your whole database or worse. I have been surprised at the sorts of things I have assumed would be unique and unchanging that have not turned out to be years later.
Here's the thing with auto incrementing integers as keys:
You HAVE to have posted the record before you get access to it. That means that until you have posted the record, you cannot, for example, prepare related records that will be stored in another table, or any one of a lot of other possible reasons why it might be useful to have access to the new record's unique id, before posting it.
The above is my deciding factor, whether to go with one method, or the other.
Using a unique identifiers would allow you to merge data from two different databases.
Maybe you have an application that collects data in multiple database and then "syncs" with a master database at various times in the day. You wouldn't have to worry about primary key collisions in this scenario.
Or, possibly, you might want to know what a record's ID will be before you actually create it.
One benefit is that it can allow the database/SQL to be more cross-platform. The SQL can be exactly the same on SQL Server, Oracle, etc...
The only reason I can think of is that the code was written before sequences were invented and the code forgot to catch up ;)
I would prefer to use a GUID for most of the scenarios in which the post's current method makes any sense to me (replication being a possible one). If replication was the issue, such a stored procedure would have to be aware of the other server which would have to be linked to ensure key uniqueness, which would make it very brittle and probably a poor way of doing this.
One situation where I use integer primary keys that are NOT auto-incrementing identities is the case of rarely-changed lookup tables that enforce foreign key constraints, that will have a corresponding enum in the data-consuming application. In that scenario, I want to ensure the enum mapping will be correct between development and deployment, especially if there will be multiple prod servers.
Another potential reason is that you deliberately want random keys. This can be desirable if, say, you don't want nosey browsers leafing through every item you have in the database, but it's not critical enough to warrant actual authentication security measures.
My main issue with auto-incrementing keys is that they lack any meaning.
For tables where certain fields provide uniqueness (whether alone or in combination with another), I'd opt for using that instead.
A useful side benefit of using a GUID primary key instead of an auto-incrementing one is that you can assign the PK value for a new row on the client side (in fact you have to do this in a replication scenario), sparing you the hassle of retrieving the PK of the row you just added on the server.
One of the downsides of a GUID PK is that joins on a GUID field are slower (unless this has changed recently). Another upside of using GUIDs is that it's fun to try and explain to a non-technical manager why a GUID collision is rather unlikely.
Galwegian's answer is not necessarily true.
With MySQL you can set a key offset for each database instance. If you combine this with a large enough increment it will for fine. I'm sure other vendors would have some sort of similar settings.
Lets say we have 2 databases we want to replicate. We can set it up in the following way.
increment = 2
db1 - offset = 1
db2 - offset = 2
This means that
db1 will have keys 1, 3, 5, 7....
db2 will have keys 2, 4, 6, 8....
Therefore we will not have key clashes on inserts.
The only real reason to do this is to be database agnostic (if different db versions use different auto-numbering techniques).
The other issue mentioned here is the ability to create records in multiple places (like in the central office as well as on traveling users' laptops). In that case, though, you would probably need something like a "sitecode" that was unique to each install that was prefixed to each ID.

Resources