I am writing a project management web app just for practice. The basic idea is that a user can add a project to the app and then manage their tasks and appointments related to the project through the interface. I'm currently designing the Database and I was wondering what best practice would dictate here.
I have 4 tables so far:
+----------+ +-------------+ +--------------+ +-------------+
|Users | |Projects | |Tasks | |Appointments |
+----------+ +-------------+ +--------------+ +-------------+
|id | |id | |id | |id |
|username | |project_name | |task_name | |appt_name |
|fname | |project_desc | |task_details | |appt_details |
|sname | | | |task_deadline | |appt_date |
+----------+ +-------------+ +--------------+ +-------------+
I'm taking the basic relationships as:
one user can have many projects,tasks, and appointments.
one project can have many users, tasks and appointments.
one task can have many users, but only be associated with one project. A task can't be associated with an appointment.
The rules for the tasks also apply to the appointments.
My question is: when is it suitable to use mapping tables and when is it suitable to include the data directly in the associated table? My take on my example would be:
have a mapping table for each of users-projects/tasks/appts because there can be many users for each type and many of each type per user
in the tasks and appointments tables include a project_id field that can be used to associate tasks and appointments with projects and thereby the users of that project.
Would this be the correct approach or is there a better solution? I'm fairly new to database design so I would really appreciate some constructive criticism
I'm currently designing the Database and I was wondering what best practice would dictate here
Best Practice dictates that the data must be modelled, as data, without regard to the use or the app. Without regard to the platform as well, but the world is upside-down and backwards these days, the platform is chosen first.
Modelling means that you identify and consider the entities first, before you consider what you are going to do with them second (such as "mapping").
No Option
My question is: when is it suitable to use mapping tables
It is the normal method.
Correct
theoretically founded
allows all functions and capabilities that users expect databases to have
eg. aggregation, single or multiple item (subset of the list) searches are very fast, etc
easy to expand
prevents preventable errors
gives you chips that you can cash in, in Heaven.
and when is it suitable to include the data directly in the associated table?
Never. That will create a comma-separated list in a single column.
Incorrect
No theoretical basis
breaks First Normal Form
beloved of the incompetent (they not only don't know the rules, they don't know when they are breaking the few rules they do know)
database features and functions cannot be used
eg. searching for, determining if, a specific user is working on a project will cause a tablescan
result is not a database, it is a Record Filing System
difficult to expand
you will spend half your life fixing preventable errors, and the other half thinking about how to replace it without letting anyone noticing
guarantees you a specific place in hell, sixth level, with the frauds and those who cheat workers out of their wages, one level below murders, one above pædophiles and war-mongers
have a mapping table for each of users-projects/tasks/appts because there can be many users for each type and many of each type per user
Generally, yes. But that is not clear. "Type" rings alrm bells, it sounds like you intend to have one table that asupports all possibilities; nullable Foreign Keys; etc. Refer "Never" above.
There should be an Associative Table (not "mapping") between only those pairs of tables that need it, not between each and every possibility. And each such table relates ("links", "maps", "connects") just one discrete pair.
This will be resolved when the Normalisation is completed, next ...
Consideration
The requirement does sound a bit suspicious. I do not accept that those tables are all isolated, fragmentary facts. Consider:
First, Tasks are probably a child of Project (you've implied that, such a dependency should be explicit). Likewise, Appointments should be a child of Project. As in, a Tasks cannot exist, except in the context of a Project. Likewise for Appointment.
Then you have to evaluate whether Users should be related to Projects (as given in the requirement). It seems to me that an User is assigned to a Task (and thus related to the Project because the Tasks belongs to one Project), and not to all Tasks in the Project. Likewise for User::Appointment.
if Users are related to Projects (and not to specific Tasks), as per the requirement, it does seem too general. Especially if an Appointment applies a Project, and therefore to all Users assigned to the Project.
So it appears to me on the info received thus far, plus my suggestions (which have not been confirmed, so this one is thin ice), that Appointments are made at the lower level, the Task level, and may well apply to all Users assigned to the Task.
There may be a second type of Appointment, at the Project level, which applies to the distinct set of all Users assigned to all Tasks in the Project.
As long as my suggestions above are correct, particularly that Users are assigned to Tasks, if an Appointment is made at the Task level, it applies to all Users assigned to that Task, then there are no Associative ("mapping") Tables at all.
IDs cannot provide row uniqueness. How do you ensure row uniqueness, as demanded for relational databases ?
As you can see, stamping an ID column on every table that is perceived in the first draft of the model cripples, prevents, the data modelling exercise. You need 10 to 12 drafts. Somewhere around the fifth, you will assign Keys. At 9 or 10, you will assign IDs to the few tables (if any) that need them.
Assigning IDs first guarantees a first draft implementation in an RFS, which means no database integrity, no database capability.
Consider, confirm/dent, discuss, etc.
Here's a diagram to use as a discussion platform. Please use the link at the bottom of it, and familiarise yourself with the Notation, to whatever level you see fit.
Project Management ERD • Second Draft
One suggestion may not sound like a technical one, more like grammar. When describing your entities and their relationships with each other, do not mention or even think about tables, columns or whatever. At the beginning of the design process, they are entities -- not tables, attributes -- not columns. Don't influence the physical design too early.
Do use words that closely match the relationships. For example, I doubt the in the normal course of conversation, one user will ask another if they "have a relationship" with a project. It will be more like "Are you involved in this project?" or "Are you working on this project?" So a user can be involved in many projects and a project can have many users involved in it. Be specific in naming just what the relationship is but you don't have to get anal about it. There could be several close fits -- choose one and go on.
As for mapping tables, when you describe a many-to-many relationship, you don't really have much choice.
You do have a choice in a one-to-many relationship. A task, for example, is "performed for" only one project. This means that the FK to Project can be part of the Task tuple. But you can also implement a one-to-many mapping table. This is generally done when there seems to be at least a possibility that the relationship might evolve into a many-to-many sometime in the future.
The difference between a many-to-many and a one-to-many mapping table is trivial:
create table UserProjectMap(
int UserID not null,
int ProjectID not null,
constraint FK_UserProject_User foreign key( UserID )
references Users( ID ),
constraint FK_UserProject_Project foreign key( ProjectID )
references Projects( ID ),
constraint PK_UserProjectMap primary key( UserID, ProjectID )
);
create table TaskProjectMap(
int TaskID not null,
int ProjectID not null,
constraint FK_TaskProject_Task foreign key( TaskID )
references Tasks( ID ),
constraint FK_TaskProject_Project foreign key( ProjectID )
references Projects( ID ),
constraint PK_TaskProjectMap primary key( TaskID )
);
In case you missed it, it's the last line of each definition.
Converting a one-to-many mapping table to many-to-many is easy -- just drop the unique constraint on one side. Or, in the example above, redefine the PK to include both FK fields. That means no structural changes, which are extremely difficult to do when a design has been in use for any length of time -- unless you've prepared for them ahead of time.
But that's 500-level work.
Oh, one more piece of advice. Don't be too quick to denormalize or make any changes for no better reason than it will make queries or DML easier for the developers. The sole purpose of the database (and your goal as the designer) is to serve the needs of the users, not the db developers. At the top of that list of needs is data integrity. Don't sacrifice data integrity for a little more performance or for ease of maintenance. The DBAs may grumble, but the users will appreciate it -- and it's the users who ultimately pay your salary.
Related
I have a SQL Server 2008 database with a snowflake-style schema, so lots of different lookup tables, like Language, Countries, States, Status, etc. All these lookup table have almost identical structures: Two columns, Code and Decode. My project manager would like all of these different tables to be one BIG table, so I would need another column, say CodeCategory, and my primary key columns for this big table would be CodeCategory and Code. The problem is that for any of the tables that have the actual code (say Language Code), I cannot establish a foreign key relationship into this big decode table, as the CodeCategory would not be in the fact table, just the code. And codes by themselves will not be unique (they will be within a CodeCategory), so I cannot make an FK from just the fact table code field into the Big lookup table Code field.
So am I missing something, or is this impossible to do and still be able to do FKs in the related tables? I wish I could do this: have a FK where one of the columns I was matching to in the lookup table would match to a string constant. Like this (I know this is impossible but it gives you an idea what I want to do):
ALTER TABLE [dbo].[Users] WITH CHECK ADD CONSTRAINT [FK_User_AppCodes]
FOREIGN KEY('Language', [LanguageCode])
REFERENCES [dbo].[AppCodes] ([AppCodeCategory], [AppCode])
The above does not work, but if it did I would have the FK I need. Where I have the string 'Language', is there any way in T-SQL to substitute the table name from code instead?
I absolutely need the FKs so, if nothing like this is possible, then I will have to stick with my may little lookup tables. any assistance would be appreciated.
Brian
It is not impossible to accomplish this, but it is impossible to accomplish this and not hurt the system on several levels.
While a single lookup table (as has been pointed out already) is a truly horrible idea, I will say that this pattern does not require a single field PK or that it be auto-generated. It requires a composite PK comprised of ([AppCodeCategory], [AppCode]) and then BOTH fields need to be present in the fact table that would have a composite FK of both fields back to the PK. Again, this is not an endorsement of this particular end-goal, just a technical note that it is possible to have composite PKs and FKs in other, more appropriate scenarios.
The main problem with this type of approach to constants is that each constant is truly its own thing: Languages, Countries, States, Statii, etc are all completely separate entities. While the structure of them in the database is the same (as of today), the data within that structure does not represent the same things. You would be locked into a model that either disallows from adding additional lookup fields later (such as ISO codes for Language and Country but not the others, or something related to States that is not applicable to the others), or would require adding NULLable fields with no way to know which Category/ies they applied to (have fun debugging issues related to that and/or explaining to the new person -- who has been there for 2 days and is tasked with writing a new report -- that the 3 digit ISO Country Code does not apply to the "Deleted" status).
This approach also requires that you maintain an arbitrary "Category" field in all related tables. And that is per lookup. So if you have CountryCode, LanguageCode, and StateCode in the fact table, each of those FKs gets a matching CategoryID field, so now that is 6 fields instead of 3. Even if you were able to use TINYINT for CategoryID, if your fact table has even 200 million rows, then those three extra 1 byte fields now take up 600 MB, which adversely affects performance. And let's not forget that backups will take longer and take up more space, but disk is cheap, right? Oh, and if backups take longer, then restores also take longer, right? Oh, but the table has closer to 1 billion rows? Even better ;-).
While this approach looks maybe "cleaner" or "easier" now, it is actually more costly in the long run, especially in terms of wasted developer time, as you (and/or others) in the future try to work around issues related to this poor design choice.
Has anyone even asked your project manager what the intended benefit of this is? It is a reasonable question if you are going to spend some amount of hours making changes to the system that there be a stated benefit for that time spent. It certainly does not make interacting with the data any easier, and in fact will make it harder, especially if you choose a string for the "Category" instead of a TINYINT or maybe SMALLINT.
If your PM still presses for this change, then it should be required, as part of that project, to also change any enums in the app code accordingly so that they match what is in the database. Since the database is having its values munged together, you can accomplish that in C# (assuming your app code is in C#, if not then translate to whatever is appropriate) by setting the enum values explicitly with a pattern of the first X digits are the "category" and the remaining Y digits are the "value". For example:
Assume the "Country" category == 1 and the "Language" catagory == 2, you could do:
enum AppCodes
{
// Countries
United States = 1000001,
Canada = 1000002,
Somewhere Else = 1000003,
// Languages
EnglishUS = 2000001,
EnglishUK = 2000002,
French = 2000003
};
Absurd? Completely. But also analogous to the request of merging all lookup tables into a single table. What's good for the goose is good for the gander, right?
Is this being suggested so you can minimise the number of admin screens you need for CRUD operations on your standing data? I've been here before and decided it was better/safer/easier to build a generic screen which used metadata to decide what table to extract from/write to. It was a bit more work to build but kept the database schema 'correct'.
All the standing data tables had the same basic structure, they were mainly for dropdown population with occasional additional fields for business rule purposes.
I'm working on a database design for an application that will manage Model Congress groups. I'm having particular trouble with the object meant to represent the congress. Right now, the field list looks like this:
| Congress |
+------------+
| congressID |
| adminID |
| speakerID |
| hopperID |
| floorID |
| rulesID |
| (etc.) |
This is what each field is meant to represent. Tables/objects are all caps.
congressID: The primary key (obviously)
adminID: References the unique PERSON who runs the model congress, i.e., the teacher.
speakerID: References the unique REP (representative) who acts as speaker for the congress
hopperID: References the special COMMITTEE (a committee being any place a bill can be sent) to which new bills are initially sent.
floorID: References the COMMITTEE used to represent the congress's floor
rulesID: References the Rules COMMITTEE
As you can see, these fields are important to reference in the context of each model congress. The issue I am having is how to represent the foreign keys, primarily the last four.
It seems I have two choices:
Include them all in the Congress table as they are now, or
Make smaller tables for each field with composite primary keys, e.g. repID+congressID in Speaker, committeeID+congressID in Hopper, etc.
Is more granular necessarily better? Or does this needlessly complicate things? I've been skirting around my design with the first layout for a while, but whenever I try to draw the ERD from that point, the relationships appear hopelessly mangled.
GROUPS should be a table that can define simply the # of the congress in the name or description (1st, 2nd, etc..) and each PERSON should have an ID, with fields for their names and biographics. Then, an M:N table (call it say, GROUPMEMBERS) with it's own ID, an ID referencing GROUPS and (PERSON or GROUPS - would be two fields so you could attach groups to other groups in a hierarchy). Then, you have a tag table (GROUPROLES) with its own ID, a begin date (and possibly an end date), a reference to GROUPMEMBERS, and a reference to a TYPES table defining one role per row within the membership.
What does this allow you to do? A person's role can now change mid-congress with the dates attached to the roles. Committees can be part of congress (named groups with the # of congress in the description or name), and roles within a committee (like chairman) can be assigned.
Drawbacks? Reporting out involves a bit of gymnastics, but not impossible. The table names aren't directly correlative, but they shouldn't need to be exposed to the application users. Constraints about number of members per role per group would need to be either set up as custom functions in the database, or application-controlled (not recommended).
This also allows you to set up dependent infrastructure that tracks flow of items through committees and you can even report out how many times a single person saw a single thing by their membership to groups the item went through. It can also be used to track votes and denormalize information to large group votes (like passing or dying in committee based on votes) simply by running an aggregate on the vote in question and referencing the group. If this were the case, I'd probably make roles a subclass of a historical table and run everything out of a central history.
I have seen many posts on building database schemas for object tagging (such as dlamblin's post and Artilheiro's post as well).
What I cannot seem to find in my many days of research is the schema logic in implementing a tagging schema that allows for the tags to be assigned to a user (such as LinkedIn's Skills and Expertise system, where tags that have been added by the user can be indexed and searched). This could be as simple as changing the "object" in question to a user, but I have a feeling it is more complicated than that.
I want to be able to construct something almost exactly like this, except in categories. In example, if we took some of LinkedIn's skills and categorized them, we could have something like: IT/Computing, Retail, Project Management, etc.
I know there are a couple common methodologies and architectures to categorizing data, specifically Nested Set and Adjacency List. I have heard many things about both, such as "Nested Set's insertion and deletion are resource intensive", and "Adjacency List Models are awkward, finite, and don't cover unlimited depth."
So I have two questions wrapped into one post:
What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?
What is the best to way to categorize something of this nature in light of the necessity to have categorization?
Are there any other models that would suit this better that I am unaware of? (Oops, I think that is three questions)
What is the best to way to categorize something of this nature in light of the necessity to have categorization?
Depends how much flexibility you need. For example, the adjacency list may be perfectly fine if you can assume the depth of your category hierarchy has a fixed limit of, say 1 or 2 levels.
Are there any other models that would suit this better that I am unaware of?
Path enumeration is a way to represent hierarchy in a concatenated list of the ancestor names. So each sub-category tag would name not only its own name, but its parent and any further grandparents up to the root.
You are already familiar with absolute pathnames in any shell environment: "/usr/local/bin" is a path enumeration of "usr", "local", and "bin" with the hierarchical relationship between them encoded in the order of the string.
This solution also has the possibility of data anomalies -- it's your responsibility to create an entry for "/usr/local" as well as "/usr/local/bin" and if you don't, some things start breaking.
What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?
Implementing this in the database is almost as simple as naming the tags individually, but it requires that your tag "name" column be long enough to store the longest path in the hierarchy.
CREATE TABLE taguser (
tag_path VARCHAR(255),
user_id INT,
PRIMARY KEY (tag_path,user_id),
FOREIGN KEY (tag_path) REFERENCES tagpaths (tag_path),
FOREIGN KEY (user_id) REFERENCES users (user_id)
);
Indexing is exactly the same as simple tagging, but you can only search for sub-category tags if you specify the whole string from the root of the hierarchy.
SELECT user_id FROM taguser WHERE tag_path = '/IT/Computing'; -- uses index
SELECT user_id FROM taguser WHERE tag_path LIKE '%/Computing'; -- can't use index
I think the best logic is the same as state in the post you linked
+------- +
| user |
+------- +
| userid |
| ... |
+--------+
+-------- --+
| linktable |
+-----------+
| userid | <- (fk and pk)
| tagid | <- (fk and pk)
+-----------+
+-------+
| tag |
+-------+
| tagid |
| ... |
+-------+
pretty mutch the way go to imo. if you want to categorise the tag you can alway atach a category table to the tag table
You didn't say which database, so I'm going to play devil's advocate and suggest how it would work in MongoDB. Create your user like this:
db.users.insert({
name: "bob",
skills: [ "surfing", "knitting", "eating"]
})
Then create an index on "skills". Mongo will add each skill in the array to the index, allowing quick lookups. Finding users with an intersection of 2 skills has similar performance to SQL databases, but the syntax is much nicer:
db.users.find({skills: "$in": ["surfing", "knitting"]})
The upside is that a single disk seek will fetch all the information you need on a user. The downside is that it takes a lot more disk space, and somewhat more RAM. But if it can avoid disk seeks caused by joins, it could be a win.
Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne
The users I am concerned with can either be "unconfirmed" or "confirmed". The latter means they get full access, where the former means they are pending on approval from a moderator. I am unsure how to design the database to account for this structure.
One thought I had was to have 2 different tables: confirmedUser and unconfirmedUser that are pretty similar except that unconfirmedUser has extra fields (such as "emailConfirmed" or "confirmationCode"). This is slightly impractical as I have to copy over all the info when a user does get accepted (although I imagine it won't be that bad - not expecting heavy traffic).
The second way I imagined this would be to actually put all the users in the same table and have a key towards a table with the extra "unconfirmed" data if need be (perhaps also add a "confirmed" flag in the user table).
What are the advantages adn disadvantages of each approach and is there perhaps a better way to design the database?
The first approach means you'll need to write every query you have for two tables - for everything that's common. Bad (tm). The second option is definitely better. That way you can add a simple where confirmed = True (or False) as required for specific access.
What you could actually ponder over is whether or not the confirmed data (not the user, just the data) is stored in the same table. Perhaps it would be cleaner + normalized to have all confirmation data in a separate table so you left join confirmation on confirmation.userid = users.id where users.id is not null (or similar, or inner join, or get all + filter in server side script, etc.) to get only confirmed users. The additional data like confirmation email, date, etc. can be stored here.
Personally I would go for your second option: 1 users table with a confirmed/pending column of type boolean. Copying over data from one table to another identical table is impractical.
You can then create groups and attach specific access rights to each group and assign each user to a specific group if the need arises.
Logically, this is inheritance (aka. category, subclassing, subtype, generalization hierarchy etc.).
Physically, inheritance can be implemented in 3 ways, as mentioned here, here, here and probably in many other places on SO.
In this particular case, the strategy with all types in the same table seems most appropriate1, since the hierarchy is simple and unlikely to gain new subclasses, subclasses differ by only a few fields and you need to maintain the parent-level key (i.e. unconfirmed and confirmed user should not have overlapping keys).
1 I.e. the "second way" mentioned in your question. Whether to also put the confirmation data in the same table depends on the needed cardinality - i.e. is there a 1:N relationship there?
the Best way to do this is to have a Table for the users with a Status ID as a Foreign Key, the Status Table would have all the different types of Confirmations all the different combinations that you could have. this is the best way, in my opinion, to structure the Database for Normalization and for your programming needs.
so your Status Table would look like this
StatusID | Description
=============================================
1 | confirmed
2 | unconfirmed
3 | CC confirmed
4 | CC unconfirmed
5 | acct confirmed CC unconfirmed
6 | all confirmed
user table
userID | StatusID
=================
456 | 1
457 | 2
458 | 2
459 | 1
if you have a need for the Confirmation Code, you can store that inside the user table. and program it to change after it is used, so that you can use that same field if they need to reset a password or what ever.
maybe I am assuming too much?