Supertype/subtype database schema question - database

I have a database of resources with the typical address, email and all that jazz. One resources can be used by one or more counties. The resources are categorized by Education, Health Care and a couple others. A resource will only ever have one category so it cannot be education and health care, for instance. I would like to use the supertype/subtype relationship. Currently, no category (health care, education etc.) do not have any differing attributes. How could I amend my schema to accommodate that?
below is a screen cap of my current schema.
http://imgur.com/fbrFB

The whole point of a supertype/subtype structure is to gather the attributes common to all subtypes together in one table, the supertype, and to isolate the attributes unique to each subtype in separate tables.
If all your subtypes have the same attributes, what's the point?
I think you'll get more benefit from reconsidering how you're going to handle addresses. Anyone who has a PO box for mail is liable to have different ZIP codes for their physical and mailing addresses.

Related

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

Implementating Generalizations When At The Logical Design Stage Of Database Design?

I am designing a database and as i do not have much experience in this subject, i am faced with a problem which i do not know how to go about solving.
In my conceptual model i have an object known as "Vehicle" which the customer orders and the stock system monitors. This supertype has two subtypes "Motorcar" and "Motorcycle". The user can order one or the other or even both.
Now that i am at the logical design stage, i need to know how i can have the system allow for two different types of products. The problem i have is that if i put each of the objects separate attributes into the same relation, then i will have columns that are of no use to some objects.
For example, if i just have a generic table holding both "Motorcars" and "Motorcycles" which i call "Vehicles" and all of their attributes, the cars will not need some of the motorcycle attributes and the motorcycle will not need all of the car attributes.
Is there a way to solve this issue?
The decision will need to be guided by the amount of shared information. I would start by identifying all the attributes and the rules about them.
If the majority of information is shared, you might not split into multiple tables. On the other hand, you can always split tables and then join into a view for ease of use.
For instance, you might have a vehicle table with only share information, and then a motorcar table with a foreign key to the vehicles table and a motorcycle table with a foreign key to the vehicles table. There is a certain difficulty ensuring that you don't have a motorocar row AND a motorcycle row referring to the same vehicle, and so there are other possibilities to mitigate that - but all that is unnecessary if the majority of information is common, you just have unused columns in a single vehicles table. You can even enforce with constraints to ensure that columns are NULL for types where they should not be filled in.

Shared Entity in one to many database relationship

I have a database I'm working on the design for. I have manufacturers and I have distributors on separate tables containing practically the same information with few exceptions. Both groups have one-many contacts that need to be connected to them. I created a contact table to hold contact information, one!
Do I need a second contact table? I'm trying to make this as DRY as possible. How would that look? Thank you in advance
Maybe a case for the party-role pattern? Manufacturer and Distributor are roles played by Parties. Contacts apply to Parties, not the role(s) they play. So you'd have:
a table named Party
a table named ContactMethod (or similar)
a 1:M relationship from Party to ContactMethod
which would resolve the need for two Contact tables. How you model the roles side will depend on wider requirements. The canonical model would have:
a single supertype named Role
a M:M relationship from Party to Role
a subtype of Role for each specific role (Distributor and Manufacturer in your case).
(Note: as an aside, this also allows a Party to play both manufacturer and distributor roles - which may or may not be relevant).
There are 3 'standard' patterns for implementing a subtype hierarchy in relational tables:
table for entire hierarchy
table per leaf subtype
table per type
(1) would apply if you don't have any role-specific relationships. (However I suspect that's unlikely; there's probably information related to Distributors that doesn't apply to Manufacturers and vice-versa).
(2) means multiple relationships from Party (i.e. one to each role subtype).
(3) avoids both above but means an extra join in navigating from Party to its role(s).
Like I say, choice depends on wider reqs.
hth.

Database Design: Explain this schema

Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.

Resources