Reusing a database table for many other entities? Is this possible? - database

Say for example, I have an ADDRESS table, that will store similar attributes of other entities like address, city, zip, country, etc. The entities are USER, COMPANY, BANK, BRANCH, etc. I would like to use this one table ADDRESS to store the addresses of the other entities rather than creating other tables for each entity to store the ADDRESS like so, USER_ADDRESS, COMPANY_ADDRESS, BANK_ADDRESS, BRANCH_ADDRESS.
Is this possible? Am i breaking any laws or conventions? What are the consequences, if any?

Each entity (USER, COMPANY, etc.) should contain a reference to an entry in the ADDRESS table.
There are a few issues:
If 2 users have the same address, they should reference the same address id.
You will need to normalise addresses so that you're not duplicating information (e.g. if you know the city, then you automatically know the zip and country).
Of course, you may not want a well-normalised database. Saving the entire address as a string will improve read performance by reducing the number of join operations.
A lot of things depend on the exact use of the database.

It is fine to use a single ADDRESS table for that purpose and have an ADDRESS_ID in each of the other entities. Depends on the use case and the way you prefer to implement it. I most probably wouldn't do it. I also wouldn't do the other solution you're suggesting (an address table per entity).
So, let's say you want to implement a function to search for all the addresses, where it doesn't matter what type of entity is connected to it. You will have to search the ADDRESS table. If you get results, then you have to search the other four tables to see which record is connected to that address.
You could add a field ENTITY_TYPE in the ADDRESS table where you specify which type of entity it is connected to, so you don't have to search the four tables, but I don't recommend this since you can have consistency errors (USER 17 points to ADDRESS 14, but ADDRESS 14 has ENTITY_TYPE = BANK).
Now, with your other solution (having four separate tables to store the addresses of the four different entities) you're just going to have to search those four tables and then search the corresponding entity table to get the entity you're looking for.
My solution in most cases is adding the address fields to the entities tables themselves. Having ADDRESS, ZIP_CODE and COUNTRY_CODE (always use proper country codes, not country names https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) will make it simple. When you present a list of items (users, banks, companies, offices, whatever), it's really common to show the name and the address at the same time in a table. Having no JOINS makes it faster and easier to process. If you want to update an address, it's on the table itself. No lookups!
Of course, like most things in programming, it depends on what your needs are.
Also, please, don't try to split the ADDRESS in more fields. I've seen ADDRESS_TYPE (street, road, avenue, square, ...), STREET_NAME, STREET_NUMBER, BLOCK_NUMBER, BLOCK_FLOOR, BLOCK_LETTER. I'm pretty sure you're never going to need something like SELECT * FROM USER WHERE STREET_NUMBER = 74.

Related

How can I get rid of an "indirect" foreign key? Should I?

Here's an example schema to illustrate what I'm talking about:
Let's say I'm storing information about some activities (seminars, trainings, whatever) that are being hosted in a certain set of locations, identified by type (hackerspace, swimming pool, etc) and city. Each activity happens at all of the locations of a suitable type at once (e.g. any programming seminar happens at all of the hackerspaces at once), so any person may choose to attend an activity in any of the suitable locations. Therefore, any activity is associated only with some location type, while an attendance record is associated with some activity (and therefore implicitly with some location type) and the city where this particular user attended the activity.
The most common query in the system by far is generating a report of all activities attended by a given person.
Am I right in feeling that this is ugly? Should I try to redesign this, and if so, how?
P.S. I'd rather not reveal the actual data I'm storing in a database where I had to employ a similar design, so I hope that this analogy makes some sense.
It sounds like you need a LocationTypes table with a list of location types. Then, Location can have a foreign key relationship to LocationTypes.
But, I don't like assuming that the set of locations doesn't change over time. So that is overly simplistic. So, I would have another entity of something like LocationSets, which would list the locations for a given activity over time. The LocationSets would contain the "type" which can be used. The locations associated with a location set would be in another table, a junction table connecting the location sets and the locations.
Then Activities would have a LocationSetId. And Attendance would have a LocationId. You might want to enforce that at any given time, the Attendance location is consistent with the locations in the Activity's LocationSets. This could be done at the application layer, through a trigger in the database, or through mechanism such as a function-based constraint (if your database supports those).

customer-address, property-address and company-address

I am modelling a loan database for a friend.
A Customer can have 0 to N Addresses (street address or POBox address or even more than 1 street addresses and more than on POBox addresses). A Property must have only one Address. A Company (employment info) must have only one Address.
It will be better to have a separate Addresses table for the Customers table. The address for Property and Company can go with Properties and Companies table.
But since we have an Addresses table here, do you think it is a good idea or not to share that Addresses table for Companies and Properties tables as well?
When we think about the relationship between entities, we should cut off a time point (static way?) or we should view a certain range of the time (dynamic way?) to analyze their relationship? For example, a company can only have ONE address at certain time point but that company may moved from one place to another recently. Then a company may have more than one address for a certain range of time.
Customer would be better with a 1 to N than a 0 to N relationship, since you are making loans you might want to know where their address.
A Company (employment info) must have only one address.
Then a company may have more than one address for a certain range of
time.
You are contradicting yourself a bit, why would you need the two address? I think the company will have their official just one address till they get everything on the new address at which point you can update your DB to the new one.
But since we have an Addresses table here, do you think it is a good
idea or not to share that Addresses table for Companies and Properties
tables as well?
Yes
And here a nice link with some ideas on modelling:
http://www.databaseanswers.org/data_models/
A Company (employment info) must have only one Address.
Not necessarily. A Company can have a mailing address and a physical address.
Since we have an Addresses table here, do you think it is a good idea or not to share that Addresses table for Companies and Properties tables as well?
Yes, it's a good idea to put addresses in the Addresses table. Your Properties table would have an address row foreign key, and your Companies table would have 2 foreign keys, one for a mailing address and one for a physical address. The mailing address would be an optional (nullable) foreign key.
You would need a CuustomerAddress table to maintain the 0 to N relationship between Customer and Address. If you want, you can also have a 0 to N relationship between Address and Customer.
The table would look like this.
CustomerAddress
---------------
CustomerAddress ID
Customer ID
Address ID
The CustomerAddress ID is the primary (clustering) index. It is an ascending integer or long, or some other unique ID.
You would have a unique indexon (Customer ID, Address ID).
If you want to associate addresses with customers, you would have another unique index on (Address ID, Customer ID).
A company can only have ONE address at certain time point but that company may moved from one place to another recently. Then a company may have more than one address for a certain range of time.
If this information is important, then you have to include a date written column in your CompanyAddress table. You would create a unique index on (Company ID, Date written descending). This way, the first row you retrieve from the Address table would be the most current address.
It seems like a very popular idea to put all Addresses in their own table. Developers love to seek out repetition and eliminate it. But in this case I would hesitate to dignify addresses with Entity status by putting them in their own dedicated table, because if, like most applications, you don't treat addresses as full-fledged entities, this gets overcomplicated.
If you treated addresses as real entities then if two companies somehow shared the same address, or one inhabited a location for a while, then another one inhabited that same location, then those companies would reference the same address. Because when your application was accepting an address as input it would go see if there was an existing address and reference it rather than just slam some garbage into the address table. Which one do you intend to do? I expect it's the slam one, which is fine, because like most business applications you totally don't care if the new address you're putting in is the same as some other address already in the database, you have no interest in tracking the addresses as individual things. And that's the difference between entities and cat food.
So with the consolidation we have to introduce an intersection table, and index it, and all our entities that have addresses have to join to it, we have to think about whether to get the address eagerly or use lazy loading. We chucked all the addresses into one bucket and have to work to make sure everybody can get to their own address quickly. For real entities this makes some sense because different things need to link to the same entity, but we established above that we don't care about that, nobody is sharing these entries.
Where's the repetition we're eliminating by consolidating addresses into one table? The addresses are going to end up in the database somewhere regardless, with the same fields, we're not saving space. The only repetition is in the DDL used to generate the schema, which we can manage by making a reusable component (where "component" is the Hibernate term) for the address (which addresses redundancy in the application code) and using the ORM tool to generate the schema. Or, worst case, just ignore it, addresses don't change that much, it's not your biggest problem.
These requirements you are describing sound suspiciously enterprise-y for a project you're doing for a friend. Possibly your friend's brain has been poisoned by overexposure to elaborate requirements concocted by committees who don't know what they're doing. It's bad enough we have to put up with this junk at work, but for personal projects? Try to talk him down.
But maybe your friend is outsourcing his enterprise-y work to you and you're stuck with 0-N addresses per customer. If so, contain the damage: make a table exclusively for customer addresses, so you don't need the intersection table, and put the other entities' addresses inline. Making these entities that have only one address go get their address from another table doesn't buy you anything but more joins. If you need history, write it to a separate history table where it's out of the way.

Separate tables for name and email addresses?

I've been fighting my initial decision since I've began work on my database. I'm debating whether or not I need a separate table for email addresses. My database looks like this:
people(id, first_name, last_name, email)
addresses(id, address, street, city, state, zip, latitude, longitude)
addresses_people(id, person_id, address_id)
phone_numbers(id, person_id, phone_number, type)
I figured I didn't need a separate table for email addresses as I only wanted one per person regardless if they have more. The problem I'm seeming to have though is that some people will NOT have email addresses. Very often I will be storing children in the people table. Now it seems like it would be better design if I put the email addresses in a separate table to avoid the thousands of empty email fields I'll have.
It's a huge hassle to change this now as the app is already in production somewhat, but changing it now as opposed to a year or two from now would be exponentially easier. Is it worth a day or two to change the emails to another table?
In my opinion you are over-engineering. Optional email field is fine. Actually having a separate table might introduce much bigger overhead.
The only reason for a separate table is to model 1-N relationship if you expect the user to have more than one e-mail.
I believe there is no need to change it. Having some fields in a row be empty is not a big deal most of the time. In fact, it is common in most databases. The design of the database should depend on the objects you are trying to model primarily and not be concerned with separating things into tables simply for storage reasons unless you have serious storage constraints or other extenuating circumstances.
See Database design - empty fields which also shed a lot of light on your question.

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

Should User and Address be in separate tables?

Currently my users table has the below fields
Username
Password
Name
Surname
City
Address
Country
Region
TelNo
MobNo
Email
MembershipExpiry
NoOfMembers
DOB
Gender
Blocked
UserAttempts
BlockTime
Disabled
I'm not sure if I should put the address fields in another table. I have heard that I will be breaking 3NF if I don't although I can't understand why. Can someone please explain?
There are several points that are definitely not 3NF; and some questionable ones in addition:
Could there could be multiple addresses per user?
Is an address optional or mandatory?
Does the information in City, Country, Region duplicate that in Address?
Could a user have multiple TelNos?
Is a TelNo optional or mandatory?
Could a user have multiple MobNos?
Is a MobNo optional or mandatory?
Could a user have multiple Emails?
Is an Email optional or mandatory?
Is NoOfMembers calculated from the count of users?
Can there be more than one UserAttempts?
Can there be more than one BlockTime per user?
If the answer to any of these questions is yes, then it indicates a problem with 3NF in that area. The reason for 3NF is to remove duplication of data; to ensure that updates, insertions and deletions leave the data in consistent form; and to minimise the storage of data - in particular there is no need to store data as "not yet known/unknown/null".
In addition to the questions asked here, there is also the question of what constitutes the primary key for your table - I would guess it is something to do with user, but name and the other information you give is unlikely to be unique, so will not suffice as a PK. (If you think name plus surname is unique are you suggesting that you will never have more than one John Smith?)
EDIT:
In the light of further information that some fields are optional, I would suggest that you separate out the optional fields into different tables, and establish 1-1 links between the new tables and the user table. This link would be established by creating a foreign key in the new table referring to the primary key of the user table. As you say none of the fields can have multiple values then they are unlikely to give you problems at present. If however any of these change, then not splitting them out will give you problems in upgrading the application and the data to support the application. You still need to address the primary key issue.
As long as every user has one address and every address belongs to one user, they should go in the same table (a 1-to-1 relationship). However, if users aren't required to enter addresses (an optional relationship) a separate table would be appropriate. Also, in the odd case that many users share the same address (e.g. they're convicts in the same prison), you have a 1-to-many relationship, in which case a separate table would be the way to go. EDIT: And yes, as someone pointed out in the comments, if users have multiple address (a 1-to-many the other way around), there should also be separate tables.
Just as point that I think might help someone in this question, I once had a situation where I put addresses right in the user/site/company/etc tables because I thought, why would I ever need more than one address for them? Then after we completed everything it was brought to my attention by a different department that we needed the possibility of recording both a shipping address and a billing address.
The moral of the story is, this is a frequent requirement, so if you think you ever might want to record shipping and billing addresses, or can think of any other type of address you might want to record for a user, go ahead and put it in a separate table.
In today's age, I think phone numbers are a no brainer as well to be stored in a separate table. Everyone has mobile numbers, home numbers, work numbers, fax numbers, etc., and even if you only plan on asking for one, people will still put two in the field and separate them by a semi-colon (trust me). Just something else to consider in your database design.
the point is that if you imagine to have two addresses for the same user in the future, you should split now and have an address table with a FK pointing back to the users table.
P.S. Your table is missing an identity to be used as PK, something like Id or UserId or DataId, call it the way you want...
By adding them to separate table, you will have a easier time expanding your application if you decide to later. I generally have a simple user table with user_id or id, user_name, first_name, last_name, password, created_at & updated_at. I then have a profile table with the other info.
Its really all preference though.
You should never group two different types of data in a single table, period. The reason is if your application is intended to be used in production, sooner or later different use-cases will come which will need you to higher normalised table structure.
My recommendation - Adhere to SOLID principles even in DB design.

Resources