Should User and Address be in separate tables? - sql-server

Currently my users table has the below fields
Username
Password
Name
Surname
City
Address
Country
Region
TelNo
MobNo
Email
MembershipExpiry
NoOfMembers
DOB
Gender
Blocked
UserAttempts
BlockTime
Disabled
I'm not sure if I should put the address fields in another table. I have heard that I will be breaking 3NF if I don't although I can't understand why. Can someone please explain?

There are several points that are definitely not 3NF; and some questionable ones in addition:
Could there could be multiple addresses per user?
Is an address optional or mandatory?
Does the information in City, Country, Region duplicate that in Address?
Could a user have multiple TelNos?
Is a TelNo optional or mandatory?
Could a user have multiple MobNos?
Is a MobNo optional or mandatory?
Could a user have multiple Emails?
Is an Email optional or mandatory?
Is NoOfMembers calculated from the count of users?
Can there be more than one UserAttempts?
Can there be more than one BlockTime per user?
If the answer to any of these questions is yes, then it indicates a problem with 3NF in that area. The reason for 3NF is to remove duplication of data; to ensure that updates, insertions and deletions leave the data in consistent form; and to minimise the storage of data - in particular there is no need to store data as "not yet known/unknown/null".
In addition to the questions asked here, there is also the question of what constitutes the primary key for your table - I would guess it is something to do with user, but name and the other information you give is unlikely to be unique, so will not suffice as a PK. (If you think name plus surname is unique are you suggesting that you will never have more than one John Smith?)
EDIT:
In the light of further information that some fields are optional, I would suggest that you separate out the optional fields into different tables, and establish 1-1 links between the new tables and the user table. This link would be established by creating a foreign key in the new table referring to the primary key of the user table. As you say none of the fields can have multiple values then they are unlikely to give you problems at present. If however any of these change, then not splitting them out will give you problems in upgrading the application and the data to support the application. You still need to address the primary key issue.

As long as every user has one address and every address belongs to one user, they should go in the same table (a 1-to-1 relationship). However, if users aren't required to enter addresses (an optional relationship) a separate table would be appropriate. Also, in the odd case that many users share the same address (e.g. they're convicts in the same prison), you have a 1-to-many relationship, in which case a separate table would be the way to go. EDIT: And yes, as someone pointed out in the comments, if users have multiple address (a 1-to-many the other way around), there should also be separate tables.

Just as point that I think might help someone in this question, I once had a situation where I put addresses right in the user/site/company/etc tables because I thought, why would I ever need more than one address for them? Then after we completed everything it was brought to my attention by a different department that we needed the possibility of recording both a shipping address and a billing address.
The moral of the story is, this is a frequent requirement, so if you think you ever might want to record shipping and billing addresses, or can think of any other type of address you might want to record for a user, go ahead and put it in a separate table.
In today's age, I think phone numbers are a no brainer as well to be stored in a separate table. Everyone has mobile numbers, home numbers, work numbers, fax numbers, etc., and even if you only plan on asking for one, people will still put two in the field and separate them by a semi-colon (trust me). Just something else to consider in your database design.

the point is that if you imagine to have two addresses for the same user in the future, you should split now and have an address table with a FK pointing back to the users table.
P.S. Your table is missing an identity to be used as PK, something like Id or UserId or DataId, call it the way you want...

By adding them to separate table, you will have a easier time expanding your application if you decide to later. I generally have a simple user table with user_id or id, user_name, first_name, last_name, password, created_at & updated_at. I then have a profile table with the other info.
Its really all preference though.

You should never group two different types of data in a single table, period. The reason is if your application is intended to be used in production, sooner or later different use-cases will come which will need you to higher normalised table structure.
My recommendation - Adhere to SOLID principles even in DB design.

Related

One-To-Many relathionship task

I have a table Subject
It has many fields, two of them are code and flag.
Earlier those two fields was an idempotention key for rows in this table.
But, now I need one more option system.
There are tens of rows in Subject
And 4-7 systems.
What is a better way?
Create table System for systems and create cross-table of mapping sysytems on subjects (code and flag are still in table Subject)
Create one table of mapping without creating table System
Just add another column in table Subject
Create table System and add to the table Subject foreign key for table System?
So, It's all about database normalization.
And the third option is pretty bad.
As for me the better way is fourth option.
But, I can`t explain to yourself why this option is better than 1 and 2.
So, I read rules of database normalization. And as for me, the first option satisfies all rules too.
This is the reason why I am asking this question.
It is not typically a great idea to design a SQL schema to 1st or 2nd normal form; many databases use something at or near 3rd normal form however there many still be some relationships in 3rd normal form where dependencies exist where redundancy still exists. This can be addressed by Boyce–Codd Normal Form (Codd, 1974).
It is also not typical to see 4th normal form and less so 5th normal form and beyond due to challenges with data maintenance in a "living" database.
Let's put this another way.
If you find yourself creating NULLable values constraints on many columns consider a table to contain those in an organized fashion - for example an Address table for addresses with a linking table from say for example a Person to a PersonAddress linking table to that Address table where the PersonAddress linking table might even have an AdddressTypeId column which links to an AddressType table with rows for Address Type Postal and Address Type Street or Address Type Business. For another example consider email addresses where people have personal, family, business and other email address types; even multiple of the same type for different uses; a doctor with a business practice email and 2-3 hospital email addresses where the doctor practices.
Linking tables for those type scenarios are likely better than 3-4 or more email or postal address columns in one table where many after the first are nullable or perhaps redundant.
Review your data; consider if your Subject for example may link to multiple System or placing a SubjectId column may lead to duplicates of that ID for differing system rows. If it is always and forever a 1-1 relationship it may be OK but for a 1-n or n-n it may not be ok to have the id in the other table and a linking table may provide a good mechanism to link them.

Separate tables for name and email addresses?

I've been fighting my initial decision since I've began work on my database. I'm debating whether or not I need a separate table for email addresses. My database looks like this:
people(id, first_name, last_name, email)
addresses(id, address, street, city, state, zip, latitude, longitude)
addresses_people(id, person_id, address_id)
phone_numbers(id, person_id, phone_number, type)
I figured I didn't need a separate table for email addresses as I only wanted one per person regardless if they have more. The problem I'm seeming to have though is that some people will NOT have email addresses. Very often I will be storing children in the people table. Now it seems like it would be better design if I put the email addresses in a separate table to avoid the thousands of empty email fields I'll have.
It's a huge hassle to change this now as the app is already in production somewhat, but changing it now as opposed to a year or two from now would be exponentially easier. Is it worth a day or two to change the emails to another table?
In my opinion you are over-engineering. Optional email field is fine. Actually having a separate table might introduce much bigger overhead.
The only reason for a separate table is to model 1-N relationship if you expect the user to have more than one e-mail.
I believe there is no need to change it. Having some fields in a row be empty is not a big deal most of the time. In fact, it is common in most databases. The design of the database should depend on the objects you are trying to model primarily and not be concerned with separating things into tables simply for storage reasons unless you have serious storage constraints or other extenuating circumstances.
See Database design - empty fields which also shed a lot of light on your question.

Simple Database - Design issues

Just a homework question I am trying to figure out, I would appreciate some assistance.
Apparently, there are three problems with the design of this database design:
Account = {AccNumber, Type, Balance}
Customer = {CustID, FirstName, LastName, Address, AccNumber}
The one that is pretty obvious is that 'CustID' is useless if 'AccNumber' exists.
I am not quite sure about the second and third problem.
Is there a problem with a separate attribute for 'FirstName' and "LastName', cant we just use 'Name'?
And another option, if 'AccNumber' is the primary key (assuming CustID will be removed), it probably should be place in the beginning :
Such as:
Customer = {AccNumber, Name, Address}
Any input would be appreciated!
Thanks
The customer-account relationship, at first glance, appears to be a many-many relationship, which necessitates the use of an intermediary relationship table. For instance, I have three accounts of my own at my bank. In addition, my wife has two of her own. Finally, we have a shared account. The schema above could not well handle such relationships.
You could, indeed, just use "Name" - but you may need to know what the first or last names are at some point in the future and such a concatination can be quite problematic to split.
Good luck with your homework...
The problem is that you haven't presented us with what the database should represent in words; as it is now, there's nothing "wrong" with the design, since we don't know what the design is supposed to model.
I certainly wouldn't say that CustID is useless, as it serves as the primary key of the table. What you need to determine is the relationship between customers and accounts. It should be one of the following:
A single customer can be tied to multiple accounts, but a single account can be tied to a single customer
A single customer can be tied to only one account, but an account can be tied to multiple customers.
A single customer can be tied to multiple accounts, and a single account can be tied to multiple customers
Right now, with AccNumber in the Customer table, your design models #2.
How is is designed right now, each customer could only have one bank account.
The many-to-many relationship will be a problem. Instead, you might create a third table that holds the relationships. For example:
Account = {AccNumber, Type, Balance}
Connection = {ConnID, AccNumber, CustID}
Customer = {CustID, FirstName, LastName, Address}
This way, both Account and Customer are parented by Connection (for lack of a better name). You could query all connections with a certain AccNumber and find all the customers using that account, and vice versa.

Database Design: Explain this schema

Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Resources