Given a schema for a DVD rental store, should customers' phone numbers belong to the addresses table, or the users table, and why? Are there any benefits associated with one approach or the other?
Why do you even have an addresses table (unless you want more than one address for a given customer)?
You primary "client" is a customer. You don't rent DVDs to an address, you rent them to a person. You can't take a block of land to court when the occupant runs off with your prized "Free Willy" collectors edition trilogy.
In a world where a person only lived at one place, the address would be part of the customer table (and so would the phone in a one-phone-per-customer scenario).
If you want multiple addresses, that's fine, have a separate addresses table tying those addresses back to the customer.
But you should probably also have a similar setup for phones. Either allow up to N phone numbers per customer (with N columns) in the customers table, or (much better) have a separate phones table allowing any number of phone numbers per customer.
Something like:
customers:
cust_id
cust_stuff
addresses:
cust_id references customers(cust_id)
addr_seq_num
addr_stuff
phones:
cust_id references customers(cust_id)
phon_seq_num
phon_stuff
There's no one correct answer to this, except to say "it depends".
It really depends on what you're modelling with your database schema. Does a phone number logically belong to a user, or an address that could potentially be shared by multiple users?
Example - a mobile phone number might be tied to a particular person, and so be part of the users table. A land-line number might be tied to a particular location or residence, and so be part of the address.
Basic cases of information modeling :
Case A. Each customer can have more than one phone number.
In this case, phone number belongs in a separate table.
Case A1. It is not the case that a customer is required to have a phone number. i.e. the "relationship" is 1-1 to 0-n (i.e. assuming all phone number must always "be for" some customer). Nothing to do.
Case A2. It is the case that each customer is indeed required to have a phone number. You can model this as a relationship that is 1-1 to 1-n, but the "1" of the 1-n part is very hard to enforce in SQL systems (and in the cheapest of them, probably just impossible). That does not mean that you shouldn't be documenting the business rule properly as it is.
Case B. Each customer has AT MOST one phone number.
Case B1. Each customer is required to have a phone number. This means that each customer always has exactly one phone number. Phone number is best put in the customer table. (Note that "to have a phone number" means "to have a phone number THAT IS KNOWN TO THE STORE in question !)
Case B2. It is not required for a customer to have a phone number. In formal relational theory, it is required that you define a separate table which will hold only the known phone numbers. In informal modeling techniques such as ER and UML, you can model this as an "optional attribute". In SQL systems, many would define a nullable attribute for this.
As for "phone numbers 'belonging' to addresses" : is there any kind of "connection" between phone numbers and addresses that is relevant to your business ? I mean, let's say some customer has two addresses and two phone numbers. Is it important to know which of those two phone numbers belongs to which one of those two addresses ? What address would a cellphone number 'belong to' ?
Just an assumption about your site/app, but usually I'd say "Addresses", because user information tends to be info that you pull out frequently to run the site (ID, username, visits etc) whereas phone number may not be?
What do you mean by "traditional?" Since a user can have an arbitrarily large number of contact phone numbers (home, work, personal mobile, work mobile, fax, etc.), it seems like there should be a separate phone numbers table, each row of which includes a number and a value that says what type of number it is.
Contact information is notoriously difficult to model in a relational schema. In order to keep your sanity, I would advise that you make a minimum number of assuptions with respect to phone numbers. Allowing multiple phone numbers for one customer/account is good; beyond that it's hard to apply rules to phone numbers.
There is one well known exception: many pizza delivery shops use phone numbers as primary keys for customers. This works because in general there is one phone associated with the place to which one delivers pizza. On the other hand, many people no longer have land lines, so perhaps even that system is breaking down. In any case, I don't think this applies to DVD rental.
More than one customer may have the same phone number: perhaps multiple people from the same building buy from you.
What happens when you update the phone number for one customer; should it update that number for the other people that supposedly share that number?
Yes if the address for both parties is still the same (but that might actually break some privacy laws).
No if there are now two different addresses.
Related
I'm trying to better understand designing a database schema. After reviewing the solution for a problem that I'm working on, I don't understand why the solution chooses to use an aggregation for the attributes "address" and "phone number" for a given "musician". Here are the specifications, I'm only interested in bullet point 1:
Each musician that records at Notown has an SSN, a name, an address, and a phone
number. Poorly paid musicians often share the same address, and no address has more
than one phone.
Each instrument used in songs recorded at Notown has a name (e.g., guitar, synthesizer,
flute) and a musical key (e.g., C, B-flat, E-flat).
Each album recorded on the Notown label has a title, a copyright date, a format (e.g.,
CD or MC), and an album identifier.
Each song recorded at Notown has a title and an author.
Each musician may play several instruments, and a given instrument may be played by
several musicians.
Each album has a number of songs on it, but no song may appear on more than one
album.
Each song is performed by one or more musicians, and a musician may perform a number
of songs.
Each album has exactly one musician who acts as its producer. A musician may produce several albums, of course.
Here is a solution that I found:
The ER Diagram I created looks almost exactly the same, except for the fact that I made "address" and "phone number" attributes of "musician" instead of giving each of them an entity set of their own, creating a relationship, and turning it into an aggregation. I don't understand why this would be done in this situation. Can anyone explain?? Thank you!
I'm not able to see the image you linked to, but anyway...
no address has more than one phone
This means we should make the phone number an attribute of the address - unless we want to allow for multiple phones per address in the future.
So it would not be completely wrong to make phones a table. But then, we know little about the future. Would there be multiple musicians sharing the same address and the same phones? (I.e. the phone number would be linked to an address.) Or would there be multiple musicians sharing the same address, but each would have their own phone? (I.e. the phone number would be linked to a musician. To use a phone table and link the phones to musicians, however, would only be necessary if a musician could have multiple phone numbers. Otherwise we'd still not make a phone table, but rather make the phone a musician's attribute.)
poorly paid musicians often share the same address
This means we make the address a table of its own. Thus there is only one row to change in case the phone number or some other attribute changes. If we made the address number a musician's attribute instead, we'd store the address redundantly and could get inconsistent data (e.g. same address, but different phone numbers).
A possible data model:
address (address_id, street, city, phone, ...)
musician (musician_id, ssn, name, address_id, ...)
This is a 1:n relation. A musician has one address; an address can belong to multiple musicians.
The primary purpose of database normalization is to make it more difficult for anomalous data to get into the database. Reading the first bullet point, we see that each address may have zero or one phone numbers associated with it. In other words, the phone number is an attribute of/identified by the address. Which normalization level does this violate?
To illustrate how not normalizing the address fields (including phone number) increases the chances of anomalous data, let's say you have four students staying at that address. This means you have four rows where the address data exists. Suppose the phone number changes. You have to make sure you change all four versions of the data. I said there were four students, but suppose there are actually five and I just missed one? Or suppose you found only three when you went to make the change? An address may have at most one phone number however now you have several copies of the same address but with different phone numbers. This is anomalous data.
If this data is normalized, you would have only one copy to change. Since this data is referenced by all the students who live there, no matter how many, this change is "propagated" to all of them. The integrity of the data is maintained.
I am familiar with creating a bridge table between facts and dimension table.
Is it a good idea to create bridge table between dimension and its multidimensional attributes?
e.g., customer has multiple phone numbers. Can I just create a customer telephone dimension which has one to many relationship with customer dimension or is creating a bridge table advisable?
Answering specifically for the multiple phones example.
I usually try to avoid bridge tables as much as possible. They are a complication of design, and keeping things simple is a better approach (although not always possible, of course).
In case of the multiple phones per customer, I would create 2 attributes:
Primary Phone
Other Phones
The first attribute will contain a main customer phone and is mandatory.
The second attribute might contain one or more other phone numbers, concatenated into a delimited string (i.e., "415-111-1111, 415-222-2222"). Such design is acceptable because you (most likely) will use these extra phones only as a descriptive information in your reports. Also, most likely you will have a varying but reasonably limited number of such phones - let's say, 0-3 or so, which means that this attribute will be either empty or contain a reasonably short string.
The above design is simple and clean and works for most situations, unless you need to perform specific analytics on the phone numbers, or if there are too many of them and they must be all used. In cases like that, I would put them into a fact table ("Customer Phones"), which might contain:
Customer_ID
Phone_Profile_ID
Date
Phone Number
Phone_Profile is a dimension that should contain phone attributes, i.e, "Phone Type" {"Land Line", "Mobile"}, "Phone Use" {"Primary", "Secondary"}, etc.
Such fact table can also be a periodic snapshot (annual, monthly etc) of all customer phones and serve as a phone catalog. However, such elaborate designs are rarely needed (unless you design for a Call Center or similar phone-heavy application).
I have an old application that needs upgrading. Doesn't everything now days?
The existing DB schema consists of predefined fields like phone, fax, email. Obviously with the social explosion over the last 5-7 years (or longer depending on your country) end users need more control over creating contact cards the way they see fit rather than just what I think might be useful.
Im concerned here with "digital" addresses. i.e. One line type addresses. phone=ccc ccc ccc ccc etc
Since physical addresses are pretty standard in terms of requirements in this case users will have to use what they are given (location, postal, delivery) in order to keep the scope managable.
So I'm wondering what the best practice format for storing digital info is. To me it seems I have two choices:
A simple 4 field table (ContactId, AddressTypeId, Address, FormatterId)
1000, "phone", "ccc ccc ccc ccc", phoneformatter
1000, "facebook", "myfacebook", facebookformatter
This would then be JOINED anywhere it's need. The table would get massive though and the join performance would degrade over time i suspect.
A json blob that would require additional processing once read (ContactId, Addresses)
1000, {{"phone": "ccc ccc ccc ccc"}, {"facebook": "myfacebook"}}
Or ... something else.
This db is for use in a given country by customers only trading domestically with client bases ranging from 3000-12000 accounts and then however many contacts per account - averages about 10 in current system.
My primary concern is user flexibility but performance is a key consideration in that. So I dunno, just do whatever and throw heaps of hardware at it ;)
Application is in C# if that makes any difference re: post query processing.
I would not go for the JSON blob. This will be nasty if you need to answer any queries like:-
Does anyone have me in their Facebook contacts?
What's the most popular type of social media contact?
You would be forced to parse the JSON for every record and be unable to create a simple index.
Your additional solution is nearly correct, however FormatterId would need to be on a AddressType table. What you have is not normalised as FormatterId would depend only on AddressTypeId. So you would have three tables:-
Contact
ContactAddress
AddressType
You haven't stated if you need to store two addresses of the same type against a single contact. e.g. if someone has two twitter accounts. Answering this question will allow you to define the correct primary key on ContactAddress. It would either be (ContactId, AddressTypeId) if you can only have one of each type per contact or create a synthenic key (ContactAddressId).
Well, I believe you have a table named contact
contact(contactid, contact details, other details)
and now you want to remove this contact details from the contact table because the contact details may contain digital address, phone number and all.
But the table you are considering
(ContactId, AddressTypeId, Address, FormatterId) is not in normal form and you can't uniquely identify a tuple until you read all the four columns which is bad and in this case indexing also not going to help you.
So better if you have if separate table for each type of the digital address, and have indexing on contactID
facebookdetails(contactid, rest of the details)
phonedetails(contactid, rest of the details)
And then the query can be join of all the tables, it will not degrade the performance.
Hope this will help :)
Admittedly, I am simply looking for some direction here. I have a specific situation, and being a novice in database design I am lost on how to begin tackling this problem. Let me start by explaining my situation.
I have a mysql table called contacts. As the name infers, it stores a list of contacts and the attributes that go along with each such as first name, last name, email, phone number etc. I would like users of my application to be able to add an unlimited amount of certain attributes for each contact. So, for instance rather than a contact having one phone number, the user could add another number, and another if they choose etc so essentially, a contact in my database can have as many phone numbers as the user needs. This will also be true for other fields in the table, but for the sake of simplicity let's just stick with phone number as an example.
So what is the best way to approach this? Should I have a separate table called contactsPhone and have a matching id column so that any number of rows in the phone table can be associated with one row in the contacts table? Or is there a way to store an ArrayList of some sort in the contacts table so I can have multiple phone numbers in just one field?
You should be looking at modelling something like this in a document database - a relational database is a poor choice for a flexible schema. You may be able to just have this specific portion of you data in a document database.
If you must, the common solution is the entity-attribute-value pattern - note that this requires multiple joins, makes ad-hock queries difficult and is generally slow.
Update:
I misread the question a bit - if you do know which attributes you want to hold multiple values and this list will not change (or not change much), entity-attribute-value may not be the best way forward.
A one-to-many table per each of these attributes will work (and is a standard relational solution for this kind of problem) - each such table will require a foreign key to your contacts table and a column to hold a single attribute value. This allows you to have multiple attribute values against a single contact.
I would like users of my application to be able to add an unlimited
amount of certain attributes for each contact. So, for instance rather
than a contact having one phone number, the user could add another
number, and another if they choose etc so essentially, a contact in my
database can have as many phone numbers as the user needs.
You're not describing an unlimited number of attributes for each contact. (That's a Good Thing.) You're describing an unlimited number of rows for a single attribute, in this case a contact's phone number.
So, yes, a table of contact phone numbers would work well. You might want to give some thought to how the user might want to identify phone numbers. For example, do they need to distinguish home phone numbers from work numbers and so on.
I need to implement one user ID for one user. After looking around, closest I can see is using mobile phone numbers to text. Assuming this is best available method How to handle mobile number recycling by telephone companies? Any ideas?
Following is best one that attempted to answer on how to implement one ID for one person-http://stackoverflow.com/questions/5964664/account-verification-only-1-account-per-person
Thank you.
Back when I worked for a marketing client, our business was based on determining whether two people were the same - "our records show Bob Smith. Is the Robert Smith who responded the same as Bob Smith?"
Our data could be segmented into name, method of contact (address, phone, email) and relationship (Alonzo works for Square Tires Inc). Once the data had been scrubbed/standardized, we would then compare that entity to possible matches in the system. Matching on a single segment was insufficient to categorize as a match. We required two different segment matches for an entity to be matched to another. This was important as matching phone and address might yield different members of the same household. Depending on your criteria, that might be sufficient but generally it is not.
With all that said, I don't think you have enough information to make a semi-authoritative guess as to whether phone number 1234567 is the original user or a person who has had their phone number reassigned. I would attempt to capture more personally identifiable information about the entity registering the phone number. The registration of an existing phone number with different PII would not cause the original holder of the number to become invalidated, it merely signifies that a new user has come onboard. Until you get enough information back about the original holder of that number having a new method of contact, you should probably retain the association to the phone number. It could be the case the second registrant supplied a bogus number or either applicant fat-fingered their number and you wouldn't know which was accurate.
You can still run into issues with say corporate issued mobile phones. A company has a pool of phones they distribute to folks. People quit and get hired and those phone numbers remain the domain of the company but are assigned to people that then register for your service.
Hope this helps give you some starting points for solving the problem.