Database Design for Expanding Lists - database

Admittedly, I am simply looking for some direction here. I have a specific situation, and being a novice in database design I am lost on how to begin tackling this problem. Let me start by explaining my situation.
I have a mysql table called contacts. As the name infers, it stores a list of contacts and the attributes that go along with each such as first name, last name, email, phone number etc. I would like users of my application to be able to add an unlimited amount of certain attributes for each contact. So, for instance rather than a contact having one phone number, the user could add another number, and another if they choose etc so essentially, a contact in my database can have as many phone numbers as the user needs. This will also be true for other fields in the table, but for the sake of simplicity let's just stick with phone number as an example.
So what is the best way to approach this? Should I have a separate table called contactsPhone and have a matching id column so that any number of rows in the phone table can be associated with one row in the contacts table? Or is there a way to store an ArrayList of some sort in the contacts table so I can have multiple phone numbers in just one field?

You should be looking at modelling something like this in a document database - a relational database is a poor choice for a flexible schema. You may be able to just have this specific portion of you data in a document database.
If you must, the common solution is the entity-attribute-value pattern - note that this requires multiple joins, makes ad-hock queries difficult and is generally slow.
Update:
I misread the question a bit - if you do know which attributes you want to hold multiple values and this list will not change (or not change much), entity-attribute-value may not be the best way forward.
A one-to-many table per each of these attributes will work (and is a standard relational solution for this kind of problem) - each such table will require a foreign key to your contacts table and a column to hold a single attribute value. This allows you to have multiple attribute values against a single contact.

I would like users of my application to be able to add an unlimited
amount of certain attributes for each contact. So, for instance rather
than a contact having one phone number, the user could add another
number, and another if they choose etc so essentially, a contact in my
database can have as many phone numbers as the user needs.
You're not describing an unlimited number of attributes for each contact. (That's a Good Thing.) You're describing an unlimited number of rows for a single attribute, in this case a contact's phone number.
So, yes, a table of contact phone numbers would work well. You might want to give some thought to how the user might want to identify phone numbers. For example, do they need to distinguish home phone numbers from work numbers and so on.

Related

How can I fit these 3 columns into one?

For a project, I have a database with some tables.
All of them are related between them. The table organization has a relationship with offer and user, etc.
However, I have some columns that only serve to link two tables together.
Take a look at the diagram. I use users_interests to link the user to his interests. Same thing for badges and group.
It doesn't really feel efficient. When I try to get the interests of a users, I must first go through requesting the user interests and from that, require the interest details.
Is there a way that I can request the interests of an user without having to go through a second table ?
https://dbdiagram.io/d/5da3f119ff5115114db53551
There's a couple things you should consider when looking at this question/functionality and what you're going to be doing with the data.
Do you have a unique set of interests that will remain the same or be added to, without duplicates?
Do you need to have multiple interests for a single user?
If you answered "yes" to both of these questions, then you should leave things the way they are. You'll be able to reuse the same interests entry for multiple users as well as each user being able to have multiple interests.
If you only want one interest per user, but want to keep a "master list" (aka: lookup table) of interests, then you can move the interest_id to the users table.
If you don't care about duplicates, but you still want each user to have multiple interests, move the name column to the users_interests table.
Finally, if you don't care about duplicates and you want each user to have a single interest, move the name column to the users table.
FYI, how you currently have the tables structured will likely take up less disk space for a large amount of users.
It's unclear what you're trying to do, so I included all options. Each option has it's own set of pros and cons, and each will be required of some other project or another at some point. You might want to learn the difference and reasoning behind why you would use one option over another now, so you don't have to think too hard about it later.
You can do it if you will make interests like this
id
user_id
name
but then you will lose uniqness of your interests, you will have many rows with the same name only because they connect to different users. Now db is perfectly ok according my understanding...It is as it should be

Bridge table in dimensional modeling

I am familiar with creating a bridge table between facts and dimension table.
Is it a good idea to create bridge table between dimension and its multidimensional attributes?
e.g., customer has multiple phone numbers. Can I just create a customer telephone dimension which has one to many relationship with customer dimension or is creating a bridge table advisable?
Answering specifically for the multiple phones example.
I usually try to avoid bridge tables as much as possible. They are a complication of design, and keeping things simple is a better approach (although not always possible, of course).
In case of the multiple phones per customer, I would create 2 attributes:
Primary Phone
Other Phones
The first attribute will contain a main customer phone and is mandatory.
The second attribute might contain one or more other phone numbers, concatenated into a delimited string (i.e., "415-111-1111, 415-222-2222"). Such design is acceptable because you (most likely) will use these extra phones only as a descriptive information in your reports. Also, most likely you will have a varying but reasonably limited number of such phones - let's say, 0-3 or so, which means that this attribute will be either empty or contain a reasonably short string.
The above design is simple and clean and works for most situations, unless you need to perform specific analytics on the phone numbers, or if there are too many of them and they must be all used. In cases like that, I would put them into a fact table ("Customer Phones"), which might contain:
Customer_ID
Phone_Profile_ID
Date
Phone Number
Phone_Profile is a dimension that should contain phone attributes, i.e, "Phone Type" {"Land Line", "Mobile"}, "Phone Use" {"Primary", "Secondary"}, etc.
Such fact table can also be a periodic snapshot (annual, monthly etc) of all customer phones and serve as a phone catalog. However, such elaborate designs are rarely needed (unless you design for a Call Center or similar phone-heavy application).

In Access database table, sequential field must be unique, but only when student ID matches between records

I am maintaining an Access Database for use with student admissions. I have a primary table which houses biographical information, and a secondary table which has application information, and allows for multiple applications per student (with each student having a unique student ID; that ID is stored in both tables and is how the applications are matched to the student).
Each application is assigned an "Application Number," and each student can only have one application with a specified number (i.e., student A cannot have two applications numbered "1", but can have 1, 2, and 3).
I would like to create a validation rule of some kind to prevent duplicates, but the whole column is not unique... it's only as it relates to the specified student.
Is there a way to create such a rule, or should I be arranging my data differently? I am open to making changes if it means a more efficient workflow.
I hope this makes sense... I wasn't sure how best to describe this. Thank you for any help.
If you are expecting the user doing the data entry to come up with a valid unique "application number", then the rule you are looking for would be a unique index on both StudentId and ApplicationNumber. (Remember, you can create an index which includes multiple columns.) This would mean that every pair of StudentId and ApplicationNumber must be unique.
However, I should note that requiring the user doing the data entry to have to come up with a unique application number by themselves is very user-unfriendly.
Consider the following alternatives:
Have the database suggest a unique application number. Or, better yet,
Do not even suggest any number while the application is being filled-in, but instead issue a unique application number at the moment that the application is submitted. Or, even better yet,
Stop storing application numbers in the database, and instead have the database calculate them, only when there is a need to display them, based on user id and date of data entry of the application. (Caveat: if a student has 3 applications, and application #2 gets deleted, then the old application #3 will be renumbered to #2, thus causing confusion. So, this will only work if deletion is disallowed.)

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Extendable database schema for contacts (social)

I have an old application that needs upgrading. Doesn't everything now days?
The existing DB schema consists of predefined fields like phone, fax, email. Obviously with the social explosion over the last 5-7 years (or longer depending on your country) end users need more control over creating contact cards the way they see fit rather than just what I think might be useful.
Im concerned here with "digital" addresses. i.e. One line type addresses. phone=ccc ccc ccc ccc etc
Since physical addresses are pretty standard in terms of requirements in this case users will have to use what they are given (location, postal, delivery) in order to keep the scope managable.
So I'm wondering what the best practice format for storing digital info is. To me it seems I have two choices:
A simple 4 field table (ContactId, AddressTypeId, Address, FormatterId)
1000, "phone", "ccc ccc ccc ccc", phoneformatter
1000, "facebook", "myfacebook", facebookformatter
This would then be JOINED anywhere it's need. The table would get massive though and the join performance would degrade over time i suspect.
A json blob that would require additional processing once read (ContactId, Addresses)
1000, {{"phone": "ccc ccc ccc ccc"}, {"facebook": "myfacebook"}}
Or ... something else.
This db is for use in a given country by customers only trading domestically with client bases ranging from 3000-12000 accounts and then however many contacts per account - averages about 10 in current system.
My primary concern is user flexibility but performance is a key consideration in that. So I dunno, just do whatever and throw heaps of hardware at it ;)
Application is in C# if that makes any difference re: post query processing.
I would not go for the JSON blob. This will be nasty if you need to answer any queries like:-
Does anyone have me in their Facebook contacts?
What's the most popular type of social media contact?
You would be forced to parse the JSON for every record and be unable to create a simple index.
Your additional solution is nearly correct, however FormatterId would need to be on a AddressType table. What you have is not normalised as FormatterId would depend only on AddressTypeId. So you would have three tables:-
Contact
ContactAddress
AddressType
You haven't stated if you need to store two addresses of the same type against a single contact. e.g. if someone has two twitter accounts. Answering this question will allow you to define the correct primary key on ContactAddress. It would either be (ContactId, AddressTypeId) if you can only have one of each type per contact or create a synthenic key (ContactAddressId).
Well, I believe you have a table named contact
contact(contactid, contact details, other details)
and now you want to remove this contact details from the contact table because the contact details may contain digital address, phone number and all.
But the table you are considering
(ContactId, AddressTypeId, Address, FormatterId) is not in normal form and you can't uniquely identify a tuple until you read all the four columns which is bad and in this case indexing also not going to help you.
So better if you have if separate table for each type of the digital address, and have indexing on contactID
facebookdetails(contactid, rest of the details)
phonedetails(contactid, rest of the details)
And then the query can be join of all the tables, it will not degrade the performance.
Hope this will help :)

Resources