Designing a database with several different kinds of products? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
As part of a recent project I have started planning out, I am required to build the structure of a database which will contain several products. As an example, think of the way Amazon is structured. It has several categories and within those categories, several sub-categories.
My problem is that conceptually I am unsure on how to build the database tables. I have thought of creating a self-referencing table for the categories and sub-categories, but since I do plan to have a wide variety of products within the database, I don't know if I should just group them into one table called "Products" or put them all in separate tables.
For example, a toilet would be one product while a television could be another. Even though they have different categories/sub-categories, they are both products. By placing them in one "Products" table, they would share attributes that would make no sense for both of them. A toilet would not need an attribute for resolution or display size(unless it is a very special toilet?) and a television wouldn't need a seat size attribute.
I thought that one to get around this and still keep everything in one table would be to create a bunch of NOT NULL attributes that could be missing for certain items if they weren't necessary, but common sense is telling me that this is probably not the best way to go about things.
So at this point, I feel that my real problem is figuring out how to structure this database and its tables with several categories/sub-categories and different kinds of items. Would I create a table for televisions and a table for toilets? How would this all be structured? How are these sort of problems normally planned out?
Thanks

A generic products table is a good way to go. You're not going to want to create a new table in your schema every time you have a new type of product.
Similar with the categories, a self referencing table is better with a parent/child relationship so you don't have to create a new table each time you want a new level of sub-category.
Your products table should contain information that's common amongst all your products. E.g. name and possibly price (although if you have different prices for an individual product, then price is best stored in another table that references the product).
If you have a bunch of other information that relates to characteristics for each product, then maybe create an attributes table and another table that references each attribute's value for that product.
Here's a simple example schema:

This is more of a design decision than anything else.
This is how I would separate the tables:
categories (e.g. household)
sub_categories (e.g. bathroom is a foreign key of household)
products (e.g. Ceramic toilet)
As for the extra attributes, you can either store these directly within the products table or create another table called products_extra_attributes and store an optional NULL value within the products table which would be a foreign key pointing toward the additional attributes for the individual product.
Make sense? I'll make an edit later on if not as I'm answering this question from my phone.

Depends on how many products. If you only sold toilets and televisions I'd say go ahead and make totally separate tables for them, however if you have 100s of different product types all of which would have different attributes I might suggest creating a products table that stored common attributes (they all have a cost and, probably, a size) then a product type table that specifies a set of attributes for each product type, then a attributes table to define the attributes and lat a product values table.
So for example, take a Sony TV. It would be in products with the price and a link to the product type, which would be TV. That would one to many join to attributes that all TVs had and Sony TV would have entries in the product values for each of those attributes. This way, you wouldn't have to redefine shared attributes, so when you started selling other things that had resolution, you could just add them to the product type.
Make sense?

Related

A Master Category Table Where Records Have Various Categories OR There Should Be A Table For Each Category Type

Recently I encountered an application, Where a Master Table is maintained which contain the data of more than 20 categories. For e.g. it has some categories named as Country,State and City.
So my question is, it is better to move out this category as a separate table and fetching out the data through joins or Everything should be inside a single table.
P.S. In future categories count might increase to 50+ or more than it.
P.S. application based on EF6 + Sql Server.
Edited Version
I just want to know that in above scenario what should be the best approach, one should go with single table with proper indexing or go by the DB normalization approach, putting each category into a separate Table and maintaning relationship through fk's.
Normally, categories are put into separate tables. This conforms more closely with normalized database structures and the definition of entities. In particular, it allows for proper foreign key relationships to be defined. That is a big win for data integrity.
Sometimes categories are put into a single table. This can, of course, be confusing; consider, for instance, "Florida, Massachusetts" or "Washington, Iowa" (these are real places).
Putting categories in one table has one major advantage: all the text is in a single location. That can be very handy for internationalization efforts. To be honest, that is the situation where I have seen this used.

How to design a database table from a form? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm learning how to design databases, and i've been ask to create the table that will hold this form: Medical History I'm learning to use Django/Python i've already made the markup in HTML and CSS, but I don't think that making each question on the form an column would be the best approach. For example in the family history i've thought of making it a separate table, while in the review of systems i want to make each to be a set.
A pragmatic approach is to define tables based on the following criteria:
1) easy to select data from them (not to obtain many JOINs or convoluted queries that require ORs or strings splitting)
2) easy to understand (each concept maps to one table)
=> usually, normalized structures do the trick here
Of course, above are challenged in high transactional environments (INSERTs, UPDATEs, DELETEs).
I would assume then your case has moderate INSERTs, but more SELECTs (reports).
For Family history section I would normalize everything:
DiseaseType
DiseaseTypeId
Code -- use to separate from a name that can change in time
Name -- breast cancer, colon cancer etc.
CollateralOption
CollateralOptionId
Code -- I would put UNIQUE constraints on Codes and Names
Name -- no, yes, father
FamilyHistory
FamilyHistoryId INT PK IDENTITY -- this may be missing, but I prefer if I use an ORM
PatientId -> FK -> Patient
DiseaseTypeId -> FK -> DiseaseType
CollateralOptionId FK -> CollateralOption
Checked BIT -- you may not define this and have records for Checked ones.
-- having this may put some storage pressure
-- but prevent some "stuffing" in the queries
These structures allow to easily COUNT number of patients with colon cancer cases in their family, for example.
Shortly put: if there is not serious reason against it, go for normalized structures.
I don't see any advantage to perform any design tricks on this data structure. Yes, making a boolean attribute of each of your checkboxes, and a string attribute of each of your free texts, will lead to a high number of attributes in one table. But this is just the logical structure of your data. All these attributes are dependent on the key, some person id, (or at least that's what I assume, as a medical layman). Also, I assume that they are independent of each other, i.e. not determined by some other combination of attributes. So they go to the same table. Putting them on several tables won't gain anything, but will force you to do lots of joins if you query on different types of attributes (like all patients whose mother had breast cancer and who now have breast lumps).
I don't know exactly what you mean by making sets of some attributes. Do you mean to have just one attribute, and encode the sequence of boolean values e.g. in one integer, like 5 for yes-no-yes? Again that's not worth the trouble, as it won't save any space or whatever, but will make queries more complicated.
If you are still in doubt, try to formulate the most frequent use cases for those data, which will probably be typical queries on combinations of these attributes. Then we might see whether a different structure would make your life easier.

One-To-Many join table to avoid nullable columns [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I'M wondering myself whether am I the first programmer struggling with this problem, but i can't find anything in SO about this.
Point of my question, is it a good idea to make a One-To-Many join table, in order to prevent NULL references.
Let's explain, in our business requirements, we have some activities that causes a payment, i.e. sales, loans, rents, services etc. each activity can have zero or one or more payments.
When designing the DB, we have tables for each activity, Sales – Loans – Rents - Services etc, and a Payment table. The relation between the activities and the payments are one to many, each loan can have many payments, and each rent can have many payments.
But there is a problem, each payment can be a loan or a sale or any other activity, we need to relate it to its corresponding activity. I think about two options:
1) Add some Foreign keys in the Payments table for each kind of activity, LoanID - RentID - ServiceID etc. And make them Nullable, due to a loan is neither a service nor a rent.
I personally don't like this solution, it is very error prone, man can very easy forgot to add the matching FK due to it is Nullable, and then we don't know what this payment is about, we lose the Referential integrity. Although it is possible to overcome this problem by creating some constraint to ensure that there are Neither more nor less than one FK, but it is not so easy to create the right constraint and take into account all possible options, and it is hard to recreate the constraint when adding new FK columns.
Needless to say about the ugliness of such a table. Don't speak about the main issue of letting unnecessary nullable columns in a table.
2) A second solution, to create join tables in between for each kind of activity, called ActivityPayments i.e. LoanPayments etc., that holds the activity ID and the payment ID, like Many-To-Many table.
There aren’t the problems described above, each payment is related to its corresponding activity, there are no referential integrity loss, no Nullable columns.
The problem is however that it enlarges the Database, and adds another layer between the tables, and needs more work when joining in queries.
Has someone any idea?
Another option is to create a supertype table, say Activity, with all of the common attributes:
This should keep the number of tables small, and still allow you to identify the activity type for a payment. Note that this assumes that common attributes exist between the different activities. If that is not the case, the second option you listed is probably the way to go.
Look up the following tags in SO.
single-table-inheritance
class-table-inheritance
shared-primary-key
The info tab on these tags gives you a brief explanation, and the questions grouped under the tag will give you some examples.
Single table inheritance is similar you the solution you presented, and that you are unhappy with. Yes, it does involve NULLS. Generally, user errors here are prevented by the application.
Class-table-inheritance is like the solution offered by AMS. Note that SalesID and LoanID are listed as both a PK and an FK. This hints at the technique of shared primary key. With this, SalesID and LoanID are copies of a value in ActivityID. Again, it's the application layer that does the necessary work to mke sure the copies are right.
in this specific case (not necessarily applicable in similiar situations), we usualy calculate dynamically, in a view/function, each payment for what it was (in chronological order)
in other instances we had one sale table where each product can be a physical product or service or any other for-pay offer. so that limits all debit transactions to one tbale
HTA

design a database for a shopping-cart application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have never designed a database/data-model/schema from scratch, especially for a web-application.
In some recent job interviews, i was asked to 'design' a database for a shopping cart application. Now i am working on a mobile shopping application (retail, uses phonegap) with a backend that needs to store and process product and order info. The scale of this problem is so huge, i don't know where to start. I was hoping for some advise on -
How should I approach such a problem (shopping cart application DB) ? where should i start ?
Are there any common mistakes/pitfalls that i should avoid ?
What optimization/efficiency paradigms should i keep in mind when designing such a DB ?
How should i go about identifying entities in the problem space (products, orders, etc)? how should i derive the relationships between them ?
When a interviewer asks such a question, what exactly is he looking for ? is there something i should/should not say ?
I should also clarify that -
Yes, I am a noob, and my motives are to learn database design AND prepare for upcoming job interviews. I have read DBMS books where they describe individual concepts in detail, but i have no clue how to put those things together and start designing a database.
I have seen other threads on database design. The authors already tend to posses some knowledge on how to break the problem down. i would like to understand the methodology behind doing that.
Links to outside resources, comments, suggestions and anything that will put me on the right track is much appreciated. I hope this thread serves as a learning experience for myself and others.
There can be five tables in database:
CATEGORY this table stores information about products categories of your store and categories hierarchy.parent field of this table stores ID of the parent category.
PRODUCT all products of your store are stored in this table. This table has a foreign key categoryID which identifies ID of the category to which a product belongs.
ORDER this table stores information about all orders made by visitors of your store.
ORDERED_SHOPPING_CART table is tightly connected with PRODUCT and ORDER tables; stores information on customers' orders content.
SPECIAL_OFFER table contains a list of products, which are shown on home page as special offers
A brief answer is the way that i would tackle this problem. Firstly, there are loads of open source or free, web based shopping carts. This means that you can get one, set up the database and then have a good look around what they did.
Ask yourself questions such as, why have they done that? Why is it good? What downside could there be? How would i do it differently? why?
I would try to procure a database design tool that allows you to visualize the database. (like database designer in visual studio or i have one from MicroOlap that does pgsql databases)
Then you need to think about what you need in the database. What is the customer going to do? Buy products! Therefore you need a products table. Without going down the whole route you can see the point. Imagine what is needed, then basically make a table for it.
If you have more than one option for a field in a table, make another table with a relation in it. So if you have a product table and you have a status field. you could have more than one status. (eg out of stock, limited number, big item, expensive) instead of hard coding these fields, make a table and allow the user to add items to the table. then in the product table add a field status_id and link it to the status table
Many - many relationships are useful things to know. (i fell short to this myself.) say you have a component and product tables. The products can be made up of lots of components and the components could be allocated to many products. Create a mediator table. Something like prodcomp( and in this you would have fields like id, prod_id, comp_id, qtyneeded).
Learn to index correctly.
Don't create the database until you have a solid idea of how it will work. this saves time in recreating it later.
There may be more to this, however, i hope i have given you a good start.

When to split up models into multiple database tables? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm working with Ruby on Rails, but this question I think is broader than that and applies to database design generally.
When is it a good idea to split a single model up into multiple tables? For example, assume I have a User model, and the number of fields in the model is really starting to add up. For example, the User can enter his website, his birthday, his time zone, his etc etc.
Is there any advantage or disadvantage to splitting up the model, such that maybe the User table only has basic info like login and email, and then there is another table that every User has that is something like UserInfo, and another that is UserPermissions, and another that is UserPrivacySettings or something like that?
Edit: To add additional gloss on this, most of the fields are rarely accessed, except on pages specific to them. For example, things like birthday are only ever accessed if someone clicks through to a User's profile. Furthermore, some of the fields (which are rarely accessed) have the potential to be extremely large. Most of the fields have the potential to be either set to blank or nil.
Generally it is a good idea to put things which have a one-to-one relationship in the same table. Unless your userbase includes the Queen or Paddington Bear, a user has just one birthday, so that should be an attribute of the USERS table. Things which have a one-to-many relationship should be in separate tables. So, if a user can have multiple privacy settings by all means split them out.
Splitting one table into several tables can make queries more complicated or slower, if we want to retrieve all the user's information at once. On the other hand if we have a set of attributes which is only ever queried or updated in a discrete fashion then having a separate table to hold that data is a sound idea.
This would be a situation for analysis.
When you find that a lot of the fields in such a table are NULLs, and can be grouped together (eg. UserContactInfo), it is time to look at extracting the information to its own table.
You want to avoid having a table with tens/hundreds of fields with only sparsely entered data.
Rather try to group the data logically, and crete the main table containging the fields that are mostly all populated. Then you can create subsets of data, almost as you would represent them on the UI, (Contact Info, Personal Interest, Work Related Info, etc) into seperate tables.
Retrieving a row is more expensive if it has many columns, especially if you usually need just some of the fields. Also, hosting stuff such as the components of an address in a separate class is a case of DRY. On the other hand, if you do need all fields of an object, it takes longer to execute a compound query.
I would normally not bother to distribute classes over several tables just to make the code more readable (i.e. without actually reusable parts like addresses).

Resources