Database Design Structure | Product & ProductKids - database

What would be the best way to design a mysql table, which can be filled with HTML (+PHP) form.
Actually I have this structure:
Table PRODUCT
ID | Ordernumber | Name | Desc | Price
Table PRODUCT_KIDS
ID | MasterProductID | Ordernumber | Price
The only difference between my 2 tables are Name and Desc.
The ADD HTML form looks like this:
DATA FOR PRODUCT
Ordernumber
Name
Desc
Price
DATA FOR PRODUCT_KIDS
Ordnernumber
Price
For some reasons, the customers want for example only 2 PRODUCT_KIDS without their MasterProduct. In this case I need the Name and the Description from the Master-Product.
My questions are the following:
Should I merge these two tables together? Is this the best way to search for something?
When I merge these 2 tables, should I save the Name & Desc for the PRODUCT_KIDS as well (for the example above)?

what would be the best way to design a mysql table which can be filled with HTML (+PHP) form.
Who cares? Contrary to popular believe, MySql is JUST ANOTHER relational database and PHP just another web scripting technology. The answer is the same whether you use mysql + php or oracle + java or sql server + asp.net.
The principles of relational database design apply to any relational database. As such, the question is not related to MySql and particularly not at all to PHP.
Table PRODUCT
Table PRODUCT_KIDS
This is a simplistic view on the topic that leaves out a lot of even legal items, such as possible international taxation, different shipping codes and their prices (not all items can be combined in one shipment) and for example customization of items in general - I remember writing a shop where PIPES where sold, in custom lengths ;) And some items requires separate shipping ;)
"The Data Model Resosource Book", Volume 1, discusses standard enterprise scnearios in great depth - including address management (not as simple as most people do it), accounting and.... the whole shop blabla (storage, inventory, pricing). BOTH (!) of your approaches are simplistic and would be totally illegal in my juristidciton because it would not take into acocunt legal requirments for properly tracking taxation on various products.
I can only suggest gtting it - they also go in great depth on some of the industry level particularities. For example ;) - Apparel. Make a shop for cloths and you go nuts on "variatns in size AND color for the same product". Your approach would result in a "shirt" possibly having 200 children (sizes * colors) ;)
I would suggest: back to the drawing board. With a good book ;) I personally loved reading this book - hm - a long time ago.

Related

What is the best way to indicate a value that has special handling in a database table?

The setup.
I have a table that stores a list of physical items for a game. Items also have a hierarchical list of categories. Example base table:
Items
id | parent_id | is_category | name | description
-- | --------- | ----------- | ------- | -----------
1 | 0 | 1 | Weapon | Something intended to cause damage
2 | 1 | 1 | Ranged | Attack from a distance
3 | 1 | 1 | Melee | Must be able to reach the target with arm
4 | 2 | 0 | Musket | Shoots hot lead.
5 | 2 | 0 | Bomb | Fire damage over area
6 | 0 | 1 | Mount | Something that carries a load.
7 | 6 | 0 | Horse | It has no name.
8 | 6 | 0 | Donkey | Don't assume or you become one.
The system is currently running on PHP and SQLite but the database back-end is flexible and may use MySQL and the front-end may eventually use javascript or Object-C/Swift
The problem.
In the sample above the program must have a different special handling for each of the top level categories and the items underneath them. e.g. Weapon and Mount are sold by different merchants, weapons may be carried while a mount cannot.
What is the best way to flag the top level tiers in code for special handling?
While the top level categories are relatively fixed I would like to keep them in the DB so it is easier to generate the full hierarchy for visualization using a single (recursive) function.
Nearly all foreign keys that identify an item may also identify an item category so separating them into different tables seemed very clunky.
My thoughts.
I can use a string match on the name and store the id in an internal constant upon first execution. An ugly solution at best that I would like to avoid.
I can store the id in an internal constant at install time. better but still not quite what I prefer.
I can store an array in code of the top level elements instead of putting them in the table. This creates a lot of complications like how does a child point to the top level parent. Another id would have to be added to the table that is used by like 100 of the 10K rows.
I can store an array in code and enable identity insert at install time to add the top level elements sharing the identity of the static array. Probably my best idea but I don't really like the idea of identity insert it just doesn't feel "database" to me. Also what if a new top level item appears. Maybe start the ids at 1Million for these categories?
I can add a flag column "varchar(1) top_category" or "int top_category" with a character or bit-map indicating the value. Again a column used on like 10 of 10k rows.
As a software person I tend to fine software solutions so I'm curious if their is a more DB type solution out there.
Original table, with a join to actions.
Yes, you can put everything in a single table. You'd just need to establish unique rows for every scenario. This sqlfiddle gives you an example... but IMO it starts to become difficult to make sense of. This doesn't take care of all scenarios, due to not being able to do full joins (just a limitation of sqlfiddle that is awesome otherwise.)
IMO, breaking things out into tables makes more sense. Here's another example of how I'd start to approach a schema design for some of the scenarios you described.
The base tables themselve look clunky, but it gives so much more flexibility of how the data is used.
tl;dr analogy ahead
A datase isn't a list of outfits, organized in rows. It's where you store the cothes that make up an outfit.
So the clunky feel of breaking things out into separate tables, is actually the benefit of relational datbases. Putting everything into a single table feels efficient and optimized at first... but as you expand complexity... it starts to become a pain.
Think of your schema as a dresser. Drawers are you tables. If you only have a few socks and underware, putting them all in one drawer is efficient. But once you get enough socks, it can become a pain to have them all in the same drawer as your underware. You have dress socks, crew socks, ankle socks, furry socks. So you put them in another drawer. Once you have shirts, shorts, pants, you start putting them in drawers too.
The drive for putting all data into a single table is often driven by how you intend to use the data.
Assuming your dresser is fully stocked and neatly organized, you have several potential unique outfits; all neatly organized in your dresser. You just need to put them together. Select and Joins are you you would assemble those outfits. The fact that your favorite jean/t-shirt/sock combo isn't all in one drawer doesn't make it clunky or inefficient. The fact that they are separated and organized allows you to:
1. Quickly know where to get each item
2. See potential other new favorite combos
3. Quickly see what you have of each component of your outfit
There's nothing wrong with choosing to think of outfit first, then how you will put it away later. If you only have one outfit, putting everything in one drawer is way easier than putting each pieace in a separate drawer. However, as you expand your wardrobe, the single drawer for everything starts to become inefficient.
You typically want to plan for expansion and versatility. Your program can put the data together however you need it. A well organized schema can do that for you. Whether you use an ORM and do model driven data storage; or start with the schema, and then build models based on the schema; the more complex you data requirements become; the more similar both approaches become.
A relational database is meant to store entities in tables that relate to each other. Very often you'll see examples of a company database consisting of departments, employees, jobs, etc. or of stores holding products, clients, orders, and suppliers.
It is very easy to query such database and for example get all employees that have a certain job in a particular department:
select *
from employees
where job_id = (select id from job where name = 'accountant')
and dept_id = select id from departments where name = 'buying');
You on the other hand have only one table containing "things". One row can relate to another meaning "is of type". You could call this table "something". And were it about company data, we would get the job thus:
select *
from something
where description = 'accountant'
and parent_id = (select id from something where description = 'job');
and the department thus:
select *
from something
where description = 'buying'
and parent_id = (select id from something where description = 'department');
These two would still have to be related by persons working in a department in a job. A mere "is type of" doesn't suffice then. The short query I've shown above would become quite big and complex with your type of database. Imagine the same with a more complicated query.
And your app would either not know anything about what it's selecting (well, it would know it's something which is of some type and another something that is of some type and the person (if you go so far as to introduce a person table) is connected somehow with these two things), or it would have to know what description "department" means and what description "job" means.
Your database is blind. It doesn't know what a "something" is. If you make a programming mistake some time (most of us do), you may even store wrong relations (A Donkey is of type Musket and hence "shoots hot lead" while you can ride it) and your app may crash at one point or another not able to deal with a query result.
Don't you want your app to know what a weapon is and what a mount is? That a weapon enables you to fight and a mount enables you to travel? So why make this a secret? Do you think you gain flexibility? Well, then add food to your table without altering the app. What will the app do with this information? You see, you must code this anyway.
Separate entity from data. Your entities are weapons and mounts so far. These should be tables. Then you have instances (rows) of these entities that have certain attributes. A bomb is a weapon with a certain range for instance.
Tables could look like this:
person (person_id, name, strength_points, ...)
weapon (weapon_id, name, range_from, range_to, weight, force_points, ...)
person_weapon(person_id, weapon_id)
mount (mount_id, name, speed, endurance, ...)
person_mount(person_id, mount_id)
food (food_id, name, weight, energy_points, ...)
person_food (person_id, food_id)
armor (armor_id, name, protection_points, ...)
person_armor <= a table for m:n or a mere person.id_armor for 1:n
...
This is just an example, but it shows clearly what entities your app is dealing with. It knows weapons and food are something the person carries, so these can only have a maximum total weight for a person. A mount is something to use for transport and can make a person move faster (or carry weight, if your app and tables allow for that). Etc.

What is the correct database structure to store historical data?

I am designing a database in Sqlite that is intended to help facilitate the creation of predictive algorithms for FIRST Robotics Competition. On the surface, things looked pretty easy, but I'm struggling with a single issue: How to store past ratings of a team. I have looked over previous questions pertaining to how historical data is stored, but I'm not sure any of it applies well to my situation (though it could certainly be that I just don't understand it well enough).
Each team has an individual rating, and after each match that team participates in the rating gets revised. Now, there are several ways I could go about storing them, but none of them seem particularly good. I'll go through the ones that I have thought through, in no particular order.
Option 1:
Each Team has it's own table.It would include the match_id and the rating after the match was done, and could possibly also include the rating before. The problem is, that there would be bordering on 10,000 tables. I'm pretty sure that's inefficient, especially considering I believe that it's also unnormalized (correct me if I'm wrong).
Table Name: Team_id
match_id | rating_after
Option 2:
Historical rating ratings for each team or stored in the match table, and current ratings are stored in the team table. A simplified version of the team table looks like this:
Table : Team_list
team_id | team_name | team_rating
That isn't really the problem, the problem is with the historical data. The historical data would be stored with the match. Likely, it would be each teams rating before the match.
The problems I have with this one, are how tough of a search this will be to find the previous ratings. This comes from the structure of how FRC works. There are 3 teams on each side (forming what is known as an alliance) for a total of 6 teams. (These alliances are normally designated by the colors Red and Blue)
These alliances are randomly assigned ahead of time, and could include any team at the event, on either side.) In other words the match table would look like this (simplified):
Table: match_table
match_id | Red1 | Red2 | Red3 | Blue1 | Blue2 | Blue3 | RedScore | BlueScore | Red1Rating | Red2Rating | etc.....
So each team has to be included in the match info, as well as a rating for each team. If were to create more than one rating (such as an updated rating design that I want to do a pure comparison test with), things could get clogged really fast.
In order to find the previous rating for team # 67, for instance, I'd have to search Red1, Red2, Red3, Blue1, etc. and then look at the column that pertains to the position, all while being sure that this really is the most recent match.
Note: This might involve knowing not only the year of the data, the week it was taken in (I would get this data from a join with an event table), but the match level(whether it was qualifications or playoffs), and match #(which is not match_id).
Sure, this option is normalized, but it's also got a weird search pattern, and isn't easy from a front end standpoint(I might build a front-end for some of the data in the future, so I want to keep that in mind as well).
My question: Is there an easier/more efficient option that I am missing?
Because both designs feel somewhat inefficient. The first has too many tables, the other has a table that will have well over 100,000 entries and will have to be searched in a convoluted pattern. I feel as if there is some simple design solution that I simply haven't thought of.
There's only one sane answer:
team_rating:
team_id, rating, start_date, end_date
Making all ranges closed by using the creation date of the team as the first rating's start_date, and some arbitrarily distant future date (eg 2199-01-01) as the end_date for the current row. all dates being inclusive.
Queries to find the rating at any date are then a simple
select rating
from team_rating
where team_id = $id
and $date between start_date and end_date
and rating history is just
select start_date, rating
from team_rating
where team_id = $id
order by start_date
It's key that both start and end dates are stored, otherwise the queries are trainwrecks.

Database Schema for User Tagging using Nested Set/Adjacency List

I have seen many posts on building database schemas for object tagging (such as dlamblin's post and Artilheiro's post as well).
What I cannot seem to find in my many days of research is the schema logic in implementing a tagging schema that allows for the tags to be assigned to a user (such as LinkedIn's Skills and Expertise system, where tags that have been added by the user can be indexed and searched). This could be as simple as changing the "object" in question to a user, but I have a feeling it is more complicated than that.
I want to be able to construct something almost exactly like this, except in categories. In example, if we took some of LinkedIn's skills and categorized them, we could have something like: IT/Computing, Retail, Project Management, etc.
I know there are a couple common methodologies and architectures to categorizing data, specifically Nested Set and Adjacency List. I have heard many things about both, such as "Nested Set's insertion and deletion are resource intensive", and "Adjacency List Models are awkward, finite, and don't cover unlimited depth."
So I have two questions wrapped into one post:
What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?
What is the best to way to categorize something of this nature in light of the necessity to have categorization?
Are there any other models that would suit this better that I am unaware of? (Oops, I think that is three questions)
What is the best to way to categorize something of this nature in light of the necessity to have categorization?
Depends how much flexibility you need. For example, the adjacency list may be perfectly fine if you can assume the depth of your category hierarchy has a fixed limit of, say 1 or 2 levels.
Are there any other models that would suit this better that I am unaware of?
Path enumeration is a way to represent hierarchy in a concatenated list of the ancestor names. So each sub-category tag would name not only its own name, but its parent and any further grandparents up to the root.
You are already familiar with absolute pathnames in any shell environment: "/usr/local/bin" is a path enumeration of "usr", "local", and "bin" with the hierarchical relationship between them encoded in the order of the string.
This solution also has the possibility of data anomalies -- it's your responsibility to create an entry for "/usr/local" as well as "/usr/local/bin" and if you don't, some things start breaking.
What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?
Implementing this in the database is almost as simple as naming the tags individually, but it requires that your tag "name" column be long enough to store the longest path in the hierarchy.
CREATE TABLE taguser (
tag_path VARCHAR(255),
user_id INT,
PRIMARY KEY (tag_path,user_id),
FOREIGN KEY (tag_path) REFERENCES tagpaths (tag_path),
FOREIGN KEY (user_id) REFERENCES users (user_id)
);
Indexing is exactly the same as simple tagging, but you can only search for sub-category tags if you specify the whole string from the root of the hierarchy.
SELECT user_id FROM taguser WHERE tag_path = '/IT/Computing'; -- uses index
SELECT user_id FROM taguser WHERE tag_path LIKE '%/Computing'; -- can't use index
I think the best logic is the same as state in the post you linked
+------- +
| user |
+------- +
| userid |
| ... |
+--------+
+-------- --+
| linktable |
+-----------+
| userid | <- (fk and pk)
| tagid | <- (fk and pk)
+-----------+
+-------+
| tag |
+-------+
| tagid |
| ... |
+-------+
pretty mutch the way go to imo. if you want to categorise the tag you can alway atach a category table to the tag table
You didn't say which database, so I'm going to play devil's advocate and suggest how it would work in MongoDB. Create your user like this:
db.users.insert({
name: "bob",
skills: [ "surfing", "knitting", "eating"]
})
Then create an index on "skills". Mongo will add each skill in the array to the index, allowing quick lookups. Finding users with an intersection of 2 skills has similar performance to SQL databases, but the syntax is much nicer:
db.users.find({skills: "$in": ["surfing", "knitting"]})
The upside is that a single disk seek will fetch all the information you need on a user. The downside is that it takes a lot more disk space, and somewhat more RAM. But if it can avoid disk seeks caused by joins, it could be a win.

Database design issue:

I'm building a Volunteer Management System and I'm having some DB design issues:
To explain the process:
Volunteers can sign up for accounts. Volunteers report their hours to a project (each volunteer can have multiple projects). Volunteer supervisors are notified when a volunteers number of hours are close to some specified amount to give them a reward.
For example:
a volunteer who has volunteered 10 hours receives a free t shirt.
The problem I'm having is how to design the DB in such a way that a single reward profile can be related to multiple projects as well as have a single reward profile be "multi-tiered". A big thing about this is that rewards structures may change so they can't be just hardcoded.
Example of what I mean by "multi-tiered" reward profile:
A volunteer who has volunteered 10 hours receives a free t shirt.
A volunteer who has volunteered 40 hours receives a free $50 appreciation check.
The solutions I've come up with myself are:
To have a reward profile table that relates one row to each reward profile.
rewardprofile:
rID(primary key) - int
description - varchar / char(100)
details - varchar / file (XML)
Aside, just while on the topic, can DB field entries be files?
OR
To have a rewards table that relates one preset amount and reward where each row is as follows and a second rewards profile table that binds them the rewards entries together:
rewards:
rID(primary key) - int
rpID (references rewardsProfile) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
rewardsprofile:
rpID(primary key) - int
description
so this might look something like:
rewardsprofile:
rpid | desc
rp01 | no reward
rp02 | t-shirt only
rp03 | t-shirt and check
rewards
rid | rpID | hours | desc
r01 | rp02 | 10 | t-shirt
r02 | rp03 | 10 | t-shirt
r03 | rp03 | 40 | check
I'm sure this issue is nothing new but my google fu is weak and I don't know how to phrase this in a meaningful way. I think there must be a solution out there more formalized than my (hack and slash) method. If anyone can direct me to what this problem is called or any solutions to it, that would be swell. Thanks for all your time!
Cheers,
-Jeremiah Tantongco
Yes, database fields can be files (type binary, character large object, or xml) depending on the implementation of the specific database.
The rewardsprofile table looks like it might be challenging to maintain if you have a large number of different rewards in the future. One thing you might consider is a structure like:
rewards:
rID(primary key) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
volunteers:
vID(primary key) - int
.. any other fields you want here ..
rewardshistory:
vID (foreign key references volunteers)
rID (foreign key references rewards)
Any time you want to add a reward, you add it to the rewards table. Old rewards stay in the table (you might want an 'current' field or something to track whether the reward can still be assigned). The rewardshistory table tracks which rewards have been given to what volunteers.
This is a rough structure of how I would handle this:
Volunteers
volunteerid
firstname
lastname
VolunteerAddress
volunteerid
Street1
Street2
City
State
POstalcode
Country
Addresstype (home, business, etc.)
VolunteerPhone
volunteerid
Phone number
Phonetype
VolunteerEmail
volunteerid
EmailAddress
Project
Projectid
projectname
VolunteerHours
volunteerid
hoursworked
projectid
DateWorked
Rewards
Rewardid
Rewardtype (Continual, datelimited, etc.)
Reward
RewardBeginDate
RewardEndDate
RequiredHours
Awarded
VolunteerID
RewardID
RewardDate
You will probably have some time-limited rewards, that's why I added the date fields. You would then set up a job to calculate rewards once a week or once a month or so. Make sure to exclude those who have already receivced that particualr award if pertinent (You don't want to give a new t-shirt for every 10 hours worked do you?)
Yes, DB field entries can be files. Or, more precisely, they can be filespecs that reference files. Is that what you really meant?
While we are on the subject of data fields that reference other data, how much do you know about foreign keys? What can you accomplish with references to files that you couldn't accomplish even better by the judicious use of foreign keys?
Foreign keys, and the keys that they refer to, are fundamental concepts in the relational model of data. Without this model, your database design is going to be pretty random.
Morning,
You really must place all your tables on a chart then determine the business rules for that chart in the entity relationship diagram. Once you decide what the direct relationships are between each and every table only then would you test to see if you get the desired answers. This procedure is called database design and it appears that you didn't do that as of yet but got ahead of yourself a little bit from what I see.
There are plenty of good books on database design on the market. The one I use is "Database Design For Mere Mortals". It is very easy to read and understand.
Hope this helps.

Database schema design

I'm quite new to database design and have some questions about best practices and would really like to learn.
I am designing a database schema, I have a good idea of the requirements and now its a matter of getting it into black and white.
In this pseudo-database-layout, I have a table of customers, table of orders and table of products.
TBL_PRODUCTS:
ID
Description
Details
TBL_CUSTOMER:
ID
Name
Address
TBL_ORDER:
ID
TBL_CUSTOMER.ID
prod1
prod2
prod3
etc
Each 'order' has only one customer, but can have any number of 'products'.
The problem is, in my case, the products for a given order can be any amount (hundreds for a single order) on top of that, each product for an order needs more than just a 'quantity' but can have values that span pages of text for a specific product for a specific order.
My question is, how can I store that information?
Assuming I can't store a variable length array as single field value, the other option is to have a string that is delimited somehow and split by code in the application.
An order could have say 100 products, each product having either only a small int, or 5000 characters or free text (or anything in between), unique only to that order.
On top of that, each order must have it's own audit trail as many things can happen to it throughout it's lifetime.
An audit trail would contain the usual information - user, time/date, action and can be any length.
Would I store an audit trail for a specific order in it's own table (as they could become quite lengthy) created as the order is created?
Are there any places where I could learn more about techniques for database design?
The most common way would be to store the order items in another table.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_ITEM:
ID
TBL_ORDER.ID
TBL_PRODUCTS.ID
Quantity
UniqueDetails
The same can apply to your Order audit trail. It can be a new table such as
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
AuditDetails
First of all, Google Third Normal Form. In most cases, your tables should be 3NF, but there are cases where this is not the case because of performance or ease of use, and only experiance can really teach you that.
What you have is not normalized. You need a "Join table" to implement the many to many relationship.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_PRODUCT_JOIN:
ID
TBL_ORDER.ID
TBL_Product.ID
Quantity
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
Audit_Details
The basic conventional name for the ID column in the Orders table (plural, because ORDER is a keyword in SQL) is "Order Number", with the exact spelling varying (OrderNum, OrderNumber, Order_Num, OrderNo, ...).
The TBL_ prefix is superfluous; it is doubly superfluous since it doesn't always mean table, as for example in the TBL_CUSTOMER.ID column name used in the TBL_ORDER table. Also, it is a bad idea, in general, to try using a "." in the middle of a column name; you would have to always treat that name as a delimited identifier, enclosing it in either double quotes (standard SQL and most DBMS) or square brackets (MS SQL Server; not sure about Sybase).
Joe Celko has a lot to say about things like column naming. I don't agree with all he says, but it is readily searchable. See also Fabian Pascal 'Practical Issues in Database Management'.
The other answers have suggested that you need an 'Order Items' table - they're right; you do. The answers have also talked about storing the quantity in there. Don't forget that you'll need more than just the quantity. For example, you'll need the price prevailing at the time of the order. In many systems, you might also need to deal with discounts, taxes, and other details. And if it is a complex item (like an airplane), there may be only one 'item' on the order, but there will be an enormous number of subordinate details to be recorded.
While not a reference on how to design database schemas, I often use the schema library at DatabaseAnswers.org. It is a good jumping off location if you want to have something that is already roughed in. They aren't perfect and will most likely need to be modified to fit your needs, but there are more than 500 of them in there.
Learn Entity-Relationship (ER) modeling for database requirements analysis.
Learn relational database design and some relational data modeling for the overall logical design of tables. Data normalization is an important part of this piece, but by no means all there is to learn. Relational database design is pretty much DBMS independent within the main stream DBMS products.
Learn physical database design. Learn index design as the first stage of designing for performance. Some index design is DBMS independent, but physical design becomes increasingly dependent on special features of your DBMS as you get more detailed. This can require a book that's specifically tailored to the DBMS you intend to use.
You don't have to do all the above learning before you ever design and build your first database. But what you don't know WILL hurt you. Like any other skill, the more you do it, the better you'll get. And learning what other people already know is a lot cheaper than learning by trial and error.
Take a look at Agile Web Development with Rails, it's got an excellent section on ActiveRecord (an implementation of the same-named design pattern in Rails) and does a really good job of explaining these types of relationships, even if you never use Rails. Here's a good online tutorial as well.

Resources