Database design issue: - database

I'm building a Volunteer Management System and I'm having some DB design issues:
To explain the process:
Volunteers can sign up for accounts. Volunteers report their hours to a project (each volunteer can have multiple projects). Volunteer supervisors are notified when a volunteers number of hours are close to some specified amount to give them a reward.
For example:
a volunteer who has volunteered 10 hours receives a free t shirt.
The problem I'm having is how to design the DB in such a way that a single reward profile can be related to multiple projects as well as have a single reward profile be "multi-tiered". A big thing about this is that rewards structures may change so they can't be just hardcoded.
Example of what I mean by "multi-tiered" reward profile:
A volunteer who has volunteered 10 hours receives a free t shirt.
A volunteer who has volunteered 40 hours receives a free $50 appreciation check.
The solutions I've come up with myself are:
To have a reward profile table that relates one row to each reward profile.
rewardprofile:
rID(primary key) - int
description - varchar / char(100)
details - varchar / file (XML)
Aside, just while on the topic, can DB field entries be files?
OR
To have a rewards table that relates one preset amount and reward where each row is as follows and a second rewards profile table that binds them the rewards entries together:
rewards:
rID(primary key) - int
rpID (references rewardsProfile) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
rewardsprofile:
rpID(primary key) - int
description
so this might look something like:
rewardsprofile:
rpid | desc
rp01 | no reward
rp02 | t-shirt only
rp03 | t-shirt and check
rewards
rid | rpID | hours | desc
r01 | rp02 | 10 | t-shirt
r02 | rp03 | 10 | t-shirt
r03 | rp03 | 40 | check
I'm sure this issue is nothing new but my google fu is weak and I don't know how to phrase this in a meaningful way. I think there must be a solution out there more formalized than my (hack and slash) method. If anyone can direct me to what this problem is called or any solutions to it, that would be swell. Thanks for all your time!
Cheers,
-Jeremiah Tantongco

Yes, database fields can be files (type binary, character large object, or xml) depending on the implementation of the specific database.
The rewardsprofile table looks like it might be challenging to maintain if you have a large number of different rewards in the future. One thing you might consider is a structure like:
rewards:
rID(primary key) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
volunteers:
vID(primary key) - int
.. any other fields you want here ..
rewardshistory:
vID (foreign key references volunteers)
rID (foreign key references rewards)
Any time you want to add a reward, you add it to the rewards table. Old rewards stay in the table (you might want an 'current' field or something to track whether the reward can still be assigned). The rewardshistory table tracks which rewards have been given to what volunteers.

This is a rough structure of how I would handle this:
Volunteers
volunteerid
firstname
lastname
VolunteerAddress
volunteerid
Street1
Street2
City
State
POstalcode
Country
Addresstype (home, business, etc.)
VolunteerPhone
volunteerid
Phone number
Phonetype
VolunteerEmail
volunteerid
EmailAddress
Project
Projectid
projectname
VolunteerHours
volunteerid
hoursworked
projectid
DateWorked
Rewards
Rewardid
Rewardtype (Continual, datelimited, etc.)
Reward
RewardBeginDate
RewardEndDate
RequiredHours
Awarded
VolunteerID
RewardID
RewardDate
You will probably have some time-limited rewards, that's why I added the date fields. You would then set up a job to calculate rewards once a week or once a month or so. Make sure to exclude those who have already receivced that particualr award if pertinent (You don't want to give a new t-shirt for every 10 hours worked do you?)

Yes, DB field entries can be files. Or, more precisely, they can be filespecs that reference files. Is that what you really meant?
While we are on the subject of data fields that reference other data, how much do you know about foreign keys? What can you accomplish with references to files that you couldn't accomplish even better by the judicious use of foreign keys?
Foreign keys, and the keys that they refer to, are fundamental concepts in the relational model of data. Without this model, your database design is going to be pretty random.

Morning,
You really must place all your tables on a chart then determine the business rules for that chart in the entity relationship diagram. Once you decide what the direct relationships are between each and every table only then would you test to see if you get the desired answers. This procedure is called database design and it appears that you didn't do that as of yet but got ahead of yourself a little bit from what I see.
There are plenty of good books on database design on the market. The one I use is "Database Design For Mere Mortals". It is very easy to read and understand.
Hope this helps.

Related

RDBMS: How to model a company having different products at multiple locations

I have an existing database that models all products, a company is either producing or consuming. The database is quite simple:
Table: companies {PK: company_id}
+------------+--------------+
| company_id | company_name |
+------------+--------------+
Table: products {PK: product_id}
+------------+--------------+---------------+
| company_id | product_id | product_price |
+------------+--------------+---------------+
Now, if I need to add location information to it, it starts to get complicated.
Basically, now a company has many locations and each location has many products.
To further complicate matters, some attributes of the product e.g. price may not be the same at each location. I would like to share other common attributes at all locations (Basically, I want to avoid creating three copies of product A that's used at all three locations).
I'm not sure what the best way to model this is. I can think of
Table: company_location
+------------+-------------+
| company_id | location_id |
+------------+-------------+
Table: location_product
+-------------+------------+
| location_id | product_id |
+-------------+------------+
But this design would not allow product attributes to change per location, without creating an entirely different product for each location. I also don't have a way to maintain a master product list per company.
Any help is appreciated.
PS: I'm using a postgreSQL database
The rules of normalization would tell you that you need your non-key attributes to depend on all of the key values (and nothing else).
If price is determined by:
- The company who makes it
- The location that sells it
- What the product actually is
Then that implies that PRICE needs a candidate key that specifies company, location and production.
The issue becomes what the relationships are between companies, products and locations. Also, what else do you know (what columns do you have) about these three kinds of things?
If they are all totally independent, for example, the products are commodities and don't depend at all on companies and the locations are independent distributors, which have nothing to do with either companies or what kind of products are sold there, then really a single three-way join is probably your best bet.
However, if there are some linkages between company, product and location, then you need to normalize these items out appropriately. At the end, you may still find yourself tempted to keep price as the only attribute in a three-way join. Alternatively, you may find that your data is actually more hierarchical (companies have locations which sell products that are fundamentally different in some meaningful way from similar products sold at other locations). In such a case the price might live on the leaf level of a tree structure.
It's really hard to say for sure what would work best for you without understanding your business rules better.
The bottom line is, you should aim for third normal form (3NF).
You probably want something like this:

What is the correct database structure to store historical data?

I am designing a database in Sqlite that is intended to help facilitate the creation of predictive algorithms for FIRST Robotics Competition. On the surface, things looked pretty easy, but I'm struggling with a single issue: How to store past ratings of a team. I have looked over previous questions pertaining to how historical data is stored, but I'm not sure any of it applies well to my situation (though it could certainly be that I just don't understand it well enough).
Each team has an individual rating, and after each match that team participates in the rating gets revised. Now, there are several ways I could go about storing them, but none of them seem particularly good. I'll go through the ones that I have thought through, in no particular order.
Option 1:
Each Team has it's own table.It would include the match_id and the rating after the match was done, and could possibly also include the rating before. The problem is, that there would be bordering on 10,000 tables. I'm pretty sure that's inefficient, especially considering I believe that it's also unnormalized (correct me if I'm wrong).
Table Name: Team_id
match_id | rating_after
Option 2:
Historical rating ratings for each team or stored in the match table, and current ratings are stored in the team table. A simplified version of the team table looks like this:
Table : Team_list
team_id | team_name | team_rating
That isn't really the problem, the problem is with the historical data. The historical data would be stored with the match. Likely, it would be each teams rating before the match.
The problems I have with this one, are how tough of a search this will be to find the previous ratings. This comes from the structure of how FRC works. There are 3 teams on each side (forming what is known as an alliance) for a total of 6 teams. (These alliances are normally designated by the colors Red and Blue)
These alliances are randomly assigned ahead of time, and could include any team at the event, on either side.) In other words the match table would look like this (simplified):
Table: match_table
match_id | Red1 | Red2 | Red3 | Blue1 | Blue2 | Blue3 | RedScore | BlueScore | Red1Rating | Red2Rating | etc.....
So each team has to be included in the match info, as well as a rating for each team. If were to create more than one rating (such as an updated rating design that I want to do a pure comparison test with), things could get clogged really fast.
In order to find the previous rating for team # 67, for instance, I'd have to search Red1, Red2, Red3, Blue1, etc. and then look at the column that pertains to the position, all while being sure that this really is the most recent match.
Note: This might involve knowing not only the year of the data, the week it was taken in (I would get this data from a join with an event table), but the match level(whether it was qualifications or playoffs), and match #(which is not match_id).
Sure, this option is normalized, but it's also got a weird search pattern, and isn't easy from a front end standpoint(I might build a front-end for some of the data in the future, so I want to keep that in mind as well).
My question: Is there an easier/more efficient option that I am missing?
Because both designs feel somewhat inefficient. The first has too many tables, the other has a table that will have well over 100,000 entries and will have to be searched in a convoluted pattern. I feel as if there is some simple design solution that I simply haven't thought of.
There's only one sane answer:
team_rating:
team_id, rating, start_date, end_date
Making all ranges closed by using the creation date of the team as the first rating's start_date, and some arbitrarily distant future date (eg 2199-01-01) as the end_date for the current row. all dates being inclusive.
Queries to find the rating at any date are then a simple
select rating
from team_rating
where team_id = $id
and $date between start_date and end_date
and rating history is just
select start_date, rating
from team_rating
where team_id = $id
order by start_date
It's key that both start and end dates are stored, otherwise the queries are trainwrecks.

Change Data Capture and SQL Server Analysis Services

I'm designing a database application where data is going to change over time. I want to persist historical data and allow my users to analyze it using SQL Server Analysis Services, but I'm struggling to come up with a database schema that allows this. I've come up with a handful of schemas that could track the changes (including relying on CDC) but then I can't figure out how to turn that schema into a working BISM within SSAS. I've also been able to create a schema that translates nicely in to a BISM but then it doesn't have the historical capabilities I'm looking for. Are there any established best practices for doing this sort of thing?
Here's an example of what I'm trying to do:
I have a fact table called Sales which contains monthly sales figures. I also have a regular dimension table called Customers which allows users to look at sales figures broken down by customer. There is a many-to-many relationship between customers and sales representatives so I can make a reference dimension called Responsibility that refers to the customer dimension and a Sales Representative reference dimension that refers to the Responsibility dimension. I now have the Sales facts linked to Sales Representatives by the chain of reference dimensions Sales -> Customer -> Responsibility -> Sales Representative which allows me to see sales figures broken down by sales rep. The problem is that the Sales facts aren't the only things that change over time. I also want to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact. I also want to know where the Sale Representative's office was located at the time of a particular sales fact, which may be different than his current location. I might also what to know the size of a customer's organization at the time of a particular Sales fact, also which might be different than it is currently. I have no idea how to model this in an BISM-friendly way.
You mentioned that you currently have a fact table which contains monthly sales figures. So one record per customer per month. So each record in this fact table is actually an aggregation of individual sales "transactions" that occurred during the month for the corresponding dimensions.
So in a given month, there could be 5 individual sales transactions for $10 each for customer 123...and each individual sales transaction could be handled by a different Sales Rep (A, B, C, D, E). In the fact table you describe there would be a single record for $50 for customer 123...but how do we model the SalesReps (A-B-C-D-E)?
Based on your goals...
to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact
to know where the Sale Representative's office was located at the time of a particular sales fact
to know the size of a customer's organization at the time of a particular Sales fact
...I think it would be easier to model at a lower granularity...specifcally a sales-transaction fact table which has a grain of 1 record per sales transaction. Each sales transaction would have a single customer and single sales rep.
FactSales
DateKey (date of the sale)
CustomerKey (customer involved in the sale)
SalesRepKey (sales rep involved in the sale)
SalesAmount (amount of the sale)
Now for the historical change tracking...any dimension with attributes for which you want to track historical changes will need to be modeled as a "Slowly Changing Dimension" and will therefore require the use of "Surrogate Keys". So for example, in your customer dimension, Customer ID will not be the primary key...instead it will simply be the business key...and you will use an arbitrary integer as the primary key...this arbitrary key is referred to as a surrogate key.
Here's how I'd model the data for your dimensions...
DimCustomer
CustomerKey (surrogate key, probably generated via IDENTITY function)
CustomerID (business key, what you will find in your source systems)
CustomerName
Location (attribute we wish to track historically)
-- the following columns are necessary to keep track of history
BeginDate
EndDate
CurrentRecord
DimSalesRep
SalesRepKey (surrogate key)
SalesRepID (business key)
SalesRepName
OfficeLocation (attribute we wish to track historically)
-- the following columns are necessary to keep track of historical changes
BeginDate
EndDate
CurrentRecord
FactSales
DateKey (this is your link to a date dimension)
CustomerKey (this is your link to DimCustomer)
SalesRepKey (this is your link to DimSalesRep)
SalesAmount
What this does is allow you to have multiple records for the same customer.
Ex. CustomerID 123 moves from NC to GA on 3/5/2012...
CustomerKey | CustomerID | CustomerName | Location | BeginDate | EndDate | CurrentRecord
1 | 123 | Ted Stevens | North Carolina | 01-01-1900 | 03-05-2012 | 0
2 | 123 | Ted Stevens | Georgia | 03-05-2012 | 01-01-2999 | 1
The same applies with SalesReps or any other dimension in which you want to track the historical changes for some of the attributes.
So when you slice the sales transaction fact table by CustomerID, CustomerName (or any other non-historicaly-tracked attribute) you should see a single record with the facts aggregated across all transactions for the customer. And if you instead decide to analyze the sales transactions by CustomerName and Location (the historically tracked attribute), you will see a separate record for each "version" of the customer location corresponding to the sales amount while the customer was in that location.
By the way, if you have some time and are interested in learning more, I highly recommend the Kimball bible "The Data Warehouse Toolkit"...which should provide a solid foundation on dimensional modeling scenarios.
The established best practices way of doing what you want is a dimensional model with slowly changing dimensions. Sales reps are frequently used to describe the usefulness of SCDs. For example, sales managers with bonuses tied to the performance of their teams don't want their totals to go down if a rep transfers to a new territory. SCDs are perfect for tracking this sort of thing (and the situations you describe) and allow you to see what things looked like at any point historically.
Spend some time on Ralph Kimball's website to get started. The first 3 articles I'd recommend you read are Slowly Changing Dimensions, Slowly Changing Dimensions Part 2, and The 10 Essential Rules of Dimensional Modeling.
Here are a few things to focus on in order to be successful:
You are not designing a 3NF transactional database. Get comfortable with denormalization.
Make sure you understand what grain means and explicitly define the grain of your database.
Do not use natural keys as keys, and do not bake any intelligence into your surrogate keys (with the exception of your time keys).
The goals of your application should be query speed and ease of understanding and navigation.
Understand type 1 and type 2 slowly changing dimensions and know where to use them.
Make sure you have a sponsor on the business side with the power to "break ties". You will find different people in the organization with different definitions of the same thing, and you need an enforcer with the power to make decisions. To see what I mean, ask 5 different people in your organization to define "customer" or "gross profit". You'll be lucky to get 2 people to define either the same way.
Don't try to wing it. Read the The Data Warehouse Lifecycle Toolkit and embrace the ideas, even if they seem strange at first. They work.
OLAP is powerful and can be life changing if implemented skillfully. It can be an absolute nightmare if it isn't.
Have fun!

Database Design Structure | Product & ProductKids

What would be the best way to design a mysql table, which can be filled with HTML (+PHP) form.
Actually I have this structure:
Table PRODUCT
ID | Ordernumber | Name | Desc | Price
Table PRODUCT_KIDS
ID | MasterProductID | Ordernumber | Price
The only difference between my 2 tables are Name and Desc.
The ADD HTML form looks like this:
DATA FOR PRODUCT
Ordernumber
Name
Desc
Price
DATA FOR PRODUCT_KIDS
Ordnernumber
Price
For some reasons, the customers want for example only 2 PRODUCT_KIDS without their MasterProduct. In this case I need the Name and the Description from the Master-Product.
My questions are the following:
Should I merge these two tables together? Is this the best way to search for something?
When I merge these 2 tables, should I save the Name & Desc for the PRODUCT_KIDS as well (for the example above)?
what would be the best way to design a mysql table which can be filled with HTML (+PHP) form.
Who cares? Contrary to popular believe, MySql is JUST ANOTHER relational database and PHP just another web scripting technology. The answer is the same whether you use mysql + php or oracle + java or sql server + asp.net.
The principles of relational database design apply to any relational database. As such, the question is not related to MySql and particularly not at all to PHP.
Table PRODUCT
Table PRODUCT_KIDS
This is a simplistic view on the topic that leaves out a lot of even legal items, such as possible international taxation, different shipping codes and their prices (not all items can be combined in one shipment) and for example customization of items in general - I remember writing a shop where PIPES where sold, in custom lengths ;) And some items requires separate shipping ;)
"The Data Model Resosource Book", Volume 1, discusses standard enterprise scnearios in great depth - including address management (not as simple as most people do it), accounting and.... the whole shop blabla (storage, inventory, pricing). BOTH (!) of your approaches are simplistic and would be totally illegal in my juristidciton because it would not take into acocunt legal requirments for properly tracking taxation on various products.
I can only suggest gtting it - they also go in great depth on some of the industry level particularities. For example ;) - Apparel. Make a shop for cloths and you go nuts on "variatns in size AND color for the same product". Your approach would result in a "shirt" possibly having 200 children (sizes * colors) ;)
I would suggest: back to the drawing board. With a good book ;) I personally loved reading this book - hm - a long time ago.

mapping a relationship on an ER diagram if it won't be used in practice?

I have the following tables:
______________ ___________ ___________
Persons | Enquiries | Products |
______________| ___________| ___________|
PersonID | EnID | ProductID |
FirstName | EnDate | Product |
LastName | Enquiry | Price |
Email | ___________| ___________|
Etc. |
______________|
I'm going to make a 1:N relationship between Persons and Enquiries. There's also an explicit M:N relationship between Enquiries and Products. However, There's no need for the business folks to record whether an enquiry is about a particular product or not.
My question:
From a logical design point of view, do I still need to record the relationship on the ER diagram and implement it within my RDMBS even if I'm not going to make any use of it?
Many thanks,
zan
From my experience, if you're even asking this question, then you need to make the table.
From the sounds of your question, you have feelings that there may be a use for mapping the statistics between Enquiries and Products, either now or in the future.
Many-to-Many tables, when implemented in this fashion:
Product-Enquiries
=============================
P_ID-E_ID PK, int, AUTO_INCR
P_ID FK, int
E_ID FK, int
Are very small tables, take less than five minutes to set up, and can be ignored with little consequence.
However, the moment it enters someone's head that "hey, we should be able to tell which products people are asking about", the act of creating these types of tables, as well as implementing the DML into the application logic becomes a pain.
Plus, all you have to do is write your SELECTs to get the complete listing of information regarding the topic available in the system, instead of waiting until there's been a buildup of records in the table to answer the question.
I guess you can cut out the relationship if you know it will never be used. This will keep the design simpler and more manageable. Or you can merge the two entities into the Enquiries one.
Relationship between tables are created only for purposes. If you are the developer and you won't be developing something that uses the relationship between product and enquiries table, then why create it.
RDBMS doesn't promote or restrict the creation of relationship. Create them if required, else leave them.

Resources