There is a Book table that is always unique with title, edition, and author.
And I want all bookstores to add their books, but different bookstores can have the same book with different pricing. So I come up with this table design.
So when one bookstore tries to add a book and the book is already been added by another bookstore the current bookstore should have to just fill in the pricing detail, not including the book detail.
The problem with this is, what if the book detail already been added has some missing or incorrect info? in this case, the current bookstore can flag and moderators or admins can fix it.
Is there any better way to achieve this? I don't comfortable with this design logic at all.
Your design makes sense. You want to keep the "static" information in 1 table, and link "dynamic" information like you did.
Your other question is related to data integrity. You can put "not null" conditions on fields to ensure all fields are filed, but garbage entries are always possible. This is a universal problem.
Potential solutions to mitigate this:
any and all data that can be selected instead of typed in should be linked via another table. Ex:
BookGenre
bookgenreid PK
genre CHAR
Book
bookid PK
genre FK, BookGenre.bookgenreid
...
So you store all possible genres in a separate table, so your users cannot invent new genres or mistype values. Same for authors, countries, ... This makes it easier to build queries as well and avoid things like [ SciFi, Science Fiction, Sciance fiction, ... ]
not everyone should be able to enter new books in the system. Ex. when I worked at a wholesale distributor, only a select group employees could create new products in the database, and they had established a convention on how to do it. They worked closely with purchasing and receiving. You will need to dedicate "data administrators".
So try to control as much as you can in the database and - or the application. Avoid free text fields as much as possible, as users will always think of new ways to mess it up. Ex. at work currently we have a HUGE project to standardise addresses between unlinked systems. It is a enormous undertaking, which involves AI. All this only because no 2 persons enter addresses exactly the same.
Related
I need to create a data model for an education based application. The question I want to ask is is it better to make one junction table for two tables with many-to-many relation or create one big junction table to deal with all many-to-many relationships?
Say, I have student, tutor, subject, grade tables.
student and tutor are in many-to-many
tutor and subject are in many-to-many
tutor and grade are also in many-to-many
A student can have many tutors for one subject of one grade.
There can be many tutors for one subject of one grade.
A subject of one grade can be taught by many tutors.
Above are just a few examples of the relationships.
My question is how to model these relationships efficiently? Should I have one junction table for each of the relationships or should I combine them into one big bridge table?
So, if I have a class table as well, then from the big bridge table I can get for which class which tutor taught which subject of what grade along with other details of the class.
Let's assume the database is not yet electronic, but a good old filing cabinet instead.
Let's assume the database is for a library, and there are a couple of distinct sorts of "many-to-many info" to be maintained : authors to books (coauthored books have >1 author), readers to books, readers to readers, book availability in possibly multiple site locations of the library, ...
Would you ever think of stashing all those distinct sorts of information in one big filing cabinet ? Imagine what the consequences are for its users ? Sometimes you'll be prohibited to do something "readers to books" merely because someone else is right there doing something "readers to readers". If and when you manage to gain access and it's finally your turn do so something, say "authors to books", your work will be slowed down because all the "readers to books" stuff might come in between and you'll have to spend extra time merely skipping the unneeded stuff. If a "conversion operation" must be performed, say, a new kind of many-to-many stuff is discovered and must be integrated in the single filing cabinet, the entire database is inaccessible while the conversion operation is being performed (people adding filing cards of a color that wasn't yet in use). Etc. etc. . Those undesirable properties carry over almost 1-1 to the electronic equivalent.
As someone else put it : don't be afraid of tables. It's what a DBMS is good at.
EDIT
Brief : just keep it at one table per fact type, and abstain from making (/trying to discover) geeky abstractions like "they're all just properties" / "they're all just some many-to-many-relation" / ... . They're geeky because an end user/business user will not "see" it. And thus there is no business value in making them.
Today I was designing a database for a potential personal project of mine. Since I couldn't decide what would be a better option I asked my teacher Databases, unfortunately he couldn't tell me which of the two options is better than the other and why.
I designed the database for a dummy data generator. Since I want to generate multilangual data I thought of these tables. (But its a simplification of the tables).
(first and last)names: id, name
streets: id, name
languages: id, name
Each names.name and streets.name originates from a language, sometimes a name can have multiple origins (ex: Nick is both a Dutch as an English name).
Each language has multiple names and streets.
These two rules result in a Many-to-Many relationship. At the moment I've got only two tables, but I know I will get between 10 and 20 of these kind of tables.
The regular way one would do this is just make 10 to 20 Many-to-Many relationship tables.
Another idea I came up with was just one Many-to-Many table with a third column which specifies which table the id relates to.
At the moment I've got the design on my other PC so I will update it with my ideas visualized after dinner (2 hours or so).
Which idea is better and why?
To make the project idea a bit clearer:
It is always a hassle to create good and enough realistic looking working data for projects. This application will generate this data for you and return the needed SQL so you only have to run the queries.
The user comes to the site to get the data. He states his tablename, his columnnames and then he can link the columnnames to types of data, think of:
* Firstname
* Lastname
* Email adress (which will be randomly generated from the name of the person)
* Adress details (street, housenumber, zipcode, place, country)
* A lot more
Then, after linking columns with the types the user can set the number of rows he wants to make. The application will then choose a country at random and generate realistic looking data according to the country they live in.
That's actually an excellent question. This sort of thing leads to a genuine problem in database design and there is a real tradeoff. I don't know what rdbms you are using but....
Basically you have four choices, all of them with serious downsides:
1. One M-M table with check constraints that only one fkey can be filled in besides language and one column per potential table. Ick....
2. One M-M table per relationship. This makes things quite hard to manage over time especially if you need to change something from an int to a bigint at some point.
3. One M-M table with a polymorphic relationship. You lose a lot of referential integrity checks when you do this and to make it safe, have fun coding (and testing!) triggers.
4. Look carefully at the advanced features in your rdbms for a solution. For example in postgresql this can be solved with table inheritance. The downside is that you lose portability and end up in advanced territory.
Unfortunately there is no single definite answer. You need to consider the tradeoffs carefully and decide what makes sense for your project. If I was just working with one RDBMS, I would do the last one. But if not, I would probably do one table per relationship and focus on tooling to manage the problems that come up. But the former preference is about my level of knowledge and confidence, and the latter is a bit more of a personal opinion.
So I hope this helps you look at the tradeoffs and select what is right for you.
I'm looking to create a database for tracking purposes. For the purposes of this question I have abstracted it a bit and transformed it into a Product Design tracking database.
I'm trying to make it as normalized and efficient as possible.
Essentially I want to be able to track Employees and what designs they've participated in.The queries I want to run aren't particularly complex. I want to be able to query how many employees participated in the design of a specific model, what models and products an employee has designed, how many employees designed a product, etc. The products would be redesigned every year or half year.
My concern is how I'm managing the many to many relationship between models and employees. In the example provided I have a Design table between the Model and Employee Table. This will essentially be a dump of all Employees designers, which both resolves the many to many relationship (which I understand to be a bit of a bad thing to have) and make query design relatively simple. I also assume I can index it by either Emp_ID or Model_ID to make it more efficient.
However I'm worried this table may get a bit unwieldy over time. In its current role I could make this database very inefficient and probably not notice any degradation in performance. However I'm hoping to make this relatively scale-able as I want to make this easy to admin (whether I admin it or someone else takes over) and I'm hoping to add features over time (a CRM-like functionality for example).
I was thinking I could create a table to for each employee and track design projects by having them in a separate empID_design table, but that also seemed very unwieldy. Essentially every other way I thought up on how to do this ended up with creating a large number of tables versus inserting a Row.
One other thing was I wanted to be able to track project managers. In the current form I thought tracking it in the Design table made sense. I don't think the Project Mgr would change mid-design but is there an elegant way to track it if they did?
Any help or advice you can provide is appreciated. I'm a bit rusty with Database Design and ERD Design, so if you notice something that doesn't make sense, its more likely a mistake I made as opposed a fancy nuanced design I thought up.
To give a basic idea of what each table could be representing:
Company: Black and Decker
Product: Rotary Tool
Model: D-5230
Designed by: George Santos, Kevin Smith, John Rodes
Project Manager: Kevin Smith
Thank you in advance!
ERD Diagram: http://i.stack.imgur.com/flo4l.png
It sounds to me like the intersection table between model and employee ought to be a role table, in which each row has:
An employee ID
A model ID
A role ID: designer, project manager, lead designer, etc
In that way an employee could even have multiple roles on a project.
I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...
I'm quite new to database design and have some questions about best practices and would really like to learn.
I am designing a database schema, I have a good idea of the requirements and now its a matter of getting it into black and white.
In this pseudo-database-layout, I have a table of customers, table of orders and table of products.
TBL_PRODUCTS:
ID
Description
Details
TBL_CUSTOMER:
ID
Name
Address
TBL_ORDER:
ID
TBL_CUSTOMER.ID
prod1
prod2
prod3
etc
Each 'order' has only one customer, but can have any number of 'products'.
The problem is, in my case, the products for a given order can be any amount (hundreds for a single order) on top of that, each product for an order needs more than just a 'quantity' but can have values that span pages of text for a specific product for a specific order.
My question is, how can I store that information?
Assuming I can't store a variable length array as single field value, the other option is to have a string that is delimited somehow and split by code in the application.
An order could have say 100 products, each product having either only a small int, or 5000 characters or free text (or anything in between), unique only to that order.
On top of that, each order must have it's own audit trail as many things can happen to it throughout it's lifetime.
An audit trail would contain the usual information - user, time/date, action and can be any length.
Would I store an audit trail for a specific order in it's own table (as they could become quite lengthy) created as the order is created?
Are there any places where I could learn more about techniques for database design?
The most common way would be to store the order items in another table.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_ITEM:
ID
TBL_ORDER.ID
TBL_PRODUCTS.ID
Quantity
UniqueDetails
The same can apply to your Order audit trail. It can be a new table such as
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
AuditDetails
First of all, Google Third Normal Form. In most cases, your tables should be 3NF, but there are cases where this is not the case because of performance or ease of use, and only experiance can really teach you that.
What you have is not normalized. You need a "Join table" to implement the many to many relationship.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_PRODUCT_JOIN:
ID
TBL_ORDER.ID
TBL_Product.ID
Quantity
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
Audit_Details
The basic conventional name for the ID column in the Orders table (plural, because ORDER is a keyword in SQL) is "Order Number", with the exact spelling varying (OrderNum, OrderNumber, Order_Num, OrderNo, ...).
The TBL_ prefix is superfluous; it is doubly superfluous since it doesn't always mean table, as for example in the TBL_CUSTOMER.ID column name used in the TBL_ORDER table. Also, it is a bad idea, in general, to try using a "." in the middle of a column name; you would have to always treat that name as a delimited identifier, enclosing it in either double quotes (standard SQL and most DBMS) or square brackets (MS SQL Server; not sure about Sybase).
Joe Celko has a lot to say about things like column naming. I don't agree with all he says, but it is readily searchable. See also Fabian Pascal 'Practical Issues in Database Management'.
The other answers have suggested that you need an 'Order Items' table - they're right; you do. The answers have also talked about storing the quantity in there. Don't forget that you'll need more than just the quantity. For example, you'll need the price prevailing at the time of the order. In many systems, you might also need to deal with discounts, taxes, and other details. And if it is a complex item (like an airplane), there may be only one 'item' on the order, but there will be an enormous number of subordinate details to be recorded.
While not a reference on how to design database schemas, I often use the schema library at DatabaseAnswers.org. It is a good jumping off location if you want to have something that is already roughed in. They aren't perfect and will most likely need to be modified to fit your needs, but there are more than 500 of them in there.
Learn Entity-Relationship (ER) modeling for database requirements analysis.
Learn relational database design and some relational data modeling for the overall logical design of tables. Data normalization is an important part of this piece, but by no means all there is to learn. Relational database design is pretty much DBMS independent within the main stream DBMS products.
Learn physical database design. Learn index design as the first stage of designing for performance. Some index design is DBMS independent, but physical design becomes increasingly dependent on special features of your DBMS as you get more detailed. This can require a book that's specifically tailored to the DBMS you intend to use.
You don't have to do all the above learning before you ever design and build your first database. But what you don't know WILL hurt you. Like any other skill, the more you do it, the better you'll get. And learning what other people already know is a lot cheaper than learning by trial and error.
Take a look at Agile Web Development with Rails, it's got an excellent section on ActiveRecord (an implementation of the same-named design pattern in Rails) and does a really good job of explaining these types of relationships, even if you never use Rails. Here's a good online tutorial as well.