I need to create a data model for an education based application. The question I want to ask is is it better to make one junction table for two tables with many-to-many relation or create one big junction table to deal with all many-to-many relationships?
Say, I have student, tutor, subject, grade tables.
student and tutor are in many-to-many
tutor and subject are in many-to-many
tutor and grade are also in many-to-many
A student can have many tutors for one subject of one grade.
There can be many tutors for one subject of one grade.
A subject of one grade can be taught by many tutors.
Above are just a few examples of the relationships.
My question is how to model these relationships efficiently? Should I have one junction table for each of the relationships or should I combine them into one big bridge table?
So, if I have a class table as well, then from the big bridge table I can get for which class which tutor taught which subject of what grade along with other details of the class.
Let's assume the database is not yet electronic, but a good old filing cabinet instead.
Let's assume the database is for a library, and there are a couple of distinct sorts of "many-to-many info" to be maintained : authors to books (coauthored books have >1 author), readers to books, readers to readers, book availability in possibly multiple site locations of the library, ...
Would you ever think of stashing all those distinct sorts of information in one big filing cabinet ? Imagine what the consequences are for its users ? Sometimes you'll be prohibited to do something "readers to books" merely because someone else is right there doing something "readers to readers". If and when you manage to gain access and it's finally your turn do so something, say "authors to books", your work will be slowed down because all the "readers to books" stuff might come in between and you'll have to spend extra time merely skipping the unneeded stuff. If a "conversion operation" must be performed, say, a new kind of many-to-many stuff is discovered and must be integrated in the single filing cabinet, the entire database is inaccessible while the conversion operation is being performed (people adding filing cards of a color that wasn't yet in use). Etc. etc. . Those undesirable properties carry over almost 1-1 to the electronic equivalent.
As someone else put it : don't be afraid of tables. It's what a DBMS is good at.
EDIT
Brief : just keep it at one table per fact type, and abstain from making (/trying to discover) geeky abstractions like "they're all just properties" / "they're all just some many-to-many-relation" / ... . They're geeky because an end user/business user will not "see" it. And thus there is no business value in making them.
Related
There is a Book table that is always unique with title, edition, and author.
And I want all bookstores to add their books, but different bookstores can have the same book with different pricing. So I come up with this table design.
So when one bookstore tries to add a book and the book is already been added by another bookstore the current bookstore should have to just fill in the pricing detail, not including the book detail.
The problem with this is, what if the book detail already been added has some missing or incorrect info? in this case, the current bookstore can flag and moderators or admins can fix it.
Is there any better way to achieve this? I don't comfortable with this design logic at all.
Your design makes sense. You want to keep the "static" information in 1 table, and link "dynamic" information like you did.
Your other question is related to data integrity. You can put "not null" conditions on fields to ensure all fields are filed, but garbage entries are always possible. This is a universal problem.
Potential solutions to mitigate this:
any and all data that can be selected instead of typed in should be linked via another table. Ex:
BookGenre
bookgenreid PK
genre CHAR
Book
bookid PK
genre FK, BookGenre.bookgenreid
...
So you store all possible genres in a separate table, so your users cannot invent new genres or mistype values. Same for authors, countries, ... This makes it easier to build queries as well and avoid things like [ SciFi, Science Fiction, Sciance fiction, ... ]
not everyone should be able to enter new books in the system. Ex. when I worked at a wholesale distributor, only a select group employees could create new products in the database, and they had established a convention on how to do it. They worked closely with purchasing and receiving. You will need to dedicate "data administrators".
So try to control as much as you can in the database and - or the application. Avoid free text fields as much as possible, as users will always think of new ways to mess it up. Ex. at work currently we have a HUGE project to standardise addresses between unlinked systems. It is a enormous undertaking, which involves AI. All this only because no 2 persons enter addresses exactly the same.
This is an problem about drawing ERD in one of my course:
A local startup is contemplating launching Jungle, a new one stop
online eCommerce site.
As they have very little experience designing and implementing
databases, they have asked you to help them design a database for
tracking their operations.
Jungle will sell a range of products, and they will need to track
information such as the name and price for each. In order to sell as
many products as possible, Jungle would like to display short reviews
alongside item listings. To conserve space, Jungle will only keep
track of the three most recent reviews for each product. Of course, if
an item is new (or just unpopular), it may have less than three
reviews stored.
Each time a customer buys something on Jungle, their details will be
stored for future access. Details collected by Jungle include
customer’s names, addresses, and phone numbers. Should a customer buy
multiple items on Jungle, their details can then be reused in future
transactions.
For maximum convenience, Jungle would also like to record credit card
information for its users. Details stored include the account and BSB
numbers. When a customer buys something on Jungle, the credit card
used is then linked to the transaction. Each customer may be linked to
one or more credit cards. However, as some users do not wish to have
their credit card details recorded, a customer may also be linked to
no credit cards. For such transactions, only the customer and product
will be recorded.
And this is the solution:
The problem is the Buys action connect with 3 others entities: Product, Customer, and Card. I find this very hard to read and understand.
Is an action involving more than 2 entities common in production? If it is, how should I understand and use it? Or if it's not, what is the better way of design for this problem?
While the bulk of relationships in practice are binary relationships, ternary and higher relationships are normal elements of the entity-relationship model. Some examples are supplies (supplier_id, product_id, region_id) or enrolled (student_id, course_id, semester_id). However, they often get converted into entity sets via the introduction of a surrogate identifier, due to dislike of composite keys or confusion with network data models in which only directed binary relationships are supported.
Reading cardinality indicators on non-binary relationships are a common source of confusion. See my answer to designing relationship between vehicle,customer and workshop in erd diagram for more info on how I handle this.
Your solution has some problems. First, Buys is indicated as an associative entity, but is used like a ternary relationship with an optional role. Neither is correct in my opinion. See my answer to When to use Associative entities? for an explanation of associative entities in the ER model.
Modeling a purchase transaction as a relationship is usually a mistake, since relationships are identified by the (keys of the) entities they relate. If (CustomerID, ProductID) is identifying, then a customer can buy a product only once, and only one product per transaction. Adding a date/time into the relationship's key is better, but still problematic. Adding a surrogate identifier and turning it into a regular entity set is almost certainly the best course of action.
Second, the Crow's foot cardinality indicators are unclear. It looks like customers and products are optional in the Buys relationship, or even as if multiple customers could be involved in the same transaction. There are three different concepts involved here - optionality, participation and cardinality - which should preferably be indicated in different ways. See my answer to is optionality (mandatory, optional) and participation (total, partial) are same? for more on the topic.
A card is optional for a purchase transaction. From the description, it sounds as if cards may participate totally, meaning we won't store information about a card unless it's used in a transaction. Furthermore, only a single card can be related to each transaction.
A customer is required for a purchase transaction, and it sounds like customers may participate totally, meaning we won't store information about customers unless they purchase something. Only a single customer can be related to each transaction.
Products are required for a purchase transaction, and since we'll offer products before they're bought, products will participate partially in transactions. However, multiple products can be related to each transaction.
I would represent transactions for this problem with something like the following structure:
I'm not saying converting a ternary or higher relationship into an entity set is always the right thing to do, but in this case it is.
Physically, that would require two tables to represent (not counting Customer, Product, Card or ProductReview) since we can denormalize TransactionCustomer and TransactionCard into Transaction, but TransactionProduct is a many-to-many relationship and requires its own table (as do ternary and higher relationships).
Transaction (TransactionID PK, TransactionDateTime, CustomerID, CardID nullable)
TransactionProduct (TransactionID PK, ProductID PK, Quantity, Price)
This is probably a simple problem for an experienced database developer, but I'm struggling... I have trouble translating a certain ER diagram to a DB model, any help is appreciated.
I have a setup similar to slide 17 of this presentation:
http://www.cbe.wwu.edu/misclasses/mis421s04/presentations/supersubtype.ppt
Slide 17 shows an ER diagram with an Employee supertype having an Employee Type attribute and as subtypes the Employee Types themselves (Hourly, Salaried and Consultant), which is very similar to my design situation.
In my case, suppose Salaried Employees are the only ones that can be bosses of other employees and I wanted to somehow indicate if a certain Salaried employee is the boss of the Hourly and/or Salaried Employee and/or Consultant (either, none or both), how could that be designed in a database model, also considering these are one-to-many relationships?
I can put a PK-FK relationship between them, which would result in all tables having two FKeys and (like Consultant having FK_Employee and FK_SalariedEmployee) and SalariedEmployee referencing itself, but I keep thinking that might not be the wisest solution....although I'm not sure why (integrity issues?).
Is this or an acceptable solution or is there a better one?
Thanks in advance for any help!
Your case looks like an instance of the design pattern known as “Generalization Specialization” (Gen-Spec for short). The gen-spec pattern is familiar to object oriented programmers. It’s covered in tutorials when teaching about inheritance and subclasses.
The design of SQL tables that implement the gen-spec pattern can be a little tricky. Database design tutorials often gloss over this topic. But it comes up again and again in practice.
If you search the web on “generalization specialization relational modeling” you’ll find several useful articles that teach you how to do this. You’ll also be pointed to several times this topic has come up before in this forum.
The articles generally show you how to design a single table to capture all the generalized data and one specialized table for each subclass that will contain all the data specific to that subclass. The interesting part involves the primary key for the subclass tables. You won’t use the autonumber feature of the DBMS to populate the sub class primary key. Instead, you’ll program the application to propagate the primary key value obtained for the generalized table to the appropriate subclass table.
This creates a two way association between the generalized data and the specialized data. A simple view for each specialized subclass will collect generalized and specialized data together. It’s easy once you get the hang of it, and it performs fairly well.
In your specific case, declaring the "boss of" FK to reference the PK in the Salaried Employees table will be enough to do the trick. This will produce the two way association you want, and also prevent employees who are not salaried from being referenced as bosses.
Hi I am doing an assignment on ER modelling and there is a part that I'm stuck on, here is an extract:
Patient is a person who is either admitted to the hospital or is registered in an outpatient program. Each patient has a patient number (ID), name, dob, and tele. Resident patients have a Date Admitted. Each outpatient is scheduled for zero or more return visits, which have data and comments. Each time a patient is admitted to the hospital or registered as an outpatient, they receive a new patient number.
I can't do the last section bolded. I have attempted the question: http://tinypic.com/r/358dus9/4
Also if anyone can check if I've done it correctly, would be highly appreciated thanks!
Sometimes assignments also contain "information" that is pretty much immaterial.
The purpose is precisely to learn to filter out the 'real' information from the noise.
(With the caveat that there are dozens and dozens of ER dialects, and each has its own peculiarities,) ER does not have a way to express the information that "attribute x in entity y is to be autogenerated by the system.". For this reason, and as far as the actual ER modeling is concerned, your bold phrase is just noise.
I agree with Erwin on this one. I'll add that not having to have a consistent structure for the patient means that you don't have to create another table for the patient, you can just put it into the ER case directly.
Generally, this is a bad practice however. In reality, you would still have a regular patients table with identifiable patients over several visits. Then again, this is a class and as we all know, the #1 rule is not to disobey the teacher (no matter how insane it is). The real lesson here is to learn how to take requirements, have them clarify the requirements, explain the consequences if they don't follow your advice on how the data will be modeled and then go ahead with whatever they say as they have the final say as the client.
Depends on the course that you're taking, as well. Microsoft SQL Server/SQL Express has the autonumber setting possible, while Oracle does not feature this (although it's accomplished through this). Insofar as the modeling is concerned, there is no way to model that requirement specifically, as far as I know.
Entity-relationship diagrams are used to model the relationships and the data itself as it exists. What you're looking for is more of a UML approach to describing the process in which it has data created for that field.
I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...