Schema design: many to many plus additional one to many - database

I have this scenario and I'm not sure exactly how it should be modeled in the database. The objects I'm trying to model are: teams, players, the team-player membership, and a list of fees due for each player on a given team. So, the fees depend on both the team and the player.
So, my current approach is the following:
**teams**
id
name
**players**
id
name
**team_players**
id
player_id
team_id
**team_player_fees**
id
team_players_id
amount
send_reminder_on
Schema layout ERD
In this schema, team_players is the junction table for teams and players. And the table team_player_fees has records that belong to records to the junction table.
For example, playerA is on teamA and has the fees of $10 and $20 due in Aug and Feb. PlayerA is also on teamB and has the fees of $25 and $25 due in May and June. Each player/team combination can have a different set of fees.
Questions:
Are there better ways to handle such
a scenario?
Is there a term for this type of
relationship? (so I can google it) Or know of any references with similar structures?

Thus is a perfectly fine design. It is not uncommon for a junction table (AKA intersection table) to have attributes of its own - such as joining_date - and that can include dependent tables. There is, as far as I know, no special name for this arrangement.
One of the reasons why it might feel strange is that these tables frequently don't exist in a logical data model. At that stage they are represented by a many-to-many join notation. It's only when we get to the physical model that we have to materialize the junction table. (Of course many people skip the logical model and go straight to physical.)

Related

Redundant relation: Is this a violation of database normalization?

I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.
This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.

Relational Database: When do we need to add more entities?

We had a discussion today related to W3 lecture case study about how many entities we need for each situation. And I have some confusion as below:
Case 1) An employee is assigned to be a member of a team. A team with more than 5 members will have a team leader. The members of the team elect the team leader. List the entity(s) which you can identify in the above statement? In this cases, if we don't create 2 entities for above requirement, we need to add two more attributes for each employee which can lead to anomaly issues later. Therefore, we need to have 2 entities as below:
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK teamId&employeeId) -> 2 entities
Case 2) The company also introduced a mentoring program, whereby a new employee will be paired with someone who has been in the company longer." How many entity/ies do you need to model the mentoring program?
The Answer from Lecturer is 1. With that, we have to add 2 more attributes for each Employee, mentorRole (Mentor or Mentee) and pairNo (to distinguish between different pairs and to know who mentors whom), doesn't it?
My question is why can't we create a new Entity named MENTORING which will be similar to TEAM in Q1? And why we can only do that if this is a many-many relationship?
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK is pairNo&employeeId) -> 2 entities
Thank you in advance
First of all, about terminology: I use entity to mean an individual person, thing or event. You and I are two distinct entities, but since we're both members of StackOverflow, we're part of the same entity set. Entity sets are contrasted with value sets in the ER model, while the relational model has no such distinction.
While you're right about the number of entity sets, there's some issues with your implementation. TEAM's PK shouldn't be teamId, employeeId, it should be only teamId. The EMPLOYEE table should have a teamId foreign key (not part of the PK) to indicate team membership. The employeeId column in the TEAM table could be used to represent the team leader and is dependent on the teamId (since each team can have only one leader at most).
With only one entity set, we would probably represent team membership and leadership as:
EMPLOYEE(employeeId PK, team, leader)
where team is some team name or number which has to be the same for team members, and leader is a true/false column to indicate whether the employee in that row is the leader of his/her team. A problem with this model is that we can't ensure that a team has only one leader.
Again, there's some issues with the implementation. I don't see the need to identify pairs apart from the employees involved, and having a mentorRole (mentor or mentee) indicates that the association will be recorded for both mentor and mentee. This is redundant and creates an opportunity for inconsistency. If the goal was to represent a one-to-one relationship, there are better ways. I suggest a separate table MENTORING(menteeEmployeeId PK, mentorEmployeeId UQ) (or possibly a unique but nullable mentorEmployeeId in the EMPLOYEE table, depending on how your DBMS handles nulls in unique indexes).
The difference between the two cases is that teams can have any number of members and one leader, which is most effectively implemented by identifying teams separately from employees, whereas mentorship is a simpler association that is sufficiently identified by either of the two people involved (provided you consistently use the same role as identifier). You could create a separate entity set for mentoring, with relationships to the employees involved - it might look like my MENTORING table but with an additional surrogate key as PK, but there's no need for the extra identifier.
And why we can only do that if this is a many-many relationship?
What do you mean? Your examples don't contain a many-to-many relationship and we don't create additional entity sets for many-to-many relationships. If you're thinking of so-called "bridge" tables, you've got some concepts mixed up. Entity sets aren't tables. An entity set is a set of values, a table represents a relation over one or more sets of values. In Chen's original method, all relationships were represented in separate tables. It's just that we've gotten used to denormalizing simple one-to-one and one-to-many relationships into the same tables as entity attributes, but we can't do the same for many-to-many binary relationships or ternary and higher relationships in general.

Can or Should an ERD Action involve more than 2 Entities?

This is an problem about drawing ERD in one of my course:
A local startup is contemplating launching Jungle, a new one stop
online eCommerce site.
As they have very little experience designing and implementing
databases, they have asked you to help them design a database for
tracking their operations.
Jungle will sell a range of products, and they will need to track
information such as the name and price for each. In order to sell as
many products as possible, Jungle would like to display short reviews
alongside item listings. To conserve space, Jungle will only keep
track of the three most recent reviews for each product. Of course, if
an item is new (or just unpopular), it may have less than three
reviews stored.
Each time a customer buys something on Jungle, their details will be
stored for future access. Details collected by Jungle include
customer’s names, addresses, and phone numbers. Should a customer buy
multiple items on Jungle, their details can then be reused in future
transactions.
For maximum convenience, Jungle would also like to record credit card
information for its users. Details stored include the account and BSB
numbers. When a customer buys something on Jungle, the credit card
used is then linked to the transaction. Each customer may be linked to
one or more credit cards. However, as some users do not wish to have
their credit card details recorded, a customer may also be linked to
no credit cards. For such transactions, only the customer and product
will be recorded.
And this is the solution:
The problem is the Buys action connect with 3 others entities: Product, Customer, and Card. I find this very hard to read and understand.
Is an action involving more than 2 entities common in production? If it is, how should I understand and use it? Or if it's not, what is the better way of design for this problem?
While the bulk of relationships in practice are binary relationships, ternary and higher relationships are normal elements of the entity-relationship model. Some examples are supplies (supplier_id, product_id, region_id) or enrolled (student_id, course_id, semester_id). However, they often get converted into entity sets via the introduction of a surrogate identifier, due to dislike of composite keys or confusion with network data models in which only directed binary relationships are supported.
Reading cardinality indicators on non-binary relationships are a common source of confusion. See my answer to designing relationship between vehicle,customer and workshop in erd diagram for more info on how I handle this.
Your solution has some problems. First, Buys is indicated as an associative entity, but is used like a ternary relationship with an optional role. Neither is correct in my opinion. See my answer to When to use Associative entities? for an explanation of associative entities in the ER model.
Modeling a purchase transaction as a relationship is usually a mistake, since relationships are identified by the (keys of the) entities they relate. If (CustomerID, ProductID) is identifying, then a customer can buy a product only once, and only one product per transaction. Adding a date/time into the relationship's key is better, but still problematic. Adding a surrogate identifier and turning it into a regular entity set is almost certainly the best course of action.
Second, the Crow's foot cardinality indicators are unclear. It looks like customers and products are optional in the Buys relationship, or even as if multiple customers could be involved in the same transaction. There are three different concepts involved here - optionality, participation and cardinality - which should preferably be indicated in different ways. See my answer to is optionality (mandatory, optional) and participation (total, partial) are same? for more on the topic.
A card is optional for a purchase transaction. From the description, it sounds as if cards may participate totally, meaning we won't store information about a card unless it's used in a transaction. Furthermore, only a single card can be related to each transaction.
A customer is required for a purchase transaction, and it sounds like customers may participate totally, meaning we won't store information about customers unless they purchase something. Only a single customer can be related to each transaction.
Products are required for a purchase transaction, and since we'll offer products before they're bought, products will participate partially in transactions. However, multiple products can be related to each transaction.
I would represent transactions for this problem with something like the following structure:
I'm not saying converting a ternary or higher relationship into an entity set is always the right thing to do, but in this case it is.
Physically, that would require two tables to represent (not counting Customer, Product, Card or ProductReview) since we can denormalize TransactionCustomer and TransactionCard into Transaction, but TransactionProduct is a many-to-many relationship and requires its own table (as do ternary and higher relationships).
Transaction (TransactionID PK, TransactionDateTime, CustomerID, CardID nullable)
TransactionProduct (TransactionID PK, ProductID PK, Quantity, Price)

Entity Relationship Diagram - Where are the purchases?

Considering that this community has questions related to modeling databases, I am here seeking help.
I'm developing an auction system based on another one seen in a book I'm reading, and I'm having trouble. The problem context is the following:
In the auction system, a user makes product announcements (he/she defines a product). It defines
the product name, and the initial offer (called initial bid). The initial bid expresses the minimum amount to be offering. A
product only exists in the database when it is announced by a user. A
user defines a number of products, but a product belongs to a single
user.
Every product has an expiration date. When a certain date arrives, if
there are no offers for the product, it is not sold. If there are
offers for the product, the highest bidder wins the given product.
The offers have a creation date, and the amount offered. An offer is
made to a product from a user. A user can make different offers for
different products. A product can be offered by different users. The
same user can not do more than two offers for the same product.
This kind of context for me is easy to do. The problem is that I need to store a purchase. (I'm sorry, but I do not know if the word is that in English). I need a way to know which offer was successful, and actually "bought" a product. Relative to what was said, part of my Conceptual Model (Entity Relationship Diagram) is as follows:
I've tried to aggregate USERS with PRODUCTS, and make a connection/relationship between the aggregation and PRODUCTS, which would give me the PURCHASES event. When this was broken down (decomposed) I would have a table showing which offer bought what product.
Nevertheless, I would have a cardinality problem. This table would have two foreign keys. One for BIDS, and the other for PRODUCTS. This would allow an N-N relationship of these two, meaning that I could save more than one bid as the buyer of a product, or that the same bid could "buy" many products (so I say in the resulting PURCHASES table).
Am I not seeing something here? Can you guys help me with this? Thank you for reading, for your time, and patience. And if you need some more detail, please do not hesitate to ask.
EDIT
The property "Initial Bid" on the PRODUCTS entity is not a relationship.
This property represents a monetary value, a minimum amount that an offer must have to be given to a particular product.
You are approaching things backwards. First we determine a relevant application relationship, then we determine its properties. Given an application relationship and the possible application situations that can arise, only certain relationship/association sets/tables can arise. A purchase can have only one bid per product and one product per bid. So it is just a fact that the bid:product cardinality of PURCHASES is 1:1. Say so. This is so regardless of what other application relationships you wish to record. If it were a different application relationship between those entities then the cardinality might be different. Wait--USERS_PRODUCTS and BIDS form exactly such a pair of appplication relationships, with user:product being 1:0..N for the former and 0..N:0..N for the latter. Cardinality for each participant reflects (is a property of) the particular application relationship that a diamond represents.
Similarly: Foreign keys are properties of pairs of relationship/association sets/tables. Declaring them tells the DBMS to enforce that participant id values appear in entity sets/tables. Ie {bid} referencing BIDS and {product} referencing PRODUCTS. And candidate keys (of which one can be a primary key) are a property of a relationship set/table. It is the candidate keys of PURCHASES that lead to declaration of a constraint that enforces its cardinality. It isn't many:many on bid:product because "bid BID purchased product PRODUCT at price $AMOUNT on date DATE" isn't many:many on bid:product. Since either its bid or its product uniquely identify a purchase both {bid} and {product} are candidate keys.
Well... I tried to follow the given advices, and also tried to follow what I had previously spoken to do, using aggregation. It seems to me that the result is correct. Please observe:
Of course, a user makes offers for products. A user record can relate to multiple records in PRODUCTS. Likewise, a product can be offered by multiple users, and thus, a record in PRODUCTS can relate to multiple records on USERS. If I'm wrong on this, please correct me.
Looking at purchase, we see that a product is only properly purchased by a single bid. There is no way to say that a user buys a product, or a product purchase itself. It is through the relationship between USERS and PRODUCTS that an bid is created, and it is an bid that is able to buy a product. Thus, it is necessary to aggregate such a relationship, then set the purchase event.
We must remember that only one product can be purchased for a single bid. So here we have a cardinality of 1 to 1. Decomposition here will ask discretion of the data modeler.
By decomposing this Conceptual Model, we will have the following Logic Model:
Notice how relationships are respected with appropriate attributes. We can know who announced a product (the seller), and we can know what offers were made to it.
Now, as I said before, there is the PURCHASES relationship. As we have a relationship of 1 to 1 here, the rule tells us that we must choose which side the relationship will be interpreted (which entity shall keep such a relationship).
It is usually recommended to keep the relationship on the side that is likely in the future to become a "many". There is no need to do so in this case because it is clear that this relationship can not be preserved in a BIDS record. Therefore, we maintain such a relationship portrayed by "Winner Bid" on the PRODUCTS entity. As you can see, I set a unique identifier for BIDS. By defining the Physical Model, we have a surrogate key.
Well, I finish my answer here, but I think I can not consider it right yet. I would like someone to comment anything if possible. Thank you.
EDIT
I'd like to apologize. It seems that there was a mistake on my part. Well, I currently live in Brazil, and here we learn the ER-Diagram in a way that seems to me to be different from what many of you are used to. Until yesterday I've been answering some questions related to the subject here in the community, and I found odd certain different notations used. Only now I'm noticing that we are not speaking the same language. I believe it was embarrassing for me in a way. I'm sorry. Really, I'm sorry.
Oh, and one more thing (it may be interesting for someone):
The cardinality between BIDS and PRODUCTS is really wrong. It should
be 0 to 1 in both, as was said in the comments.
There are no relationships relating with relationships here. We have
what is called here (in my country, or in my past course)
"Aggregation" (Represented in the first drawing by the rectangle with clipped lines). The internal relationship (BIDS) when decomposed will
become an entity, and the relationship between BIDS and PRODUCTS is
made later. An aggregation says that a relationship needs to be resolved first so that it can relate with another entity. This creates a certain restriction, which may be necessary in certain business rules. It says that the joining of two entities define records that relate to records of another entity.
Certain relationships do not necessarily need to be named. Once you
break down certain types of relationships, there is no need (it's
optional) to name them (between relations N to N), especially if new
relationship does not arise from these. Of course, if I were to name the white relationships presented, we then would have USER'S BIDS (between USERS and BIDS), and PRODUCT'S BIDS (between BIDS and PRODUCTS).
With respect to my study material, it's all in portuguese, and I believe you will not understand, I do not know. But here's one of the books I read during my past classes:
ISBN: 978-85-365-0252-6
Title: PROJETO de BANCO de DADOS Uma Visão Prática
Authors: Felipe Machado, Mauricio Abreu
Publishing company: EDITORA ÉRICA
COVER:
LINK:
http://www.editorasaraiva.com.br/produtos/show/isbn:9788536502526/titulo:projeto-de-banco-de-dados-uma-visao-pratica-edicao-revisada-e-atualizada/
Well... What do I do now? Haha... I do not know what to do. I'll leave my question here. If someone with the same method that I can help me, I'll be grateful. Thank you.
To record the winning bid requires a functional dependency Product ID -> Bid ID. This could be implemented as one or more nullable columns (since not all products have been purchased yet) in the Products table, or better in a separate table (Purchases) using Product ID as primary key.

What is the best way to represent a constrained many-to-many relationship within a relational database?

I would like to establish a many-to-many relationship with a constraint that only one or no entity from each side of the relationship can be linked at any one time.
A good analogy to the problem is cars and parking garage spaces. There are many cars and many spaces. A space can contain one car or be empty; a car can only be in one space at a time, or no space (not parked).
We have a Cars table and a Spaces table (and possibly a linking table). Each row in the cars table represents a unique instance of a car (with license, owner, model, etc.) and each row in the Spaces table represents a unique parking space (with address of garage floor level, row and number). What is the best way to link these tables in the database and enforce the constraint describe above?
(I am using C#, NHibernate and Oracle.)
If you're not worried about history - ie only worried about "now", do this:
create table parking (
car_id references car,
space_id references space,
unique car_id,
unique space_id
);
By making both car and space references unique, you restrict each side to a maximum of one link - ie a car can be parked in at most one space, and a space can has at most one car parked in it.
in any relational database, many to many relationships must have a join table to represent the combinations. As provided in the answer (but without much of the theoretical background), you cannot represent a many to many relationship without having a table in the middle to store all the combinations.
It was also mentioned in the solution that it only solves your problem if you don't need history. Trust me when I tell you that real world applications almost always need to represent historical data. There are many ways to do this, but a simple method might be to create what's called a ternary relationship with an additional table. You could, in theory, create a "time" table that also links its primary key (say a distinct timestamp) with the inherited keys of the other two source tables. this would enable you to prevent errors where two cars are located in the same parking spot during the same time. using a time table can allow you the ability to re-use the same time data for multiple parking spots using a simple integer id.
So, your data tables might look like so
table car
car_id (integers/numbers are fastest to index)
...
table parking-space
space_id
location
table timeslot
time_id integer
begin_datetime (don't use seconds unless you must!)
end_time (don't use seconds unless you must!)
now, here's where it gets fun. You add the middle table with a composite primary key that is made up of car.car_id + parking_space.space_id + time_id. There are other things you could add to optimize here, but you get the idea, I hope.
table reservation
car_id PK
parking_space_id PK
time_id PK (it's an integer - just try to keep it as highly granular as possible - 30 minute increments or something - if you allow this to include seconds / milliseconds /etc the advantages are cancelled out because you can't re-use the same value from the time table)
(this would also be the place to store variable rates, discounts, etc distinct to this particular account, reservation, etc).
now, you can reduce the amount of data because you aren't replicating the timestamp in the join table (reservation). By using an integer, you can re-use that timeslot for multiple parking spaces, but you could also apply a constraint preventing two cars from renting that given spot for the same "timeslot" for a given day / timeframe. This would also make it easier to store some history about the customers - who knows, you might want to see reports on customers who rent more often and offer them discounts or something.
By using the ternary relationship model, you are making each spot unique to a given timeslot (perhaps with some added validation rules), so the system can only store one car in one parking spot for one given time period.
By using integers as keys instead of timestamps, you are assured that the database won't need to do any heavy lifting to index the keys and sort / query against. This is a common practice in data warehousing / OLAP reporting when you have massive datasets and you need efficiency. I think it applies here as well.
create a third table.
parking
--------
car_id
space_id
start_dt
end_dt
for the constraint, i guess the problem with your situation is that you need to check a complex rule against the intersection table itself. if you try this in a trigger, it will report a mutation.
one way to avoid this would be to replicate the table, and query against this replication for the constraint.

Resources