I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.
This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.
Considering that this community has questions related to modeling databases, I am here seeking help.
I'm developing an auction system based on another one seen in a book I'm reading, and I'm having trouble. The problem context is the following:
In the auction system, a user makes product announcements (he/she defines a product). It defines
the product name, and the initial offer (called initial bid). The initial bid expresses the minimum amount to be offering. A
product only exists in the database when it is announced by a user. A
user defines a number of products, but a product belongs to a single
user.
Every product has an expiration date. When a certain date arrives, if
there are no offers for the product, it is not sold. If there are
offers for the product, the highest bidder wins the given product.
The offers have a creation date, and the amount offered. An offer is
made to a product from a user. A user can make different offers for
different products. A product can be offered by different users. The
same user can not do more than two offers for the same product.
This kind of context for me is easy to do. The problem is that I need to store a purchase. (I'm sorry, but I do not know if the word is that in English). I need a way to know which offer was successful, and actually "bought" a product. Relative to what was said, part of my Conceptual Model (Entity Relationship Diagram) is as follows:
I've tried to aggregate USERS with PRODUCTS, and make a connection/relationship between the aggregation and PRODUCTS, which would give me the PURCHASES event. When this was broken down (decomposed) I would have a table showing which offer bought what product.
Nevertheless, I would have a cardinality problem. This table would have two foreign keys. One for BIDS, and the other for PRODUCTS. This would allow an N-N relationship of these two, meaning that I could save more than one bid as the buyer of a product, or that the same bid could "buy" many products (so I say in the resulting PURCHASES table).
Am I not seeing something here? Can you guys help me with this? Thank you for reading, for your time, and patience. And if you need some more detail, please do not hesitate to ask.
EDIT
The property "Initial Bid" on the PRODUCTS entity is not a relationship.
This property represents a monetary value, a minimum amount that an offer must have to be given to a particular product.
You are approaching things backwards. First we determine a relevant application relationship, then we determine its properties. Given an application relationship and the possible application situations that can arise, only certain relationship/association sets/tables can arise. A purchase can have only one bid per product and one product per bid. So it is just a fact that the bid:product cardinality of PURCHASES is 1:1. Say so. This is so regardless of what other application relationships you wish to record. If it were a different application relationship between those entities then the cardinality might be different. Wait--USERS_PRODUCTS and BIDS form exactly such a pair of appplication relationships, with user:product being 1:0..N for the former and 0..N:0..N for the latter. Cardinality for each participant reflects (is a property of) the particular application relationship that a diamond represents.
Similarly: Foreign keys are properties of pairs of relationship/association sets/tables. Declaring them tells the DBMS to enforce that participant id values appear in entity sets/tables. Ie {bid} referencing BIDS and {product} referencing PRODUCTS. And candidate keys (of which one can be a primary key) are a property of a relationship set/table. It is the candidate keys of PURCHASES that lead to declaration of a constraint that enforces its cardinality. It isn't many:many on bid:product because "bid BID purchased product PRODUCT at price $AMOUNT on date DATE" isn't many:many on bid:product. Since either its bid or its product uniquely identify a purchase both {bid} and {product} are candidate keys.
Well... I tried to follow the given advices, and also tried to follow what I had previously spoken to do, using aggregation. It seems to me that the result is correct. Please observe:
Of course, a user makes offers for products. A user record can relate to multiple records in PRODUCTS. Likewise, a product can be offered by multiple users, and thus, a record in PRODUCTS can relate to multiple records on USERS. If I'm wrong on this, please correct me.
Looking at purchase, we see that a product is only properly purchased by a single bid. There is no way to say that a user buys a product, or a product purchase itself. It is through the relationship between USERS and PRODUCTS that an bid is created, and it is an bid that is able to buy a product. Thus, it is necessary to aggregate such a relationship, then set the purchase event.
We must remember that only one product can be purchased for a single bid. So here we have a cardinality of 1 to 1. Decomposition here will ask discretion of the data modeler.
By decomposing this Conceptual Model, we will have the following Logic Model:
Notice how relationships are respected with appropriate attributes. We can know who announced a product (the seller), and we can know what offers were made to it.
Now, as I said before, there is the PURCHASES relationship. As we have a relationship of 1 to 1 here, the rule tells us that we must choose which side the relationship will be interpreted (which entity shall keep such a relationship).
It is usually recommended to keep the relationship on the side that is likely in the future to become a "many". There is no need to do so in this case because it is clear that this relationship can not be preserved in a BIDS record. Therefore, we maintain such a relationship portrayed by "Winner Bid" on the PRODUCTS entity. As you can see, I set a unique identifier for BIDS. By defining the Physical Model, we have a surrogate key.
Well, I finish my answer here, but I think I can not consider it right yet. I would like someone to comment anything if possible. Thank you.
EDIT
I'd like to apologize. It seems that there was a mistake on my part. Well, I currently live in Brazil, and here we learn the ER-Diagram in a way that seems to me to be different from what many of you are used to. Until yesterday I've been answering some questions related to the subject here in the community, and I found odd certain different notations used. Only now I'm noticing that we are not speaking the same language. I believe it was embarrassing for me in a way. I'm sorry. Really, I'm sorry.
Oh, and one more thing (it may be interesting for someone):
The cardinality between BIDS and PRODUCTS is really wrong. It should
be 0 to 1 in both, as was said in the comments.
There are no relationships relating with relationships here. We have
what is called here (in my country, or in my past course)
"Aggregation" (Represented in the first drawing by the rectangle with clipped lines). The internal relationship (BIDS) when decomposed will
become an entity, and the relationship between BIDS and PRODUCTS is
made later. An aggregation says that a relationship needs to be resolved first so that it can relate with another entity. This creates a certain restriction, which may be necessary in certain business rules. It says that the joining of two entities define records that relate to records of another entity.
Certain relationships do not necessarily need to be named. Once you
break down certain types of relationships, there is no need (it's
optional) to name them (between relations N to N), especially if new
relationship does not arise from these. Of course, if I were to name the white relationships presented, we then would have USER'S BIDS (between USERS and BIDS), and PRODUCT'S BIDS (between BIDS and PRODUCTS).
With respect to my study material, it's all in portuguese, and I believe you will not understand, I do not know. But here's one of the books I read during my past classes:
ISBN: 978-85-365-0252-6
Title: PROJETO de BANCO de DADOS Uma Visão Prática
Authors: Felipe Machado, Mauricio Abreu
Publishing company: EDITORA ÉRICA
COVER:
LINK:
http://www.editorasaraiva.com.br/produtos/show/isbn:9788536502526/titulo:projeto-de-banco-de-dados-uma-visao-pratica-edicao-revisada-e-atualizada/
Well... What do I do now? Haha... I do not know what to do. I'll leave my question here. If someone with the same method that I can help me, I'll be grateful. Thank you.
To record the winning bid requires a functional dependency Product ID -> Bid ID. This could be implemented as one or more nullable columns (since not all products have been purchased yet) in the Products table, or better in a separate table (Purchases) using Product ID as primary key.
I'm trying to develop a database model for candidate, their registered exams and result of the exams when its being taken.
This is what I've done so far. however im unsure if am on the right track especially from the examination table to the examination result table.
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
the examination types are categorised into science, art and social science. they all have 4 components each
Note on Progression
Given the fact that the Question changes substantially (in clarifying the requirement, not is scope) in response to my Response and TRD, this is going to take some back-and-forth. Let's identify Steps: your Step numbers are odd, starting from 1; mine, in response, are even. Parts of previous Response Steps have become obsolete, they may no longer make sense.
I would suggest a bounty, except for the fact that you have few points.
Response Step 2 to Initial Question & Step 1 Diagram
This is what I've done so far.
You have done some good work, but it is too early for assigning PKs. Besides, assigning an ID on every file as a starting point will cripple the modelling process, the result will not be a database. You have to model the data (not the database) first, then assign Keys when the entities are clear and stable. So drop all your IDs and PKs and model the data, as data. Forget about what you want to do with the data (ie. forget the app).
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
Right now you can't. You have no relationship between Candidate and Examination[Result]. That is not a problem because the modelling is incomplete at this stage, when it is complete the code will be simple.
The entity Course is implied, but it is missing.
however im unsure if am on the right track especially from the examination table to the examination result table
You are on the right track with some of the other files, but the Examination cluster needs work. This will take a bit of back-and-forth. Once you answer the questions in the comments, we can proceed.
The main issue is this: how is Examination identified.
An ID field does not identify anything, nor does it provide uniqueness in the data, which is required if you want data integrity. IDs result in a Record Filing System with no integrity, however, it appears you want a database with data integrity. Is that correct ?
Go back to the user and discuss how courses and components are identified, what codes they use, etc. Those are the natural Keys that they use to identify their data, that they will enter into the system when they need look something up, or to enter examination results.
Eg. It is not reasonable to contemplate an Examination that exists independently (as you have modelled it). People do not go to a hall and sit for any old exam. The exam exists only in the context of a course, they sit for an exam for a course.
Then the course, and not the exam, has components, which are examined. And each course has a different number of components.
Eg. a Course which is identified as ENG101 for English Literature year 1
And then the components within that. Eg. 2b Short essay on poetry.
They may need to identify the year and semester of the course as well, in which case, you need a CourseOffering per semester.
Consider this, as a discussion point. Courier is example data, blue is Key, green is non-key:
TRD Step 2
Response Step 4
Response to Question & Description
This is what I've done so far.
My previous response still applies:
You have done some good work, but it is too early for assigning PKs. Besides, assigning an ID on every file as a starting point will cripple the modelling process, the result will not be a database. You have to model the data (not the database) first, then assign Keys when the entities are clear and stable. So drop all your IDs and PKs and model the data, as data. Forget about what you want to do with the data (ie. forget the app).
You have not addressed that issue, that I identified in your Step 1 Diagram, in your Step 3 Diagram. It appears, from the evidence, that you might be happy with IDs as "Primary Keys" (there aren't), despite the hindrance having been identified to you. That means your understanding of the data is crippled, and the progress of your diagrams will be slow.
My previous response still applies:
An ID field does not identify anything, nor does it provide uniqueness in the data, which is required if you want data integrity. IDs result in a Record Filing System with no integrity, however, it appears you want a database with data integrity. Is that correct ?
You must answer these questions, otherwise your design cannot proceed. These are severe errors that must be corrected. One cannot build on, or progress, a foundation that contains severe errors.
Could you please confirm, you do want a Relational Database, with the integrity and performance that Relational Databases are capable of, that is easy to code against, as opposed to a Record Filing System, with no integrity or speed, that will be difficult to code against. Correct ?
If [1] is correct. Since ID fields as "Primary Keys" do not provide row uniqueness, which is demanded for a Relational Database, how exactly, do you intend to provide the required row uniqueness ? Alternately, are you happy to have an RFS that is full of duplicate rows (each with an unique record ID) ?
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
My previous response still applies:
Right now you can't. You have no relationship between Candidate and Examination[Result]. That is not a problem because the modelling is incomplete at this stage, when it is complete the code will be simple.
Ok, in your Step 3 Diagram, you have drawn a line between Candidate file and the ExaminationResult file (as opposed to, inserting a relationship in a database).
In a record filing system, sure, you can just draw a line between any two files, insert the relevant ID field, and hey presto, you have "linked" or "connected" or "mapped" the two files.
But database design (as opposed to file design) does not progress like that, you cannot just draw a line between any two objects, insert the relevant ID field, and hey presto, create a database relationship. No. There is no basis, no integrity, in the dashed line that you have drawn. Eg. in your Step 3 Diagram, any Candidate can be related to any Examination[Result].
That is "normal" or "ordinary" in record filing systems, but in a database, it is something to be recognised and understood as an error, and thus prevented. Because we expect integrity in a database, and because it can be prevented, easily.
however im unsure if am on the right track especially from the examination table to the examination result table
My previous response still applies:
You are on the right track with some of the other files, but the Examination cluster needs work. This will take a bit of back-and-forth. Once you answer the questions in the comments, we can proceed.
The main issue is this: how is Examination identified.
An ID field does not identify a row (it identifies a record, which has no relevance whatsoever in a database).
The same two problems (a) lack of a valid identifier, and (b) lack of row uniqueness, exists with your Candidate, Component and ExaminationResult files.
Response to Diagram as a Diagram (as opposed to the content)
You have improved it over your Step 1 Diagram, and in response to my Response Step 2, great. But the relationships (most of them) are still incorrect. And the basis of Candidate::Examination is still not resolved.
It appears to me that you are not clear about the notation (notches; circles; crows feet) and precisely what they mean at the parent and child ends). So you need to learn that first, and then draw the diagram, rather than the other way round.
It is great that you are using a Notation that is meaningful, and many details are shown (many people don't, they draw nice-looking diagrams that lack the detail required for a full understanding of the model. That means that every notch; circle; crows foot, has specific meaning, and must be drawn correctly, in order to convey that meaning to the reader.
Entities do not exist in isolation, there must always be a parent first, in order for the child to be a child of the parent. There is no such thing as "equal". Dependency is always in one direction.
Your relationships that are 1-and-only-1 on one side, and 1-and-only-1 on the other side, are incorrect, they indicate a Normalisation error. The field in the subordinate record can be Normalised into the ordinate record.
Eg. AdmissionLetter is not a separate file, some form of AdmissionLetter identifier (not an ID field) should be located in Candidate.
Eg. Title::Candidate is a drawing error, it should be 1 at the Title end and 0-to-many at the Candidate end.
In a data model, bold (by convention) means a migrated Foreign Key. The Primary Key that is migrated is not bold.
Response to Diagram Content
From your replies, the term Subject trumps the term Component; Category trumps various loosely-identified elements into one clear entity.
It is not reasonable to contemplate an Examination that exists independently (as you have modelled it).
People do not go to a hall and sit for any old exam, any old Subject. The exam exists only in the context of a Subject, they sit for an exam for a Subject.
I accept that the Examination is one sitting, for four Subjects
I accept that the four Subjects are defined by a Category.
I accept that the Candidate is registered for a Category.
Thus the exam exists only in the context of a Subject, which exists only in the context of a Category, and the Candidate sits for an exam which is a Category, which contains four (the number does not matter) Subjects.
Having resolved that, two questions remain:
Do you need to record an Examination as an event, independent of the Candidates who sit in that event. Eg. Examination(Location, DateTime) ?
Does the Examination event examine Candidates in one, or more than one, Category ?
The notion of four Subjects that are implemented as four repeated fields in one record breaks Second Normal Form, which demands that repeating fields are Normalised into separate records in a child file.
Therefore, for both your Component and ExaminationResult files, that issue needs to be resolved.
Note that the fact that that problem is repeated in two separate files is a second alarm that it is an error.
I have clarified the Category/Subject issues for you, and resolved the Normalisation error.
I have given simple identifiers for Categories and Subjects.
If you do not implement that, you will not have integrity between the Candidate and the Subject they are being Examined for. As well, you will suffer various problems when you get to the coding stage.
I have no idea what you are trying to do with exComp, therefore I have no response. Perhaps you can say a few words about it.
Thus far, there is still no reasonable way of relating Candidates to Examinations or ExaminationResults. That is, it has no basis, nothing has been defined as the basis for the relationship, and thus the relationship has no integrity.
On the basis of what I have been able to ascertain thus far, there must be some sort of registration for an exam. Otherwise you would not know that a Candidate is sitting for an exam.
When the Candidate registers, they register for an exam, and that exam is defined (and therefore constrained) by a Category. Otherwise any Candidate can sit for any exam, which I believe, you would like to prevent.
Further, the [four] exam Subjects that they sit for, should be constrained by the Category that they registered for.
You do want to ensure that you do not record an Economics exam result for a Candidate who is registered for Science, correct ?
I have determined that the basis of an exam is the Registration. That is the event, the fact, the recording of which, establishes that a Candidate will sit for an exam.
The identifier virtually jumps out at you, it is CategoryCode plus CandidateID. Voila! we have row uniqueness. Magnifique! we have integrity.
Now the integrity of ExaminationResult can be implemented: it is constrained to the CandidateRegistration::Category and to the Category::Subject.
To be Resolved: Do you need to identify the fact of a Candidate registering for an examination (RegistrationDate, AdmissionLetter of whatever) vs the fact that the Candidate sat for the examination (eg. ExaminationDate) ? A sort of roll call.
Right now, I have modelled that as a single fact with no differentiation, and the table is called Examination because you seem to be focussed on that.
Predicate
These days, people seem to be throwing themselves at drawing a diagram, without understanding either the basics of a Relational Database, or of the exercise of modelling data. Predictably, that results in an ill-defined diagram (many relevant details are omitted) [gratefully, your diagram has some definition], and it produces a record filing system with no integrity, no relational power, no speed, instead of a Relational Database with integrity, power, and speed.
One concept that is often missing is Predicates. A competent reader can read a good data model, and ascertain the Predicates, because they are drawn in the model, in the form of notation, but a novice doesn't understand the notation, or the relevance of the various items, and therefore will miss the Predicates. In sum, the Predicates are all the constraints that are placed on the data:
Row Identification:
The basis of it existence, and how it is Identified: Independent (square corners); or Dependent (round corners)
Row Uniqueness: Primary and Alternate Keys (note, IDs are not Keys)
Relationships between rows:
Identifying (solid lines); or Non-identifying (dashed lines)
Meaning, relevance, purpose: the all-important Verb Phrase
Further, a novice cannot determine the Predicates when there is no diagram, or when the diagram is poor, or when they are designing the filing system and drawing the diagram themselves. Thus they do not identify the relevant Predicates in their diagram.
Predicates are very important during the modelling exercise, in that as well as the model expressing the Predicates, the Predicates confirm the accuracy of the model, it is a feedback loop. It is an essential part of the modelling exercise. Since I am executing the modelling task for you, I am working out the Predicates as I perform that task, they are obvious to me. But they may not be obvious to you.
When the data model is published, and ready for discussion with the users, these Predicates are incorporated into it. They come under the heading of Business Rules, they form a part of that, because that is the way the user perceives them. Consequently, during the walkthroughs and discussions, the Predicates (as well as the other stated Business Rules) are either confirmed or denied by the user. They need to be stated explicitly, because unlike the technically educated developer, the user cannot be expected to read all the relevant Predicates from the notation in a good data model.
In this situation, I am the modeller, and you are the "user". Thus I have decided to provide the Predicates for you, explicitly. So that you can confirm or deny them, and thus we can progress the modelling exercise. Once you get used to reading the Predicates from a good data model, you will not need to have them declared explicitly for you. Again, Predicates are very important because they verify (or not) the accuracy of the model. So please read them carefully and comment on any Predicates that you do not completely agree with, or that you do not understand.
Of course, it is not necessary to explicitly declare all the Predicates, there are just too many, we declare just the more relevant ones, that relate to:
(a) rows (tables), the basis for their existence
(b) their identification
(c) all dependencies
(d) relationships, both sides (one side is the Verb Phrase).
Step 4 TRD
I have implemented all the above, as detailed. Please consider this TRD as a discussion platform for the next iteration, and comment. Courier indicates example data, blue indicates Key values, green indicates non-key values:
Step 4 TRD
Response Step 6 to Chat Step 5
All issues discussed have been resolved, and implemented in the model. Sorry, I do not have time right now to post details, this is simply identifies the updated models.
Entity-Relation and full Predicates on page 1
All resolved issues have been implemented.
Predicates
Now that it is stable, I am now giving you the second side of the Relation Predicates (child-to-parent). And now that you understand them, I have deleted the repeated, annoying "Each" that is demanded for novices.
Entity-Relation-Key on page 2
Now that the TRD is stable, we are ready to proceed to Determination of Keys
(Second only to Normalisation, Key Determination is a critical part of the modelling exercise. The two tasks are normally performed side-by-side, they are inseparable, I have already determined the keys. In this case, given the limitations of the communication media, I am presenting it as a sequential step).
Here, I use an Extension to the IDEF1X Notation that allows me to concentrate of the components that are relevant to the task, I expect that it is self-explanatory. The Key columns only, are given. Foreign Keys are not Bold (as they are in the DM). All that, is intended to make it easy on the eye.
Most tables have one Key (Primary). Where there are two Keys (Primary and Alternate), the AK is below the line.
This is my recommendation for the Keys, as requested, for your review.
Step 6 TRD and TRK 6
Currently I am having trouble learning normalization. While I know basic concepts behind 1NF - 3NF, I still do not understand the steps that one needs to follow before normalization.
According to my understanding, one has to first collect base entities, their attributes, relation among the entities and then start normalization. But I do not understand, whether I am supposed to normalize all attributes at once or normalize attributes of the entities that have some sort of relation with each other.
Considering an example of a store.
store(name, address, contact)
customer(sn, name, address)
item(id, name, price)
transaction(id, date, customer_sn, item_id, quantity, total_price)
According to my understanding I would either try to normalize all attributes at once or normalize attributes of only customer, item and transaction.
I know I am missing something, I just cannot figure it out.
Any help is appreciated. Thanks for your valuable time.
According to my understanding, one has to first collect base entities,
their attributes, relation among the entities and then start
normalization.
No, you don't have to do that. And in fact it's often hard, if not impossible, to even identify some base entities before you start normalizing.
At the logical level, you start by collecting all the attributes you need, and you put them in one big relation. You will find smaller examples of this in virtually every textbook about database systems.
After you collect all the required attributes, determine the dependencies among them.
For a store, you might collect these attributes.
A. store name
B. store physical address
C. item sku
D. item name
E. item price
F. transaction timestamp
G. transaction type (as "Purchase", "Return", "Store Credit", etc.)
H. transaction register (which machine performed the transaction)
I. transaction cashier (which employee performed the transaction)
J. transaction item sku
K. transaction item price
L. transaction item quantity
M. transaction item extended price
N. transaction item sales tax
O. transaction total
P. transaction payment type (cash, check, credit card)
Q. transaction payment amount
Following that, you might determine that these functional dependencies apply.
A->B
B->A
C->D
C->E
FH->GI
FH->O
J->K
FHJ->L
KL->M
KL->N
M->N
This is where you start normalizing. Each time you decompose one relation to raise it to a higher normal form, you add another relation. Each time you add another relation, you apply all the principles of normalization to it, too. The process is recursive.
Good CASE tools can generate every possible 5NF decomposition based on that list of dependencies.
Other design decisions are also important, but might have nothing to do with normalization.
For example, one design project might decide to store only one phone number per person. Another project might decide to store several phone numbers for each person. That kind of decision is important, but it has nothing to do with normalization. That is, storing one phone number per person doesn't violate any normal form, and neither does storing multiple phone numbers per person.