So I am taking a class in database design and management and am kind of confused from a design perspective. My example is an invoice system. I just made it up quick so it doesn't have a ton of complexity in it.
There are Customers, Orders, Invoices and Payments entities
Customers
CustId(PK),
Street,
Zip,
City,
..
Orders
OrderID(PK)
CustID(FK)
Date
Amt
....
Invoices
InvoiceID(PK),
OrderID(FK),
Date,
AmtDue,
AmtPaid,
....
Payments
PaymentNo(PK),
InvoiceID(FK),
PayMethod,
Date,
Amt,
...
Customer entity has a one to many relationship with Orders
Purchases entity has a one to many relationship with Invoices
Invoices Entity has a one to many relationship with Payments.
To get the results of a query to list all Payments made by a Customer the query would have to join Payments with the Invoice table, the Invoice table with the Orders table and the Orders table with the Customer table.
Is this the correct way to do it? One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
Bonus question. Lets say there should be a report that says what the total customer balance is. Does there need to be a customer balance field in the database or can this be a calculated item that is produced by joining tables and adding up the amount billed vs amount paid?
Thanks!
Is this the correct way to do it?
Yes. Based on the information provided, it looks reasonable.
One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
The question you're asking falls under "normal forms", often called normalization. Your target should be Boyce-Codd normal form (similar to 3NF), which should be described in your textbook. I will warn you that misinformation and misuderstanding of database design issues is very abundant on the interwebs, so beware of which answers you pay attention to.
The goal of normalization is to eliminate redundancy, and thus to eliminate "anomaliies", whereby two logically equivalent queries produce inconsistent results. If the same information is kept in two places, and is updated in only one, then two queries against the two different values will produce different -- i.e, inconsistent -- results.
In your example, if there is a Payments.CustID, should I believe that one, or the one derived from joining Payments to Orders? The same goes for total customer balance: do I believe the stored total, or the one I computed from the consituents?
If you are going to "denomalize for performance", as is so often alleged to be necessary, what are you going to do to ensure the redundant values are consistent?
Bonus question. Lets say there should be a report that says what the total customer balance is.
As a matter of fact, in practice balances are sort of a special case. It's often necessary to know the balance at points in time. While it's possible to compute, say, monthy account balances from inception based on transactions, as a practical matter applications usually "draw a line in the sand" and record the balance for future reference. Step are taken -- must be, for the sake of the business -- to ensure the historical information does not change or, if it does, that the recorded balance is updated to reflect the change. From that description alone, you can imagine that the work of enforcing consistency throughout the system is much more work than relying on the DBMS to enforce it. And that is why, insofar as is feasible, it's better to elimate all redundant data, and let the DBMS do the job it was designed to do.
In your analysis, seek Boyce-Codd normal form. Understand your data, eliminate the redundancies, and recognize the relations. Let the DBMS enforce referential integrity. Countless errors will be avoided, and time saved. Only when specific circumstances conspire to show that specific business requirements cannot be satisfied on a particular system with a given, correct design, does one begin the tedious and error-prone work of introducing redundant information and compensating for it with external controls.
"Is this the correct way to do it?" Of course, given your current design. But it's not the ONLY way. So you're studying DB "normalization" and seeing the pros and cons of the various "forms" of normalization. In the "real world" things can change on a dime, due to a management decision or whatever. I tend to use "compound primary keys" instead of simply one field for primary and others as FK. I handle my "FK" programmatically instead of relegating that responsibility to the DB.
I also create and utilize a number of "intermediate" tables, or sometimes "VIEWS", that I use more easily than a bunch of code with too many JOINs. (3rd Normal form addicts can hate, but my code runs faster than a scalded rabbit).
An Order means nothing without a Customer; an Invoice means nothing without an Order; a Payment is great, but means nothing without both an Order and Invoice. So lemme throw this out there -- what's wrong with having a "summary" type of entity that has Cust, Order, Invoice #, and Payment Id ?
This is an problem about drawing ERD in one of my course:
A local startup is contemplating launching Jungle, a new one stop
online eCommerce site.
As they have very little experience designing and implementing
databases, they have asked you to help them design a database for
tracking their operations.
Jungle will sell a range of products, and they will need to track
information such as the name and price for each. In order to sell as
many products as possible, Jungle would like to display short reviews
alongside item listings. To conserve space, Jungle will only keep
track of the three most recent reviews for each product. Of course, if
an item is new (or just unpopular), it may have less than three
reviews stored.
Each time a customer buys something on Jungle, their details will be
stored for future access. Details collected by Jungle include
customer’s names, addresses, and phone numbers. Should a customer buy
multiple items on Jungle, their details can then be reused in future
transactions.
For maximum convenience, Jungle would also like to record credit card
information for its users. Details stored include the account and BSB
numbers. When a customer buys something on Jungle, the credit card
used is then linked to the transaction. Each customer may be linked to
one or more credit cards. However, as some users do not wish to have
their credit card details recorded, a customer may also be linked to
no credit cards. For such transactions, only the customer and product
will be recorded.
And this is the solution:
The problem is the Buys action connect with 3 others entities: Product, Customer, and Card. I find this very hard to read and understand.
Is an action involving more than 2 entities common in production? If it is, how should I understand and use it? Or if it's not, what is the better way of design for this problem?
While the bulk of relationships in practice are binary relationships, ternary and higher relationships are normal elements of the entity-relationship model. Some examples are supplies (supplier_id, product_id, region_id) or enrolled (student_id, course_id, semester_id). However, they often get converted into entity sets via the introduction of a surrogate identifier, due to dislike of composite keys or confusion with network data models in which only directed binary relationships are supported.
Reading cardinality indicators on non-binary relationships are a common source of confusion. See my answer to designing relationship between vehicle,customer and workshop in erd diagram for more info on how I handle this.
Your solution has some problems. First, Buys is indicated as an associative entity, but is used like a ternary relationship with an optional role. Neither is correct in my opinion. See my answer to When to use Associative entities? for an explanation of associative entities in the ER model.
Modeling a purchase transaction as a relationship is usually a mistake, since relationships are identified by the (keys of the) entities they relate. If (CustomerID, ProductID) is identifying, then a customer can buy a product only once, and only one product per transaction. Adding a date/time into the relationship's key is better, but still problematic. Adding a surrogate identifier and turning it into a regular entity set is almost certainly the best course of action.
Second, the Crow's foot cardinality indicators are unclear. It looks like customers and products are optional in the Buys relationship, or even as if multiple customers could be involved in the same transaction. There are three different concepts involved here - optionality, participation and cardinality - which should preferably be indicated in different ways. See my answer to is optionality (mandatory, optional) and participation (total, partial) are same? for more on the topic.
A card is optional for a purchase transaction. From the description, it sounds as if cards may participate totally, meaning we won't store information about a card unless it's used in a transaction. Furthermore, only a single card can be related to each transaction.
A customer is required for a purchase transaction, and it sounds like customers may participate totally, meaning we won't store information about customers unless they purchase something. Only a single customer can be related to each transaction.
Products are required for a purchase transaction, and since we'll offer products before they're bought, products will participate partially in transactions. However, multiple products can be related to each transaction.
I would represent transactions for this problem with something like the following structure:
I'm not saying converting a ternary or higher relationship into an entity set is always the right thing to do, but in this case it is.
Physically, that would require two tables to represent (not counting Customer, Product, Card or ProductReview) since we can denormalize TransactionCustomer and TransactionCard into Transaction, but TransactionProduct is a many-to-many relationship and requires its own table (as do ternary and higher relationships).
Transaction (TransactionID PK, TransactionDateTime, CustomerID, CardID nullable)
TransactionProduct (TransactionID PK, ProductID PK, Quantity, Price)
I'm trying to develop a database model for candidate, their registered exams and result of the exams when its being taken.
This is what I've done so far. however im unsure if am on the right track especially from the examination table to the examination result table.
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
the examination types are categorised into science, art and social science. they all have 4 components each
Note on Progression
Given the fact that the Question changes substantially (in clarifying the requirement, not is scope) in response to my Response and TRD, this is going to take some back-and-forth. Let's identify Steps: your Step numbers are odd, starting from 1; mine, in response, are even. Parts of previous Response Steps have become obsolete, they may no longer make sense.
I would suggest a bounty, except for the fact that you have few points.
Response Step 2 to Initial Question & Step 1 Diagram
This is what I've done so far.
You have done some good work, but it is too early for assigning PKs. Besides, assigning an ID on every file as a starting point will cripple the modelling process, the result will not be a database. You have to model the data (not the database) first, then assign Keys when the entities are clear and stable. So drop all your IDs and PKs and model the data, as data. Forget about what you want to do with the data (ie. forget the app).
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
Right now you can't. You have no relationship between Candidate and Examination[Result]. That is not a problem because the modelling is incomplete at this stage, when it is complete the code will be simple.
The entity Course is implied, but it is missing.
however im unsure if am on the right track especially from the examination table to the examination result table
You are on the right track with some of the other files, but the Examination cluster needs work. This will take a bit of back-and-forth. Once you answer the questions in the comments, we can proceed.
The main issue is this: how is Examination identified.
An ID field does not identify anything, nor does it provide uniqueness in the data, which is required if you want data integrity. IDs result in a Record Filing System with no integrity, however, it appears you want a database with data integrity. Is that correct ?
Go back to the user and discuss how courses and components are identified, what codes they use, etc. Those are the natural Keys that they use to identify their data, that they will enter into the system when they need look something up, or to enter examination results.
Eg. It is not reasonable to contemplate an Examination that exists independently (as you have modelled it). People do not go to a hall and sit for any old exam. The exam exists only in the context of a course, they sit for an exam for a course.
Then the course, and not the exam, has components, which are examined. And each course has a different number of components.
Eg. a Course which is identified as ENG101 for English Literature year 1
And then the components within that. Eg. 2b Short essay on poetry.
They may need to identify the year and semester of the course as well, in which case, you need a CourseOffering per semester.
Consider this, as a discussion point. Courier is example data, blue is Key, green is non-key:
TRD Step 2
Response Step 4
Response to Question & Description
This is what I've done so far.
My previous response still applies:
You have done some good work, but it is too early for assigning PKs. Besides, assigning an ID on every file as a starting point will cripple the modelling process, the result will not be a database. You have to model the data (not the database) first, then assign Keys when the entities are clear and stable. So drop all your IDs and PKs and model the data, as data. Forget about what you want to do with the data (ie. forget the app).
You have not addressed that issue, that I identified in your Step 1 Diagram, in your Step 3 Diagram. It appears, from the evidence, that you might be happy with IDs as "Primary Keys" (there aren't), despite the hindrance having been identified to you. That means your understanding of the data is crippled, and the progress of your diagrams will be slow.
My previous response still applies:
An ID field does not identify anything, nor does it provide uniqueness in the data, which is required if you want data integrity. IDs result in a Record Filing System with no integrity, however, it appears you want a database with data integrity. Is that correct ?
You must answer these questions, otherwise your design cannot proceed. These are severe errors that must be corrected. One cannot build on, or progress, a foundation that contains severe errors.
Could you please confirm, you do want a Relational Database, with the integrity and performance that Relational Databases are capable of, that is easy to code against, as opposed to a Record Filing System, with no integrity or speed, that will be difficult to code against. Correct ?
If [1] is correct. Since ID fields as "Primary Keys" do not provide row uniqueness, which is demanded for a Relational Database, how exactly, do you intend to provide the required row uniqueness ? Alternately, are you happy to have an RFS that is full of duplicate rows (each with an unique record ID) ?
how easy will it be to right write an insert sql code for examinationresult population for a particular candidate
My previous response still applies:
Right now you can't. You have no relationship between Candidate and Examination[Result]. That is not a problem because the modelling is incomplete at this stage, when it is complete the code will be simple.
Ok, in your Step 3 Diagram, you have drawn a line between Candidate file and the ExaminationResult file (as opposed to, inserting a relationship in a database).
In a record filing system, sure, you can just draw a line between any two files, insert the relevant ID field, and hey presto, you have "linked" or "connected" or "mapped" the two files.
But database design (as opposed to file design) does not progress like that, you cannot just draw a line between any two objects, insert the relevant ID field, and hey presto, create a database relationship. No. There is no basis, no integrity, in the dashed line that you have drawn. Eg. in your Step 3 Diagram, any Candidate can be related to any Examination[Result].
That is "normal" or "ordinary" in record filing systems, but in a database, it is something to be recognised and understood as an error, and thus prevented. Because we expect integrity in a database, and because it can be prevented, easily.
however im unsure if am on the right track especially from the examination table to the examination result table
My previous response still applies:
You are on the right track with some of the other files, but the Examination cluster needs work. This will take a bit of back-and-forth. Once you answer the questions in the comments, we can proceed.
The main issue is this: how is Examination identified.
An ID field does not identify a row (it identifies a record, which has no relevance whatsoever in a database).
The same two problems (a) lack of a valid identifier, and (b) lack of row uniqueness, exists with your Candidate, Component and ExaminationResult files.
Response to Diagram as a Diagram (as opposed to the content)
You have improved it over your Step 1 Diagram, and in response to my Response Step 2, great. But the relationships (most of them) are still incorrect. And the basis of Candidate::Examination is still not resolved.
It appears to me that you are not clear about the notation (notches; circles; crows feet) and precisely what they mean at the parent and child ends). So you need to learn that first, and then draw the diagram, rather than the other way round.
It is great that you are using a Notation that is meaningful, and many details are shown (many people don't, they draw nice-looking diagrams that lack the detail required for a full understanding of the model. That means that every notch; circle; crows foot, has specific meaning, and must be drawn correctly, in order to convey that meaning to the reader.
Entities do not exist in isolation, there must always be a parent first, in order for the child to be a child of the parent. There is no such thing as "equal". Dependency is always in one direction.
Your relationships that are 1-and-only-1 on one side, and 1-and-only-1 on the other side, are incorrect, they indicate a Normalisation error. The field in the subordinate record can be Normalised into the ordinate record.
Eg. AdmissionLetter is not a separate file, some form of AdmissionLetter identifier (not an ID field) should be located in Candidate.
Eg. Title::Candidate is a drawing error, it should be 1 at the Title end and 0-to-many at the Candidate end.
In a data model, bold (by convention) means a migrated Foreign Key. The Primary Key that is migrated is not bold.
Response to Diagram Content
From your replies, the term Subject trumps the term Component; Category trumps various loosely-identified elements into one clear entity.
It is not reasonable to contemplate an Examination that exists independently (as you have modelled it).
People do not go to a hall and sit for any old exam, any old Subject. The exam exists only in the context of a Subject, they sit for an exam for a Subject.
I accept that the Examination is one sitting, for four Subjects
I accept that the four Subjects are defined by a Category.
I accept that the Candidate is registered for a Category.
Thus the exam exists only in the context of a Subject, which exists only in the context of a Category, and the Candidate sits for an exam which is a Category, which contains four (the number does not matter) Subjects.
Having resolved that, two questions remain:
Do you need to record an Examination as an event, independent of the Candidates who sit in that event. Eg. Examination(Location, DateTime) ?
Does the Examination event examine Candidates in one, or more than one, Category ?
The notion of four Subjects that are implemented as four repeated fields in one record breaks Second Normal Form, which demands that repeating fields are Normalised into separate records in a child file.
Therefore, for both your Component and ExaminationResult files, that issue needs to be resolved.
Note that the fact that that problem is repeated in two separate files is a second alarm that it is an error.
I have clarified the Category/Subject issues for you, and resolved the Normalisation error.
I have given simple identifiers for Categories and Subjects.
If you do not implement that, you will not have integrity between the Candidate and the Subject they are being Examined for. As well, you will suffer various problems when you get to the coding stage.
I have no idea what you are trying to do with exComp, therefore I have no response. Perhaps you can say a few words about it.
Thus far, there is still no reasonable way of relating Candidates to Examinations or ExaminationResults. That is, it has no basis, nothing has been defined as the basis for the relationship, and thus the relationship has no integrity.
On the basis of what I have been able to ascertain thus far, there must be some sort of registration for an exam. Otherwise you would not know that a Candidate is sitting for an exam.
When the Candidate registers, they register for an exam, and that exam is defined (and therefore constrained) by a Category. Otherwise any Candidate can sit for any exam, which I believe, you would like to prevent.
Further, the [four] exam Subjects that they sit for, should be constrained by the Category that they registered for.
You do want to ensure that you do not record an Economics exam result for a Candidate who is registered for Science, correct ?
I have determined that the basis of an exam is the Registration. That is the event, the fact, the recording of which, establishes that a Candidate will sit for an exam.
The identifier virtually jumps out at you, it is CategoryCode plus CandidateID. Voila! we have row uniqueness. Magnifique! we have integrity.
Now the integrity of ExaminationResult can be implemented: it is constrained to the CandidateRegistration::Category and to the Category::Subject.
To be Resolved: Do you need to identify the fact of a Candidate registering for an examination (RegistrationDate, AdmissionLetter of whatever) vs the fact that the Candidate sat for the examination (eg. ExaminationDate) ? A sort of roll call.
Right now, I have modelled that as a single fact with no differentiation, and the table is called Examination because you seem to be focussed on that.
Predicate
These days, people seem to be throwing themselves at drawing a diagram, without understanding either the basics of a Relational Database, or of the exercise of modelling data. Predictably, that results in an ill-defined diagram (many relevant details are omitted) [gratefully, your diagram has some definition], and it produces a record filing system with no integrity, no relational power, no speed, instead of a Relational Database with integrity, power, and speed.
One concept that is often missing is Predicates. A competent reader can read a good data model, and ascertain the Predicates, because they are drawn in the model, in the form of notation, but a novice doesn't understand the notation, or the relevance of the various items, and therefore will miss the Predicates. In sum, the Predicates are all the constraints that are placed on the data:
Row Identification:
The basis of it existence, and how it is Identified: Independent (square corners); or Dependent (round corners)
Row Uniqueness: Primary and Alternate Keys (note, IDs are not Keys)
Relationships between rows:
Identifying (solid lines); or Non-identifying (dashed lines)
Meaning, relevance, purpose: the all-important Verb Phrase
Further, a novice cannot determine the Predicates when there is no diagram, or when the diagram is poor, or when they are designing the filing system and drawing the diagram themselves. Thus they do not identify the relevant Predicates in their diagram.
Predicates are very important during the modelling exercise, in that as well as the model expressing the Predicates, the Predicates confirm the accuracy of the model, it is a feedback loop. It is an essential part of the modelling exercise. Since I am executing the modelling task for you, I am working out the Predicates as I perform that task, they are obvious to me. But they may not be obvious to you.
When the data model is published, and ready for discussion with the users, these Predicates are incorporated into it. They come under the heading of Business Rules, they form a part of that, because that is the way the user perceives them. Consequently, during the walkthroughs and discussions, the Predicates (as well as the other stated Business Rules) are either confirmed or denied by the user. They need to be stated explicitly, because unlike the technically educated developer, the user cannot be expected to read all the relevant Predicates from the notation in a good data model.
In this situation, I am the modeller, and you are the "user". Thus I have decided to provide the Predicates for you, explicitly. So that you can confirm or deny them, and thus we can progress the modelling exercise. Once you get used to reading the Predicates from a good data model, you will not need to have them declared explicitly for you. Again, Predicates are very important because they verify (or not) the accuracy of the model. So please read them carefully and comment on any Predicates that you do not completely agree with, or that you do not understand.
Of course, it is not necessary to explicitly declare all the Predicates, there are just too many, we declare just the more relevant ones, that relate to:
(a) rows (tables), the basis for their existence
(b) their identification
(c) all dependencies
(d) relationships, both sides (one side is the Verb Phrase).
Step 4 TRD
I have implemented all the above, as detailed. Please consider this TRD as a discussion platform for the next iteration, and comment. Courier indicates example data, blue indicates Key values, green indicates non-key values:
Step 4 TRD
Response Step 6 to Chat Step 5
All issues discussed have been resolved, and implemented in the model. Sorry, I do not have time right now to post details, this is simply identifies the updated models.
Entity-Relation and full Predicates on page 1
All resolved issues have been implemented.
Predicates
Now that it is stable, I am now giving you the second side of the Relation Predicates (child-to-parent). And now that you understand them, I have deleted the repeated, annoying "Each" that is demanded for novices.
Entity-Relation-Key on page 2
Now that the TRD is stable, we are ready to proceed to Determination of Keys
(Second only to Normalisation, Key Determination is a critical part of the modelling exercise. The two tasks are normally performed side-by-side, they are inseparable, I have already determined the keys. In this case, given the limitations of the communication media, I am presenting it as a sequential step).
Here, I use an Extension to the IDEF1X Notation that allows me to concentrate of the components that are relevant to the task, I expect that it is self-explanatory. The Key columns only, are given. Foreign Keys are not Bold (as they are in the DM). All that, is intended to make it easy on the eye.
Most tables have one Key (Primary). Where there are two Keys (Primary and Alternate), the AK is below the line.
This is my recommendation for the Keys, as requested, for your review.
Step 6 TRD and TRK 6
I was a little miffed about the one-to-one relationship explanation on the 'I Think You Mean A Many To One' article.
In this instance for example, a product has one price because the business in question is small, niche, localized and supports only a single currency. Multiple prices per product make no sense in this case? I'm doubtful I'm grasping the concept correctly though, because everywhere I read says it will probably be a many-to-one even if you think it isn't?
Can somebody enlighten me please? :)
In an attempt to gain more reputation so that I can help in comments instead of an "answer" The one-to-many vs one-to-one is this
View a one-to-one as an extension of the table you are looking at.
Table B extends Table A. Meaning the information wasn't necessarily relevant enough to include in the table directly, but has a bidirectional relationship with each other. Basically meaning that As Table A, I am not dependent on the information in Table B, but Table B's information is very dependent on me. For the price example it means that Table A has a row related to a row in table B. So if you entering unique information in your Price table around every item to match in Table A, then this would be useful. As in say you had a description column about the item in your price table. Otherwise the price table in this case may just be irrelevant to have in the schema.
in a one-to-many relationship Table B usually has no reference back to Table A. So in the case of price, the items you are looking at do have a price, but prices aren't exclusive to items. So to better define, A number of things may have the price 9.99, but 9.99 only needs to exist in your pricing table once.
I am not familiar with the article you refer to. However, price is a classic example of a slowly changing dimension. Price may be constant at any point in time, but over time, the price changes.
Such dimensions are typically implemented by having effective and end dates for the period in question.
Now, at a given point in time, a product probably does have only one price. Things that affect the price -- coupons, discounts for the purchaser, volume discounts, for example -- are not properties of the product. These are properties of the transaction.
That said, there may be circumstances where a fixed volume discount does not make sense. So, the "price" for a product might include volume, as well as time.
In any case, I would agree with you that price is not a good example of a 1-1 relationship. There are other factors such as time and volume that affect it.
We're doing a complex bit of data accumulation. Our customer sends us some stuff that includes two dimensions (time and a business unit). Time is mostly year-month. The business unit dimension has just a few attributes: a name, and a few categories to which BU's can belong for reporting and analysis purposes.
The stuff they send us includes some current state information (dates and codes). These seem fact-like. They also send some information that characterizes the relationship with the business unit (mostly additional codes). Again, these are unique to the business unit and time period.
Finally, they send us stuff that is clearly additive facts. It includes currency and counts that have proper units.
Should I commingle this qualitative information in a single fact table with the additive facts? Or should I separate the qualitative stuff (which can only be used with counts) from the quantitative stuff (which can be used with sum)?
Only put things in the fact table if they are degenerate (causing a high-cardinality/uniqueness problems in your dimension where it takes the dimension to a 1-1 relationship to the fact table). Kimball recommends avoiding the temptation to put anything but degenerate dimensions in with the facts (unique order number, for instance).
You can always put these in what Kimball calls a "junk" dimension. All those codes can simply be lumped into a junk dimension. Most dates would go in the fact table as keys into your date dimension in a particular role (usually with a natural int key of the form YYYYMMDD - one of the only times we don't use a non-identity meaningless surrogate key)
I like to naively view the star as all the facts and then which columns go into which dimensions is simply determined by convenience. One should not necessarily view them as corresponding to a particular business entity - remember, the star is not an ERD-style normalized OLTP database.
If the data is both directly related to the additive fact and is not something you want to be grouping/sorting/search on, then putting it in the fact table is okay.
Be aware, though, that non-additive data in the fact table will either prevent roll-ups or will become a lossy operation.
Brad Wilson accurately describes the risk of adding them to your fact table. In the past, I've added junk attributes to my fact table only to require refactoring later.
The stuff they send us includes some
current state information (dates and
codes). These seem fact-like. They
also send some information that
characterizes the relationship with
the business unit (mostly additional
codes). Again, these are unique to the
business unit and time period.
What business purpose do the dates serve? Offhand, I'd recommend making these their own dimensions and describe them accurately.
How volatile are the extra codes that come in? If the grain of your fact table is date and BU, why can't they be included in the BU dimension and treated as slowly changing attributes?
Without more details I can't make a firm recommendation but these would be the first questions I'd ask myself.