I am confused on the very concept of ACID.
All references/textbooks describe ACID as a set of properties that the database system is expected/required to maintain in order to preserve data integrity.
But the C part of ACID i.e. Consistency does not really seem to actually be a responsibility of the database.
In some references (e.g. Silberschatz) in the sense that the code itself of a transaction (if run in isolation) lefts the database in consistent state i.e. the transaction code is correct and hence this is the application's programmer's perspective not of the DBMS.
And in other references the description is vague like "leaving the database in consistent state"
So which is correct?
In transactions, the technical term consistent means "satisfying all known integrity constraints".
By definition, integrity constraints are declared to the dbms, and are expected to be enforced by the dbms. Why? Because if application programmers were responsible, every application programmer might make a different decision about what constitutes a consistent update operation.
For example, one application programmer might decide that every unit price that's more than $10,000 is in error. Another application programmer might decide that every unit price that's more than $12,000 is in error--but that $11,000 is a valid unit price. One programmer's work will accept a unit price of $11,000; the other's will kick it out as an error.
Relational databases resolve that inconsistency by centralizing decisions about unit prices. In this particular case, that decision might be centralized in the form of a CHECK() constraint on the "unit_price" column. Once that integrity constraint is in place, every update operation will result in a consistent value in the "unit_price" column. It doesn't matter
how many application programmers there are,
how well (or how poorly) trained they are,
how many different applications there are,
what languages they're written in,
whether the sleep-deprived DBA is executing an update from a command-line console.
All those update operations will find it impossible to set the "unit_price" column to a value fails to satisfy all the known integrity constraints. That means all those update operations will result in a state that satisfies all the known integrity constraints. That's the definition of consistent.
In the relational model, integrity constraints and business rules mean the same thing.1 Some people use business rules to mean something else; you have to determine their meaning by careful reading.
Integrity constraints (or business rules) should never be under the control of end users if your data is important. Constraints can easily be changed by a DBA, usually with a single SQL statement. But knowing which statement to execute and when to execute it is not in most end user's skill set.
The terms consistent and correct mean two different things. A database state can be consistent without being correct. A unit price that is within the range of a CHECK() constraint might still be the wrong price, a person's name might be misspelled, etc.
Neither the relational model nor the SQL standards are defined by a particular SQL implementation. They're especially not defined by MySQL's behavior, which is just barely SQL. (CHECK constraints parsed but not enforced, indeterminate results using GROUP BY, no analytic functions, nonstandard quoting in backticks, etc.)
If I had Database System Concepts in front of me, and I wanted to understand what the author meant by "Ensuring consistency for an individual transaction is the responsibility of the programmer who codes the transaction.", I'd ask myself these questions.
Is there any errata that refers to the section in question?
What does the author say about ACID properties in earlier editions? In later editions?
Does the author distinguish correct transactions (they have the right data) from consistent transactions (they don't violate any of the integrity constraints)?
Does the author always say that consistency is the responsibility of the programmer? (I'd say that correctness is; consistency isn't.)
What programmer is the author talking about? (Application programmer writing a Rails app? DBA writing a stored proc? Hard-core programmer writing the transactional subsystem for PostgreSQL?)
Does the author never say that consistency is the responsibility of the dbms?
Does the author ever say that, of the four ACID properties of transactions, A, I, and D are the responsibility of the dbms, but C isn't? (I'll bet he doesn't.)
Is consistency of an individual transaction different from consistency of a database? Where else does the author talk about that?
"Centralized control of the database can help in avoiding such problems--insofar as they can be avoided--by permitting the data administrator to define, and the DBA to implement, integrity constraints (also known as business rules) to be checked whenever any update operation is performed." An Introduction to Database Systems, 7th ed, C.J. Date, p 18.
Related
I haven't touched database since graduated from school, so please forgive me if my question is too entry level.
As i remember how to draw ERD with UML, recently, my boss asked me to create a database for inventory system with frontend. I googled some similar systems and found that in backend their databases don't have any relationships between tables ( i did DB reversed UML ).
So I thought about it, it seems application works fine even without relationships ( no foreign keys ), so what's the point we have reasons we still need relationship between tables?
This is one of the areas where there is often a noticeable disparity between the theory that is taught in CS courses and the reality of what happens in practice.
Often what you'll run into is a mash-up between the two: an ERD model that shows all the proper relationships and keys, and the "reality" of what actually gets implemented in the database.
The implementation side is probably the part that catches people by surprise, as you have seen: no relationships defined, and foreign keys are simply implied by the matching column names across different tables. This is a tradeoff.
One one hand, managing foreign keys in a database has overhead. Every time a row is added or modified, the database will need to examine those foreign keys and make sure that the change will preserve the relational integrity. After all, that's what you are asking for when you define those relationships, right? And in an ideal world where that overhead is negligible, this is probably a good thing, because as DBA's we like it when our physical implementation matches the idealized model we spent all that time creating. We sleep better knowing that every entry in the customer table references a valid location in the company_location table.
On the other hand, there is reality. That overhead is not something we can easily ignore. Not when that nightly batch load is 4 hours late, and some marketing manager is asking you every 10 minutes for an estimate on when his data will be available. So we cut some corners and make some compromises. And hey, we're pretty good programmers, right? Certainly we can code the application in a way that will always maintain the referential integrity of the database without having to spend all that extra time to deal with foreign keys in the database....well, maybe.... the truth is that it is really hard to be sure that RI will always be preserved by an application that is already implementing some potentially complex business logic.
There are, of course, many other reasons for using explicit RI, and plenty of good reasons for ignoring it in the physical implementation. You are right, at the end of the day applications often do work OK without relationships being defined. And at the end of the day, I will probably get home safe even if I don't put on my seatbelt for the drive. But having the relationships implemented in the database is a pretty solid insurance policy when it comes to guaranteeing the integrity our a data. Analysts that use that database to generate business insights like consistent data. And transactional applications might depend on the assumption that the data is relationally consistent.
Î guess my point is that there is no "always right" answer here, and it really is a case-by-base thing. I would just suggest to start from the assumption that you'll physically implement the model, complete with RI, as faithfully as possible. Then, if you find hot spots, carefully and conservatively relax those constraints as needed.
Foreign key's take care of referential integrity.
To explain this a bit more: By adding a foreign key you are saying "what is in this column must be in the column I am pointing to as well". This makes sure your naming stay's consistent.
If you did not do this mistakes could be introduced when adding redundant information. Like calling "James", "Jamed" by mistake.
I thought at first that it isn't a relational DB, but after I read that I can join tables and it was written on their site https://crate.io/overview/ (see Use cases), I'm not sure.
Especially I got confused by the senctence:
CrateDB is based on a NoSQL architecture, but features standard SQL.
from https://crate.io/overview/high-level-architecture/
Going by a Codd's 12 rules (which have been used to identify relational databases), CrateDB is not a relational database - yet. CrateDB's eventual consistency model does not prohibit that.
Rule 0: For any system that is advertised as, or claimed to be, a relational data base management system, that system must be able to manage data bases entirely through its relational capabilities.
CrateDB doesn't have another interface with which data can be inserted, retrieved, and updated.
Rule 1: All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables.
Exactly what can be found in CrateDB.
Rule 2: Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
This is strictly enforced. Access through primary keys will even give you read-after-write consistency.
Rule 3: Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
CrateDB supports null.
Rule 4: The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
CrateDB has among other meta-tables, Information Schema tables
Rule 5: A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items:
Data definition.
View definition.
Data manipulation (interactive and by program).
Integrity constraints.
Authorization.
Transaction boundaries (begin, commit and rollback).
CrateDB supports data definition and data manipulation parts and only a single integrity constraint (primary key). This is definitely incomplete.
Rule 6: All views that are theoretically updatable are also updatable by the system.
CrateDB does not support views yet.
Rule 7: The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.
CrateDB currently only does that for data retrieval...
Rule 8: Application programs and terminal activities remain logically unimpared whenever any changes are made in either storage representations or access methods.
CrateDB's use of SQL allows for this; performance/storage level improvements are even delivered via system upgrades.
Rule 9: Application programs and terminal activites remain logically unimpared when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Parts of this are still missing (the views, inserts/updates on joins). However for retrieving data, this is already the case.
Rule 10: Integrity constraints specific to a particular relational data base must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
This is quite tricky for a distributed database, specifically the foreign key constraints. CrateDB only supports primary key constraints for now.
Rule 11: A relational DBMS has distribution independence.
In CrateDB any kind of sharding/partitioning/distribution is handled transparently for the user. Any kinds of constraints/settings for data distribution are applied on the data definition level.
Rule 12: If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).
One could argue that COPY FROM directly violates this rule since there is no type checking and conversion happening underneath. However there is no other command/language/API that would allow data manipulation otherwise.
While CrateDB certainly has some catching up to do, there is no reason why it wouldn't become a relational database in this sense soon. Its SQL support may not be on par with Oracle's or Postgres' but many people can live without some very use-case specific features.
As said above, all of the rules above are not directly violated, but rather not implemented yet in a satisfactory manner, so there is no reason why CrateDB can't become a fully relational database eventually.
(Disclaimer: I work there)
Since the beginning of the relational model the three main components that a system must have to be considered relational are (applying Codd's three-component definition of "data model" to the relational model):
data is presented as relations (tables)
manipulation is via relation and/or logic operators/expressions
integrity is enforced by relation and/or logic operators/expressions
Also a multi-user DMBS has been understood to support apparently atomic persistent transactions while benefiting from implementation via overlapped execution (ACID) and a distributed DBMS has been understood to support an apparent single database while benefiting from implementation at multiple sites.
By these criteria CrateDB is not relational.
It has tables, but its manipulation of tables in extremely limited and it has almost no integrity functionality. Re manipulation, it allows querying for rows of a table meeting a condition (including aggregation), and it allows joining multiple tables, but that's not optimized, even for equijoin. Re constraints, its only functionality is column typing, primary keys and non-null columns. It uses a tiny subset of SQL.
See the pages at your link re Supported Features and Standard SQL Compliance as addressed in:
Crate SQL
Data Definition
Constraints (PRIMARY KEY Constraint, NOT NULL Constraint)
Indices
Data Manipulation
Querying Crate
Retrieving Data (FROM Clause, Joins)
Joins
Crate SQL Syntax Reference
As usual with non-relational DBMSs, their documentation does not reflect an understanding or appreciation of the relational model or other fundamental DBMS functionality.
CrateDB is a distributed SQL database. The underlying technology is similar to what so called NoSQL databases typically use (shared nothing architecture, columnar indexes, eventual-consistency, support for semi-structured records) - but makes it accessible via a traditional SQL interface.
So therefor - YES, CrateDB is somewhat of a relational SQL DB.
What is the difference between a DBMS and an RDBMS with some examples and some new tools as examples. Why can't we really use a DBMS instead of an RDBMS or vice versa?
A relational DBMS will expose to its users "relations, and nothing else". Other DBMS's will violate that principle in various ways. E.g. in IDMS, you could do <ACCEPT <hostvar> FROM CURRENCY> and this would expose the internal record id of the "current record" to the user, violating the "nothing else".
A relational DBMS will allow its users to operate exclusively at the logical level, i.e. work exclusively with assertions of fact (which are represented as tuples). Other DBMS's made/make their users operate more at the "record" level (too "low" on the conceptual-logical-physical scale) or at the "document" level (in a certain sense too "high" on that same scale, since a "document" is often one particular view of a multitude of underlying facts).
A relational DBMS will also offer facilities for manipulation of the data, in the form of a language that supports the operations of the relational algebra. Other DBMS's, seeing as they don't support relations to boot, obviously cannot build their data manipulation facilities on relational algebra, and as a consequence the data manipulation facilities/language is mostly ad-hoc. On the "too low" end of the spectrum, this forces DBMS users to hand-write operations such as JOIN again and again and again. On the "too high" end of the spectrum, it causes problems of combinatorial explosion in language complexity/size (the RA has some 4 or 5 primitive operators and that's all it needs - can you imagine 4 or 5 operators that will allow you to do just any "document transform" anyone would ever want to do ?)
(Note very carefully that even SQL systems violate basic relational principles quite seriously, so "relational DBMS" is a thing that arguably doesn't even exist, except then in rather small specialized spaces, see e.g. http://www.thethirdmanifesto.com/ - projects page.)
DBMS : Database management system, here we can store some data and collect.
Imagine a single table , save and read.
RDBMS : Relational Database Management , here you can join several tables together and get related data and queried data ( say data for a particular user or for an particular order,not all users or all orders)
The Noramalization forms comes into play in RDBMS, we dont need to store repeated data again and again, can store in one table, and use the id in other table, easier to update, and for reading we can join both the table and get what we want.
DBMS:
DBMS applications store data as file.In DBMS, data is generally stored in either a hierarchical form or a navigational form.Normalization is not present in DBMS.
RDBMS:
RDBMS applications store data in a tabular form.In RDBMS, the tables have an identifier called primary key and the data values are stored in the form of tables.Normalization is present in RDBMS.
Say I had three tables: Accommodation, Train Stations and Airports. Would I have address columns in each table or an address table that is referenced by the other tables? Is there such a thing as over-normalization?
Database Normalization is all about constructing relations (tables) that maintain certain functional
dependencies among the facts (columns) within the relation (table) and among the various relations (tables)
making up the schema (database). Bit of a mouth-full, but that is what it is all about.
A Simple Guide to Five Normal Forms in Relational Database Theory
is the classic reference for normal forms. This paper defines in simple terms what the essence of each normal form is
and its significance with respect to database table design. This is a very good "touch-stone" reference.
To answer your specific question properly requires additional information. Some critical questions you have to ask
are:
Is an Address a simple fact (e.g. blob of text) or a composite fact (e.g.
composed of multiple attributes: Address line, City Name, Postal Code etc.)
What are the other "facts" relating to "Accommodation",
"Airport" and "Train Station"?
What sets of "facts" uniquely and minimally identify an "Airport", an "Accommodation"
and a "Train Station" (these facts are typically called a key or candidate key)?
What functional dependencies exist among Address facts and the facts
composing each relations key?
All this to say, the answer to your question is not as straight forward as one might hope for!
Is there such a thing as "over normalization"? Maybe. This depends on whether the
functional dependencies you have identified and used to build your tables are
of significance to your application domain.
For example, suppose it was determined that an address
was composed of multiple attributes; one of which is postal code. Technically a postal
code is a composite item too (at least Canadian Postal Codes are). Further normalizing your
database to recognize these facts would probably be an over-normalization. This is because
the components of a postal code are irrelevant to your application and therefore factoring
them into the database design would be an over-normalization.
For addresses, I would almost always create a separate address table. Not only for normalization but also for consistency in fields stored.
As for such a thing as over-normalization, absolutely there is! It's hard to give you guidance on what is and isn't over-normalization as I think it mostly comes from experience. However, follow the books on each level of normalization and then once it starts to get difficult to see where things are you've probably gone too far.
Look at all the sample/example databases you can as well. They will give you a good indication on when you should be splitting out data and when you shouldn't.
Also, be well aware of the type and amount of data you're storing, along with the speed of access, etc. A lot of modern web software is going fully de-normalized for many performance and scalability reason. It's worth looking into those for reason why and when you should and shouldn't de-normalize.
Would I have address columns in each table or an address table that is referenced by the other tables?
Can airports, train stations and accommodation each have a different address format?
A single ADDRESS table minimizes the work necessary dealing with addresses - suite, RR, postal/zip code, state/province...
Is there such a thing as over-normalization?
There are different levels of normalization. I've only encountered what I'd consider poor design rather than normalization.
Personally I'd go for another table.
I think it makes the design cleaner, makes reporting on addresses much simpler and will make any changes you need to make to the address schema easier.
If you need to have it denormalized later on you can always create two views that contain the Train station and airport information along with any address information you need.
This isn't really what I understand by normalisation. You don't seem to be talking about removing redundancy, just how to partition the storage or data model. I'm assuming that the example of addresses for Accommodation, Train Stations and Airports will all be disjoint?
As far as I know it would only be normalisation if you started thinking along the lines. Postcode is functionally dependent upon street address so should be factored out into its own table.
In which case this could be ever desirable or undesirable dependent upon context. Perhaps desirable if you administer the records and can ensure correctness, and less desirable if users can update their own records.
A related question is Is normalizing a person’s name going too far?
If you have a project/piece of functionality that is very performance sensitive, it may be smart to denormalize the database in some cases. However, this can lead to maintenance issues for various reasons. You may instead want to duplicate the data with cache tables but there are drawbacks to this as well. It's really a case by case basis but in normal practice, database normalization is a good thing. 99% of the non-normalized databases I've seen are not by design, but rather by a misunderstanding/mistake by the developer.
Would I have address columns in each table or an address table that is referenced by the other tables?
As others have alluded to, this is not really a question of normalization because you're not attempting to reduce redundancy or organize dependencies. Either way is perfectly acceptable. Moving the addresses to a separate table might make sense if you are going to have centralized validation or business logic specific to addresses.
Is there such a thing as over-normalization?
Yes. As has been mentioned, in large systems (lots of data, lots of transactions, or both) you can normalize to the point where performance becomes an issue. This is why lots of systems use denormalized database for reporting and querying.
In addition to performance though, there is also the issue of how easy the data is to query. In systems where there will be a lot of end-user querying of the data (can be dangerous!), a denormalized structure is easier for most non-technical or non-database people to understand.
Like most things we deal with, it's a trade-off between understanding, performance, and future maintainability and there is rarely a clear-cut answer to where you draw the line in any given system.
With experience, you will learn where the line is best drawn for the systems you write.
With that said, my preference is to err on the side of more vs less normalization.
If you are using Oracle 9i, you could store address objects in your tables. That would remove the (justified) concerns about address formats.
I agree with S.Lott, and would like to add:
A good answer depends on what you know already. The basic "math" of relational database theory, however, defines very well-defined, distinct levels of normalization. You cannot normalize anymore when you've reached the ultimate normal form.
Depending on what you want to model with your three entities, and how you identify them, you can come up with very different conceptual data models, all of which can be represented in a mix of normal forms -- or unnormalized at all (like 1 table for all data with descriptors and NULL holes all over the place...).
Consider you normalize your three entities to the ultimate normal form. I can now introduce a new requirement, or use case, or extension, which gives an upto-now descriptive attribute a somehow ordered, or referencing, or structured nature if you look at its content. Then, the model should represent this behavior, and what used to be an attribute perhaps will better be a separate entity referenced by other entities.
Over-normalization? Only in the sense that can you normalize a given model so it gets inefficient to store, or process, on a given DB platform. Depending on what can be handled efficiently there, you might want to de-normalize certain aspects, trading off redundancy for speed (data warehouse dbs do this all the time), and insight, or vice versa.
All (working) db designs I've seen so far either have a rather normalized conceptual data model, with quite some denormalization done at the logical and/or physical data model level (speaking in Sybase PowerDesigner terms) to make the model "manageable" -- either that, or they were not working, i.e. failed because the maintenance problems became kingsize real quick.
When you say "address", I presume you mean a complete address, like street, city, state/province, maybe country, and zip/postal code. That's 4 or 5 fields, maybe more if you allow for "address line 1" and "address line 2", care-of's, etc. That should definately be in a separate table, with an "addressid" to link to the Station, etc tables. Otherwise, you are creating 3 separate copies of the same set of field definitions. That's bad news because it creates extra effort to keep them consistent. Like, what if initially you are only dealing with U.S. addresses (I'm an American so I'll assume U.S.), but later you find you also need to allow for Canadians. You'll need to expand the size of the postal code field and add a country code. If there's a common table, then you only have to do this once. If there isn't, then you have to do this three times. And it's likely that the "three times" is not just changing the database schema, but changing every place in your programs that processes an address.
One of the benefits of normalization is to minimize the impact of changes.
There are times when you want to denormalize to make queries more efficient. But this should be done very cautiously, only after you have good reason to believe that the fully normalized model creates serious inefficiency problems. In my humble experience, most programmers are far to quick to denormalize, usually with a quick "oh, breaking that out into a separate table is too much trouble".
I think in this situation it is OK to have address columns in each table. You'll hardly have an address which will be used more than two times. Most of the adresses will be used just one per entity.
But what could be in an extra table are names of streets, cities, countries...
And most important every train station, accomodoation and airport will probably have just one address so it's an n:1 relation.
I can only add one more constructive note to the answers already posted here. However you choose to normalize your database, that very process becomes almost trivial when the addresses are standardized (look the same). This is because as you endeavor to prevent duplicates, all the addresses that are actually the same do look the same.
Now, standardizing addresses is not trivial. There are CASS services which do this for you (for US addresses) which have been certified by the USPS. I actually work for SmartyStreets where this is our expertise, so I'd suggest you start your search there. You can either perform batch processing or use the API to standardize the addresses as you receive them.
Without something like this, your database may be normalized, but duplicate address data (whether correct or incomplete and invalid, etc) will still seep in because of the many, many forms they can take. If you have any further questions about this, I'll personally assisty you.
Should dates for a temporal database stored in one or 2 tables ? If one doesn't this violate normalisation ?
PERSON1 DATE11 DATE21 INFO11 INFO21 DEPRECATED
PERSON2 DATE21 DATE22 INFO21 INFO22 CURRENT
PERSON1 DATE31 DATE32 INFO31 INFO32 CURRENT
DATE1 and DATE2 Columns indicate that INFO1 and INFO2 are true for the period between DATE1 and DATE2. If DATE < TODAY, the facts are deprecated and shouldn't show any more in the user interface but they shouldn't be deleted for historical purpose. For example INFO11 and INFO21 are now deprecated.
Should I split this table ? Should I store the state (deprecated or current) in the table ?
To clarify the question further more, Deprecated is the term used by the Business, if you prefer "not current", the problem is not semantic, it's not about sql queries either, I just want to know which design violates or best suits Normalisation rules (I know normalisation is not always the way to go, that is not my question either).
"I want to know which design violates Normalisation rules"
Depends on which set of normalization rules you want to go by.
The first and most likely violation of normal forms, and in Date's book it is a violation of first NF, is your end-dates in the rows that hold "current" information (making abstraction of the possibility of future-dated information): you violate 1NF if you make that attribute nullable.
Violations of BCNF may obviously occur as a consequence of your choice of keys (as it is the case in nontemporal database designs too - the temporal aspect makes no difference here). Wrt "choice of keys": if you use separate start- and end-dates (and SQL kind of leaves you no other choice), then most likely you should declare TWO keys: one that includes the start date, and one that includes the end-date.
Another design issue is the multiple data columns. This issue is discussed quite at large in "Temporal Data and the Relational Model" : if INFO1 and INFO2 can change independently of one another, it might be better to decompose your tables to hold just one attribute, in order to avoid an "explosion of rows count" that might otherwise occur if you have to create a new complete row every time one single attribute in the row changes. In that case, your design as you gave it constitutes a violation of SIXTH normal form, as (that normal form is) defined in "Temporal Data and the Relational Model".
Normalization is a Relational database concept - it does not apply as well to temporal databases. That's not to say that you cannot store temporal data in a relational database. You definitely can.
But if you are going with Temporal Database Design, then the concepts of Temporal Normalization apply rather than Relational normalization.
You have not indicated the meaning of the dates. Do they refer to (a) the period when the stated fact was true in real-life, or (b) to the period when the stated fact was believed to be true by the holder of the database ? If (b), then I would never do it this way. Move the updated line to an archive table/log immediately when the update is done. If (a), then the following statement is questionable :
"the facts are deprecated and shouldn't show any more in the user interface"
If a fact doesn't "need to show up in the user interface" anymore, then it doesn't need to be in the database anymore either. Keeping such facts there achieves only one thing : deteriorate general performance for all the rest.
If you really need these historical statements of fact to suit your requirements, then chances are that your so-called "deprecated facts" are still very much relevant to the business, and therefore not "deprecated" at all. Assumming that for this reason, there are very little "genuinely deprecated" facts in your database, your design is good. Just keep the number of "genuinely deprecated facts" small by periodically removing them from the operational database.
(PS) To say that your design is good, doesn't mean you won't run into any problems. SQL is extremely ill-suited to handle this kind of information elegantly. "Temporal Data and the Relational Model" is an excellent treatment of the subject. Another book, the one from Snodgrass, is often praised too, though not by me. That one is something of a cookbook with recipes for dealing with these problems in SQL, as proven by the following conversation on SO about this book :
(Q) "Why would I read that ?"
(A) "Because the trigger you asked for is on page 135."