Design principles for designing database architecture of financial transaction system? - sql-server

I want to design a database which will keep record for financial transaction.I want to design it as a product so that it can be used for any type of financial transaction.Are there some design principles specific to financial transaction database design that can help me out to make database more durable for long term with minimal architectural level changes.Some good examples will be a great help too.
Thanks

Some things particular to financial systems include internal controls (This is a critical accounting term, do some research to really think this one through). Things like the person entering the check value can't also approve it. Things like using stored procs and not SQL generated from the application so that you can restrict rights to only the procs (no dynamic SQL at all - ever - in a financial system) and so users can only do what they are authorized to do. No rights for anyone except the production dba and an alternate to the tables. Fraud is what you are trying to protect the system from not just outside attacks. Security is critical to financial systems.
You also need audit tables to know who changed what data and when and what the old value was. This is not only an additional way to help find problems if someone got around the internal controls (or the system forgot to implement some critical ones) stole money, but it is often critical to be able to undo a mistake without having to restore. In general accounting systems often have data fields that are not viewable by the user and that are generated through default values or in a way that the user doesn't see them.
Another thing is you need to view actions in time so things that might look like a natural relationship may need denormalizing to preserve what the cost was at the time the action happened. So if you have an hourly rate table, you would use that as a lookup to get the rate at the time of the action not join to it to get the rate when you query.
Financial systems have private data in them, almost always, think how you are going to protect this data. You will need to be encrypting and decrypting data. You probably want an encrypted backup as well.
This data is the lifeblood of a company, it is critical that you have a good backup plan and much practice restoring. Off-site backups are critical.
Data integrity is critical. You need the correct datatypes and you need pk/fk relationships, constraints and triggers to enforce the rules. A financial system can't afford to have orphaned records.
You need to consider deletes very carefully. Financial systems often do soft deletes (mark records as deleted to avoid losing historical data. Yes XYZ company is no longer a customer, but you don't want to lose the financial history of the orders they had in the past. I would not even consider using cascade delete in a financial system.
Don't just talk to accountants in designing the system, talk to financial people who will run the system and auditors who will audit the results. Read and know thoroughly the published accounting standard for the country you are designing for. Look at tax implications. This is complex stuff.
Think about data warehousing and archiving data. Financial systems often query old data for reports, reporting is big, big, big for financial systems. Think how to do it effectively without affecting day-to-day data entry.

Depending one what you are actually trying to achieve, for you to create a "financial transaction" system that is useful you will need to teach yourself about journals, ledgers and other details of accounting. It isn't as simple as logging the actual transactions in a table...
Really, I don't think you will find database design principles for financial systems that are all that different from from any database system that needs it's information to be 100% correct.
Hence, reading the following when working with databases never hurt anyone:
Database Design Best Practices
Do you source control your databases?
Database Development Mistakes Made by App Developers
What are some of your most useful database standards?

Related

Bad real-world database schemas

Our masters thesis project is creating a database schema analyzer. As a foundation to this, we are working on quantifying bad database design.
Our supervisor has tasked us with analyzing a real world schema, of our choosing, such that we can identify some/several design issues. These issues are to be used as a starting point in the schema analyzer.
Finding a good schema is a bit difficult because we do not want a schema which is well designed in all aspects, but a schema that is more "rare to medium".
We have already scheduled the following schemas for analysis: wikimedia, moodle and drupal. Not sure in which category each fit. It is not necessary that the schema is open source.
The database engine used is not important, though we would like to focus on SQL server, Posgresql and Oracle.
For now literature will be deferred, as this task is supposed to give us real world examples which can be used in the thesis. i.e. "Design X is perceived by us as bad design, which our analyzer identifies and suggests improvements to", instead of coming up with contrived examples.
I will update this post when we have some kind of a tool ready.
Check the Dell-dvd-store, you can use it for free.
The Dell DVD Store is an open source
simulation of an online ecommerce site
with implementations in Microsoft SQL
Server, Oracle and MySQL along with
driver programs and web applications
Bill Karwin has written a great book about bad designs: SQL antipatterns
I'm working on a project including a geographical information system. And in my opinion these designs are often "medium" to "rare".
Here are some examples:
1) Geonames.org
You can find the data and the schema here: http://download.geonames.org/export/dump/ (scroll down to the bottom of the page for the schema, it's in plain text on the site !)
It'd be interesting how this DB design performs with such a HUGE amount of data!
2) OpenGeoDB
This one is very popular in german-speaking countries (Germany, Austria, Switzerland) because it's a database containing nearly every city/town/village in the german speaking region with zip-code, name, hierarchy and coordinates.
This one comes with a .sql schema and the table fields are in english, so this shouldn't be a problem.
http://fa-technik.adfc.de/code/opengeodb/
The interesting thing in both examples is how they managed the hierarchy of entities like Country -> State -> County -> City -> Village etc.
PS: Maybe you could judge my DB design too ;) DB Schema of a Role Based Access Control
vBulletin has a really bad database schema.
"we are working on quantifying bad database design."
It seems to me like you are developing a model, or process, or apparatus, that takes a relational schema as input and scores it for quality.
I invite you to ponder the following:
Can a physical schema be "bad" while the logical schema is nonetheless "extremely good" ? Do you intend to distinguish properly between "logical schema" and "physical schema" ? How do you dream to achieve that ?
How do you decide that a certain aspect of physical design is "bad" ? Take for example the absence of some index. If the relvar that that "supposedly desirable index" is to be on, is itself constrained to be a singleton, then what detrimental effects would the absence of that index cause for the system ? If there are no such detrimental effects, then what grounds are there for qualifying the absence of such an index as "bad" ?
How do you decide that a certain aspect of logical design is "bad" ? Choices in logical design are done as a consequence of what the actual requirements are. How can you make any judgment whatsoever about a logical design, without a formalized and machine-readable way to specify what the actual requirements are ?
Wow - you have an ambitious project ahead of you. To determine what is a good database design may be impossible, except for broadly understood principles and guidelines.
Here are a few ideas that come to mind:
I work for a company that does database management for several large retail companies. We have custom databases designed for each of these companies, according to how they intend for us to use the data (for direct mail, email campaigns, etc.), and what kind of analysis and selection parameters they like to use. For example, a company that sells musical equipment in stores and online will want to distinguish between walk-in and online customers, categorize the customers according to the type of items they buy (drums, guitars, microphones, keyboards, recording equipment, amplifiers, etc.), and keep track of how much they spent, and what they bought, over the past 6 months or the past year. They use this information to decide who will receive catalogs in the mail. These mailings are very expensive; maybe one or two dollars per customer, so the company wants to mail the catalogs only to those most likely to buy something. They may have 15 million customers in their database, but only 3 million buy drums, and only 750,000 have purchased anything in the past year.
If you were to analyze the database we created, you would find many "work" tables, that are used for specific selection purposes, and that may not actually be properly designed, according to database design principles. While the "main" tables are efficiently designed and have proper relationships and indexes, these "work" tables would make it appear that the entire database is poorly designed, when in reality, the work tables may just be used a few times, or even just once, and we haven't gone in yet to clear them out or drop them. The work tables far outnumber the main tables in this particular database.
One also has to take into account the volume of the data being managed. A customer base of 10 million may have transaction data numbering 10 to 20 million transactions per week. Or per day. Sometimes, for manageability, this data has to be partitioned into tables by date range, and then a view would be used to select data from the proper sub-table. This is efficient for this huge volume, but it may appear repetitive to an automated analyzer.
Your analyzer would need to be user configurable before the analysis began. Some items must be skipped, while others may be absolutely critical.
Also, how does one analyze stored procedures and user-defined functions, etc? I have seen some really ugly code that works quite efficiently. And, some of the ugliest, most inefficient code was written for one-time use only.
OK, I am out of ideas for the moment. Good luck with your project.
If you can get ahold of it, the project management system Clarity has a horrible database design. I don't know if they have a trial version you can download.

Why do we need Audit Columns in Database Tables?

I have seen many database designs having following audit columns on all the tables...
Created By
Create DateTime
Updated By
Upldated DateTime
From one perspective I see tables from the following view...
Entity Tables:
Good candidate for Audit columns)
Reference Tables:
Audit columns may or may not required. In some case last update information is not at all required because record is never going to be modified.)
Reference Data Tables
Like Country Names, Entity State etc... Audit columns may not required because these information is created only during system installation time, and never going to be changed.
I have seen many designers blindly put all audit columns to all tables, is this practice good, if yes what could be the reason...
I just want to know because to me it seems illogical. It is difficult for me to figure out why do they design their db this way? I am not saying they are wrong or right, just want to know the WHY?
You can also suggest me, if there is an alternative auditing patter or solution available...
Thanks and Regards
Data auditing is a required internal control for many business systems (see Sarbanes Oxley for reasons why). It must be at the database level to assure that all changes are captured especially unauthorized ones.
Even with lookup tables an unauthorized change could wreak havok in your system and thus it is important to know who made the change and when. When is especially important because it helps the dbas know how far back to grab a backup to restore information accidentally or maliciously changed.
We like to think all our employees are trustworthy, but many of the thefts of personal data and the malicious changes to destroy company data come from internal sources (this is why is is dangerous to have many disgruntled employees) as does almost all of the fraud. Yet most programmers seem to think that they only have to protect against outside threats.
Of course you are still going to have a few people who can make unauthorized changes, you can't prevent system admins from doing this. But with auditing at least you can limit the potential for data damage (and be especially careful when hiring dbas and allow no one else admin rights on your database servers).
These columns are for the benefit of the DBA and the database developers. They just provide a quick mechanism to answer questions like "When did this record last change?" "who changed it?" They are not robust enough or fine-grained enough to satisfy compliance with SOX, HIPAA or whatever.
It is simply easier to have these columns on every table. All data can change, so it is useful to know when changes happened, especially if that data isn't supposed to change. It is possible to automate the process of adding them, by using the data dictionary to generate scripts.
It is good practice for these columns to be populated independently of the application, by triggers or some similar mechanism. These columns are metadata, the application shouldn't really be aware of them.
Relying on a full-blown audit trail to provide this functionality is usually not an option. Audit data which is collected for compliance purposes usually has restricted access, and indeed may be stored in a separate physical location.
Many applications are developed using some OOP language in which there is generally a class like BusinessObject that contains what is perceived generally helpful information like such auditing fields. Not all subclassing entities may need it, but it's there if they do. Since the overhead of the db is small and the chances that the client may request another odd statistic based on the audit fields it's better to have them around than not to have them at all. If something represents a static list of information such as country names I generally wouldn't put it in the db at all - enumerated data type are created just for such purposes.
I come across this thread by chance, as the same question popped up in my mind this morning. Every answer has got the point and I definitely agree with all of you. It is undeniable to safeguard business data and transaction data. Instead of that, the author feels doubtful about audit fields for some configuration or static data.
This kind of configuration data are not updatable by users. Usually they can be placed in other places as well, like properties, config files or even hard-coded constants. Of course putting configuration data in these places might be bad designs or styles, but from the perspective of auditing, do they matter? In addition, if these data are updatable by users, then the only ones who can update it are either dba or hackers. Truly malicious dba or hackers will already know laws before they break the laws and they do find ways circumvent the laws.
To me, the question is more related to the environment in your company. Does your company have a culture of keeping track of every little bit of tiny information? Does your company constantly enforce strict discipline, monitoring or auditing? Having these auditing fields for non-user data are simply for their satisfaction, more than any other purposes.

What are the pros/cons of and best practices for using a single database?

Here at work (a multi-billion dollar manufaturing company with a 12 person Windows development team) we are about to go to a single master database for all new applications and will have it broken up with schemas for what we normally would have had databases for before. There will also be a few common schemas with stuff like employee directory and branch directory and so on...
I'm still not sure how I feel about this move, but we're about to have a meeting on this in a few hours to discuss pros, cons, best practices, pitfalls and so on... so I'm looking for your thoughts on this... Is it good? Is it bad? What problems are we going to run into a year from now?
Any thoughts, tips, or advice is welcome. Thanks
EDIT
In response to a comment on this question, we are using SQL Server 2005 and we are actually talking about moving what would have been seperate databases on the same instance into a single database. The driving issue is the complete lack of referential integrity accross databases as the majority of our applications need access to common data such as an employee record, or branch information.
UPDATE
Several people requested that I update this question with the results from our meeting so here it is. We debated back and forth the pros and cons of doing this (I even showed them this question using the projector) and by the time we were done we had pretty much covered the pros and cons covered here. About half of us thought we could get it done with the right resources and commitment, and about half thought we couldn't do it (or that it wouldn't work out well). We decided to use some time with Microsoft to get their thoughts and platform specific advice. I will be sure to update this question and my blog after we've talked to them. Thanks for all the help and helpful answers.
Larger database are harder to maintain due to sheer size: backups take longer, disaster recovery is slower which in turn requires more often backups. You can address these by creating filegroups and using filegroup level backup in your maintenance plans and on crash recovery you can use the 'piecemeal restore' strategy to speed things up.
Proper use of filegroups will make most of the 'cons' cited by previous replies go away: they can distribute the I/O, they can sanitize your maintenance plans and backup/restore strategy, they offer availability by taking offline only the damaged portion of the the db in case of crash. So I'd say that while those 'cons' are legit concerns, they have can be mitigated by a proper deployment strategy. Its true though that these mitigation actions require a true, experienced, dba at the helm as they will go beyond the comfort zone of a developer turned dba by need.
Some of the pros I can think of quickly:
Consistency. You can have a backup-restore so that all data is consistent. Separate dbs don't allow this because you cannot coordinate a consistent set of backups unless you take them all offline, or make them r/o, during the backup.
Dirt cheap high availability: you can deploy database mirroring for disaster recoverability and high availability. Multiple databases have problems because one cannot coordinate a simultaneous failover and apps are faced with the dilemma of seeking each database current location.
Security. While most other posts see one database harder to secure, I'd say is easier to secure. Multiple databases seem harder to secure properly simply because what everyone does is they make one login and add it to that database db_owner group. Having one database will make things harder (unless you end up making everyone dbo, very bad) but once you start doing the right thing (granular access) then one db is not harder than multiple dbs, is actually easier because you won't have to copy/maintain some common groups/rights across multiple dbs.
Control. Will be easier to impose certain policies and good practices on a single db rather than multiple ones (no data access to developers, app data access only through execute rights on the schema to enforce procedures access etc).
There are also some cons I did not see in other posts:
This will be much harder to pull off that you think right now
Increase coupling between formerly separated applications will impose development restrictions: you can't simply alter your schema, you will have to coordinate it with the rest of the apps (you can argue that this was also the case before, but was brushed under the carpet by having separate dbs, and you're right)
Log writes that are now distributed across multiple db logs will be consolidated into one single log file. If your writes are significant, this may turn out to be a serious bottleneck and force you to buy some expensive fast drives for the new, consolidated, log file. In general this can be addresses by making the log drive a stripped array across as many stripes as needed to make it fast enough (usually raid 10).
GAM/SGAM/PFS allocations will also be consolidated, but again this will be alleviated by proper use of file groups.
Pros:
You only need to remember one connection string
When users report that access is slow, you know which DB is causing the trouble
Cons:
Backups of The One DB will take a long time and will get progressively longer over time.
Restoring data from a backup will get increasingly difficult.
Performance Tuning (SQL Profiler, Execution Plan estimation) for a feature for one app will slow down every app.
Restricting access to a single application's data is cumbersome if at all possible which will likely mean in practice that all devs and DBAs will be given keys to the ENTIRE kingdom.
New developers/DBAs have a much larger learning curve as they need to navigate a large and mostly useless (to them) database structure which means higher costs for training/ramp up.
When The One database goes down, everyone in your organization plays solitaire until it is restored.
Creating test instances for app development means copying your entire db
The only "Pro" I can think of is that all of your systems will be in the one database and therefore a single place to backup, store, etc. However, I would consider this to also be one of the biggest "Cons".
Some other general Cons:
Much harder to move an application to a different location/server in the future.
Possible locking issues if any applications make use of tempdb.
Possible unrelated performance degredation on one application when another application is being used.
Much harder to implement an application level security model if all tables are in the same database.
It sounds to me as though your company is transitioning between two completely distinct motives for using database technology. The first is application support. The second is data integration. If I'm right about this, the process will open up a huge can of worms, and many of the issues won't even be addressed by putting all the data in one big database.
Consider two of the points you made. The first is the complete lack of referential integrity across different databases. The second is the idea that each application will have its own schema. What this permits to happen is complete lack of referential integrity across schemas, putting you back in the quicksand you are in now.
Fixing the data so that referential integrity is present, and fixing the schemas so that referential integrity is enforced, and fixing the applications so that the applications agree with the new schemas will turn out to be a monumental task.
Here's what your company really needs to do: Have one single CONCEPTUAL database that contains all "enterprise data", and defined in such a way that both referential integrity and entity integrity are enforced. Revise existing schemas so that they conform to the CONCEPTUAL database except for data that is both purely local to that schema and undocumented in the unified conceptual database. Use constraints wherever needed to guarantee that the data covered by these schemas doesn't lose integrity.
Make the decision about whether these schemas belong in one database or many databases based on database administration, fail soft, security, and performance requirements and NOT on the need to integrate data. Whether you use one platform or multiple platforms is a separable decision.
Where necessary, maintain synchronized copies of the same data in separate databases. Include the overhead of doing this in your performance considerations above.
Document the conceptual database out the gazoo. Don't just settle for definitions of the FORM of data. Insist on definitions of the semantics of the data as well.
Notice that if you use ID fields instead of natural keys to enforce referential integrity, you will have to generate each ID field in one schema, and let the association between ID and dependent data propagate by means of synonyms, views, and synchronized replication.
This is not going to be easy.
If DB is getting bigger, making back-up is getting more difficult because of it's size.
This could mean a serious scalability problem if you want to add high-traffic applications in the future, since it is much easier to add new database servers which run seperate dbs than it is to parrallelize a single DB. At least in SQL Server.
Pros:
The convenience of having everything in one place
Thinking less about good database design
Cons:
Even unrelated things are in one place
Less thinking about good database design leading to poorly normalized data
To me this just sounds like laziness and a belief that all this "fancy ivory tower database stuff" is worthless.
I can see that being scary, but considering the number of businesses that use Oracle EBS, or SAP, or other systems that are, in essence, this same configuration, I don't see it being a Bad Thing™. It's a big move, and will be tough to get correct, but it can really improve integration across the enterprise in the long run.
I've never heard of this approach and would like to know how the meeting goes. I see no real benefit in combining multiple applications into a single database when the data doesn't relate to each other.
I'm thinking you might have issues if you decide that an application requires it's own database server at one point.
Ah, the old EggsInOneBasket design pattern. It's not a favourite.
You're just compounding any problems caused by damage to that database. Spread the risk!
For the referential integrity issue, you can make copies of those shared tables in the subsidiary databases. You can't use real replication, but what you do is deny everything but select on these to most users.
On the same server, you can either push or pull data from the official repository of the master data and insert any new rows/update any changed rows. You can even do this with a trigger in the master database (I don't recommend it, though).
If it's different instances or servers, you can use linked servers or SSIS.
You can put the common data into a "core" schema in each database. Then you can have tools to check that all your core tables in every subsidiary database are consistent. The worse that can happen is that an application is not seeing a new employee because the core isn't updated. And keeping your database separate gives you an ability to decouple and gives you maintenance windows. (You can even decouple and run "standalone" if your master is down for maintenance).
I expect you'll only be seeing a few dozen of these core entity tables in even a largish enterprise.
There are many other ways to solve the referential integrity (RI) issue. I am not as familiar with SQL Server as other DB's. In Informix you can use synonyms to point to objects in other DB's and use these for your RI. In Oracle you can make a DB links to one or more DB's to accomplish the same thing.
These approaches have the issue that if any of the DB's are down the RI will fail causing issues in the dependent DB's. selects would work, but inserts would fail.
Consolidation can be a good idea, depending upon the size of the schema's, and other issues with scalability. SQL Server has serious scalability issues. Other DB platforms allow horizontal scaling with either a share everything approach (Oracle's RAC, latest Informix release) or a partitioned share nothing approach (DB2's DPF, Informix XPS, Netezza, Teradata)
I am with some of the others here interested to hear the results of your meeting.

Data Warehouse Considerations: When and Why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
A little background here:
I know what a data warehouse is, more or less. I've read several dozen guides on data warehousing, I've played with SSAS, I know what a star schema and a dimension table and a fact table is, I know what ETL is and how to do it. This is not a "how" question or a request for tutorials.
My issue is that all of the material I've read on data warehousing seems to gloss over the rationale for building a data warehouse. They all figuratively, or in some cases literally start with the phrase "so you've decided to build a data warehouse..." Except I haven't made that decision yet.
So I'm hoping that SO members can point me to, or help come up with, some kind of semi-objective test. Something that I can adapt to a particular system and end up with either "yep, we need a data warehouse" or "no, the payoff today would be too small." I think that the specific questions I should be able to answer are:
At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
I generally try not to ask multi-parters but I think that these are all very closely-related. I'm willing to accept any answer that addresses at least the first 4 questions, although the last would really help to crystallize this in my mind. Links are fine if somebody's already written about this, as long as they're reasonably concise and specific (link to Ralph Kimball's home page = not helpful).
Hope I've made the question clear - thanks in advance for your answers!
I'll see if I can do my best to answer your questions succinctly.
1.At what point is building a data warehouse an option worth considering?
In other words, what telltale signs,
metrics, or other criteria should I be
looking out for that might indicate
that a standard transactional
environment is no longer sufficient?
a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.
b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.
c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.
d. If you want to bring together data from multiple sources.
2.What are the alternatives to a full-on data warehouse?
Denormalization in the transactional
database and the bog-standard
replicated "report server" are two
that come to mind; are there any
others I should explore before
committing to the DW?
3.Why is a data warehouse better than said alternatives? If the answer is,
"it depends", then what does it depend
on?
I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."
Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.
4.When shouldn't I attempt to build a data warehouse? I'm skeptical of
anything declared as a "best practice"
irrespective of context. Surely there
must be some scenarios where a DW is
the wrong choice - what are they?
Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.
If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.
5.Are there any practical examples I could look at of systems that were
improved by introducing a data
warehouse? Something that would
explain to me, end-to-end, what sorts
of decisions or analysis they needed
the warehouse for, how they decided
what to put in it, and how the
warehouse ended up fitting into the
larger environment? I don't want a
contrived "let's make a cube out of
the AdventureWorks database" - the
implementation is irrelevant to me,
I'm interested in the specifications
and designs and overall thought
process that were involved.
That's a big question that would take far more space than I'm allotted here.
On this one, I can point you to a few places that might provide the insight you seek.
"Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
"Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
"The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.
The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.
For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).
The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.
If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.
All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.
You should consider building a data warehouse, when two of the following criteria match:
Huge amount of data
Many big complex selects (possibly compared to few inserts, updates, and deletes) that just take too long to execute (and are complicated to write)
Data from different systems needs to get combined
It's really the question what you consider a data warehouse. In many cases you can move gradually from OLTPs Systems with some reports to a full blown data warehouse, as long as you can stick to a relational database management system. First could be to build a first fact table, and keep using the normalized tables for dimension. Then adding more facts, more fact tables or dedicated dimension tables to the game. First in the same database (or one of the databases of the involved systems), possibly moving to a separate database later.
A full data warehouse (separate database, star schema) offers the best options for tuning select statements, apart from going to a specialized system. It is also cleanly decoupled from the OLTP system(s). Think schema design, but also resources like CPU, I/O and memory and organizational, like scheduling of new releases. Of course it is a lot of work which you possibly don't need.
It's in the answers above: just because you have a handfull of complex queries, doesn't mean you should build a DWH, same holds for the other criteria, if they come in isolation.
Can't offer much here, but the advice: go agile. The requirements for a DWH depend extremly on the possibilities the users see. There for requirements are likely to change. Automating tests with databases is a pain, but fooling around in a production system with no proper tests is worse.
At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
I'd recommend a data warehouse when you observed that performing reporting and analysis activities on the in the transactional data store was harmful to both.
What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
I have nothing to offer here. I'd say that keeping the transactional and reporting databases seems sensible to me, regardless of whether you call it a warehouse or not. Data mining can be a very CPU intensive activity.
Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
I have nothing to offer here.
When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
I'd say that if you don't need to keep long history, aren't doing intensive analysis of the data, and your reporting needs are limited to an ad hoc query from time to time, then perhaps a data warehouse isn't necessary.
Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
My employers have all used data warehouses for many years prior to my arrival, so I can't speak to what things were like before I arrived.
From my experience, the first sign for starting to think about data warehousing is when you have (or are developing) a transactional database and the users start adding lots of reporting and data history requirements. Which is pretty much always. It's always easier to have a separate data warehouse or reporting database than to try to design a transactional system that handles the reporting needs that end users always have. Storing history (for business entities) in a transactional system adds complexity and bloats a database that should be as responsive as possible.
On the flip side, I've been in large companies where many groups created data warehouses because data of interest was spread across many systems and was therefore difficult to query. The problem was that each group created their own data warehouse because all the existing warehouses in the company did not have the right subset of information, or had a data model that was regarded as non-optimal or incorrect. This made the situation worse by creating even more disparate data systems that were hard to compare.
DW could be considered if, one is using a ‘Transactional System’ from a long period. Later, they realize that they need to perform some data mining, to determine different data patterns of the business. And finally, with the help of the determined data patterns, one wants to help the top management to take further decisions in the benefit of the company.
Following steps needs to be taken up for building up a data ware house:
An ETL platform and database needs to be decided for the database.
A reporting tool like SSRS, Tableau, etc. needs to be chosen for the visualization.
One may opt for the Data Analytical language like R, for further use.
Finally, all this will help in developing the data ware house and reporting tool. 
"I think that why do some projects fail?"
There are five primary reasons:
lack of partnership between the IT department and business users;
incorrect data warehouse architecture;
not enough experienced people;
improper planning, such as failure to use a proven methodology and a plan to ensure that no details are omitted;
and depending on bleeding-edge technology.

Audit trails and implementing SOX/HIPAA/etc, best practices for sensitive data

I consider myself to be relatively proficient in terms of application design, but I've never had to work with sensitive data. I've been wondering about what the best practices were for audit trails and how exactly one should implement them. I don't have to do it right now, but it'd be nice to be able to confidently talk with a medical company if they ask me to do some work for them.
Let's say we have a "school" database, with 'teachers', 'classes', 'students' all normalized in a many-to-many 'grades' table. What would you log? Every insert/update on the 'grades table'? Only updates (say, a kid breaks in and wants to change grades, this should send up redflags)? Does this vary entirely based on how paranoid one wants to be? Is there a best practice?
Is this something that should be done in the database? (A trigger on each sensitive SELECT which inserts a row to an 'audit' table logging each query?) What should be logged? Is there functionality automatically built into Oracle/DB2 that do it for you? Should this be application side logic?
If anyone has any formal documentation/books on how to deal with sensitive data (not quite DoD "Trusted Computing" spec, but something along the lines of that :P), I'd appreciate it. I'm sorry if this question is terribly vague. I realize that this varies from application to application. I just want to hear your detailed experiences with dealing with sensitive data.
The first thing to understand is the native auditing capabilities of your chosen DBMS. These vary in detail, but generally provide a way to configure which operations are audited, and provide secure storage for the audit records that they generate.
The next thing to understand is what you want to audit. In the case of HIPAA and SOX, for example, you are probably looking at PII - Personal Identifying Information. Remember the fuss made about people accessing Obama's phone records, or various celebrities medical records, or ... Those were caught because the system audited who read those records, and the audit analysis officer (AAO) spotted that the celebrity records were accessed by people who were not specifically authorized to do so. So, those systems must be logging who accesses each record, and spotting when the user who does so does not have an authentic business reason to do so. In these cases, it appears that the users had read authority for the records, so if their ordinary duties required them to look at the records, they could do so. But, when they were not required to do so, then they were abusing their power and appropriately sanctioned (up to and including losing jobs over it).
What this means is that you probably don't want to track who accesses the table of States which records the state code and full name (and assorted other bits of information about the state). There is nothing confidential about that list - it doesn't matter who reads it. Of course, almost no-one should write to it; the list of states does not change very often - but that can probably be handled by revoking update and delete permission on the table from everyone.
OTOH, you probably do want to record who accesses the records in medical histories (HIPAA), or who modifies the data in the accounting systems (SOX). You might or might not need to worry about who reads the accounting data; a lot of that can be dealt with by basic permissions (accounting staff have permission; IT staff do not). However, auditing is always an extra line of defense.
Bear in mind that audit records are no help whatsoever if they are never looked at. In general, auditing slows a system down (simply because it is doing more work when it writes audit records); it is important to understand how much it slows down before deciding to implement your auditing strategy. However, there are some things that are more important than application speeed, and one of those is keeping yourself and other staff members out of jail. Auditing can be necessary to ensure that happens.
Oracle has a product called Oracle Audit Vault- DB2 probably has an equivalent.
You should start by prevention. The system should not allow invalid actions. Period. If the system allows 'dubious' actions that need to be monitored, that's "business logic", you are probably better of implementing like the rest of your business logic.
If you want to do something in your database, you can look into log shipping (terminology might differ from RDBMS to RDBMS). Basically, any DML operation is logged to a file. You can use this information for backups and point-in-time recovery, even for replication/HA/failover/etc. If you ship your logs to a separate, "trusted" system in an "append-only" (i.e. the log shipping process has privileges to create new log files, but not to modify existing information) fashion, you already have a primitive auditing functionality. If you do it in a secure way (i.e. authentication, non-repudiation), you probably even are quite close to "compliance" :-p
Of course, sifting through lots and lots of INSERT/UPDATE/DELETE statements is not the most sophisticated way to work.

Resources