Best practices for creating a data model [closed] - database

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)

Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.
These books are Database specific, here is one for SQl Server.
http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1
Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing.
http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1
Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.
Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.
As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.
Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.
The most important things for database design:
Thorough definition of the data needed (including correct datatypes)
and the relationships between pieces of data (including correct normalization)
data integrity
performance
security
consistency (of datatypes, naming standards etc.)

The best book I've read on the design of database systems was "An Introduction to Database Systems". Joe Celko's SQL for Smarties books are also worth reading.
Assuming you're building an application and not just a database, and assuming you're using an Object Oriented language, Applying UML and Patterns by Craig Larman has a good discussion on mapping databases to objects.
In terms of defining "good", in my experience "maintainable" is probably top of the list. Maintainability in database design means many things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/. Normalization is another obvious maintainability strategy. I often recommend being generous with column types - it's hard to change an application if you find out that postal codes in different countries are longer than in the US. I often recommend using views to abstract complex data relations away for less experienced developers.
A key thing with maintainability is the ability to test and deploy. It's worth reading up about Continuous Database Integration (http://www.codeproject.com/KB/architecture/Database_CI.aspx) - whilst not strictly associated with the design of the database schema, it's important context.
As for performance - I believe you should design for maintainability first, and only design for performance if you know you have a problem. Sometimes, you know in advance that performance will be a major problem - designing a database for Facebook (or Stack Exchange), designing a database with huge amounts of data (terabytes and up), or huge numbers of users. Most systems don't fall into that camp - so I recommend regular performance tests, with representative data, to find if you have a problem, and only tune when you can prove you have to. Many performance optimizations are at the expense of maintainability - denormalization, for instance.
Oh, and in general, avoid triggers and stored procedures if you can. That's just my opinion, though...

Even though it is not a book I recommend to read Query evaluation techniques for large databases. It gives a background on query processing which largely influences your schema design, especially for data intensive (e.g., analytical) workloads. It is less hands-on but I believe every database designer should read it at least once :-).

Related

How to choose a DBMS?

I am learning to design systems, specifically the database part in it and I know a lot of similar posts are available on SO but either they are a decade old (and a lot has changed since then) or they don't have helpful answers.
All the previous answers or blog posts compare relational and non-relational DBMS (or SQL and NoSQL) in some way. And I can't see a well defined line between the two. Because as soon as I read about a property of NoSQL DBMS, I find there's a latest version of a SQL DBMS that provides the same property and vice versa. For example, Document data-stores store data in JSON-like objects and you can even query into these objects but then I found that PostreSQL also provides this functionality.
Next common point of comparison is Scalability. Well, I can see a lot of giants like Amazon using SQL dbms and they don't seem to have any scalability issues.
Next point of comparison is that SQL dbms enforce schema. We can kind of do that with MongoDB schema validation.
And now newSQL databases claim to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
So it seems like Relational dbms and Non-relational dbms are moving towards each other.
Keeping all these things in mind, how to choose a dbms? I mean what points do you consider? What's the thought process?
Thank you!
Choosing as a platform for learning or choosing for a specific project? That really does have an impact. If it's for a specific project then the choice will depend on the needs of the project... where is the data coming from? What is the purpose of storing it? (Is it for record keeping, transactional, etc.) Who will access it and how? What needs to be accomplished?
There are a lot of options to consider, that's a fact, but it's helpful to consider the purpose of the project to realize which systems are a better fit.
Beyond that, another major factor in a decision is the skills of the people involved or what is already available in the company (if anything).
Do you have specific details at this point or is this more of learning exercise?

Why did object oriented databases fail? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why did object oriented databases fail?
I find it astonishing that:
foo bar = new foo();
bar.saveToDatabase();
Lost to:
foo bar = new foo();
/* write complicated code to extract stuff from foo */
/* write complicated code to write stuff to database */
Related questions:
Are Object oriented databases still
in use?
Object Oriented vs Relational
Databases
Why have object oriented databases not been successful (yet)?
Probably because of their coupling with specific programming languages.
First, I don't believe they have "failed" entirely. There are still some around, and they're still used by a couple of companies, as far as I know.
anyway, probably because a lot of the data we want to store in a database is relational in nature.
The problem is that while yes, OO databases are easier to integrate into OO programming languages, relational databases make it easier to define queries and, well, relations between the data stored. Which is often the complicated part.
I have been using db4o (an object oriented database) lately on my personal pet projects, just because it is so quick to set up & get going. No need with the itty, gritty details.
That aside, as I see it, the main reasons why they haven't become popular are:
Reporting is more difficult on object oriented databases. Related to this, it is also easier to manually look into the actual data in a relational database.
Microsoft and Oracle base so much of their business on relational databases.
Lots of businesses already have relational databases in place.
The IT departments often have relational database expertise.
And, as Jan Aagaard, have pointed out, lately it is because the problem have been solved in a different way, giving programmers the object oriented feel even though they program against a relational database.
There are countless numbers of existing applications out there storing their data in relational databases. This data is the lifeblood of those companies. They have collectively invested huge amounts in storing, maintaining and reporting on this data. The cost and risk of moving this priceless information into a fundamentally different environment is extremely high.
Now consider that ORM tools can map modern application data structures into traditional relational models, and you remove pretty much any incentive to migrate to OODBMS. They provide a low-risk alternative to a very costly high risk migration.
Because, as much as ODBMS advertisements were laden with derogatory language about ORM systems, it wasn't that hard to make ORMs do the job, and without all the various hits taken in switching to a pure ODBMS.
What actually happened is that your first code sample won, it just happens to be on a RDBMS persistence layer.
I think it is because the problem was solved differently. You might be using a relational database behind the scenes when you are coding in Ruby on Rails or LINQ to SQL, but it feels like you are working with objects.
Very subjective, but a few reasons come to mind:
Performance has not been as good as relational databases (or at least that's the perception)
Again with performance - relational databases allow you to do things like denormalizing data to further improve performance.
Legacy support for all the non-OO apps that need to access the data.
I think a lot of your answer lies in the "Why we abandoned Object Databases" answer of "Object Oriented vs Relational Databases".
As far as your example goes, it doesn't have to be that way. Linq to SQL is actually a quite nice basic layer over a DBMS, and Linq to Entities (v2 -- v1 sucked) will be pretty cool too. (N)Hibernate has been solving the problem you're having for years now using RDBMSes.
So I guess my answer to you is that O/R mappers are getting to the point where they solve your problem nicely, and you don't need an ODBMS to get what you need.
They will succeed some day. They are future.
Looking back to software technologies in history, the trend is sacrificing performance to reduce complexity (Assembly => C => C++ => .NET). An application which takes 30 minutes to code now, some days in past took a month.
ORMs are right answer to wrong question. Currently, they are the choice, since they make life easier in the absence of a better solution. But they cannot handle the level of complexity they aimed to. "Problems cannot be solved by the same level of thinking that created them." A.E
As others mentioned relational databases are heavily used and relied and replacing them forces a lot of risks. Look the interval between SQL versions and the major changes between these versions and other Microsoft products (conservative approach, which is necessary here). Also I'll add the following items:
Current approach still works. You may argue it will work forever (we
can code assembly yet), but here I mean it doesn't
work logically when, the AVERAGE level of projects complexity and
the time to develop them on relational databases rings the bell.
Major companies did not involved seriously. When the market signals, they do.
The problem is not well-defined yet. Unfortunately current failures help.
It need some improvements in other sciences (QC, AI) rather than
computer. Storing and querying multidimensional data on flat
infrastructure and without enough smartness for self-organizing are
the top obstacles at the theoretic level.
Why not?
I guess they were a solution to a problem nobody was having, or not having enough to pay for it.
Further, OOP and set-based programming are not always very comptatble.
Personally, when I started reading about OO databases, I couldn't help but think "Boy, I hope I never have to work on one of those, update 1 million rows out of a 6 million row table and then make sure all appropriate records in other tables get updated as well"

Data Warehouse Considerations: When and Why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
A little background here:
I know what a data warehouse is, more or less. I've read several dozen guides on data warehousing, I've played with SSAS, I know what a star schema and a dimension table and a fact table is, I know what ETL is and how to do it. This is not a "how" question or a request for tutorials.
My issue is that all of the material I've read on data warehousing seems to gloss over the rationale for building a data warehouse. They all figuratively, or in some cases literally start with the phrase "so you've decided to build a data warehouse..." Except I haven't made that decision yet.
So I'm hoping that SO members can point me to, or help come up with, some kind of semi-objective test. Something that I can adapt to a particular system and end up with either "yep, we need a data warehouse" or "no, the payoff today would be too small." I think that the specific questions I should be able to answer are:
At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
I generally try not to ask multi-parters but I think that these are all very closely-related. I'm willing to accept any answer that addresses at least the first 4 questions, although the last would really help to crystallize this in my mind. Links are fine if somebody's already written about this, as long as they're reasonably concise and specific (link to Ralph Kimball's home page = not helpful).
Hope I've made the question clear - thanks in advance for your answers!
I'll see if I can do my best to answer your questions succinctly.
1.At what point is building a data warehouse an option worth considering?
In other words, what telltale signs,
metrics, or other criteria should I be
looking out for that might indicate
that a standard transactional
environment is no longer sufficient?
a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.
b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.
c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.
d. If you want to bring together data from multiple sources.
2.What are the alternatives to a full-on data warehouse?
Denormalization in the transactional
database and the bog-standard
replicated "report server" are two
that come to mind; are there any
others I should explore before
committing to the DW?
3.Why is a data warehouse better than said alternatives? If the answer is,
"it depends", then what does it depend
on?
I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."
Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.
4.When shouldn't I attempt to build a data warehouse? I'm skeptical of
anything declared as a "best practice"
irrespective of context. Surely there
must be some scenarios where a DW is
the wrong choice - what are they?
Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.
If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.
5.Are there any practical examples I could look at of systems that were
improved by introducing a data
warehouse? Something that would
explain to me, end-to-end, what sorts
of decisions or analysis they needed
the warehouse for, how they decided
what to put in it, and how the
warehouse ended up fitting into the
larger environment? I don't want a
contrived "let's make a cube out of
the AdventureWorks database" - the
implementation is irrelevant to me,
I'm interested in the specifications
and designs and overall thought
process that were involved.
That's a big question that would take far more space than I'm allotted here.
On this one, I can point you to a few places that might provide the insight you seek.
"Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
"Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
"The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.
The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.
For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).
The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.
If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.
All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.
You should consider building a data warehouse, when two of the following criteria match:
Huge amount of data
Many big complex selects (possibly compared to few inserts, updates, and deletes) that just take too long to execute (and are complicated to write)
Data from different systems needs to get combined
It's really the question what you consider a data warehouse. In many cases you can move gradually from OLTPs Systems with some reports to a full blown data warehouse, as long as you can stick to a relational database management system. First could be to build a first fact table, and keep using the normalized tables for dimension. Then adding more facts, more fact tables or dedicated dimension tables to the game. First in the same database (or one of the databases of the involved systems), possibly moving to a separate database later.
A full data warehouse (separate database, star schema) offers the best options for tuning select statements, apart from going to a specialized system. It is also cleanly decoupled from the OLTP system(s). Think schema design, but also resources like CPU, I/O and memory and organizational, like scheduling of new releases. Of course it is a lot of work which you possibly don't need.
It's in the answers above: just because you have a handfull of complex queries, doesn't mean you should build a DWH, same holds for the other criteria, if they come in isolation.
Can't offer much here, but the advice: go agile. The requirements for a DWH depend extremly on the possibilities the users see. There for requirements are likely to change. Automating tests with databases is a pain, but fooling around in a production system with no proper tests is worse.
At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
I'd recommend a data warehouse when you observed that performing reporting and analysis activities on the in the transactional data store was harmful to both.
What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
I have nothing to offer here. I'd say that keeping the transactional and reporting databases seems sensible to me, regardless of whether you call it a warehouse or not. Data mining can be a very CPU intensive activity.
Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
I have nothing to offer here.
When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
I'd say that if you don't need to keep long history, aren't doing intensive analysis of the data, and your reporting needs are limited to an ad hoc query from time to time, then perhaps a data warehouse isn't necessary.
Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
My employers have all used data warehouses for many years prior to my arrival, so I can't speak to what things were like before I arrived.
From my experience, the first sign for starting to think about data warehousing is when you have (or are developing) a transactional database and the users start adding lots of reporting and data history requirements. Which is pretty much always. It's always easier to have a separate data warehouse or reporting database than to try to design a transactional system that handles the reporting needs that end users always have. Storing history (for business entities) in a transactional system adds complexity and bloats a database that should be as responsive as possible.
On the flip side, I've been in large companies where many groups created data warehouses because data of interest was spread across many systems and was therefore difficult to query. The problem was that each group created their own data warehouse because all the existing warehouses in the company did not have the right subset of information, or had a data model that was regarded as non-optimal or incorrect. This made the situation worse by creating even more disparate data systems that were hard to compare.
DW could be considered if, one is using a ‘Transactional System’ from a long period. Later, they realize that they need to perform some data mining, to determine different data patterns of the business. And finally, with the help of the determined data patterns, one wants to help the top management to take further decisions in the benefit of the company.
Following steps needs to be taken up for building up a data ware house:
An ETL platform and database needs to be decided for the database.
A reporting tool like SSRS, Tableau, etc. needs to be chosen for the visualization.
One may opt for the Data Analytical language like R, for further use.
Finally, all this will help in developing the data ware house and reporting tool. 
"I think that why do some projects fail?"
There are five primary reasons:
lack of partnership between the IT department and business users;
incorrect data warehouse architecture;
not enough experienced people;
improper planning, such as failure to use a proven methodology and a plan to ensure that no details are omitted;
and depending on bleeding-edge technology.

What are the general guidelines and best practices to keep in mind while designing database for an application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My questions is regarding Database Modeling. I tried to look for this question amongst other Database Designing questions on SO but haven't found it and so here am asking about:
What are the general guidelines and best practices to keep in mind while designing database for an application ?
What are the best resources/books/University Lectures available on Database Design Concepts ?
Thanks.
Just some things I've learned from experience (I'm sure some will disagree, but I've been querying and designing and programming databases for 30+years and have seen the effects of stupid design up close and personal):
There are three critical things to consider in all database design - data integrity (without this you essentially have no data), security and performance. All other considerations take a back seat to these three.
Never create a table without a way to uniquely identify a record.
There really are very few true natural keys that really work as a primary key, if you don't have control over whether it will change, do not use it as a primary key (you don't really want to change the company name through 27 child tables do you?). Use a surrogate key instead. Using a surrogate key does not exempt you from the need to set unique indexes if you could have used a unique composite key. Always set these indexes if you can determine a way to have a unique composite. Duplicate records are the bane of an application's existance. It seems obvious but never ever consider name to be a key field, names are not and never will be unique.
Do not use a GUID as your primary key as it can kill performance. If you need a guid for replication also consider having an int or big int primary key.
Do not design as if you will be changing database backends unless you know up front you will be doing so. Virtually all the good techniques for performance tuning are database specific, don't harm your own ability to tune your database for a non-existant requirement.
Avoid value-entity table structures. They are miserable to query.
Add all things you need to ensure data integrity into your database design, things like defaults, constraints, triggers, etc. are necessary to avoid having useless data. Do not rely on the application code to do this or you will be sorry.
Others mentioned normalization, I agree you must understand this thoroughly even if you later decide to denormalize.
Do not stack views on top of views if you want any kind of performance at all. Every database I've seen that does this is eventually a huge performance problem.
Consider what data you will need to manage the database as well as what the application needs. If you are going to be serious about databases you need to understand database auditing and your database should implement ways to find out who made what change and when and what the old data was. You'll thank me the first time someone malicious changes the data or someone deletes all the records in a table accidentally.
Really think through how the data will be queried when designing. It can make a huge difference in the design.
Do not store more than one piece of information in a field. It might look cool to put a comma delimited list into one field rather than add a related table but it is a really bad idea.
Elegance is often the enemy of performance in databases. Pick performance over elegance every time and you won't go wrong.
Avoid the use of database keywords in the naming of objects. Your programmers will thank you. Pick a naming convention and be consistent in always using it. If a field is in mulitple tables make sure it is the same name (exception if an id field has two possible foreign keys in the same table use the id field name and a prefix to identify the differnce between say Sales_person_id and Customer_person_id), same datatype and length, if applicable in all of them. Fix misspellings right away, you really don't want to spend the next ten years remembering that in tablea it is the persnoid instead of personid.
Read about database refactoring (search on amazon for some good books) and consider how to be able to do this in your design. Few databases are designed to be refactored and being able to do so is critical towards being able to fix database problems that arise from badly thought out designs or changes to business requirements.
While you are reading, read about performance tuning, you'll learn a tremendous amount about what to avoid in designing the database.
I'm sure there's more but this is enough to start with.
One addtional thing I wanted to add. Do not design your database as if the data entry application page is the most critical thing. Data is often queried more often than it is written even in a transactional database. Really think about how easy it will be to to get data back out of the database (Oh so that's why the EAV model is so bad!) and what effect the design will have on reporting. This is espcially critical as I often see that the people doing the reporting are not the people who design the database or reporting tasks are later in the project than createing the data entry. Databases are not easy to refactor, consider the whole life cycle of the data when designing a database. Think about things like storing moment in time values as you can't find out how much an order was for two years later by multiplying the quantity ordered by the price in the products table as that wasn't the price at the time of the order. Reporting needs this type if information, but it often too late to get it by the time the reports are written when the design is done badly. Stuff that works fine when you are handling one record at a time can be a disaster when you need to look at thousands or millions of records. Not every application is going to create a separate reporting datbase, so really think about this.
DEPENDS
this question is like saying "what is the best car to buy", it really depends on many factors including amount of data, number of concurrent users, what you are trying to do, etc. FYI, normalization is good for some database uses, but bad for others (data warehouse).
Give us a better idea of how you intend to use the data, and you'll get some better recommendations.
While I agree with others that your question right now is much too broad and can't really be answered (except for the "it depends" approach :-)), there is one book I would wholeheartedly recommend for anyone beginning database design in general:
Michael Hernandez: Database Design for Mere Mortals(R): A Hands-On Guide to Relational Database Design
It's a really hands-on, no-frills, down to earth book and introduces all the major and important concepts in a very understandable, very approachable fashion. Well written, interesting, very sound and useful - highly recommended!
Marc
your question is too broad. Normalization and denormalization are most used concepts.
The best thing to do is to start with a well normalized database. The wikipedia article has some good information on that along with some good references.
Typically you'll end up denormalizing parts of your database for better performance, but you almost always want to start with it in 4th normal form.
Look at wikipedia article about database normalization. There is also further reading section.
If you design a new database for brand new application you should try use ORM library (like JPA implementations in Java) that release you from database design, because these tools generate database from domain model. If you don't have any experience in this field - database generated with ORM tools will be much better of yours.
Consider all your use cases. Think about every single possible way someone might want to get to data, and plan for those. Wear your designer, developer, tester, and user hats.
Try to think of database tables as representing physical objects.
Normalize, as others have said.

To what extent should a developer learn specifics about database systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Modern database systems today come with loads of features. And you would agree with me that to learn one database you must unlearn the concepts you learned in another database. For example, each database would implement locking differently than others. So to carry the concepts of one database to another would be a recipe for failure. And there could be other examples where two databases would perform very very differently.
So while developing the database driven systems should the programmers need to know the database in detail so that they code for performance? I don't think it would be appropriate to have the DBA called for performance later as his job is to only maintain the database and help out the developer in case of an emergency but not on a regular basis.
What do you think should be the extent the developer needs to gain an insight into the database?
I think these are the most important things (from most important to least, IMO):
SQL (obviously) - It helps to know how to at least do basic queries, aggregates (sum(), etc), and inner joins
Normalization - DB design skills are an major requirement
Locking Model/MVCC - Its nice to have at least a basic grasp of how your databases manage row locking (or use MVCC to accomplish similar goals with optimistic locking)
ACID compliance, Txns - Please know how these work and interact
Indexing - While I don't think that you need to be an expert in tablespaces, placing data on separate drives for optimal performance, and other minutiae, it does help to have a high level knowledge of how index scans work vs. tablescans. It also helps to be able to read a query plan and understand why it might be choosing one over the other.
Basic Tools - You'll probably find yourself wanting to copy production data to a test environment at some point, so knowing the basics of how to restore/backup your database will be important.
Fortunately, there are some great FOSS and free commercial databases out there today that can be used to learn quite a bit about db fundamentals.
I think a developer should have a fairly good grasp of how their database system works, not matter which one it is. When making design and architecture decisions, they need to understand the possible implications when it comes to the database.
Personally, I think you should know how databases work as well as the relational model and the rhetoric behind it, including all forms of normalization (even though I rarely see a need to go beyond third normal form). The core concepts of the relational model do not change from relational database to relational database - implementation may, but so what?
Developers that don't understand the rationale behind database normalization, indexes, etc. are going to suffer if they ever work on a non-trivial project.
I think it really depends on your job. If you are a developer in a large company with dedicated DBAs then maybe you don't need to know much, but if you are in a small company then it may be really helpful knowing more about databases. In small companies you may wear more than one hat.
It cannot hurt to know more in any situation.
It certainly can't hurt to be familiar with relational database theory, and have a good working knowledge of the standard SQL syntax, as well as knowing what stored procedures, triggers, views, and indexes are. Obviously it's not terribly important to learn the database-specific extensions to SQL (T-SQL, PL/SQL, etc) until you start working with that database.
I think it's important to have a basic understand of databses when developing database applications just like it's important to have an understanding of the hardware your your software runs on. You don't have to be an expert, but you shouldn't be totally ignorant of anything your software interacts with.
That said, you probably shouldn't need to do much SQL as an application developer. Most of the interaction with the database should be done through stored procedures developed by the DBA, I'm not a big fan of including SQL code in your application code. If your queries are in stored procedures, then the DBA can change the implementation of the stored procedure, or even the database schema, and so long as the result is the same it doesn't require any changes to your application code.
If you are uncertain about how to best access the database you should be using tried and tested solutions like the application blocks from Microsoft - http://msdn.microsoft.com/en-us/library/cc309504.aspx. They can also prove helpful to you by examining how that code is implemented.
Basic things about Sql queries are must. then you can develop simple system. but when you are going to implement Complex systems you should know Normalization, Procedures, Functions, etc.

Resources