Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
While researching the topic, I came across this post: Should you enforce constraints at the database level as well as the application level?
The person who answered the question claimed that we should enforce Database constraint because it is "easier, integrity, flexible".
The reason I brought out this question is because of my recent maintenance work in one of a very robust systems. Due to a change in business rule, one of the data columns used to have CHAR(5) is now accepting 8 Characters. This table has many dependencies and will also affect many other tables not only in the database but also a few other systems, thus increasing the size to CHAR(8) is literally impossible.
So my question goes back to the database design - wouldn't it be so much easier if you reduce or even eliminate the need of database constraints? If the above mentioned scenario would have happened, all you have to do is to change the front-end or application level validation to make sure the user enter 8 characters for that field.
In my opinion, we should minimize the database constraint to anticipate any changes in the data structure in the future. What is your thought?
It's easier to maintain 100 tables than 100,000 lines of code. In general, constraints that are enforced in the application but not in the database have to be replicated across many applications. Sometimes those applications are even written and maintained by different teams.
Keeping all those changes in sync when the requirements change is a nightmare. The ripple effect is even worse than the cases you outline for changing a five character field into an 8 character field. This is how things were done before databases were invented.
Having said that, there are situations where it's better to enforce the constraints in applications than in the database. There are even cases where it's better to enforce a constraint in both places. (Example: non null constraint).
And very large organizations sometimes maintain a data dictionary, where every data item is cataloged, defined, and described in terms of features, including constraints. In this kind of environment, databases actually acquire their data definitions from the dictionary. And application programs do the same thing, generally at precompile time.
Future proofing such an arrangement is still a challenge.
I agree with you that, constraints like the length of the field should be avoided, you never know how your business will changed. and hardware nowadays are cheep, it really not necessary to use CHAR(8) just for less storage.
But those contraints like not null constraints,duplicate check and foreignkey constraints for a header details table is better to be kept. it's like the goal keeper of your data intergrate.
Database systems provide a number of benefits, one of the most important is (physical) data independence. Data independence can be defined as an immunity of application program to change in the way that the data is physically stored and accessed, this concept is tightly related to data-model design and normalization roles where data constraints are fundamental.
Database sharing is one of the application integration patterns, widely used between independent applications. Tradeoff will be trying to spread data integrity code in all applications or in a centric fashion inside database.
Minimizing the database constraint will minimize usage of wide range of well-known, proven technologies developed over many years by a wide variety of very smart people.
As a foot note:
This table has many dependencies and will also affect many other
tables not only in the database but also a few other systems
Beside this smells redundancy, at least it shows the side effect of the change. Think about when you have to find the side effects with code review!
Application comes, applications go but data remains.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We have an application to manage company, teams, branches,employee etc and have different tables for that. Now we have a requirement that we have to give access of same system to our technology partners so that they can also do the same thing which we are doing. But at the same time we need to supervise these partners in our system.
So in terms of DB schema what will be the best way to manage them:
1)To duplicate the entire schema for partners, and for that we have to duplicate around 50-60 tables and many more in future as system will grows.
2)To create some flag in each table which will tell it is internal or external entity.
Please suggest if anyone has any experience.
Consider the following points before finalizing any of the approaches.
Do you want a holistic view of the data
By this I mean that do you want to view the data your partner creates and which you create in a single report / form. If the answer is yes then it would make sense to store the database in the same set of tables and differentiate them based on some set of columns.
Is your application functionality going to vary significantly
If the answer to this question is NO then it would make sense to keep the data in the same set of tables. This way any changes you do to your system will automatically reflect to all the users and you won't have to replicate your code bits across schemas / databases.
Are you and your partner going to use the same master / reference data
If the answer to this question is yes then again it makes sense to use the same set of tables since you will do away with unnecessary redundant data.
Implementation
Rather than creating a flag I would recommend creating a master table known as user_master. The key of this table should be made available in every transaction table. This way if you want to include a second partner down the line you can make a new entry in your user_master table and make necessary modifications to your application code. Your application code should manage the security. Needless to say that you need to implement as much security as possible at the database level too.
Other Suggestions
To physical separate data of these entities you can either implement
partitioning or sharding depending upon the db you are using.
Perform thorough regression testing and check that your data is not
visible in partner reports or forms. Also, check that partner is not
able to update or insert your data.
Since the data in your system will increase significantly it would
make sense to performance test your reports, forms and programs.
If you are using indexes then you will need to revisit those since
your where conditions would change.
Also, revisit your keys and relationships.
None of your asked suggestion is advisable. You need to follow given guideline to secure your whole system and audit your technology partner as well.
[1]You should create a module on Admin side which will show you existing tables as well table which will be added in future.
[2]Create user for your technology partner and provide permission on those objects.
[3]Keep one audit-trail table, and insert entry of user name/IP etc.in it. So you will have complete tracking of activity carried out by your technology partner.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am relatively new to coding in the Microsoft stack and some practices in my new workplace differ from things I've seen before. Namely, I have seen a practice where Read-Only tables (ones that the application is not meant to be able to insert/edit/delete in) are prefixed with "lkp.EmailType", "lkp.Gender", "lkp.Prefix" and so on.
However, when I started developing some MVC5 apps using Entity Framework and a Database-First approach - when debugging my code I noticed it attempts to both pluralize the table name and change the schema - so "lkp.Gender" queries take on a select statement on "dbo.Genders". After looking into the pluralizing functionality, it seems best practice leans toward pluralizing table names, so I went ahead and did that for this application (this is a new application but we are using a similar DB structure as prior ones but do not have to keep it the same).
The last thing I need to do - is change these table schemas to be "dbo" as opposed to "lkp". In talking with some coworkers on their other projects, they found while read only lookup tables might use the DBO schema for their project, they might name it differently such as "dbo.LkpGenders" or the like.
This takes a bit of work to remove constraints on other tables using these LKP tables and such and I wanted to ask the community before I put too much effort toward this change if it is even a good idea or not and put my time towards either making LKP tables work or doing away with them.
In short - Is usage of LKP schemas for read-only tables an old practice or is this still a good idea to do and I just have been in other workplaces and project who were doing it "wrong"? As an added bonus, reasoning why MVC5/EF may be using DBO schemas on something it created an EDMX fine out of would be good to know. Should I be using a naming convention, DB Views, or LKP schemas for this kind of read-only lookup data?
Some thoughts:
I like plural table names. A row can contain an entity; a table can contain many entities. But, naming conventions should be guidelines rather than carved-in-stone rules. It is impossible that any one rule would be the best alternative under all situations. So allow some flexibility.
My only exception to that last caveat is to name tables and views identically. That is, the database object Employees could be either a table or view. The apps that use it wouldn't know which one it is (or care) and the DB developers could quickly find out (if it was pertinent). There is absolutely no reason to differentiate between tables and views by name and many good reasons to abstract tables and views to just "data sources".
The practice of keeping significant tables in their own database/schema is a valid one. There is something about these tables (read-only) that group them together organizationally, so it can make sense to group them together physically. The problem can be when there are other such attributes: read-only employee data, read-only financial data, etc. If employee and financial data are also segregated into their own database/schema, which is the more significant attribute that would determine where they are located: read-only or employee/financial?
In your particular instance, I would not think that "read-only" is significant enough to rate segregation. Firstly, read-only is not a universal constraint -- someone must be able to maintain the data. So it is "read-only here, writable there". Secondly, just about any grouping of data can have some of that data that is generally read-only. Does it make sense to gather read-only data that is of use only to application X and read-only data that is of use only to application Y in the same place just because they are both read-only? And suppose application X now needs to see (read-only, of course) some of application Y's data to implement a new feature? Would that data be subject to relocation to the read-only database?
A better alternative would be to place X-only data in its own location, Y-only data in its own location and so forth. Company-wide data would go in dbo. Each location could have different requirements for the same common data -- read-only for some, writable for others. These differing requirements could be implemented by local views. A do nothing "instead of" trigger on the view would render it completely read only, but a view with working triggers would make it indistinguishable from the underlying table(s). Each application would have its own view in its own space with triggers as appropriate. So each sees the same data but only one can manipulate that data.
Another advantage to accessing common (dbo) data or shared data from another location through local views is that each application, even though they are looking at the same data, may want the data in different formats and/or different field names. Views allow you to provide the data to each application exactly the way that application wants to see it.
This can also greatly improve the maintainability of your physical data. If a table needs to be normalized or denormalized or a field renamed, added or dropped entirely, go ahead and do it. Just rewrite the views to minimize if not completely eliminate the differences that make it back to the apps. Application code may not have to be changed at all. How's that for cool?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have been given the task to design a database to store a lot of information for our company. Because the task is rather big and contains multiple modules where users should be able to do stuff, I'm worried about designing a good data model for this. I just don't want to end up with a badly designed database.
I want to have some decent examples of database structures for contracts / billing / orders etc to combine those in one nice relational database. Are there any resources out there that can help me with some examples regarding this?
Barry Williams has published a library of about six hundred data models for all sorts of applications. Almost certainly it will give you a "starter for ten" for all your subsystems. Access to this library is free so check it out.
It sounds like this is a big "enterprise-y" application your organisation wants, and you seem to be a bit of a beginner with databases. If at all possible you should start with a single sub-system - say, Orders - and get that working. Not just the database tables build but some skeleton front-end for it. Once that is good enough add another, related sub-system such as Billing. You don't want to end up with a sprawling monster.
Also make sure you have a decent data modelling tool. SQL Power Architect is nice enough for a free tool.
Before you start read up on normalization until you have no questions about it at all. If you only did this in school, you probably don't know enough about it to design yet.
Gather your requirements for each module carefully. You need to know:
Business rules (which are specific to applications and which must be enforced in the database because they must be enforced on all records no matter the source),
Are there legal or regulatory concerns (HIPAA for instance or Sarbanes-Oxley requirements)
security (does data need to be encrypted?)
What data do you need to store and why (is this data available anywhere else)
Which pieces of data will only have one row of data and which will need to have multiple rows?
How do you intend to enforce uniqueness of the the row in each table? Do you have a natural key or do you need a surrogate key (suggest a surrogate key in almost all cases)?
Do you need replication?
Do you need auditing?
How is the data going to be entered into the database? Will it come from the application one record at a time (or even from multiple applications)or will some of it come from bulk inserts from an ETL tool or from another database.
Do you need to know who entered the record and when (highly likely this will be necessary in an enterprise system.
What kind of lookup tables will you need? Data entry is much more accurate when you can use lookup tables and restrict the users to the values.
What kind of data validation do you need?
Roughly how many records will the system have? You need to have an idea to know how big to create your test data.
How are you going to query the data? Will you be using stored procs or an ORM or dynamic queries?
Some very basic things to remember in your design. Choose the right data type for your data. Do not store dates or numbers you intend to do math on in string fields. Do store numbers that are not candidates for math (part numbers, zip codes, phone numbers, etc) as string data as you may need leading zeros. Do not store more than one piece of information in a field. So no comma-concatenated lists (these indicate the need for a related table) and while you are at it if you find yourself doing something like phone1, phone2, phone 3, stop right away and design a related table. Do use foreign keys for data integrity purposes.
All the way through your design consider data integrity. Data that has no integrity is meaningless and useless. Do design for performance, this is critical in database design and is NOT premature optimization. Database do not refactor easily, so it is important to get the most critical parts of the performance equation right the first time. In fact all databases need to be designed for data integrity, performance and security.
Do not be afraid to have multiple joins, properly indexed these will perform just fine. Do not try to put everything into an entity value type table. Use these as sparingly as possible. Try to learn to think in terms of handling sets of data, it will help your design. Databases are optimized to do things in sets.
There's more but this is enough to start digesting.
Try to keep your concerns separate here. Users being able to update the database is more of an "application design" problem. If you get your database design right then it should be a case of developing a nice front end for it.
First thing to look at is Normalization. This is the process of eliminating any redundant data from your tables. This will help keep your database neat, and only store information that is relevant to your needs.
The Data Model Resource Book.
http://www.amazon.com/Data-Model-Resource-Book-Vol/dp/0471380237/ref=dp_cp_ob_b_title_0
HEAVY stuff, but very well through out. 3 volumes all in all...
Has a lot of very well through out generic structures - but they are NOT easy, as they cover everything ;) Always a good starting point, though.
The database should not be the model. It is used to save informations between sessions of work.
You should not build your application upon a data model, but upon a good object oriented model that follows business logic.
Once your object model is done, then think about how you can save and load it, with all the database design that goes with it.
(but apparently your company just want you to design a database ? not an application ?)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My questions is regarding Database Modeling. I tried to look for this question amongst other Database Designing questions on SO but haven't found it and so here am asking about:
What are the general guidelines and best practices to keep in mind while designing database for an application ?
What are the best resources/books/University Lectures available on Database Design Concepts ?
Thanks.
Just some things I've learned from experience (I'm sure some will disagree, but I've been querying and designing and programming databases for 30+years and have seen the effects of stupid design up close and personal):
There are three critical things to consider in all database design - data integrity (without this you essentially have no data), security and performance. All other considerations take a back seat to these three.
Never create a table without a way to uniquely identify a record.
There really are very few true natural keys that really work as a primary key, if you don't have control over whether it will change, do not use it as a primary key (you don't really want to change the company name through 27 child tables do you?). Use a surrogate key instead. Using a surrogate key does not exempt you from the need to set unique indexes if you could have used a unique composite key. Always set these indexes if you can determine a way to have a unique composite. Duplicate records are the bane of an application's existance. It seems obvious but never ever consider name to be a key field, names are not and never will be unique.
Do not use a GUID as your primary key as it can kill performance. If you need a guid for replication also consider having an int or big int primary key.
Do not design as if you will be changing database backends unless you know up front you will be doing so. Virtually all the good techniques for performance tuning are database specific, don't harm your own ability to tune your database for a non-existant requirement.
Avoid value-entity table structures. They are miserable to query.
Add all things you need to ensure data integrity into your database design, things like defaults, constraints, triggers, etc. are necessary to avoid having useless data. Do not rely on the application code to do this or you will be sorry.
Others mentioned normalization, I agree you must understand this thoroughly even if you later decide to denormalize.
Do not stack views on top of views if you want any kind of performance at all. Every database I've seen that does this is eventually a huge performance problem.
Consider what data you will need to manage the database as well as what the application needs. If you are going to be serious about databases you need to understand database auditing and your database should implement ways to find out who made what change and when and what the old data was. You'll thank me the first time someone malicious changes the data or someone deletes all the records in a table accidentally.
Really think through how the data will be queried when designing. It can make a huge difference in the design.
Do not store more than one piece of information in a field. It might look cool to put a comma delimited list into one field rather than add a related table but it is a really bad idea.
Elegance is often the enemy of performance in databases. Pick performance over elegance every time and you won't go wrong.
Avoid the use of database keywords in the naming of objects. Your programmers will thank you. Pick a naming convention and be consistent in always using it. If a field is in mulitple tables make sure it is the same name (exception if an id field has two possible foreign keys in the same table use the id field name and a prefix to identify the differnce between say Sales_person_id and Customer_person_id), same datatype and length, if applicable in all of them. Fix misspellings right away, you really don't want to spend the next ten years remembering that in tablea it is the persnoid instead of personid.
Read about database refactoring (search on amazon for some good books) and consider how to be able to do this in your design. Few databases are designed to be refactored and being able to do so is critical towards being able to fix database problems that arise from badly thought out designs or changes to business requirements.
While you are reading, read about performance tuning, you'll learn a tremendous amount about what to avoid in designing the database.
I'm sure there's more but this is enough to start with.
One addtional thing I wanted to add. Do not design your database as if the data entry application page is the most critical thing. Data is often queried more often than it is written even in a transactional database. Really think about how easy it will be to to get data back out of the database (Oh so that's why the EAV model is so bad!) and what effect the design will have on reporting. This is espcially critical as I often see that the people doing the reporting are not the people who design the database or reporting tasks are later in the project than createing the data entry. Databases are not easy to refactor, consider the whole life cycle of the data when designing a database. Think about things like storing moment in time values as you can't find out how much an order was for two years later by multiplying the quantity ordered by the price in the products table as that wasn't the price at the time of the order. Reporting needs this type if information, but it often too late to get it by the time the reports are written when the design is done badly. Stuff that works fine when you are handling one record at a time can be a disaster when you need to look at thousands or millions of records. Not every application is going to create a separate reporting datbase, so really think about this.
DEPENDS
this question is like saying "what is the best car to buy", it really depends on many factors including amount of data, number of concurrent users, what you are trying to do, etc. FYI, normalization is good for some database uses, but bad for others (data warehouse).
Give us a better idea of how you intend to use the data, and you'll get some better recommendations.
While I agree with others that your question right now is much too broad and can't really be answered (except for the "it depends" approach :-)), there is one book I would wholeheartedly recommend for anyone beginning database design in general:
Michael Hernandez: Database Design for Mere Mortals(R): A Hands-On Guide to Relational Database Design
It's a really hands-on, no-frills, down to earth book and introduces all the major and important concepts in a very understandable, very approachable fashion. Well written, interesting, very sound and useful - highly recommended!
Marc
your question is too broad. Normalization and denormalization are most used concepts.
The best thing to do is to start with a well normalized database. The wikipedia article has some good information on that along with some good references.
Typically you'll end up denormalizing parts of your database for better performance, but you almost always want to start with it in 4th normal form.
Look at wikipedia article about database normalization. There is also further reading section.
If you design a new database for brand new application you should try use ORM library (like JPA implementations in Java) that release you from database design, because these tools generate database from domain model. If you don't have any experience in this field - database generated with ORM tools will be much better of yours.
Consider all your use cases. Think about every single possible way someone might want to get to data, and plan for those. Wear your designer, developer, tester, and user hats.
Try to think of database tables as representing physical objects.
Normalize, as others have said.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What should be the data model for a work flow application? Currently we are using an Entity Attribute Value based model in SQL Server 2000 with the user having the ability to create dynamic forms (on asp.net), but as the data grows performance is getting down and hard to generate report and worse if too many users concurrently query the data (EAV).
As you have probably realized, the problem with an EAV model is that tables grow very large and queries grow very complex very quickly. For example, EAV-based queries typically require lots of subqueries just to get at the same data that would be trivial to select if you were using more traditionally-structured tables.
Unfortunately, it is quite difficult to move to a traditionally-structured relational model while simultaneously leaving old forms open to modification.
Thus, my suggestion: consider closing changes on well-established forms and moving their data to standard, normalized tables. For example, if you have a set of shipping forms that are not likely to change (or whose change you could manage by changing the app because it happens so rarely), then you could create a fixed table and then copy the existing data out of your EAV table(s). This would A) improve your ability to do reporting, B) reduce the amount of data in your existing EAV table(s) and C) improve your ability to support concurrent users / improve performance because you could build more appropriate indices into your data.
In short, think of the dynamic EAV-based system as a way to collect user's needs (they tell you by building their forms) and NOT as the permanent storage. As the forms evolve into their final form, you transition to fixed tables in order to gain the benefits discussed above.
One last thing. If all of this isn't possible, have you considered segmenting your EAV table into multiple, category-specific tables? For example, have all of your shipping forms in one table, personnel forms in a second, etc. It won't solve the querying structure problem (needing subqueries) but it will help shrink your tables and improve performance.
I hope this helps - I do sympathize with your plight as I've been in a similar situation myself!
Typically, when your database schema becomes very large and multiple users are trying to access the same information in many different ways, Data Warehousing, is applied in order to reduce major load on the database server. Unlike your traditional schema where you are more than likely using Normalization to keep data integrity, data warehousing is optimized for speed and multiple copies of your data are stored.
Try using the relational model of data. It works.