Database Design - LKP Schemas for Read-Only Tables? Best Practices? [closed] - sql-server

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am relatively new to coding in the Microsoft stack and some practices in my new workplace differ from things I've seen before. Namely, I have seen a practice where Read-Only tables (ones that the application is not meant to be able to insert/edit/delete in) are prefixed with "lkp.EmailType", "lkp.Gender", "lkp.Prefix" and so on.
However, when I started developing some MVC5 apps using Entity Framework and a Database-First approach - when debugging my code I noticed it attempts to both pluralize the table name and change the schema - so "lkp.Gender" queries take on a select statement on "dbo.Genders". After looking into the pluralizing functionality, it seems best practice leans toward pluralizing table names, so I went ahead and did that for this application (this is a new application but we are using a similar DB structure as prior ones but do not have to keep it the same).
The last thing I need to do - is change these table schemas to be "dbo" as opposed to "lkp". In talking with some coworkers on their other projects, they found while read only lookup tables might use the DBO schema for their project, they might name it differently such as "dbo.LkpGenders" or the like.
This takes a bit of work to remove constraints on other tables using these LKP tables and such and I wanted to ask the community before I put too much effort toward this change if it is even a good idea or not and put my time towards either making LKP tables work or doing away with them.
In short - Is usage of LKP schemas for read-only tables an old practice or is this still a good idea to do and I just have been in other workplaces and project who were doing it "wrong"? As an added bonus, reasoning why MVC5/EF may be using DBO schemas on something it created an EDMX fine out of would be good to know. Should I be using a naming convention, DB Views, or LKP schemas for this kind of read-only lookup data?

Some thoughts:
I like plural table names. A row can contain an entity; a table can contain many entities. But, naming conventions should be guidelines rather than carved-in-stone rules. It is impossible that any one rule would be the best alternative under all situations. So allow some flexibility.
My only exception to that last caveat is to name tables and views identically. That is, the database object Employees could be either a table or view. The apps that use it wouldn't know which one it is (or care) and the DB developers could quickly find out (if it was pertinent). There is absolutely no reason to differentiate between tables and views by name and many good reasons to abstract tables and views to just "data sources".
The practice of keeping significant tables in their own database/schema is a valid one. There is something about these tables (read-only) that group them together organizationally, so it can make sense to group them together physically. The problem can be when there are other such attributes: read-only employee data, read-only financial data, etc. If employee and financial data are also segregated into their own database/schema, which is the more significant attribute that would determine where they are located: read-only or employee/financial?
In your particular instance, I would not think that "read-only" is significant enough to rate segregation. Firstly, read-only is not a universal constraint -- someone must be able to maintain the data. So it is "read-only here, writable there". Secondly, just about any grouping of data can have some of that data that is generally read-only. Does it make sense to gather read-only data that is of use only to application X and read-only data that is of use only to application Y in the same place just because they are both read-only? And suppose application X now needs to see (read-only, of course) some of application Y's data to implement a new feature? Would that data be subject to relocation to the read-only database?
A better alternative would be to place X-only data in its own location, Y-only data in its own location and so forth. Company-wide data would go in dbo. Each location could have different requirements for the same common data -- read-only for some, writable for others. These differing requirements could be implemented by local views. A do nothing "instead of" trigger on the view would render it completely read only, but a view with working triggers would make it indistinguishable from the underlying table(s). Each application would have its own view in its own space with triggers as appropriate. So each sees the same data but only one can manipulate that data.
Another advantage to accessing common (dbo) data or shared data from another location through local views is that each application, even though they are looking at the same data, may want the data in different formats and/or different field names. Views allow you to provide the data to each application exactly the way that application wants to see it.
This can also greatly improve the maintainability of your physical data. If a table needs to be normalized or denormalized or a field renamed, added or dropped entirely, go ahead and do it. Just rewrite the views to minimize if not completely eliminate the differences that make it back to the apps. Application code may not have to be changed at all. How's that for cool?

Related

Database schema for Partners [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We have an application to manage company, teams, branches,employee etc and have different tables for that. Now we have a requirement that we have to give access of same system to our technology partners so that they can also do the same thing which we are doing. But at the same time we need to supervise these partners in our system.
So in terms of DB schema what will be the best way to manage them:
1)To duplicate the entire schema for partners, and for that we have to duplicate around 50-60 tables and many more in future as system will grows.
2)To create some flag in each table which will tell it is internal or external entity.
Please suggest if anyone has any experience.
Consider the following points before finalizing any of the approaches.
Do you want a holistic view of the data
By this I mean that do you want to view the data your partner creates and which you create in a single report / form. If the answer is yes then it would make sense to store the database in the same set of tables and differentiate them based on some set of columns.
Is your application functionality going to vary significantly
If the answer to this question is NO then it would make sense to keep the data in the same set of tables. This way any changes you do to your system will automatically reflect to all the users and you won't have to replicate your code bits across schemas / databases.
Are you and your partner going to use the same master / reference data
If the answer to this question is yes then again it makes sense to use the same set of tables since you will do away with unnecessary redundant data.
Implementation
Rather than creating a flag I would recommend creating a master table known as user_master. The key of this table should be made available in every transaction table. This way if you want to include a second partner down the line you can make a new entry in your user_master table and make necessary modifications to your application code. Your application code should manage the security. Needless to say that you need to implement as much security as possible at the database level too.
Other Suggestions
To physical separate data of these entities you can either implement
partitioning or sharding depending upon the db you are using.
Perform thorough regression testing and check that your data is not
visible in partner reports or forms. Also, check that partner is not
able to update or insert your data.
Since the data in your system will increase significantly it would
make sense to performance test your reports, forms and programs.
If you are using indexes then you will need to revisit those since
your where conditions would change.
Also, revisit your keys and relationships.
None of your asked suggestion is advisable. You need to follow given guideline to secure your whole system and audit your technology partner as well.
[1]You should create a module on Admin side which will show you existing tables as well table which will be added in future.
[2]Create user for your technology partner and provide permission on those objects.
[3]Keep one audit-trail table, and insert entry of user name/IP etc.in it. So you will have complete tracking of activity carried out by your technology partner.

Database constraints vs Application level validation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
While researching the topic, I came across this post: Should you enforce constraints at the database level as well as the application level?
The person who answered the question claimed that we should enforce Database constraint because it is "easier, integrity, flexible".
The reason I brought out this question is because of my recent maintenance work in one of a very robust systems. Due to a change in business rule, one of the data columns used to have CHAR(5) is now accepting 8 Characters. This table has many dependencies and will also affect many other tables not only in the database but also a few other systems, thus increasing the size to CHAR(8) is literally impossible.
So my question goes back to the database design - wouldn't it be so much easier if you reduce or even eliminate the need of database constraints? If the above mentioned scenario would have happened, all you have to do is to change the front-end or application level validation to make sure the user enter 8 characters for that field.
In my opinion, we should minimize the database constraint to anticipate any changes in the data structure in the future. What is your thought?
It's easier to maintain 100 tables than 100,000 lines of code. In general, constraints that are enforced in the application but not in the database have to be replicated across many applications. Sometimes those applications are even written and maintained by different teams.
Keeping all those changes in sync when the requirements change is a nightmare. The ripple effect is even worse than the cases you outline for changing a five character field into an 8 character field. This is how things were done before databases were invented.
Having said that, there are situations where it's better to enforce the constraints in applications than in the database. There are even cases where it's better to enforce a constraint in both places. (Example: non null constraint).
And very large organizations sometimes maintain a data dictionary, where every data item is cataloged, defined, and described in terms of features, including constraints. In this kind of environment, databases actually acquire their data definitions from the dictionary. And application programs do the same thing, generally at precompile time.
Future proofing such an arrangement is still a challenge.
I agree with you that, constraints like the length of the field should be avoided, you never know how your business will changed. and hardware nowadays are cheep, it really not necessary to use CHAR(8) just for less storage.
But those contraints like not null constraints,duplicate check and foreignkey constraints for a header details table is better to be kept. it's like the goal keeper of your data intergrate.
Database systems provide a number of benefits, one of the most important is (physical) data independence. Data independence can be defined as an immunity of application program to change in the way that the data is physically stored and accessed, this concept is tightly related to data-model design and normalization roles where data constraints are fundamental.
Database sharing is one of the application integration patterns, widely used between independent applications. Tradeoff will be trying to spread data integrity code in all applications or in a centric fashion inside database.
Minimizing the database constraint will minimize usage of wide range of well-known, proven technologies developed over many years by a wide variety of very smart people.
As a foot note:
This table has many dependencies and will also affect many other
tables not only in the database but also a few other systems
Beside this smells redundancy, at least it shows the side effect of the change. Think about when you have to find the side effects with code review!
Application comes, applications go but data remains.

Data independence in relational database

Can anybody explain how data independence in ensured in a relational database? What says that nothing will change for the user if the database structure changes?
For example, I have relation R (and have created an application which uses this relation) and the database admin decides to make a decomposition of R to R1 and R2. How is application inalterability ensured for the end user?
I asked myself exactly the same question during my Database class.
According Codd's 12 rules, there are two kinds of data independence:
Physical Data Independence requires that changes at the physical level (like data structures) have no impact in the applications that consume the database. For example, let's say you decide to stop using a Hash Index in your table and decide to use a B-Tree Index instead: Your application that executes queries against this table doesn't have to change at all.
Logical Data Independence states that changes at the logical level (tables, columns, rows) will have no impact in the applications that access the database. As you already noticed, this feature is harder to implement that Physical Data Independence but there are still cases when this feature works. For example, if you add Tables, Columns or Rows to your current scheme the already working queries aren't affected at all.
Your question is not phrased very clearly. I don't see the relationship between between "data independence" and "application inalterability".
A proper relational structure decomposes data into entities and relationships. The idea is that when a value changes, it only changes in one place. This is the reasoning behind the various "normal forms" of data.
Most user applications do not want to see data in a normalized form. They want to see data in a denormalized form, often with lots of fields gathered together on one line. Similarly, an update might involve several fields in different entities, but to a user, it is just one thing.
A relational database can maintain the structure of the data and allow you to combine data for different viewpoints. It has nothing to do with your second point. Application independence (I think this is a better word than "inalterability") depends on how the application is designed. A well-designed application has a well-design application programming interface (also known as an API).
It seems that a lot of database developers think that the physical data structure is good enough as an API. However, this is often a bad design decision. Often, a better design decision is to have all database operations performed through stored procedures, views, and user defined functions. In other words, don't directly update a table. Create a stored procedure called something usp_table_update that takes fields and updates the table.
With such a structure, you can modify the underlying database structure and maintain user applications at the same time.
what says that nothing will change for the user if the database
structure changes?
Well, database structures can change for many reasons. On a high level, I see two possibilities:
Performance / internal database reasons
Business rules / the world outside the application changed
#1: in this case, the DBA has decided to change some structure for performance or ... In that case an extra layer, for example using stored procedures, views etc. can help to "hide" the change to the application/user. Or a good data-layer on the application side could be helpfull.
#2: if the outside world changes, or your business rules change, NOTHING you can do on the database level can keep that away from the user. For example a company that always has used only ONE currency in the database is suddenly going international: in that case your database has to be adopted to support multi currency and it will need serious alteration in the database and for the user.
For example, I have relation R (and created application which uses this relation) and the database admin desides to make a decomposion of R to R1 and R2. How the application inalternability is ensured for the end user?
The admin should create a view which would represent R1 and R2 as the original R.

Database per application VS One big database for all applications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm designing a few applications that will share 2 or 3 database tables and all of the other tables will be independent of each app. The shared databases contain mostly user information, and there might occur the case where other tables need to be shared, but that's my instinct speaking.
I'm leaning over the one database for all applications solution because I want to have referential integrity, and I won't have to keep the same information up to date in each of the databases, but I'm probably going to end with a database of 100+ tables where only groups of ten tables will have related information.
The database per application approach helps me keep everything more organized, but I don't know a way to keep the related tables in all databases up to date.
So, the basic question is: which of both approaches do you recommend?
Thanks,
Jorge Vargas.
Edit 1:
When I talk about not being able to have referential integrity, it's because there's no way to have foreign keys in tables when those tables are in different databases, and at least one of the tables per application will need a foreign key to one of the shared tables.
Edit 2:
Links to related questions:
SQL design around lack of cross-database foreign key references
Keeping referential integrity across multiple databases
How to salvage referential integrity with mutiple databases
Only the second one has an accepted answer. Still haven't decided what to do.
Answer:
I've decided to go with a database per application with cross-database references to a shared database, adding views to each database mimicking the tables in the shared database, and using NHibernate as my ORM. As the membership system I'll be using the asp.net one.
I'll also use triggers and logical deletes to try and keep to a minimum the number of ID's I'll have flying around livin' la vida loca without a parent. The development effort needed to keep databases synced is too much and the payoff is too little (as you all have pointed out). So, I'd rather fight my way through orphaned records.
Since using an ORM and Views was first suggested by svinto, he gets the correct answer.
Thanks to all for helping me out with this tough decision.
Neither way looks ideal
I think you should consider not making references in database layer for cross-application relations, and make them in application layer. That would allow you to split it to one database per app.
I'm working on one app with 100+ tables. I have them in one database, and are separated by prefixes - each table has prefix for module it belongs to. Then i have built a layer on top of database functions to use this custom groups. I'm also building data administrator, which takes advantage of this table groups and makes editing data very easy.
It depends and your options are a bit different depending on the database and frameworks you're using. I'd recommend using some sort of ORM and that way you don't need to bother that much. Anyways you could probably put each app in it's own schema in the database and then either reference the shared tables by schemaname.tablename or create views in each application schema that's just a SELECT * FROM schemaname.tablename and then code against that view.
There are no hard and fast rules to choose one over the other.
Multiple databases provide modularity. As far as sync-ing across multiple databases are concerned, one can use the concept of linked servers and views thereof and can gain the advantages of integrated database (unified access) as well.
Also, keeping multiple databases can help better management of security, data, backup & restore, replication, scaling out etc!
My 2cents.
THat does not sound like "a lot of applications" at all, but like "one application system with different executables". Naturally they can share one database. Make smart usage of Schemata to isolate the different funcational areas within one database.
One database for all application in my opinion .Data would be stored once no repitation.
With the other approach you would end up replicating and in my opinion when you start replicating it will bring its own headache and data would go out of sync too
The most appropriate approach from scalability and maintenence point of view would be to make the "shared/common" tables subset self-sufficient and put it to "commons" database, for all others have 1 db per application of per logical scope (business logic determined) and maintain this structure always
This will ease the planning and execution commissioning/decommissioning/relocation/maintenence procedures of your software (you will know exactly which two affected DBs (commons+app_specific) are involved if you know which app you are going to touch and vice versa.
At our business, we went with a separate database per application, with cross database references for the small amount of shared information and an occasional linked server. This has worked pretty well with a development, staging, build and production environments.
For users, our entire user base is on windows. We use Active Directory to manage the users with application references to groups, so that the apps don't have to manage users, which is nice. We did not centralize the group management, that is each application has tables for groups and security which is not so nice but works.
I would recommend, that if your applications are really different, to have a database per application. Looking back, the central shared database for users sounds workable as well.
You can use triggers for cross database referential integrity:
Create a linked server to the server that holds the database that you want to reference. Then use 4-part naming to reference the table in the remote database that holds the reference data. Then put this in the insert and update triggers on the table.
EXAMPLE(assumes single row inserts and updates):
DECLARE #ref (datatype appropriate to your field)
SELECT #ref = refField FROM inserted
IF NOT EXISTS (SELECT *
FROM referenceserver.refDB.dbo.refTable
WHERE refField = #ref)
BEGIN
RAISERROR(...)
ROLLBACK TRAN
END
To do multi row inserts and updates you can join the tables on the reference field but it can be very slow.
I think the answer to this question depends entirely on your non functional requirements. If you are designing a application that will one day need to be deployed across 100's of nodes then you need to design your database so that if need be it could be horizontally scaled. If on the other hand this application is to be used by a hand full of users and may have a short shelf life then you approach will be different. I have recently listened to a pod cast of how EBAY's architecture is set-up, http://www.se-radio.net/podcast/2008-09/episode-109-ebay039s-architecture-principles-randy-shoup, and they have a database per application stream and they use sharding to split tables across physical nodes. Now their non-functional requirements are that the system is available 24/7, is fast, can support thousands of users and that is does not lose any important data. EBAY make millions of pounds and so can support the effort that this takes to develop and maintain.
Anyway this does not answer your question:) my personnel opinion would be to make sure your non-functional requirements have been documented and signed off by someone. That way you can decide on the best Architecture. I would be tempted to have each application using its own database and a central database for shared data. And I would try to minimise the dependencies between them, which I'm sure is not easy or you would have done it:), but I would also try to steer clear of having to produce some sort of middle ware software to keep tables in sync as this could create a headaches for you.
At the end of the day you need to get your system up and running and the guys with the pointy hair wont give a monkeys chuff about how cool your design is.
We went for splitting the database down, and having one common database for all the shared tables. Due to them all being on the save SQL Server instance it didn't affect the cost of running queries across multiple database.
The key in replication for us was that whole server was on a Virtual Machine (VM), so for replication to create Dev/Test environments, IT Support would just create a copy of that image and restore additional copies when required.

Good place to look for example Database Designs - Best practices [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have been given the task to design a database to store a lot of information for our company. Because the task is rather big and contains multiple modules where users should be able to do stuff, I'm worried about designing a good data model for this. I just don't want to end up with a badly designed database.
I want to have some decent examples of database structures for contracts / billing / orders etc to combine those in one nice relational database. Are there any resources out there that can help me with some examples regarding this?
Barry Williams has published a library of about six hundred data models for all sorts of applications. Almost certainly it will give you a "starter for ten" for all your subsystems. Access to this library is free so check it out.
It sounds like this is a big "enterprise-y" application your organisation wants, and you seem to be a bit of a beginner with databases. If at all possible you should start with a single sub-system - say, Orders - and get that working. Not just the database tables build but some skeleton front-end for it. Once that is good enough add another, related sub-system such as Billing. You don't want to end up with a sprawling monster.
Also make sure you have a decent data modelling tool. SQL Power Architect is nice enough for a free tool.
Before you start read up on normalization until you have no questions about it at all. If you only did this in school, you probably don't know enough about it to design yet.
Gather your requirements for each module carefully. You need to know:
Business rules (which are specific to applications and which must be enforced in the database because they must be enforced on all records no matter the source),
Are there legal or regulatory concerns (HIPAA for instance or Sarbanes-Oxley requirements)
security (does data need to be encrypted?)
What data do you need to store and why (is this data available anywhere else)
Which pieces of data will only have one row of data and which will need to have multiple rows?
How do you intend to enforce uniqueness of the the row in each table? Do you have a natural key or do you need a surrogate key (suggest a surrogate key in almost all cases)?
Do you need replication?
Do you need auditing?
How is the data going to be entered into the database? Will it come from the application one record at a time (or even from multiple applications)or will some of it come from bulk inserts from an ETL tool or from another database.
Do you need to know who entered the record and when (highly likely this will be necessary in an enterprise system.
What kind of lookup tables will you need? Data entry is much more accurate when you can use lookup tables and restrict the users to the values.
What kind of data validation do you need?
Roughly how many records will the system have? You need to have an idea to know how big to create your test data.
How are you going to query the data? Will you be using stored procs or an ORM or dynamic queries?
Some very basic things to remember in your design. Choose the right data type for your data. Do not store dates or numbers you intend to do math on in string fields. Do store numbers that are not candidates for math (part numbers, zip codes, phone numbers, etc) as string data as you may need leading zeros. Do not store more than one piece of information in a field. So no comma-concatenated lists (these indicate the need for a related table) and while you are at it if you find yourself doing something like phone1, phone2, phone 3, stop right away and design a related table. Do use foreign keys for data integrity purposes.
All the way through your design consider data integrity. Data that has no integrity is meaningless and useless. Do design for performance, this is critical in database design and is NOT premature optimization. Database do not refactor easily, so it is important to get the most critical parts of the performance equation right the first time. In fact all databases need to be designed for data integrity, performance and security.
Do not be afraid to have multiple joins, properly indexed these will perform just fine. Do not try to put everything into an entity value type table. Use these as sparingly as possible. Try to learn to think in terms of handling sets of data, it will help your design. Databases are optimized to do things in sets.
There's more but this is enough to start digesting.
Try to keep your concerns separate here. Users being able to update the database is more of an "application design" problem. If you get your database design right then it should be a case of developing a nice front end for it.
First thing to look at is Normalization. This is the process of eliminating any redundant data from your tables. This will help keep your database neat, and only store information that is relevant to your needs.
The Data Model Resource Book.
http://www.amazon.com/Data-Model-Resource-Book-Vol/dp/0471380237/ref=dp_cp_ob_b_title_0
HEAVY stuff, but very well through out. 3 volumes all in all...
Has a lot of very well through out generic structures - but they are NOT easy, as they cover everything ;) Always a good starting point, though.
The database should not be the model. It is used to save informations between sessions of work.
You should not build your application upon a data model, but upon a good object oriented model that follows business logic.
Once your object model is done, then think about how you can save and load it, with all the database design that goes with it.
(but apparently your company just want you to design a database ? not an application ?)

Resources