We've an SQL Server DB design time scenario .. we've to store data about different Organizations in our database (i.e. like Customer, Vendor, Distributor, ...). All the diff organizations share the same type of information (almost) .. like Address details, etc... And they will be referred in other tables (i.e. linked via OrgId and we have to lookup OrgName at many diff places)
I see two options:
We create a table for each organization like OrgCustomer, OrgDistributor, OrgVendor, etc... all the tables will have similar structure and some tables will have extra special fields like the customer has a field HomeAddress (which the other Org tables don't have) .. and vice-versa.
We create a common OrgMaster table and store ALL the diff Orgs at a single place. The table will have a OrgType field to distinguish among the diff types of Orgs. And the special fields will be appended to the OrgMaster table (only relevant Org records will have values in such fields, in other cases it'll be NULL)
Some Pros & Cons of #1:
PROS:
It helps distribute the load while accessing diff type of Org data so I believe this improves performance.
Provides a full scope for accustomizing any particular Org table without effecting the other existing Org types.
Not sure if diff indexes on diff/distributed tables work better then a single big table.
CONS:
Replication of design. If I have to increase the size of the ZipCode field - I've to do it in ALL the tables.
Replication in manipulation implementation (i.e. we've used stored procedures for CRUD operations so the replication goes n-fold .. 3-4 Inert SP, 2-3 SELECT SPs, etc...)
Everything grows n-fold right from DB constraints\indexing to SP to the Business objects in the application code.
Change(common) in one place has to be made at all the other places as well.
Some Pros & Cons of #2:
PROS:
N-fold becomes 1-fold :-)
Maintenance gets easy because we can try and implement single entry points for all the operations (i.e. a single SP to handle CRUD operations, etc..)
We've to worry about maintaining a single table. Indexing and other optimizations are limited to a single table.
CONS:
Does it create a bottleneck? Can it be managed by implementing Views and other optimized data access strategy?
The other side of centralized implementation is that a single change has to be tested and verified at ALL the places. It isn't abstract.
The design might seem a little less 'organized\structured' esp. due to those few Orgs for which we need to add 'special' fields (which are irrelevant to the other tables)
I also got in mind an Option#3 - keep the Org tables separate but create a common OrgAddress table to store the common fields. But this gets me in the middle of #1 & #2 and it is creating even more confusion!
To be honest, I'm an experienced programmer but not an equally experienced DBA because that's not my main-stream job so please help me derive the correct tradeoff between parameters like the design-complexity and performance.
Thanks in advance. Feel free to ask for any technical queries & suggestions are welcome.
Hemant
I would say that your 2nd option is close, just few points:
Customer, Distributor, Vendor are TYPES of organizations, so I would suggest:
Table [Organization] which has all columns common to all organizations and a primary key for the row.
Separate tables [Vendor], [Customer], [Distributor] with specific columns for each one and FK to the [Organization] row PK.
The sounds like a "supertype/subtype relationship".
I have worked on various applications that have implemented all of your options. To be honest, you probably need to take account of the way that your users work with the data, how many records you are expecting, commonality (same organisation having multiple functions), and what level of updating of the records you are expecting.
Option 1 worked well in an app where there was very little commonality. I have used what is effectively your option 3 in an app where there was more commonality, and didn't like it very much (there is more work involved in getting the data from different layers all of the time). A rewrite of this app is implementing your option 2 because of this.
HTH
Related
i want to create a database for a forum website...
All the users of the forum website will be stored in a table named USERS with the following fields :
user_name
user_ID
(and additional details)
there will be a single table named FORUMS with the following fields :
forum_ID
forum_creatorID(which is the ID of one of the users)
forum_topic
replies
views
And for each forum(for each row in the FORUMS table) created, there'll be a separate table which is named as "forum_ID"_replies , where the exact forum_ID of that forum will be replaced within the quotes...
thus, each forum will have a separate table where all the replies for that particular forum will be saved...
the fields in the "forum_ID"_replies table are
user_ID
user_name
comment
timestamp(for the comment)
I hope i made my design clear... now, my doubt is
I saved user_name as one of the fields in each "forum_ID"_replies . But, i think the user_name can be referred(or accessed) from USERS table using the user_ID , instead of storing it in each "forum_ID"_replies table. In this way, redundancy is reduced.
But, if user_name is stored in each table, the search for user_name will be reduced , and result can be displayed faster.
Which is more optimal ?
Storing names along with their IDs for faster access, or storing only the IDs to avoid reduncancy ?
"Optimal", "better" etc. are all subjective.
Most database designers would have several problems with your proposal.
Database normalization recommends not duplicating data - for good reason. What happens if your user changes their username? You have to update the user table, but also find all the "forum_id"_replies tables where their username occurs; if you mess that up, all of a sudden, you have a fairly obvious bug - people think they're replying to "bob", but they're actually replying to "jane".
From a performance point of view, unless you have esoteric performance demands (e.g. you're running Facebook), the join to the user table will have no measurable impact - you're joining on a primary key column, and this is what databases are really, really good at.
Finally, creating separate tables for each forum is not really a good idea unless you have huge performance/scalability needs (read: you're Facebook) - the additional complexity in maintaining the database, building queries, connecting your apps to the database etc. is significant; the performance overhead of storing multiple forums in a single table usually is not.
"Better" depends on your criteria. If (as you write in the comments) you are concerned about scalability and supporting huge numbers of posts, my recommendation is to start by building a way of testing and measuring your scalability levels. Once you can test and measure, you can test different solutions, and know whether they have a material impact - very often, this shows counter-intuitive outcomes. Performance optimizations often come at the expense of other criteria - your design, for instance, is more error prone (repeated information means you can get discrepancies) and more expensive to code (writing the logic to join to different tables for each forum). If you can't prove that it has a material benefit in scalability, and that this benefit meets your business requirements, you're probably wasting time & money.
You can use tools like DBMonster to populate your database with test data, and JMeter to run lots of concurrent database queries - use those tools to try both solutions, and see if your solution is, indeed, faster.
I was pointed out by someone else that the following database design have serious issues, can anyone tell me why?
a tb_user table saves all the users information
tb_user table will have 3 - 8 users only.
each user's data will be saved in a separate table, naming after the user's name.
Say a user is called: bill_admin, then he has a seperate table, i.e. bill_admin_data, to save all data belongs to him. All users' data shared the same structure.
The person who pointed out this problem said I should merge all the data into one table, and uses FK to distinguish them, but I have the following statement:
users will only be 3 - 8, so there's not gonna be a lot of tables anyway.
each user has a very large data table, say 500K records.
Is it a bad practise to design database like this? And why? Thank you.
Because it isn't very maintainable.
1) Adding data to a database should never require modifying the structure. In your model, if you ever need to add another person you will need a new table (or two). You may not think you will ever need to do this, but trust me. You will.
So assume, for example, you want to add functionality to your application to add a new user to the database. With this structure you will have to give your end users rights to create new tables, which creates security problems.
2) It violates the DRY principle. That is, you are creating multiple copies of the same table structure that are identical. This makes maintenance a pain in the butt.
3) Querying across multiple users will be unnecessarily complicated. There is no good reason to split each user into a separate table other than having a vendetta against the person who has to write queries against this DB model.
4) If you are splitting it into multiple tables for performance because each user has a lot of rows, you are reinventing the wheel. The RDBMS you are using undoubtedly has an indexing feature which allows it to efficiently query large tables. Your home-grown hack is not going to outperform the platform's approach for handling large data.
I wouldn't say it's bad design per se. It is just not the type of design for which relational databases are designed and optimized for.
Of course, you can store your data as you mention, but many operations won't be trivial. For example:
Adding a new person
Removing a person
Generating reports based on data across all your people
If you don't really care about doing this. Go ahead and do your tables as you propose, although I would recommend using a non relational database, such as MongoDB, which is more suited for this type of structure.
If you prefer using relational databases, by aggregating data by type, and not by person gives you lots of flexibility when adding new people and calculating reports.
500k lines is not "very large", so don't worry about size when making your design.
it is good to use Document based database like mongoDB for these type of requirement.
we make software for managing participants in grants given to non-profits. (for example, if your family needs food stamps, then that office must somehow track your family and report to the state)
Up until now we have been focused on one particularly complex grant. We are now wanting to expand to other grants. Our first goal was a fairly simplistic grant. The code for it was just piled onto the old application. Now we have decided the best course of action is to separate the two programs(because not all of our clients have both grants). This sounds easy in theory.
We can manage the code complexity brought about by this pretty easily with patches and SVNs merging functionality. The thing that is significantly harder is that their database is the same. The two grants share a few tables and a few procedures. But this is a rather large legacy database (more than 40 tables, 100s of stored procedures).
What exactly is the best way to keep these two databases separate, but still sharing their common elements? We are not concerned about conflicts between the two applications writing to the same DB(we have locks for that), but rather we're concerned about schema conflicts in development and updating our client's servers and managing the complexity.
We have a few options we've thought of:
Using Schemas (shared, grant1, grant2)
Using prefixed names
The book SQL Antipatterns had a solution to this sort of thing. Sharing common data is fine, just move the extended data into a separate table, much like extending a class in OOP.
Persons Table
--------------------------
PersonID LastName FirstName
FoodStamps Table (From Application 1)
--------------------------
PersonID FoodStampAllotment
HousingGrant Table (From Application 2)
--------------------------
PersonID GrantAmount
You could add prefixes to the extended tables if you want.
This query will get me just people in the FoodStamps program:
SELECT * FROM Persons
JOIN FoodStamps
ON FoodStamps.PersonID = Persons.PersonID
We're doing a little redesign and consolidation of some tables in a database. Where we once had 2 tables, 'administrators' and 'users' we're now combining into a single table called 'users'. To facilitate that change we've created a 'user_types' table and a 'user_user_types' table which is the one-to-many linking table between 'users' and 'user_types'.
The reason we have to use the 'user_types' table is because there are different types of administrators, one being a super admin who has access to everything in our system. In the old set up there was a bit field in the 'administrators' table called 'SiteAdmin' which indicated that that particular admin was a super admin or not. Since shifting to this new system my thought was there would be a user type in the 'user_types' table that would be a super admin and we'd just link the correct user to that user type in those cases however, my fellow programmer here says he still wants to use a 'SiteAdmin' bit field in the new 'users' table. I think that is a bit redundant but he claims there would be excess load and it would be more processor intensive on the SQL server to have to join the 'users', 'user_types' tables together to determine if a particular user was a super admin.
My question is which way is better? Is the hit on the SQL server so great in joining the two tables that it warrants adding the bit field to the 'users' table to flag super admins?
Thanks!
Your re-design seems reasonable and more maintainable down the road. I doubt the join will impact performance, especially since you are probably querying for the user type(s) once per user upon login.
You could write some unit tests against both designs, fill the tables with a bunch of fake data and run a timer against the join versus the one with use of a flag.
Performance difference will depend on how much data is in these tables. If your talking about a few hundred or thousand user rows, you won't see any difference between solutions. If you have millions of users, and maybe a great amount of simultaneous access to the data, you may see a difference.
I would find it strange if there was much of a hit to performance joining to what will be a small table.
What could be a bigger problem is storing differnt types of users two differnt ways which is confusing for maintenance. I think your solution is more straightforward for the long term.
I have an MS Access database with plenty of data. It's used by an application me and my team are developing. However, we've never added any foreign keys to this database because we could control relations from the code itself. Never had any problems with this, probably never will either.
However, as development has developed further, I fear there's a risk of losing sight over all the relationships between the 30+ tables, even though we use well-normalized data. So it would be a good idea go get at least the relations between the tables documented.
Altova has created DatabaseSpy which can show the structure of a database but without the relations, there isn't much to display. I could still use to add relations to it all but I don't want to modify the database itself.
Is there any software that can analyse a database by it's structures and data and then do a best-guess about its relations? (Just as documentation, not to modify the database.)
This application was created more than 10 years ago and has over 3000 paying customers who all use it. It's actually document-based, using an XML document for it's internal storage. The database is just used as storage and a single import/export routine converts it back and to XML. Unfortunately, the XML structure isn't very practical to use for documentation and there's a second layer around this XML document to expose it as an object model. This object model is far from perfect too, but that's what 10 years of development can do to an application. We do want to improve it but this takes time and we can't disappoint the current users by delaying new updates.Basically, we're stuck with its current design and to improve it, we need to make sure things are well-documented. That's what I'm working on now.
Only 30+ tables? Shouldn't take but a half hour or an hour to create all the relationships required. Which I'd urge you to do. Yes, I know that you state your code checks for those. But what if you've missed some? What if there are indeed orphaned records? How are you going to know? Or do you have bullet proof routines which go through all your tables looking for all these problems?
Use a largish 23" LCD monitor and have at it.
If your database does not have relationships defined somewhere other than code, there is no real way to guess how tables relate to each other.
Worse, you can't know the type of relationship and whether cascading of update and deletion should occur or not.
Having said that, if you followed some strict rules for naming your foreign key fields, then it could be possible to reconstruct the structure of the relationships.
For instance, I use a scheme like this one:
Table Product
- Field ID /* The Unique ID for a Product */
- Field Designation
- Field Cost
Table Order
- Field ID /* the unique ID for an Order */
- Field ProductID
- Field Quantity
The relationship is easy to detect when looking at the Order: Order.ProductID is related to Product.ID and this can easily be ascertain from code, going through each field.
If you have a similar scheme, then how much you can get out of it depends on how well you follow your own convention, but it could go to 100% accuracy although you're probably have some exceptions (that you can build-in your code or, better, look-up somewhere).
The other solution is if each of your table's unique ID is following a different numbering scheme.
Say your Order.ID is in fact following a scheme like OR001, OR002, etc and Product.ID follows PD001, PD002, etc.
In that case, going through all fields in all tables, you can search for FK records that match each PK.
If you're following a sane convention for naming your fields and tables, then you can probably automate the discovery of the relations between them, store that in a table and manually go through to make corrections.
Once you're done, use that result table to actually build the relationships from code using the Database.CreateRelation() method (look up the Access documentation, there is sample code for it).
You can build a small piece of VBA code, divided in 2 parts:
Step 1 implements the database relations with the database.createrelation method
Step 2 deleted all created relations with the database.delete command
As Tony said, 30 tables are not that much, and the script should be easy to set. Once this set, stop the process after step 1, run the access documenter (tools\analyse\documenter) to get your documentation ready, launch step 2. Your database will then be unchanged and your documentation ready.
I advise you to keep this code and run it regularly against your database to check that your relational model sticks to the data.
There might be a tool out there that might be able to "guess" the relations but I doubt it. Frankly I am scared of databases without proper foreign keys in particular and multi user apps that uses Access as a DBMS as well.
I guess that the app must be some sort of internal tool, otherwise I would suggest that you move to a proper DBMS ( SQL Express is for free) and adds the foreign keys.