What's the better way to structure this data within a database? - sql-server

We're doing a little redesign and consolidation of some tables in a database. Where we once had 2 tables, 'administrators' and 'users' we're now combining into a single table called 'users'. To facilitate that change we've created a 'user_types' table and a 'user_user_types' table which is the one-to-many linking table between 'users' and 'user_types'.
The reason we have to use the 'user_types' table is because there are different types of administrators, one being a super admin who has access to everything in our system. In the old set up there was a bit field in the 'administrators' table called 'SiteAdmin' which indicated that that particular admin was a super admin or not. Since shifting to this new system my thought was there would be a user type in the 'user_types' table that would be a super admin and we'd just link the correct user to that user type in those cases however, my fellow programmer here says he still wants to use a 'SiteAdmin' bit field in the new 'users' table. I think that is a bit redundant but he claims there would be excess load and it would be more processor intensive on the SQL server to have to join the 'users', 'user_types' tables together to determine if a particular user was a super admin.
My question is which way is better? Is the hit on the SQL server so great in joining the two tables that it warrants adding the bit field to the 'users' table to flag super admins?
Thanks!

Your re-design seems reasonable and more maintainable down the road. I doubt the join will impact performance, especially since you are probably querying for the user type(s) once per user upon login.
You could write some unit tests against both designs, fill the tables with a bunch of fake data and run a timer against the join versus the one with use of a flag.

Performance difference will depend on how much data is in these tables. If your talking about a few hundred or thousand user rows, you won't see any difference between solutions. If you have millions of users, and maybe a great amount of simultaneous access to the data, you may see a difference.

I would find it strange if there was much of a hit to performance joining to what will be a small table.
What could be a bigger problem is storing differnt types of users two differnt ways which is confusing for maintenance. I think your solution is more straightforward for the long term.

Related

SQL Server Table With Multiple Entry Types

I'm not that experienced with SQL server but I need to come up with a solution to the following problem.
I'm creating a database that holds cars for sale. Cars are purchased via a handful of ways (contracts), here are 2 examples of the pricing fields needed:
I've left out unnecessary fields for the sake of clarity.
Type: Personal Contract Hire
Fields: InitalPayment, MonthlyPayment
Type: Personal Contract Purchase
Fields: InitialPayment, MonthlyPayment, GFMVPayment
The differences are subtle.
The question is, would it be better to create a table for each type along with some kind of header table or create a single table with a few extra unused fields? Or something else?
I know the purists will hate me for even raising the question of redundancy but the solution has to be practical too and I'm worried about overcomplicating something that needn't be.
I'm using Entity Framework as my ORM.
Any thoughts?
I've never designed a database, but I work with them everyday at my job. The databases I encounter were designed by professionals with years of experience in IT, and many of our tables face the same issue you are describing here. Every single time the answer is create a single table with a few extra unused fields. I realize this may just be the preference of the IT team and that this is not the only way to do it, but as someone who writes dozens of business-analytics queries a day, I can confidently say that this design is very natural and easy to use.
You're probably going to run into this problem again in the future. You may even create another type that requires a 4th field. Imagine if every time that happened, you just added another table. Your database would quickly become hard to manage, and anyone else using it would need to memorize which three or four tables give access to pretty much the same data, with only subtle differences. That's not very user-friendly.
Overall, I suggest creating a single table with some unused fields.

which is better ? Reduncancy with faster access of data , or no redundancy and slower data access

i want to create a database for a forum website...
All the users of the forum website will be stored in a table named USERS with the following fields :
user_name
user_ID
(and additional details)
there will be a single table named FORUMS with the following fields :
forum_ID
forum_creatorID(which is the ID of one of the users)
forum_topic
replies
views
And for each forum(for each row in the FORUMS table) created, there'll be a separate table which is named as "forum_ID"_replies , where the exact forum_ID of that forum will be replaced within the quotes...
thus, each forum will have a separate table where all the replies for that particular forum will be saved...
the fields in the "forum_ID"_replies table are
user_ID
user_name
comment
timestamp(for the comment)
I hope i made my design clear... now, my doubt is
I saved user_name as one of the fields in each "forum_ID"_replies . But, i think the user_name can be referred(or accessed) from USERS table using the user_ID , instead of storing it in each "forum_ID"_replies table. In this way, redundancy is reduced.
But, if user_name is stored in each table, the search for user_name will be reduced , and result can be displayed faster.
Which is more optimal ?
Storing names along with their IDs for faster access, or storing only the IDs to avoid reduncancy ?
"Optimal", "better" etc. are all subjective.
Most database designers would have several problems with your proposal.
Database normalization recommends not duplicating data - for good reason. What happens if your user changes their username? You have to update the user table, but also find all the "forum_id"_replies tables where their username occurs; if you mess that up, all of a sudden, you have a fairly obvious bug - people think they're replying to "bob", but they're actually replying to "jane".
From a performance point of view, unless you have esoteric performance demands (e.g. you're running Facebook), the join to the user table will have no measurable impact - you're joining on a primary key column, and this is what databases are really, really good at.
Finally, creating separate tables for each forum is not really a good idea unless you have huge performance/scalability needs (read: you're Facebook) - the additional complexity in maintaining the database, building queries, connecting your apps to the database etc. is significant; the performance overhead of storing multiple forums in a single table usually is not.
"Better" depends on your criteria. If (as you write in the comments) you are concerned about scalability and supporting huge numbers of posts, my recommendation is to start by building a way of testing and measuring your scalability levels. Once you can test and measure, you can test different solutions, and know whether they have a material impact - very often, this shows counter-intuitive outcomes. Performance optimizations often come at the expense of other criteria - your design, for instance, is more error prone (repeated information means you can get discrepancies) and more expensive to code (writing the logic to join to different tables for each forum). If you can't prove that it has a material benefit in scalability, and that this benefit meets your business requirements, you're probably wasting time & money.
You can use tools like DBMonster to populate your database with test data, and JMeter to run lots of concurrent database queries - use those tools to try both solutions, and see if your solution is, indeed, faster.

What is wrong with this database design?

I was pointed out by someone else that the following database design have serious issues, can anyone tell me why?
a tb_user table saves all the users information
tb_user table will have 3 - 8 users only.
each user's data will be saved in a separate table, naming after the user's name.
Say a user is called: bill_admin, then he has a seperate table, i.e. bill_admin_data, to save all data belongs to him. All users' data shared the same structure.
The person who pointed out this problem said I should merge all the data into one table, and uses FK to distinguish them, but I have the following statement:
users will only be 3 - 8, so there's not gonna be a lot of tables anyway.
each user has a very large data table, say 500K records.
Is it a bad practise to design database like this? And why? Thank you.
Because it isn't very maintainable.
1) Adding data to a database should never require modifying the structure. In your model, if you ever need to add another person you will need a new table (or two). You may not think you will ever need to do this, but trust me. You will.
So assume, for example, you want to add functionality to your application to add a new user to the database. With this structure you will have to give your end users rights to create new tables, which creates security problems.
2) It violates the DRY principle. That is, you are creating multiple copies of the same table structure that are identical. This makes maintenance a pain in the butt.
3) Querying across multiple users will be unnecessarily complicated. There is no good reason to split each user into a separate table other than having a vendetta against the person who has to write queries against this DB model.
4) If you are splitting it into multiple tables for performance because each user has a lot of rows, you are reinventing the wheel. The RDBMS you are using undoubtedly has an indexing feature which allows it to efficiently query large tables. Your home-grown hack is not going to outperform the platform's approach for handling large data.
I wouldn't say it's bad design per se. It is just not the type of design for which relational databases are designed and optimized for.
Of course, you can store your data as you mention, but many operations won't be trivial. For example:
Adding a new person
Removing a person
Generating reports based on data across all your people
If you don't really care about doing this. Go ahead and do your tables as you propose, although I would recommend using a non relational database, such as MongoDB, which is more suited for this type of structure.
If you prefer using relational databases, by aggregating data by type, and not by person gives you lots of flexibility when adding new people and calculating reports.
500k lines is not "very large", so don't worry about size when making your design.
it is good to use Document based database like mongoDB for these type of requirement.

Can you have 2 tables with identical structure in a good DB schema?

2 tables:
- views
- downloads
Identical structure:
item_id, user_id, time
Should I be worried?
I don't think that there is a problem, per se.
When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence.
Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often.
This alone could be a good reason not to burden it with records from the downloads.
Also, the fact that they are indentical now does not mean they will be in the future: views and downloads are different, after all, so sooner or later one or both could grow an extra field or two.
These tables are the same NOW but may schema change in the future. If they represent 2 different concepts it is good to keep them separate. What if you wanted to have a foreign key from another table to the downloads table but not the views table, if they were that same table you could not do this.
I think the answer has to be "it depends". As someone else pointed out, if the schema of one or both tables is likely to evolve then no. I can think of other cases well (simplifying the security model by allow apps/users access to one or the other).
Having said this, I work with a legacy DB where this is a problem. We have multiple identical tables for customer invoices. Data is actually moved between then at different stages in the processing life-cycle. It makes for a complicated mess when trying to access data. It would have been easily solved by a state flag in the original schema, but we now have 20+ years of code written against the multi-table version.
Short answer: depends on why they are the same schema :).
From a E/R modelling point of view I don't see a problem with that, as long as they represent two semantically different entities.
From an implementation point of view, it really depends on how you plan to query that data:
If you plan to query those tables independently from each other, keeping them separate is a good choice
If you plan to query those tables together (maybe with a UNION of a JOIN operation) you should consider storing them in a single table with a discriminator column to distinguish their type
When considering whether to consolidate them into a single table you should also take into account other factors like:
The amount of data stored in each table
The rate at which data grows in each table
The ratio of read/write operations executed on each table
Chris Date and Dave McGoveran formalised the "Principle of Orthogonal Design". Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result.
Arguably it isn't always totally practical to do that and it isn't necessarily clear cut exactly when the principle is being broken. However, I do think it's a good guiding rule, if only because it avoids the problem of duplicate logic in data access code or constraints, i.e. it's a good DRY principle. Avoid having tables with potentially overlapping meanings unless there is some database constraint that prevents duplication between them.
It depends on the context - what is a View and what is a Download? Does a Download imply a View (how else would it be downloaded)?
It's possible that you have well-defined, separate concepts there - but it is a smell I'd want to investigate further. It seems likely that a View and a Download are related somehow, but your model doesn't show anything.
Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding.

SQL-Server DB design time scenario (distributed or centralized)

We've an SQL Server DB design time scenario .. we've to store data about different Organizations in our database (i.e. like Customer, Vendor, Distributor, ...). All the diff organizations share the same type of information (almost) .. like Address details, etc... And they will be referred in other tables (i.e. linked via OrgId and we have to lookup OrgName at many diff places)
I see two options:
We create a table for each organization like OrgCustomer, OrgDistributor, OrgVendor, etc... all the tables will have similar structure and some tables will have extra special fields like the customer has a field HomeAddress (which the other Org tables don't have) .. and vice-versa.
We create a common OrgMaster table and store ALL the diff Orgs at a single place. The table will have a OrgType field to distinguish among the diff types of Orgs. And the special fields will be appended to the OrgMaster table (only relevant Org records will have values in such fields, in other cases it'll be NULL)
Some Pros & Cons of #1:
PROS:
It helps distribute the load while accessing diff type of Org data so I believe this improves performance.
Provides a full scope for accustomizing any particular Org table without effecting the other existing Org types.
Not sure if diff indexes on diff/distributed tables work better then a single big table.
CONS:
Replication of design. If I have to increase the size of the ZipCode field - I've to do it in ALL the tables.
Replication in manipulation implementation (i.e. we've used stored procedures for CRUD operations so the replication goes n-fold .. 3-4 Inert SP, 2-3 SELECT SPs, etc...)
Everything grows n-fold right from DB constraints\indexing to SP to the Business objects in the application code.
Change(common) in one place has to be made at all the other places as well.
Some Pros & Cons of #2:
PROS:
N-fold becomes 1-fold :-)
Maintenance gets easy because we can try and implement single entry points for all the operations (i.e. a single SP to handle CRUD operations, etc..)
We've to worry about maintaining a single table. Indexing and other optimizations are limited to a single table.
CONS:
Does it create a bottleneck? Can it be managed by implementing Views and other optimized data access strategy?
The other side of centralized implementation is that a single change has to be tested and verified at ALL the places. It isn't abstract.
The design might seem a little less 'organized\structured' esp. due to those few Orgs for which we need to add 'special' fields (which are irrelevant to the other tables)
I also got in mind an Option#3 - keep the Org tables separate but create a common OrgAddress table to store the common fields. But this gets me in the middle of #1 & #2 and it is creating even more confusion!
To be honest, I'm an experienced programmer but not an equally experienced DBA because that's not my main-stream job so please help me derive the correct tradeoff between parameters like the design-complexity and performance.
Thanks in advance. Feel free to ask for any technical queries & suggestions are welcome.
Hemant
I would say that your 2nd option is close, just few points:
Customer, Distributor, Vendor are TYPES of organizations, so I would suggest:
Table [Organization] which has all columns common to all organizations and a primary key for the row.
Separate tables [Vendor], [Customer], [Distributor] with specific columns for each one and FK to the [Organization] row PK.
The sounds like a "supertype/subtype relationship".
I have worked on various applications that have implemented all of your options. To be honest, you probably need to take account of the way that your users work with the data, how many records you are expecting, commonality (same organisation having multiple functions), and what level of updating of the records you are expecting.
Option 1 worked well in an app where there was very little commonality. I have used what is effectively your option 3 in an app where there was more commonality, and didn't like it very much (there is more work involved in getting the data from different layers all of the time). A rewrite of this app is implementing your option 2 because of this.
HTH

Resources