Database performance concerns - repeated data - database

I have some database performance concerns (not yet a real issue but I would like to make sure everything is good enough).
I have around 10 tables that are connected. There is one main object/table that is 'mother' of all and contains the userID (these records are user specific). So, if I want to get any record from any table for specific user I would have to do lets say 5+ joins.
I decided to avoid complications with joins so I added this userID to all tables. Now if I want to get records from a specific table for specific user I wont need any joins.
Would this cause any issues and is it bad practice is my question.
Microsoft technologies used for both application and database.

A little hard to say without understanding the structure of your joins. I take it from your question that you have a hierarchy of tables like:
Customers -> Invoices -> Orders -> OrderItems, (where -> = 1 to many) and the question is it problematic to include sy, customer ID in the OrderItems table because without it, to determine customerID for a particular order item you'd have to traverse back up the chain to get Order, then Invoice in order to get customer ID (give that CustomerID is the join column between Customers and Invoices.)
The answer is probably "it depends". If you are purist you would probably avoid that, but if you often find you have an OrderItem record, but the you don't thave the invoiceID to hand. and you need to find say "customer address" from the customer table, then maybe its worth it.
things to think about is whether or not the relationships are volatile or not, e.g. whether or not say, an an invoice can be transferred from one customer to another, or an order can be transferred from one invoice to another. If that were the case, then you would have ot make sure to remember to change not only the customerID in Invoices, but also the customerID on all of the orders on that invoice, and all of the orderitems in each order. If you take the fully normalized approach, you only have 1 update to make. If not, you could have many writes to many different tables depending on how deep you go into the hierarchy and how many records are on the many side in your db. If you have sprinkled customerIDs sprinkled liberally all over the hierarchy, that could be a lot of writes, and keeping everything in synch could be a pain in the neck.
If the folks who are writing code against your db don't really understand exactly what you've done, it could turn inot a bloody mess and I think that's why people would tend to avoid it.
I would say to be practical about it. How often do you find you need to traverse the hierarchy to do something you want to do, and whether or not the the foreign key you are replicating is ever likely to change.

Related

Database design, an included attribute vs multiple joins? Confused

So I am taking a class in database design and management and am kind of confused from a design perspective. My example is an invoice system. I just made it up quick so it doesn't have a ton of complexity in it.
There are Customers, Orders, Invoices and Payments entities
Customers
CustId(PK),
Street,
Zip,
City,
..
Orders
OrderID(PK)
CustID(FK)
Date
Amt
....
Invoices
InvoiceID(PK),
OrderID(FK),
Date,
AmtDue,
AmtPaid,
....
Payments
PaymentNo(PK),
InvoiceID(FK),
PayMethod,
Date,
Amt,
...
Customer entity has a one to many relationship with Orders
Purchases entity has a one to many relationship with Invoices
Invoices Entity has a one to many relationship with Payments.
To get the results of a query to list all Payments made by a Customer the query would have to join Payments with the Invoice table, the Invoice table with the Orders table and the Orders table with the Customer table.
Is this the correct way to do it? One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
Bonus question. Lets say there should be a report that says what the total customer balance is. Does there need to be a customer balance field in the database or can this be a calculated item that is produced by joining tables and adding up the amount billed vs amount paid?
Thanks!
Is this the correct way to do it?
Yes. Based on the information provided, it looks reasonable.
One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
The question you're asking falls under "normal forms", often called normalization. Your target should be Boyce-Codd normal form (similar to 3NF), which should be described in your textbook. I will warn you that misinformation and misuderstanding of database design issues is very abundant on the interwebs, so beware of which answers you pay attention to.
The goal of normalization is to eliminate redundancy, and thus to eliminate "anomaliies", whereby two logically equivalent queries produce inconsistent results. If the same information is kept in two places, and is updated in only one, then two queries against the two different values will produce different -- i.e, inconsistent -- results.
In your example, if there is a Payments.CustID, should I believe that one, or the one derived from joining Payments to Orders? The same goes for total customer balance: do I believe the stored total, or the one I computed from the consituents?
If you are going to "denomalize for performance", as is so often alleged to be necessary, what are you going to do to ensure the redundant values are consistent?
Bonus question. Lets say there should be a report that says what the total customer balance is.
As a matter of fact, in practice balances are sort of a special case. It's often necessary to know the balance at points in time. While it's possible to compute, say, monthy account balances from inception based on transactions, as a practical matter applications usually "draw a line in the sand" and record the balance for future reference. Step are taken -- must be, for the sake of the business -- to ensure the historical information does not change or, if it does, that the recorded balance is updated to reflect the change. From that description alone, you can imagine that the work of enforcing consistency throughout the system is much more work than relying on the DBMS to enforce it. And that is why, insofar as is feasible, it's better to elimate all redundant data, and let the DBMS do the job it was designed to do.
In your analysis, seek Boyce-Codd normal form. Understand your data, eliminate the redundancies, and recognize the relations. Let the DBMS enforce referential integrity. Countless errors will be avoided, and time saved. Only when specific circumstances conspire to show that specific business requirements cannot be satisfied on a particular system with a given, correct design, does one begin the tedious and error-prone work of introducing redundant information and compensating for it with external controls.
"Is this the correct way to do it?" Of course, given your current design. But it's not the ONLY way. So you're studying DB "normalization" and seeing the pros and cons of the various "forms" of normalization. In the "real world" things can change on a dime, due to a management decision or whatever. I tend to use "compound primary keys" instead of simply one field for primary and others as FK. I handle my "FK" programmatically instead of relegating that responsibility to the DB.
I also create and utilize a number of "intermediate" tables, or sometimes "VIEWS", that I use more easily than a bunch of code with too many JOINs. (3rd Normal form addicts can hate, but my code runs faster than a scalded rabbit).
An Order means nothing without a Customer; an Invoice means nothing without an Order; a Payment is great, but means nothing without both an Order and Invoice. So lemme throw this out there -- what's wrong with having a "summary" type of entity that has Cust, Order, Invoice #, and Payment Id ?

1:1 Relationships. Split into more than 1 table? Bad?

I am creating a mobile game where I am optimistically hoping i'll have millions of players.
I have created a users table that currently has roughly 8 columns (ie. userid, username, password, last_signin, etc)
For every user I'll also need to record the amount of in-game currency they have (ie. gold, silver, gems, etc).
This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table that the users table will be hammered with a crazy amount of queries per second. Everytime someone in the game finds more gold, more silver, logs in, creates an account... the users table will be accessed and/or updated.
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources" that had the following columns : userid, gold, silver, gems. This new table would have the exact same number of rows as the user table since there is a 1:1 relationship between users and resources. I'm wondering if those queries would be faster since the database data is split up and not all queries would go to the same table.
Clearly to me it seems better to put it all in 1 table since they are 1:1.... but It also seemed like a bad idea to have the majority of the games data in 1 table.
Thanks for any advice you can give!
Ryan
There are plenty of cases where good design calls for two tables in a 1:1 relationship with each other. There is no normalization rule that calls for decomposing tables in this manner. But normalization isn't the only handle on good design.
Access traffic is another handle. Your intuition that access to resources is going to be much more frequent than access to basic user data sounds credible. But you will need to check it out, to make sure that the transactions that access resources don't end up using basic user data anyway. It all boils down to which costs more: a fat user table or more joins.
Other responders have already hinted that there may come a day when the 1:1 relationship becomes a 1:many relationship. I can imagine one. The model of the game player gets expanded where a single user can get involved in multiple distinct instances of the game. In this case, a single user might have the same basic user data in all instances, but different resources in each instance. I have no way of telling if this is ever going to happen in your case. But, if it does, you're going to be better off with a separate resources table.
It really depends on your game design, how big your database is, and how you might expand your database in the future. I would put the resources in a separate table with a foreign key pointing to the user id because:
You can keep the user table slimmer for easier
maintenance/backup.
Simple 1-to-1 JOIN operation between two
tables doesn't take much more resources than having everything in
the same table, as long as you have proper indexing.
By keeping your tables separated, you are practicing separation of concerns;
multiple people can work on different stuff without having to worry
about affecting other tables.
Easier to expand. You may want to add other columns such as birth_date, region, first_name, etc. that
are more relevant to users' personal info to the users table in the
future. It will be confusing if columns of different purposes are
stored together. (In PostgreSQL you can't simply arrange column
order though you can create Views for that.)
This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
... for now ;)
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table
New columns?
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources"
Probably, because:
You'll be doing smaller writes when you update the frequently updated part, without rewriting less-modified user data
It makes it easier to audit changes to the user data

Database Design without inheritance

I have a come up with the following schema for a client of mine. Does anything look off here especially the Order Line Items. Should i use inheritance. I'm pretty sure that this site will only allow you to order courses, lessons, and giftcards, and that's it
Any feedback would be appreciated
Just my thinking on the design:
You have Courses, Lessons and GiftCards tables for the possible purchase objects, and OrderLines contains IDs for each of the tables. But in case a customer will purchase a Lesson and a GiftCard, they should be shown as 2 lines in the order. Also, what you will do if your client will want to trade more objects?
Therefore I think it might be better to redesign this part, like this:
OrderLines rename to OrderItems;
add ItemType table with 3 rows: Courses, Lessons, GiftCards;
add Items table with (ItemId, ItemType, Title, Price, LanguageCode, SortOrder, etc.) fields.
This way it will also be possible to add reviews not only for Lessons, but for all possible items.
You will have to come up with the preferred way to keep fields for the Items details. Right now Courses and Lessons share a lot of fields, therefore it might be reasonable to move all of them into the new Items table, as such fields seems also to be valid for the GiftCards also. And in case you have some specific details, like for GiftCards, you might add specific tables, like GiftCardItems with Items.id and a set of special fields not shared with other Item types.
A minor note: I would split Users into a couple of tables, as I suppose that this table will contain both, customers and support stuff. This means that this table might grow big (depending on how many customers are expected). Maintaining so many fields in a single table might be problematic when table will grow in number of rows.
And I agree with Matt — it is difficult to tell anything without requirements.
It is really hard to tell without knowing the requirements from your client. Everything looks good but I can't really tell if it is all inclusive of what the client wants without their requirements documentation.

Should I just use a single table?

I have an entity Order.
The order has information on date, client, associate who handled order etc.
Now the order also needs to store a state i.e. differentiate between won orders and lost orders.
The idea is that a customer may submit an order to the company, but could eventually back out.
(As domain info, the order is not of items. It is a services company that tries to handle clients and makes offers on when they can deliver an order and at what price etc. So the customer may find a better burgain and back up and stop the ordering process from the company).
The company wants data on both won orders and lost orders and the difference between a won order and a lost order is just a couple of more attributes e.g. ReasonLost which could be Price or Time.
My question is, what would be the best representation of the Order?
I was thinking of using a single table and just have for the orders won, the ReasonLost as null.
Does it make sense to create separate tables for WonOrder and LostOrder if the difference of these new entities is not significant?
What would be the best model for this case?
Use one table. Add an OrderState Field.
Caveat: If you are doing millions of transactions per day, then decisions like this need much more attention and analysis.
There is another alternative design that you might consider. In this alternative you keep a second table for the order lost reason and relate it to your order table as an optional 1:1. Note that this is effectively an implementation of a supertype/subtype pattern where the lost order subtype has one additional attribute.
It looks like this:
This alternative might be attractive under any of the following circumstances:
You lose very few orders.
Your order table isn't wide enough to hold a long enough lost order reason.
Your lost order reason is very, very big (even BLOB).
You have an aesthetic objection to maintaining a lost order reason in your order table.

Database relation design - Relating two tables twice in different tables

I have the following tables:
Post
Id int
User
Id int
Then I have the table
Favorite
PostId int
UserId int
and the table
Vote
PostId int
UserId int
IsUpVote bit
IsDownVote bit
LastActivity datetime2
the problem is that if I merged both Favorite and Vote into a single table, then I'd have something like
UserPost
PostId int
UserId int
IsFavorited bit
IsUpVoted bit
IsDownVoted bit
LastActivity datetime2
IsDownVote couldn't be computed anymore (since now, I can't use a "doesn't exist: didn't vote; didn't vote up: voted down" pattern anymore) and LastActivity will only reflect the last time the vote has changed (either up, down, or removed). So I'd maybe have to change that field's name or it's functionality. or even both..
So the question is basically, how wrong is having two tables relating Tables A and B (Post,User) in this case, which are indexed by the same primary key (PostId,UserId) in this case, but which are intended for different uses?
Favourites and Votes seem to be two different things, so IMHO you will be better off keeping them as separate tables. As you mentioned, you would lose functionality if you merged them, and I don't see any clear benefit to merge them. Stick with what you've got unless you can provide an awesome justification for the merge.
Nothing wrong at all.
I am not saying that the DDL provided shows correctly Normalised tables, but they are somewhat Normalised. As you have identified yourself, the two tables have different purposes, they have different meaning, so technically (theoretically, academically, and in practice [code] ), they are correct.
"related to the same parents" is not a criterion (there are many instances where there are many tables related to the same parents, and which are correct)
therefore such tables will "have the same PKs and FKs", so that is not a criterion either.
Only someone with no real concept of Normalisation, and no concept of the causes of negative performance, will suggest that "just because they have the same parents (and therefore the same pair of keys/indices)", they should be merged.
Vote and Favourite are two different Things, Entities, records of Action taken. Two tables is correct.
Distinction: The real reason IsDownVoted cannot be compared anymore is that it does not apply to Favourite. You have used an Indicator (bit) to identify that (although badly named); which is really a substitute for a Null column. Nulls are not good for performance, and it is a Good Thing that you have Indicators to identify the absence of data, and therefore avoided Nulls, but that is separate to breaking a Normalised design by mereging them.
The merged table will perform slower on all accesses. When you SELECT Votes from it, you have to exclude Favourites, and vice versa, but it will be doing I/O for both, because they are located together (PostId, UserId). SO the server is forever reading twice as many rows, using twice as much cache; etc. Then you will "add speed" by adding an index for (PostId, UserId, IsFavourited), making it even slower for Inserts and Deletes (while "speeding up" Selects). Messes get compounded, guaranteed; best to not have any mess in the first place.
When the database grows, you can independently add columns to either one of Vote and Favourite, without affecting the other. In a merged table, it will introduce complications.
You accept Answers too quickly.
While I won't say what you should do table wise if you use int instead of bit and use values like 0 1 and -1 to do calculations / comparisons, this way you could compute the values you want in a relatively simple way.
Talking relational databases you should almost always aim for 3'rd normal form regarding your tables - Try looking at http://en.wikipedia.org/wiki/Database_normalization
Cheers!

Resources