Database design help - hierarchical data - sql-server

So I have a table of user accounts (Users). There needs to be functionality in place for subaccounts.
So for instance, a company named Dunder Mifflin might have an account. The company will have subaccounts, Accounting and Sales. The Accounting account would have subaccounts for Kevin, Angela, and Oscar. And there's no limit on the number of levels.
My initial idea was to create a table like this:
CREATE TABLE Users
(
UserID INTEGER,
ParentUserID INTEGER,
...
)
Where a primary account's ParentUserID would just be null, but a subaccount would contain the UserID of its parent.
Is this a good design for this? I don't know of any other way.

it is a good design for it. An alternative is to use the HIERARCHID data type to mape the hierarchy, but support for that is limited (reporting, ORM tools etc.).
Actually I use EXACTLY this in a number of setups. There simply is not too many alternatives that are not obviously dump (like having X fields for the hierarchy). I acutally know of no single alternative.

That is a good design and you really have little other choice. Read up on CTEs (Common Table Expressions) which will help you query this hierarchical relationships (recursively.)
Recursive querying of hierarchical structures was possible in Sql Server 2000 but is much simplified with CTEs since 2005.
A table that joins to itself will look something like this in your designer:

This is called a self-join and, yes, it is the standard way of representing hierarchical data. You're probably going to need to query like this to get something like all of the users associated with Dunder Mifflin.

What are subaccounts are used for? Database design is a serious matter.
Earlier answers claim that the design you demonstrate is good by definition. When you have hierarchical data, yes, you always have parent ID. However, very often you have some sort of group account to which accounts belong. That would be a more proper place to setup a hierarchy.

Related

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Database Is-a relationship

My problem relates to DB schema developing and is as follows.
I am developing a purchasing module, in which I want to use for purchasing items and SERVICES.
Following is my EER diagram, (note that service has very few specialized attributes – max 2)
My problem is to keep products and services in two tables or just in one table?
One table option –
Reduces complexity as I will only need to specify item id which refers to item table which will have an “item_type” field to identify whether it’s a product or a service
Two table option –
Will have to refer separate product or service in everywhere I want to refer to them and will have to keep “item_type” field in every table which refers to either product or service?
Currently planning to use option 1, but want to know expert opinion on this matter. Highly appreciate your time and advice. Thanks.
I'd certainly go to the "two tables" option. You see, you have to distinguish Products and Services, so you may either use switch(item_type) { ... } in your program or entirely distinct code paths for Product and for Service. And if a need for updating the DB schema arises, switch is harder to maintain.
The second reason is NULLs. I'd advise avoid them as much as you can — they create more problems than they solve. With two tables you can declare all fields non-NULL and forget about NULL-processing. With one table option, you have to manually write code to ensure that if item_type=product, then Product-specific fields are not NULL, and Service-specific ones are, and that if item_type=service, then Service-specific fields are not NULL, and Product-specific ones are. That's not quite pleasant work, and the DBMS can't do it for you (there is no NOT NULL IF another_field = value column constraint in SQL or anything like this).
Go with two tables. It's easier to support. I once saw a DB where everything, every single piece of data went in just two tables — there were pages and pages of code to make sure that necessary fields are not NULL.
If I were to implement I would have gone for the Two table option, It's kinda like the first rule of normalization of the schema. To remove multi-valued attributes. Using item_type is not recommended. Once you create separate tables you dont need to use the item_type you can just use the foreign key relationship.
Consider reading this article :
http://en.wikipedia.org/wiki/Database_normalization
It should help.

Database paid users and trial users in same table

If you had to design a database with paid users and trial users would you put them in the same table and differentiate between them with a field? Or would you put them in two separate tables?
Or would you do the best of both worlds and put them in the same table but create two views 1) PaidUsers and 2) TrialUsers
Thanks!
I just express some performance considerations in following opinions.
In single user query(ex. login check, or data retrieving for single user), there are not significant differences between these two strategies.
But if you need some statistic data, for example, one for paid users and another for trial users. And seperating to two tables may be a good idea.
Otherwise, if you need some statistic data whatever paid users or trial users, single table may be a good idea.
What if you need both of scenarios? Well, I think that would be a case which some common attributes exist between two kinds of users.
These common attributes should be put in one table, and dedicated attributes for particular users should be put in 'sub-table' inheriting from former table. Just as vonPetrushev said.
Since your paid users would probably be related to some additional data, but still have the same fieldset as non-paid, the correct way to do this is [is-a] approach:
User
id
username
password
fullname
...
Paiduser
user_id [fk->User::id]
account_id
.... [other addidional data]
EDIT: Now, the trial users will be all records in User that does not have entry in Paiduser. I'm assuming that Paiduser fieldset is a superset of the fieldset of a trial/normal user [User].
EDIT 2: To get a list of trial users, which are 'set difference' between User and Paiduser, the following sql should work:
select u.*
from (User as u
join Paiduser as p on u.id<>p.user_id)
The best solution may depend on database type. My experience is with MySQL and SQL Server. I've always put all users into a single table. Then differentiate as needed using fields. This could apply to paid/ unpaid or anything else. This solution meets 3NF standards and seems easier to me for maintenance etc. What reason would there be to use multiple tables?

Database Permission Structure

Many of my employers applications share a similar internal permission structure for restricting data to a specific set of users or groups. Groups can also be nested.
The problem we're currently facing with this approach is that enumerating the permissions is incredibly slow. The current method uses a stored procedure with many cursors and temporary tables. This has worked fine for smaller applications, but we now have one particular system which is growing quickly, and it's starting to slow down.
The basic table structure is as follows;
tblUser { UserID, Username, WindowsLogonName }
tblGroup { GroupID, Name, Description, SystemFlag }
tblGroupGroup { GroupGroupID, Name, }
tblGroupUser { GroupUserID, Name, }
and to tie it all together;
tblPermission { PermissionID, SecurityObjectID, SecuredID, TableName, AllowFlag }
which contains rows like..
'5255-5152-1234-5678', '{ID of a Group}', '{ID for something in tblJob}', 'tblJob', 1
'4240-7678-5435-8774', '{ID of a User}', '{ID for something in tblJob}', 'tblJob', 1
'5434-2424-5244-5678', '{ID of a Group}', '{ID for something in tblTask}', 'tblTask', 0
Surely there must be a more efficient approach to enumerating all the groups, and getting the ID's of the secured rows?
To complicate things further; if a user is explicitly denied access to a row then this overrules any group permissions. This is all in MSSQL.
I'm guessing it would be useful to break apart tblPermission into a couple of tables: one for groups and one for users. By having both groups and users in there, it seems to add complexity to the design (and maybe that's why you need the stored procedures).
If you want to break down the tblPermission table (into something like tblUserPermission and tblGroupPermission) but still want a representation of the tables that looks like tblPermission, you can make a view that union's the data from the two tables.
Hope this helps. Do you have examples of what the stored procedures do?
I think you could use a Recursive Common Table Expressions (CTE) hierarchical query. You can find many examples if you search for it. This is one of them.
Perhaps your design is OK, but the implementation/code is wrong.
Some thoughts:
Are all your ID columns GUID? Not recommended Kimberley L Tripp article
Indexes on all foreign keys, perhaps with other columns in key or INCLUDE
Regular maintenance? eg fragmented indexes, stats ot of date etc
Are all datatypes matching (assumes no FKs): datatype precedence and implicit conversion errors may creep in
Some more schema info and examples of poorly performing code may help

Database schema design

I'm quite new to database design and have some questions about best practices and would really like to learn.
I am designing a database schema, I have a good idea of the requirements and now its a matter of getting it into black and white.
In this pseudo-database-layout, I have a table of customers, table of orders and table of products.
TBL_PRODUCTS:
ID
Description
Details
TBL_CUSTOMER:
ID
Name
Address
TBL_ORDER:
ID
TBL_CUSTOMER.ID
prod1
prod2
prod3
etc
Each 'order' has only one customer, but can have any number of 'products'.
The problem is, in my case, the products for a given order can be any amount (hundreds for a single order) on top of that, each product for an order needs more than just a 'quantity' but can have values that span pages of text for a specific product for a specific order.
My question is, how can I store that information?
Assuming I can't store a variable length array as single field value, the other option is to have a string that is delimited somehow and split by code in the application.
An order could have say 100 products, each product having either only a small int, or 5000 characters or free text (or anything in between), unique only to that order.
On top of that, each order must have it's own audit trail as many things can happen to it throughout it's lifetime.
An audit trail would contain the usual information - user, time/date, action and can be any length.
Would I store an audit trail for a specific order in it's own table (as they could become quite lengthy) created as the order is created?
Are there any places where I could learn more about techniques for database design?
The most common way would be to store the order items in another table.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_ITEM:
ID
TBL_ORDER.ID
TBL_PRODUCTS.ID
Quantity
UniqueDetails
The same can apply to your Order audit trail. It can be a new table such as
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
AuditDetails
First of all, Google Third Normal Form. In most cases, your tables should be 3NF, but there are cases where this is not the case because of performance or ease of use, and only experiance can really teach you that.
What you have is not normalized. You need a "Join table" to implement the many to many relationship.
TBL_ORDER:
ID
TBL_CUSTOMER.ID
TBL_ORDER_PRODUCT_JOIN:
ID
TBL_ORDER.ID
TBL_Product.ID
Quantity
TBL_ORDER_AUDIT:
ID
TBL_ORDER.ID
Audit_Details
The basic conventional name for the ID column in the Orders table (plural, because ORDER is a keyword in SQL) is "Order Number", with the exact spelling varying (OrderNum, OrderNumber, Order_Num, OrderNo, ...).
The TBL_ prefix is superfluous; it is doubly superfluous since it doesn't always mean table, as for example in the TBL_CUSTOMER.ID column name used in the TBL_ORDER table. Also, it is a bad idea, in general, to try using a "." in the middle of a column name; you would have to always treat that name as a delimited identifier, enclosing it in either double quotes (standard SQL and most DBMS) or square brackets (MS SQL Server; not sure about Sybase).
Joe Celko has a lot to say about things like column naming. I don't agree with all he says, but it is readily searchable. See also Fabian Pascal 'Practical Issues in Database Management'.
The other answers have suggested that you need an 'Order Items' table - they're right; you do. The answers have also talked about storing the quantity in there. Don't forget that you'll need more than just the quantity. For example, you'll need the price prevailing at the time of the order. In many systems, you might also need to deal with discounts, taxes, and other details. And if it is a complex item (like an airplane), there may be only one 'item' on the order, but there will be an enormous number of subordinate details to be recorded.
While not a reference on how to design database schemas, I often use the schema library at DatabaseAnswers.org. It is a good jumping off location if you want to have something that is already roughed in. They aren't perfect and will most likely need to be modified to fit your needs, but there are more than 500 of them in there.
Learn Entity-Relationship (ER) modeling for database requirements analysis.
Learn relational database design and some relational data modeling for the overall logical design of tables. Data normalization is an important part of this piece, but by no means all there is to learn. Relational database design is pretty much DBMS independent within the main stream DBMS products.
Learn physical database design. Learn index design as the first stage of designing for performance. Some index design is DBMS independent, but physical design becomes increasingly dependent on special features of your DBMS as you get more detailed. This can require a book that's specifically tailored to the DBMS you intend to use.
You don't have to do all the above learning before you ever design and build your first database. But what you don't know WILL hurt you. Like any other skill, the more you do it, the better you'll get. And learning what other people already know is a lot cheaper than learning by trial and error.
Take a look at Agile Web Development with Rails, it's got an excellent section on ActiveRecord (an implementation of the same-named design pattern in Rails) and does a really good job of explaining these types of relationships, even if you never use Rails. Here's a good online tutorial as well.

Resources