How would you accommodate this requirement in EF4.0 - sql-server

I have a new circumstances (for me anyway) and wonder what is the best way to do this in EF4.0, (database first). This is a madeup example, but it mimics the logic of what I need to do:
Say you have just two tables PEOPLE and TEAMS, each team has a team leader and a backup team leader. The people table has a single record for each person, with a unique ID, the team record has a unique ID, but also a TeamLeaderID and a BackupTeamLeaderId, which map to the people table.
How do you handle this in EF? If I only had a teamleaderid, I could access it by Team.People.Name, but since I know have two links from teams->people this design will not work.
I can think of lots of kludgy scenarios for this, but what is the proper way to set this up in EF (or alternatively resdesign the underlying tables).

I'm not sure why you are starting with an EF question. Do you have your database schema worked out? If so, present that. If not, then I'd start there.
Your Team table will need a Leader foreign key to the People table. If any person can only be in one team, then you could add a TeamID column to your People table. Each person with TempID set to a particular Team would be a player on that team.

Related

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Best practice: database referencing tables

In database design what are the feelings of tuple vs referencing table for small pieces of data?
For instance, supposing you are designing a schema involving office management. You want to record what department each employee belongs to, but are otherwise uninterested in any information relating to departments. So do you have department as a string/char/varchar/etc in your EMPLOYEE table, or have it instead be a foreign key, relating a DEPARTMENT table.
If the DEPARTMENT table is recording nothing other than department names, one would normally want to combine this with the EMPLOYEE table. But if this is contained in the EMPLOYEE table you cannot guarantee that some users will call HR "HumanResourses", some may call it "H-R", some may call it "human resources", etc. Having it as a foreign key guarantees that it can be only one thing. Also, if other information is ever to be added about departments, it would be easy if it is in a table of its own.
So what do people think about it? Naturally more tables and referencing is also likely to have a negative impact on performance. My question specifically is asked with Oracle 11g in mind, but I doubt that the type of rdms involved has much bearing on this design consideration.
If you use the related table, then you don't have the performance problem of updating 1,000,000 records because the Personnel Department became the Human Resources department.
You have another option. Create the table and use it as a lookup for data entry. But store the information in the main table.
However, I prefer the option of using the related table for the departments and storing the ID for the department and the employee in a join table that has the ids and start and endates. Over time employees tend to move from one department to another. It is helpful for reporting to be able to tell what department they were in when. You need to consider how the data will be used over time and in reporting when designing this sort of thing. Short-sighted designs are hard to fix later.
Your concern about having too many tables is really unfounded. Databases are designed to have many tables and to use joins. If you index correctly, there will not be preformance implications for most databases. And you know what,I know of realtional database with many many tables that have terrabytes of data that perform just fine.
You only have to worry about the performance impact of this sort of thing if you're dealing with truly massive datasets. For any regular office environment system like this, prefer the normalized schema.

Complex Database Relations (Junction Tables)

My Question is about the idea of combining two junction tables into one, for similarly related tables. Please read to see what I mean. Also note that this is indeed a problem I am faced with and therefore relevant to this forum. It is just a topic of broad consequence for which I'm hoping to elicit a bit more participation from various professionals to get a better census of "best practice" if you will.
I have this rather challenging database design problem. I'm hoping this will be sort of a wiki that many people can contribute to and learn from. To make this easier, I've created a set of graphics, and will break the problem down into 1) Process, and 2) Structure.
Process Steps
A request (DocRequest) for documentation (Publication) is made.
A new publication is created IF said publication does not already exist.
A running log (StatusReport) is kept for progress on fulfilling the request.
Note: For any given Publication there may be many DocRequests and StatusReports (including updates)
Database Structure
Note: Both the DocRequest and StatusReport tables have numerous fields and supporting tables not shown in the attached graphics. Furthermore, a particular Publication is the master record to which all records in those tables belong.
--Current Implementation--
Note: The major flaw with this design is that whenever you create either a new DocRequest and StatusReport record, you have to also create a new record in the Publications table (which acts like a junction table), but this also creates a new Publication as a result. This is not the desired behavior.
--Typical Implementation-- (for this type of relationship)
Note: This is ok, and probably ideal, but handles updates to either the DocRequest and StatusReport tables, independently linking them to the Publication to which they belong.
--My Preferred Implementation-- (for this special case)
Note: The idea I had here, was simply to combine the dual junction tables into one. In this case the junction table would get a new record anytime either the DocRequest or StatusReport had a insert occur. I will likely handle this with a trigger.
Discussion
Now for the discussion. I would like to know from my fellow Database Developers if you think this is a bad idea, and what issues might arise from this. I think the net number of records should be identical as with the two separate junction tables, and in fact uses slightly less space by saving an extra ID column. :)
Let me know what you guys think. I would really like to get many people involved in this discussion. Cheers! :)
I think you're hurting yourself by thinking in terms of junction tables. Just think of tables.
Since StatusReport has to do with the status of the document request,
you need a table that relates those two somehow.
"StatusReport" is an awful name for a table that stores facts about the status of a document request.
"ID" is an awful name for any column in any table.
The id number of the publication seems to have more to do with the document request than with the status of the request. (You said, "A new publication is created IF said publication does not already exist." Frankly, that's skating pretty close to the edge of not making sense.) So the publication number almost certainly belongs in the DocRequest table.
Referring to the diagram of your preferred implementation, I'd drop the table TripleJunction, and replace StatusReport with this.
-- Predicate: Document request number (doc_request_id) has status (status)
-- as of date and time (status_as_of).
create table document_request_status (
doc_request_id integer not null references DocRequest (id),
status_as_of timestamp not null default current_timestamp,
status varchar(10) not null,
-- other columns go here
primary key (doc_request_id, status_as_of)
);

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Separating user table from people table in a relational database

I've done many web apps where the first thing you do is make a user table with usernames, passwords, names, e-mails and all of the other usual flotsam. My current project presents a situation where non-users records need to function similarly to users, but do not need to the ability to be a first order user.
Is it reasonable to create a second table, people_tb, that is the main relational table and data store, and only use the users_tb for authentication? Does separating user_tb from people_tb present any problems? If this is commonly done, what are some strategies and solutions as well as drawbacks?
This is certainly a good idea, as you are normalizing the database. I have done a similar design in an app that I am writing, where I have an employee table and a user table. Users may a from an external company or an employee, so I have separate tables because an employee is always a user, but a user may not be an employee.
The issues that you'll run into is that whenever you use the user table, you'll nearly always want the person table to get the name or other common attributes you would want to show up.
From a coding standpoint, if you're using straight SQL, it will take a little more effort to mentally parse the select statement. It may be a little more complicated if you're using an ORM library. I don't have enough experience with those.
In my application, I'm writing it in Ruby on Rails, so I'm constantly doing things like employee.user.name, where if I kept them together, it would be just employee.name or user.name.
From a performance standpoint, you are hitting two tables instead of one, but given proper indexes, it should be negligible. If you had an index that contained the primary key and the person name, for instance, the database would hit the user table, then the index for the person table (with a nearly direct hit), so the performance would be nearly the same as having one table.
You could also create a view in the database to keep both tables joined together to give you additional performance enhancements. I know in the later versions of Oracle you can even put an index on a view if needed to increase performance.
I routinely do that because for me the concept of "user" (username, password, create date, last login date) is different from "person" (name, address, phone, email). One of the drawbacks that you may find is that your queries will often require more joins to get the info you're looking for. If all you have is a login name, you'll need to join the "people" table to get the first and last name for example. If you base everything around the user id primary key, this is mitigated a bit, but still pops up.
If user_tb has auth info, I would very much keep it separate from people_tb. I would however keep a relationship between the two, and most of users' info would be stored in people_tb except all of the info needed for auth (which i guess will not be used for much else) Its a nice tradeoff between design and efficiency i think.
That is definitely what we do as we have millions of people records and only thousands of users. We also separate address, phones and emails into relational tables as many people have more than one of each of these things. Critial is to not rely on name as the identifier as name is not unique. Make sure the tables are joined through some type of surrogate key (an integer or a GUID is preferable) not name.
I always try to avoid as much data repetition as possible. If not all people need to login, you can have a generic people table with the information that applies to both people and users (eg. firstname, lastname, etc).
Then for people that login, you can have a users table that has a 1~1 relationship with people. This table can store the username and password.
I'd say go for the normalized design (two tables) and only denormalize (go down to one user/person table) if it will really make your life easier down the line. If however practically all people are also users it may be simpler to denormalize up front. Its up to you; I have used the normalized approach without problems.
Very reasonable.
As an example, take a look at the aspnet_* services tables here.
Their built in schema has a aspnet_Users and aspnet_Membership with the later table having more extended information about a given user (hashed passwords, etc) but the aspnet_User.UserID is used in the other portions of the schema for referential integrity etc.
Bottom line, it's very common, and good design, to have attributes in a separate table if they are different entities, as in your case.

Resources