Entity Framework inheritance and many to many mapping - sql-server

This is tehnically two problems but I think I allready know the solution to the first but I'll post it here just to check.
I'm using Entity Framework 6, and am building the model database first way. In the database all significan interactions (inserting and modifying) are handled through stored procedures (mainly so I can do additional database side perimision checks).
First problem is that I'm trying to build a School's database. I have two tables Profesors and Students. I also have an aditional tabla called PersonalData that contains personal data info like names, surnames and such. Profesor and student tables are in 1:0.1 relationships with PersonalData, meaning for every single Profesor there has to be a PersonalData record. Same with the students. Not every record in PersonalData must be a student or a Profesor, some of them are parents.
At first I tried to implement this in EF as an TPT inheritance, but problems may arise if a Student becomes at some point a Professor. Due to the way EF handles inheritance the PersonalData collection could easily contain TWO entities with the same ID.
I have googled this a bit and have found out that using inheritance in these kind of cases is impossible, and that I'll probably need to go back to accessing Personal Data throgh relationships, but that will cause certain possible problem due to the fact that now I'll have to manually make sure a PersonalRecord is added for every Student/Professor. Unless anybody has any better idea?
Second problem is due to the fac that I'm using stored procedures for inserting and updating data. I also have fiew many to many relationships in the DB, and when I import the tables many to many related, I see no way to map that relationship (it's table) to a procedure.
So does anyone know how to map many to many relationships to stored procedures in the Model Designer?

As to Many to Many mapping to stored procedures, it seems that it's impossible for now. What you can is attach INSTEAD OF triggers to the JOIN table, or a view representing that table. The methid using a view causes problems however since EF treats views as readonly, so you'll need to manually edit the emdx and change the view's type from view to table. This will make EF treat view as a table. Also if you are using the view method, EF model designer won't turn the view representing cross table into an association automatically. You'll have to add the assoc manually, map it manually and than delete the view entity so it doesn't cause an error since two things would be mapping to the same object (association and the entity).

Related

Store multiple values in one database field in Access (hear me out)

So I've done extensive searching on this and I can't seem to find a good solution that actually applies to my situation.
I have a list of projects in a table, then a list of people. I want to assign multiple people to one project. Seems pretty common. Obviously, I can't make multiple columns on my projects table for each person, as the people will change fairly frequently.
I need to display this information very quickly in a continuous list of projects (the ultimate way would be a multiple-select combobox as a listbox is too tall, but they don't exist outside of the dreaded lookup fields)
I can think of two ways:
- Store multiple employee IDs delimited by commas in one field in my projects table (I know this goes against good database design). Would require some code to store and retrieve the data.
- Have a separate table for employees assigned to projects (ID, ProjectID, EmployeeID). One to many relationship between projects table and this new table. One to many relationship between employees table and this new table. If a project has 3 employees assigned, it would store 3 records in this table. It seems a bit odd joining both tables in this way, and would also require code to get it to store and retrieve into a control like the one mentioned above).
Does anyone know if there is a better way (including displaying in an easy control) or how you usually tackle this problem?
The usual way to tackle this problem would be with a Junction Table. This is what you describe where you have a separate table maybe called EmployeeProject which has an EmployeeProjectID(PK), EmployeeID(FK) and ProjectID(FK).
In this way you model a Many-to-Many relationship where each project can have many employees involved and each employee can be involved in many projects. It's not actually all that difficult to do the SQL etc. required to pull the information back together again for display.
I would definitely stay away from storing comma-delimited values as this becomes significantly more complicated when you want to display or manipulate the data.
There's a good guide here: http://en.tekstenuitleg.net/articles/software/create-a-many-to-many-relationship-in-access but if you google "many to many junction table" or similar, there are thousands of pages/articles about implementation.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

in manytomany association, benefit of creating list of associated items

When we have a Many to Many association between two tables, and we create entity java beans for these tables, what is the benefit to create a collection in each tables to reference the items associated to it.
for example, we have two tables A and B which are associated to each other with Many to Many way, and let AB the link table in database.
public class A {
...
List<B> ListBs; // what is the benefit to create this list, the same question in class B.
...
}
The benefits of it are clear, with JPA there's the choice to have the List loaded with the entity itself instead of having to execute a separate query.
This however can be a bit dangerous if you don't manage the LazyInitialization properly. Not establishing it has the result of the whole List being fetched from DB each time you load an Entity. Also setting it to be lazily initized can bring several problems if you try to recover an item from the List once the Session is closed.
In conclussion, you have to manage this kind of associations carefully. Think about what your application needs to be loaded and fit your model to what you really need. If using lazies take care about realoading the object if previous Session has expired.

Is this an alright way to design a DB schema for a task scheduling application?

I'm making a to-do list thingy in my spare time for learning etc. I'm using SQL Server Compact 3.5 along with Entity Framework for data management. It is a desktop application, meant to be used by a single person.
I have close to no knowledge with database stuff, and am focusing my energies more on the UI side of things.
I was going along merrily implementing CRUD of tasks, when I thought it would be nice to have some scheduling for the tasks. Begin task in future, repetitions daily/weekly/monthly/yearly/custom etc.
I went on to try to design my DB to accomodate this with my limited knowledge and poof, I end up with like 14 new tables. I then searched online and found posts pointing to sysschedules on MSDN. All accomplished in one table. I lowered my head in shame and tried a puny attempt to improve my design. I got it down to 10 tables while including some stuff I liked from the sysschedules table.
This is my (simplified) schema now(explanation below image):
A Task can have a SchedulingInfo associated with it.
I forced OO into this, so SchedulingInfo is an abstract type which has various 'subclasses'.
TimeOfDayToStart_Ticks represents the time to start... since I don't want to store it as a datetime.
The subclasses:
CustomSchedule: Used to allow a task to run some day, or a set of days, in the future.
IntervalSchedule: eg. Run everyday, or every 3 days, or every 4 hours, etc.
Monthly/Yearly-Schedule: Set of days to run every month/year
MonthlyRelativeSchedule: I stole this from the sysschedules thing. Holds a set of days that conform to things like every second(Frequency) Saturday(DayType), or the last weekday of the month, etc. (See previously mentioned link to see full explanation).
My code will retrieve a list of ScheduleInfo, sorted by NextRun. Dequeue a ScheduleInfo, instantiate a new Task with relevant details, re-calculate NextRun based on the subclass of ScheduleInfo, save the ScheduleInfo back to the DB.
I feel weird about the number of tables. Will this affect performance if there are like thousands of entries? Or is this just like yucky design, full of bad practices or some such? Should I just use the single-table approach?
Yes, I think your table flood will have a negative impact on performance. If YearlySchedule and the other stuff are derived entities from the base entity SchedulingInformation and you have separate tables for base and derived properties you are forced to use Table-Per-Type inheritance mapping which is known to be slow. (At least up to current version 4.1 of EF. It is announced that the generated SQL for queries with TPT mapping will be improved in the next release of EF.)
In my opinion your model is a typical case for Table-Per-Hierarchy mapping because I see four derived entity tables which only have a primary key column. So, these entities add nothing to the base class (except their navigation properties) and would only force unnecessary joins in queries.
I would throw these four classes away and also the fifth - IntervalSchedule - and add its single property Interval_Ticks to the SchedulingInformation table.
The four ...Specifiers tables could all refer then with their foreign keys to the SchedulingInformation table.
So, this would result in:
Five tables: SchedulingInformation and 4 x *Specifiers
One abstract base entity: SchedulingInformation
Five derived entities: *Schedule
Four entities: *Specifier
Each of the *Schedule entities (except IntervalSchedule) has a collection of the corresponding *Specifier entity (one-to-many relationship). And you map the five *Schedule entities to the same SchedulingInformation table via Table-Per-Hierarchy inheritance mapping.
That would be my primary plan to try and test.

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Resources