redundancy in the database vs efficiency in the code - database

I would like to explain my question with an example. Let's say there are many courses that students can register. Each course can have many discussion walls. Each discussion wall contains questions. Each question may have replies. And badges can be assigned to replies (or comments).
In my case, I need to know that which reply belongs to which course (when listing replies), and the same for the badges. I am able to do this with Entity Framework but the queries are becoming very complex and causing performance problems.
In this scenario, is it better to have a CourseId column in Replies (or BadgeAssignments) table? Or not? This would make my life a lot easier in some aspects, but not sure in long term. What do you think? Is it sometimes better to have some redundancy? I do no think I will need to update CourseId field later.

My pet peeve is sacrificing data integrity for performance. Obtaining a less-than-reliable answer faster is not a good solution. However, changes that improve performance that do not sacrifice data integrity are fine.
Redundancy may well sacrifice data integrity. It is certainly a critical point where anomalous data can start. The problem is that both "sets" of data must be rigidly synchronized which, depending on the design, may be easy or difficult to do. Either way, it takes system resources to maintain the synchronization so you are adding another hit on performance.
Fortunately, that performance hit will be added to the DML operations as that is where the synchronization will be performed. In general, shifting performance time from queries to DML (which are usually less sensitive to response time) can be a good solution.
The devil is in the details, however, and you provide no details. Can the performance be improved sufficiently without redundancy? What is the level of difficulty in maintaining synch between the redundant data? Another way of asking that last question would be: how likely is it for anomalous (unsynched) data to creep into the system? How much of a problem will unsynched data be and how difficult will it be to fix it?
There is not nearly enough information provided to answer these questions. But keep them in mind as you investigate solutions.

Each component of your system should be utilized as it was designed to make it the "best". Things work better when they work according to their design. This, strictly, is my answer to your question.
The Relational Database
The purpose of a relational database is first to govern the integrity of your information and second to provide a storage and retrieval system. The RDMS governs your truth which then determines the way it should be stored and retrieved.
Since it is difficult, but not impossible, for us to imagine the uniqueness of digital discussion walls and of questions and replies we will typical use surrogate keys (i.e. auto generated numbers) for those entities' primary keys. This means the decision to add the Course ID to Questions, Replies, or BadgeAssignments will violate the principals relational design. You may say "no biggie" in this instance, but it is a violation nonetheless and will have consequences as long as it persists (pun intended).
If we used natural keys for Courses, Walls, Questions, Replies, and BadgeAssignments then our Primary Keys for each of those tables would be composites from those tables. We would then, for example, have the Primary Key of Courses within the Composite Primary Key of Replies without violating any principal of redundancy or normalization and your life would be "easier".
That said, what is so hard about this query?
SELECT
D.CourseId, D.CourseName
,A.ReplyId, A.ReplyName
FROM
Replies A
JOIN Questions B On A.QuestionId = B.QuestionId
JOIN Walls C ON B.WallId = C.WallId
JOIN Courses D ON C.CourseId = D.CourseId
Entity Framework
Entity Framework (EF) can be configured to match your design whether we put CourseId in Replies or whether we rely on the joins. But, we can usually do better than EF when it comes to SQL performance.
One option would be a to craft a SQL query (starting with the one above) that has the highest amount of optimization according to your need, and turn it into a View. Then, map a C# class to the View (instead of the tables) and the interactions are simplified. We would be letting EF exceed in providing low hassle data access and SQL succeed at retrieving data.
Here is the difference in C# Linq...
var replies = context.Replies
.Where(x => x.Questions.Walls.CourseId == 1)
.Select(x => new ReplyView
{
CourseId = x.Questions.Walls.Courses.CourseId,
CourseName = x.Questions.Walls.Courses.CourseName,
ReplyId = x.ReplyId,
ReplyName = x.ReplyName
}).ToList();
versus
var replies = context.RepliesView.Where(x => x.CourseId == 1).ToList();

Since you have tagged your question with entity-framework, I'll assume you are using SQL Server, in which case you may consider using indexed views to "cache" the JOINs without worrying this cache will ever go out of sync - the DBMS will maintain it for you at all times.
For example, you can cache the JOIN between courses, students, discussion walls, questions, replies and badges. So when you want to know which badge belongs to which course, you just retrieve a single row from the indexed view, instead of performing the physical JOIN.
Alternatively, consider redesigning your keys and using identifying relationships to migrate key fields down the foreign key hierarchy, so when querying a child table you can get the key of a non-direct parent without JOINing the tables "in between".
And last but not least, I warmly recommend reading Use the Index, Luke! for fundamental knowledge every developer should have about database performance...

I'll post an example here :
public class SchoolEntities : DbContext
{
public DbSet<Department> Departments { get; set; }
}
public class Department
{
// Primary key
public int DepartmentID { get; set; }
public string Name { get; set; }
// Navigation property
public virtual ICollection<Course> Courses { get; set; }
}
public class Course
{
// Primary key
public int CourseID { get; set; }
public string Title { get; set; }
public int Credits { get; set; }
// Foreign key
public int DepartmentID { get; set; }
// Navigation properties
public virtual Department Department { get; set; }
}
public partial class OnlineCourse : Course
{
public string URL { get; set; }
}
public partial class OnsiteCourse : Course
{
public string Location { get; set; }
public string Days { get; set; }
public System.DateTime Time { get; set; }
}
and that's a small example ... do have any of this information?

Related

Which tables should have a timestamp column? Concurrency check with Entity Framework, XAF, DDD

I am using Dev Express XAF WinForms to write an ERP system.
In practice I have found that my DBContext needs to have a DBSet for most of my business objects.
I am trying to figure out which tables should have a timestamp column for optimistic concurrency purposes.
For example I have
[NavigationItem("Sales")]
public class SalesOrder : BaseSalesHeader
{
public SalesOrder()
{
Lines = new List<SalesOrderLine>();
}
[Aggregated]
public virtual List<SalesOrderLine> Lines { get; set; }
}
[NavigationItem("Production")]
public class SalesOrderLine : BaseSalesProductTransactionLine
{
[Browsable(false)]
[System.ComponentModel.DataAnnotations.Required]
[RuleRequiredField(DefaultContexts.Save)]
[ForeignKey("SalesOrder_Id")]
public virtual SalesOrder SalesOrder { get; set; }
}
In my DBContext I have
public DbSet<SalesOrder> SalesOrders { get; set; }
public DbSet<SalesOrderLine> SalesOrderLines { get; set; }
In my OnModelCreating I have
modelBuilder.Entity<SalesOrder>().HasMany(p => p.Lines).WithRequired(t => t.SalesOrder).WillCascadeOnDelete(true);
Sales Order Lines are accessible from 2 Menus
As part of a Sales Order, and as a Sales Order Line Item under the Production navigation item.
I think I should have the timestamp field in the SalesOrders table. Should I also have it in the SalesOrderLine table ?
Here is the linked question at Dev Express Support
Whether or not you want to apply optimistic concurrency (OC) for an entity is something we can't decide for you. But there are some things to consider:
It's not necessarily true that only entities that are exposed as DbSet will need OC. After all, any mapped entity can be changed when it's reachable through navigation properties. SalesOrder exposes SalesOrderLines by its Line property, so by all means you can create some UI that only modifies SalesOrderLines while it only receives a SalesOrder (including its lines) as input.
In Entity Framework (and other ORMs), a parent isn't marked as modified when one of its children is modified. If you save a SalesOrder with modified SalesOrderLines, there will only be update statements for the lines.
So, yes, you probably want to protect SalesOrderLine by OC as well. But also consider this:
OC isn't for free. When you add a RowVersion* column to a table (and map it as rowversion appropriately), Entity Framework will read its value after each insert or update. I've experienced that this may considerably harm performance in processes that update relatively many records (EF doesn't shine there anyway). Also, when a concurrency conflict occurs, EF will read the current values of the conflicting record(s) from the database.
I've seen applications where the performance impact from OC is mitigated by marking a parent object (having OC) as modified when any of its children is modified. I think that's rather contrived, but it may be something to consider.
* TimeStamp is a deprecated data type
Any table that could be updated by more than 1 user at the same time should really have some sort of timestamp associated with it. Personally, I put a timestamp on every table just to be doubly sure.
You can mark this timestamp field with a [Timestamp] attribute and EF will know what to do with it automatically.

How can I have multiple instances of a column that itself holds a foreign key reference?

I am new to Visual Studio so to start learning it I first of all downloaded a sample available at https://code.msdn.microsoft.com/ADPNET-Entity-Framework-2d1160cb and started working around it. Since I have fairly good knowledge of VB6 and SQL it did not take much time for me to understand the whole pattern the sample is based on. Had Microsoft given a detail explanation or a walk through of the sample it would have been much easier to understand the basics. However, I somehow managed to work around it and have build a small desktop application in wpf using Entity Framework and MVVM. But a point has come where I have got completely stuck up finding no way out. The problem is as under:
I have two tables. 1 Advocate and 2 Party. Table Advocate would contain names of advocates and would have a primary key. Similarly Party would have names and their respective primary keys.
Then I have another two tables 1. Case and 2 CaseDetail. Table Case would simply hold three columns: 1. CaseId 2. CaseNo and 3. Year. Table CaseDetail would have CaseDetailId as a primary key the CaseId as a Foreign Key. Now what I need is that a particular case could have multiple advocates and multiple petitioners. So the table CaseDetail would have two columns to hold advocateId and PartyId as a Foreign Keys.
If you look at the sample referred above you would not find how to deal with such a case. When I follow the pattern of the sample I get host of design time and runtime errors.
Anyways, after number of trials I have somehow manged to set the EF right but I doubt if it would serve any purpose since I need to have multiple instances of Petitioners and Advocates.
Here is the link to my edmx:
https://www.dropbox.com/s/rkarzod1lezdnqs/EDMX.png?dl=0
From the image it can be seen that I have four different foreign keys fldPetitioner, fldRespondent, fldAdvocate and fldSrAdvocate for which I too have navigation property to back track them which have multiplicity 0 or 1. Therefore, in such a scenario would I be able to have multiple instances on these columns?
Therefore, please suggest what strategy should be adopted in a scenario described above while developing WPF application using Entity Framework and MVVM.
I think it sounds like you're trying to use one-to-one relationships where there should be one-to-manys. Take a closer look at the relationship between Department and Employee from your linked MS sample. It results in '1 Department to many Employees'. This puts the DepartmentId against an entry in the Employee table, not the other way round, which is what I think you have at the moment. The analogous element from your question would be '1 CaseDetail to many Advocates'.
public class CaseDetail
{
//CaseDetail ID number
public int CaseDetailId { get; set; }
//...
//Any other properties go here
//...
//Navigation properties
public ICollection<Advocate> Advocates { get; set; }
/* Other collections would be executed similarly:
*
* public ICollection<Party> Petitioners { get; set; }
* public ICollection<Party> Respondents { get; set; }
*/
}
public class Advocate
{
//Advocate ID number
public int AdvocateId { get; set; }
//...
//Any other properties go here
//...
//Navigation properties
public int CaseDetailId { get; set; }
public CaseDetail CaseDetail { get; set; }
}
public class AdvocateConfiguration : EntityTypeConfiguration<Advocate>
{
public AdvocateConfiguration()
{
HasRequired(a => a.CaseDetail)
.WithMany(cd => cd.Advocates);
}
}
In the MS sample, you must have noticed that while entering data we assign Department to an employee although department to employee relationship is 1 to many. In my case the issue is that I want assign advocates to casedetail although the relationship casedetail to advocate is 1 to many. Here simply imagine a simple billing application where table invoice can be a master table having details in table invoicedetails where we can have multiple products as a foreign key. Considering the scenario please tell me whether or not there can be a derived entity with a navigation property to back track the products?

Entity framework foregin key in another database

So i like MVC and EF6 but I keep coming across fundamental problems with the way it / I work.
I have an app ( a very simple one) in there one of my tables references a field in another database, how would EF handle this , it seems to get very complicated whereas in the past it would have been a simple ADO.NET call to a stored procedure or something ( I am aware I can use SP's with EF, but really, what's the point , may as well just use ADO.NET again), example model below:
[Table("Target")]
public partial class Target
{
public int ID { get; set; }
public int SomeForeignKeyInMyDbID { get; set; }
public Guid? FOREGINKEYINANOTHERDB { get; set; }
}
when I scaffold views based of this it automatically creates the drop down menus etc really well but it (obviously) cannot pickup the reference to the foreign key in another field, as I want to store the ID of the foreign key in the database but get the value of it for drop downs etc, I store the ID instead of the value for reporting reasons.
I thought that I would just be able to get a context to my other db, grab the values I need and bind them to the drop down list but the model structure is so tightly defined that I face hurdle after hurdle on this.
I read somewhere that my best option may be to use SP's for CRUD operations and then perform a LINQ to EF query fro the index view and do a join on foreginkeyfromanotherdb field.
Any help much appreciated.
Thanks

Cycles and/or Multiple Cascade Paths with Auto-Generated Linking Table

I have been using EF5 via Code First successfully so far to build out my database from my models. However, I recently ran into a (fairly) common issue of cycles/multiple cascade paths. I understand what the problem is and normally, I fix it by writing rules against my entities to disable CascadeOnDelete down one side of the branch. The difference with this scenario and my current one, is that I typically am creating the middle "join" table in a many-to-many relationship.
So, for example, I may have: Users => UserLeagues <= Leagues
And then I do this:
modelBuilder.Entity<UserLeagues>()
.HasRequired(u => u.League)
.WithMany()
.HasForeignKey(l => l.LeagueId)
.WillCascadeOnDelete(false);
Where I have created the UserLeague table (it requires some additional information so this makes sense). In my most recent case, I just needed to create a many-to-many relationship. So, I didn't bother to create this middle table. Instead, I let EF auto-generate it.
As a result, I am unsure of how to stop the cascade delete down the one side because I don't have access to the UserLeagues table directly like I do if I manually created that many-to-many table. Any advice? Here are my models...
public User {
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<League> Leagues { get; set; }
}
public League {
public int Id { get; set; }
public int Score { get; set; }
public virtual ICollection<User> Users { get; set; }
}
When you let EF auto-generate (many-to-many relationship and the support table) - you have no way of manually deleting the actual records in the join table, once the relationship is removed (since you don't have that table mapped to an entity).
Hence the cascade deletes need to be 'on' by default. That's 'by convention'.
You could remove that convention all together (for all many to many - and their fk-s involved)...
modelBuilder.Conventions.Remove<ManyToManyCascadeDeleteConvention>();
Another way to do that on a case by case basis - would be to change the migration scripts
(providing you're using migrations).
When migrations generate the pseudo code - it has something like
.ForeignKey("dbo.Leagues", t => t.League_Id, cascadeDelete: true)
Just remove the , cascadeDelete: true parameter.
But then you'll end up with phantom records (i.e. you'll need to resort to manual SQL or occasional cleanup to remove the junk records).

Is it possible to update/change a primary key with NHibernate?

I have a database that uses natural keys (i.e. business oriented keys). Unfortunately, because of this, the primary keys for these objects may change over time.
I am currently researching the use of NHibernate for an O/RM for this database. However, in my testing I have noticed that there is no apparent way to change the primary key of an object and save it to the database.
For example, say I have a 'Business' object with a 'BusinessCode' as it's primary key:
public class Business
{
public string BusinessCode { get; set; }
public string Name { get; set; }
...
}
If I do a Get, change the primary key, and try and save it back to the database using NHibernate, I either receive an exception or unexpected results (depending on if I use Save(), Update(), or SaveOrUpdateCopy())
Business b = session.Get<Business>("BusinessCode1");
b.BusinessCode = "BusinessCode22";
session.Update( b );
So is something like this possible?
I understand that many NHibernate folks recommend using primary keys that do not change (i.e. identities). But I have a couple DB's that use natural keys. Thanks.
I'm actually kicking myself that I even asked this question because I can easily mitigate this by doing a delete-insert type of operation. This was more of a proof-of-concept with an existing database. Thanks for your input!
Have you tried using a composite-id?
Or Assigned Identifiers?
Is adding an Auto-incremented ID field totally out of the question?

Resources