SQL Server + Entity CF won't use table index without hint - sql-server

I have an Entity class which generates a fairly simple table: (obfuscated)
public class TableName
{
public int TableNameId { get; set; }
public int OtherTableId { get; set; }
public int UserId { get; set; }
public bool IsActive { get; set; }
// removed other columns for brevity
public virtual OtherTable OtherTable { get; set; }
public virtual User User { get; set; }
}
Entity automatically creates a primary key on TableNameId and foreign keys with indexes on OtherTableId and UserId.
The table will only ever have one record where IsActive is true for each combination of UserId and OtherTableId, but records are deactivated and replaced frequently.
Records are located with statements similar to this:
var record =
context.TableName
.FirstOrDefault(x => x.UserId == userId
&& x.OtherTableId == otherTableId
&& x.IsActive);
We found that with no other indexes, performance while pulling a record was poor since it has to scan through the UserId and OtherTableId indexes to find the records where IsActive = 1.
To address this, we added this index:
create nonclustered index ix_TableName_UserId_OtherTableId_IsActive
on TableName (UserId, OtherTableId, IsActive)
However, SQL does not use it. Instead, with this index in place it does a scan on the table's primary key!
Since Entity is pulling the entire row, I understand that the above index is not covering. However, in every other case we've had we're able to add an index similar to this one and it results in an index seek combined with a key lookup and gives satisfactory performance. In this case it will not use the index we're creating unless we use an index hint in the select statement. We'd like to avoid this since it requires custom SQL which kind of defeats the purpose of Code First Entity.
After much trial and error, the only index that SQL will use on its own is one that includes every column and is thus fully covering. However, I'd rather avoid this since it means essentially storing the table twice and 99% of the rows will be useless since IsActive = 0.
--- Edits below ---
For the code above, I found that Entity generates the following SQL. I don't know why it uses a subselect, but the SQL optimizer seems to filter it out (running the full SQL or the inner select results in the same plan).
SELECT TOP (1)
[Project1].[TableNameId] AS [TableNameId],
[Project1].[OtherTableId] AS [OtherTableId],
[Project1].[UserId] AS [UserId],
[Project1].[IsActive] AS [IsActive]
FROM ( SELECT
[Extent1].[TableNameId] AS [TableNameId],
[Extent1].[OtherTableId] AS [OtherTableId],
[Extent1].[UserId] AS [UserId],
[Extent1].[IsActive] AS [IsActive]
FROM [dbo].[TableName] AS [Extent1]
WHERE ([Extent1].[UserId] = <userId>)
AND ([Extent1].[OtherTableId] = <otherTableId>)
AND ([Extent1].[IsActive] = 1)
) AS [Project1]

Related

Inheritance Hierarchies in Entity Framework

I'm working on a database for a flying club that has a table for Flights and a table for ClubMembers. Flights, unfortunately, must be paid for by so there is a BillTo that references the ClubMember who must pay.
So far it looks like this...
public ClubMember
{
public int ID{get;set;}
public string FirstName{get;set;}
public string LastName {get;set;}
}
public Flight
{
public int ID {get;set;}
public ClubMember PilotPayingTheBill{get;set;}
public double EnormousPriceToBePaid {get;set;}
}
Pretty simple ... but then the messy world intervenes. Sometimes the plane is flown by a mechanic for maintenance purposes. I don't want to do this the lazy way and enter the mechanic as a dummy record in the ClubMember table. The database is too new for that kind of kludge. Plus, EF has nifty ability to implement inheritance in the database, so I can keep it all nice and tidy like this:
public BillableEntity
{
public int ID{get;set;}
}
ClubMember : BillableEntity
{
public string FirstName{get;set;}
public string LastName {get;set;}
}
NonPayingUser : BillableEntity
{
public string Description {get;set;}
}
public Flight
{
public int ID {get;set;}
public BillableEntity billTo{get;set;}
public double EnormousPriceToBePaid {get;set;}
}
With a few instructions in my fluent configuration the NonPayingUser and ClubMembers are all put in their own tables, with ID as a primary key and foreign key - a nice, concise design I was very happy with. The Billto_id column in the flights table is not null so every flight will always have a billableEntity, which will be either a clubMember or NonPayingUser.
Writing the query in TSQL is pretty easy
select coalesce(cm.FirstName + ' ' + cm.LastName,np.Description) as BillTo
from Flights f
left outer join ClubMembers cm on f.billto_id = cm.ID
left outer join NonPayingUsers np on f.billto_id = np.ID
But doing the same thing in EF has me stumped.
The Flight class has a BillTO property which is the parent class of BillableEntity. I can cast it to the descendant classes like this...
var flights = DB.Flights
.Select(f => new
{
PersonName = (f.BillTo as ClubMember).FirstName + PersonName = (f.BillTo as ClubMember).LastName
,
OtherName = (f.BillTo as NonPayingUser).Description
});
but this produces monstrous amounts of TSQL.
One solution is just to write my own stored procs to join these tables together and use the EF classes to do all the basic CRUD on the individual tables, and that's the direction I'm leaning in. But is there a better way?
This is a duplicate of several similar questions...
EF (6, I'm looking at 7 RC1 now) always generates a huge amount of TSQL.
I think that the best answer is what is the problem of "producing monstrous amounts of TSQL"?
If you have a particular issue with a query solve that issue otherwise ignore it. You will see that in a huge percentage of cases it does not causes problem the DBMS.
I ended up creating a calculated field in the BillableEntities table
alter table BillableEntities drop column name;
alter table BillableEntities add Name as (case when [Description] is null then LastName + ', ' + FirstName else [Description] end)
and then marked it as DataBaseGenerated in the BIllableEntities class.
[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
public string FullName { get; private set; }
This way the SQL produced by EF is much more manageable. Performance is good, data integrity is good too.

servicestack ormlite throws "The ORDER BY clause is invalid ..." sql exception when using orderby with References

I have models like:
class Request
{
public int RequestId {get;set;}
[Reference]
public List<Package> Packages {get;set;}
}
class Package
{
public int PackageId {get;set;}
public inst RequestId {get;set;}
}
if I run:
db.LoadSelect<Request>(q => q.OrderBy(x => x.RequestId));
OrmLite will generate sqls like:
SELECT "RequestId" FROM "Request" ORDER BY "RequestId" ASC
SELECT "PackageId", "RequestId" FROM "Package"
WHERE "RequestId" IN (SELECT "Request"."RequestId" FROM "Request" ORDER BY "RequestId" ASC)
which will raise the following sql error:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
and the reason is obviously the ORDER BY in the subquery of the second query.
So there are two points here:
Is this a bug in OrmLite Sql Provider?
How to write ormlite queries to load models with references and at the same time sort them?
Hmmm, not being able to use ORDER BY in Sub Selects seems to be an Sql Server specific limitation. But as it shouldn't affect the behavior, I've cleared the ORDER BY term used in Load References sub selects in this commit.
This change is available from v4.0.33+ that's now available on MyGet.

Loading the last related record instantly for multiple parent records using Entity framework

Does anyone know a good approach using Entity Framework for the problem described below?
I am trying for our next release to come up with a performant way to show the placed orders for the logged on customer.
Of course paging is always a good technique to use when a lot of data is available I would like to see an answer without any paging techniques.
Here's the story: a customer places an order which gets an orderstatus = PENDING. Depending on some strategy we move that order up the chain in order to get it APPROVED.
Every change of status is logged so we can see a trace for statusses and maybe even an extra line of comment per status which can provide some extra valuable information to whoever sees this order in an interface.
So an Order is linked to a Customer. One order can have multiple orderstatusses stored in OrderStatusHistory.
In my testscenario I am using a customer which has 100+ Orders each with about 5 records in the OrderStatusHistory-table.
I would for now like to see all orders in one page not using paging where for each Order I show the last relevant Status and the extra comment (if there is any for this last status; both fields coming from OrderStatusHistory; the record with the highest Id for the given OrderId).
There are multiple scenarios I have tried, but I would like to see any potential other solutions or comments on the things I have already tried.
Trying to do Include() when getting Orders but this still results in multiple queries launched on the database. Each order triggers an extra query to the database to get all orderstatusses in the history table. So all statusses are queried here instead of just returning the last relevant one, plus 100 extra queries are launched for 100 orders. You can imagine the problem when there are 100000+ orders in the database.
Having 2 computed columns on the database: LastStatus, LastStatusInformation and a regular Linq-Query which gets those columns which are available through the Entity-model.
The problem with this approach is the fact that those computed columns are determined using a scalar function which can not be changed without removing the formula from the computed column, etc...
In the end I am very familiar with SQL and Stored procedures, but since the rest of the data-layer uses Entity Framework I would like to stick to it as long as possible, even though I have my doubts about performance.
Using the SQL approach I would write something like this:
WITH cte (RN, OrderId, [Status], Information)
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY Id DESC), OrderId, [Status], Information
FROM OrderStatus
)
SELECT o.Id, cte.[Status], cte.Information AS StatusInformation, o.* FROM [Order] o
INNER JOIN cte ON o.Id = cte.OrderId AND cte.RN = 1
WHERE CustomerId = #CustomerId
ORDER BY 1 DESC;
which returns all orders for the customer with the statusinformation provided by the Common Table Expression.
Does anyone know a good approach using Entity Framework?
Something like this should work as you want (make only 1 db call), but I didn't test it:
var result = from order in context.Orders
where order.CustomerId == customerId
let lastStatus = order.OrderStatusHistory.OrderBy(x => x.Id).Last()
select new
{
//you can return the whole order if you need
//Order = order,
//or only the information you actually need to display
Number = order.Number,
Status = lastStatus.Status,
ExtraComment = lastStatus.ExtraComment,
};
This assumes your Order class looks something like this:
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public string Number { get; set; }
...
public ICollection<OrderStatusHistory> OrderStatusHistory { get; set; }
}
If your Order class doesn't have something like an ICollection<OrderStatusHistory> OrderStatusHistory property then you need to do a join first. Let me know if that is the case and I will edit my answer to include the join.

Taking the Top X results of a complex NHibernate query (QueryOver syntax)

I'm working with an Nhibernate Query where I have to do some complex queryover join aliases to eagerly load the children of my root entity. When loading, I want to filter the root entity results returned by a number of properties, including some which are on the children.
I've got this all working fine using joinaliases, but where I'm stumped is filtering the results returned down to the top "X" instances of the root entity when ordered by a property other than the root entities Id. Since I'm grabbing children, there are a number of duplicate rows returned by the SQL. If I try to filter the number of results with a .Take, the take executes before NHibernate collapses the result set down to the distinct root entities. For reference here's my domain model.
public class Project{
public int Id {get;set;}
public double Value {get;set;}
public IList<ProjectRole> Team {get;set;}
}
public class ProjectRole{
public User User {get;set;}
public Role Role {get;set;}
}
public class User{
public string LoginName {get;set;}
}
So I'm trying to grab all the projects where a User with the given LoginName is on the Project's Team. Then I want to order by the Project's value. I want to do this as efficiently as possible, without select n+1's etc.
What does this community recommend?
Additional Information:
As a stopgap, I'm currently returning all the results and then taking the top X in memory, but I don't want that to be permanent, because the query can return close to 10,0000 items, and I only want to top 7 or so. If I was writing straight SQL I'd just do something like this.
SELECT *
FROM Projects as p1
INNER JOIN (
SELECT distinct TOP (7)
topProjects.PGISourceItem_id as topsId,
topProjects.Value as topsValue
FROM Projects topProjects
left outer join ProjectRoles roles on topProjects.Id=roles.Project_id
left outer join PGUsers users on roles.User_id=users.Id
WHERE
(users.LoginName like 'DEV\APPROVER' or this_0_1_.IsPrivate = 0)
ORDER BY topProjects.Value desc
) as p2 on p1.Id = p2.topsId
But I can't figure out how to do this with NHibernate. The only subqueries I can create are either WHERE EXISTS or WHERE IN. And since I'm doing an ORDER BY Value I can't use WHERE IN because my select returns multiple properties.
if users have under 1k projects this might work
var subquery = QueryOver.Of<User>()
.Where(...)
.JoinAlias(x => x.Projects, () => proj)
.Select(Projections.Distinct(Projections.Property(() => proj.Id)));
session.QueryOver<Foo>()
.WithSubquery.WhereProperty(x => x.Id).In(subquery)
.Fetch(p => p.Collection)
.OrderBy(x => x.Value)
.Take(5)
.List();

How to include parent objects list in child entity object

I have the following tables in my database
Table1: tblAddressType (Id, Name)
Table2: tblAddressDtls (Id, AddressTypeId, Address1)
I am left joining the above two tables to get list of all address types and corresponding address details as follows
SQL Query:
select t1.*, t2.*
from tblAddressType t1
left outer join tblAddressDtls t2 on t1.Id = t2.AddressTypeId and t2.Id = 1;
For the above tables, i have created POCO entity classes as follows:
[Table("tblAddressType ")]
public partial class AddressType
{
[Key]
[Column(Name="ID")]
public int ID { get; set; }
[Required]
[Column(Name = "Name")]
public virtual string Name {get; set;}
[Include]
[Association("AddressTypeAddress", "ID", "AddressTypeId")]
public virtual ICollection<Address> Addresses { get; set; }
}
[Table("tblAddress", SchemaName="dbo")]
public class Address
{
[Column(Name="ID")]
public int ID { get; set; }
[Column(Name = "AddressTypeId")]
public int? AddressTypeId{ get; set; }
[Column(Name = "Address1")]
public string Address1{ get; set; }
[Include]
[Association("AddressTypeAddress", "AddressTypeId", "ID", IsForeignKey = true)]
public virtual AddressType AddressType { get; set; }
}
and, to fetch the data as shown in the sql query above, i have written the following LINQ query in my service code and this query returns me the data as needed:
var qry = (from p in dbContext.AddressTypes
join pa in (from t in dbContext.Addresses
where t.ID == 1 select t)
on p.ID equals pa.AddressTypeId into ppa
from t in ppa.DefaultIfEmpty()
select t).AsQueryable();
Now, I want to write a domain service method named "GetAddressById(int addressId)" which should return me the matching Address object along with list of AddressType objects as i need to bind list of "AddressType" objects to the drop down box in Add/Edit address screen.
I wanted to include and fetch list of "AddressType" objects data at the time of
fetching Address object data itself to avoid round-trip to server
in my silverlight client app.
Kindly suggest me the best possible way to achieve this functionality?
NEW:
I assume that in the database, Addresses has a relation to AddressTypes and again that you are using EntityFramework.
GetAddressById(int addressId){
return dbContext.Address.SingleOrDefault(a => a.ID == addressId).Include("AddressTypes");
}
that row of code would now get a single address which has the id of addressId, if there are none it would return null or if more returned it would throw an exception, the Include would tell EF that you also want AddressTypes to be loaded when you get the address and would create an appropriate join to make this happen, all this would make into a single query to the database and get the result that you want.
OLD:
Let's say we want the AddressType and all its Addresses with just one call to the db (asuming you use EntityFramework), we would call a method like
GetAddressTypeIncludingAddresses(int id){
return _context.AddressType.Include("Addressess");
/*if you use ctp5 of ef code first you should even be able to do (at => at.Addresses) in the include */
}
and then when you have it just do addressType.Id and foreach(var address in addressType.Addresses){} and the like when you use it.
I hope I understood your question, if not try again and I'll edit my answer.
You could do this by creating a stored proc in your database that returns mutliple result sets. First the one which gets you your desired child and parent and second the one that gets you your list of parents. Then you can use the work-around described here:
http://blogs.msdn.com/b/swiss_dpe_team/archive/2008/02/04/linq-to-sql-returning-multiple-result-sets.aspx
Which allows you to get each part of the results.
As an aside, you don't need a left join for your query. Since your where clause references the table on the right you will never get null values on the right side of the join. Use an inner join instead.

Resources