Taking the Top X results of a complex NHibernate query (QueryOver syntax) - sql-server

I'm working with an Nhibernate Query where I have to do some complex queryover join aliases to eagerly load the children of my root entity. When loading, I want to filter the root entity results returned by a number of properties, including some which are on the children.
I've got this all working fine using joinaliases, but where I'm stumped is filtering the results returned down to the top "X" instances of the root entity when ordered by a property other than the root entities Id. Since I'm grabbing children, there are a number of duplicate rows returned by the SQL. If I try to filter the number of results with a .Take, the take executes before NHibernate collapses the result set down to the distinct root entities. For reference here's my domain model.
public class Project{
public int Id {get;set;}
public double Value {get;set;}
public IList<ProjectRole> Team {get;set;}
}
public class ProjectRole{
public User User {get;set;}
public Role Role {get;set;}
}
public class User{
public string LoginName {get;set;}
}
So I'm trying to grab all the projects where a User with the given LoginName is on the Project's Team. Then I want to order by the Project's value. I want to do this as efficiently as possible, without select n+1's etc.
What does this community recommend?
Additional Information:
As a stopgap, I'm currently returning all the results and then taking the top X in memory, but I don't want that to be permanent, because the query can return close to 10,0000 items, and I only want to top 7 or so. If I was writing straight SQL I'd just do something like this.
SELECT *
FROM Projects as p1
INNER JOIN (
SELECT distinct TOP (7)
topProjects.PGISourceItem_id as topsId,
topProjects.Value as topsValue
FROM Projects topProjects
left outer join ProjectRoles roles on topProjects.Id=roles.Project_id
left outer join PGUsers users on roles.User_id=users.Id
WHERE
(users.LoginName like 'DEV\APPROVER' or this_0_1_.IsPrivate = 0)
ORDER BY topProjects.Value desc
) as p2 on p1.Id = p2.topsId
But I can't figure out how to do this with NHibernate. The only subqueries I can create are either WHERE EXISTS or WHERE IN. And since I'm doing an ORDER BY Value I can't use WHERE IN because my select returns multiple properties.

if users have under 1k projects this might work
var subquery = QueryOver.Of<User>()
.Where(...)
.JoinAlias(x => x.Projects, () => proj)
.Select(Projections.Distinct(Projections.Property(() => proj.Id)));
session.QueryOver<Foo>()
.WithSubquery.WhereProperty(x => x.Id).In(subquery)
.Fetch(p => p.Collection)
.OrderBy(x => x.Value)
.Take(5)
.List();

Related

How to improve poor performance of EF Core SQL query that sorts on a child collection

My issue is with the queries that EF Core generates for fetching ordered items from a child collection of a parent.
I have a parent class which has a collection of child objects. I'm using Entity Framework Core 5.0.5 (code first) against a SQL Server database. I've tried to boil down the scenario, so let's call it an Owner with a collection of Pets.
I often want a list of owners with their oldest pet, so I'll do something like
Context.Owners
.Select(owner =>
new {
Owner = owner,
OldPet = owner.Pets.OrderBy(pet => pet.Age).LastOrDefault()
})
.Where(owner.Id == 1);
This worked fine before (on ef6) and works functionally now. However, the issue I have is that now EF Core translates these sub collection queries into something apparently cleverer, something like
SELECT *
FROM [Owners] AS [c]
LEFT JOIN (
SELECT *
FROM (
SELECT [c0].[Id] ... , ROW_NUMBER() OVER(PARTITION BY [c0].[OwnerId] ORDER BY [c0].[Age] DESC) AS [row]
FROM [Pets] AS [c0]
) AS [t]
WHERE [t].[row] <= 1
) AS [t0] ON [c].[Id] = [t0].[OwnerId]
The problem I'm having is that it seems to perform terribly. Looking at the execution plan it's doing a clustered index seek on the pets table, then sorting them. The 'number of rows read' is massive and the 'sorting' takes tens or hundreds of milliseconds.
The way EF6 does the same functionality seemed way more performant in this sort of scenario.
Is there a way to change the behaviour so I can choose? Or a way to rewrite this type of query such that I don't have this problem? I've tried many variations of using GroupBy etc and still have the same result.
If you are doing FirstOrDefault in projection, EF Core has to create such join, which uses Window Function ROW_NUMBER. To get desired SQL it is better to rewrite your query to be more predictable for LINQ translator:
var query =
from owner in Context.Owners
from pet in owner.Pets
where owner.Id == 1
orderby pet.Age descending
select new
{
Owner = owner,
OldPet = pet
}
var result = query.FirstOrDefault();

MySQL query to Hibernate Query

Can anyone give me the equivalent Hibernate query for the MySQL query given below. I am not trying it from past few day but no success.
This is MySql query
SELECT * FROM product WHERE category_id IN (SELECT id FROM category WHERE parent_category_id IN (SELECT id FROM category WHERE parent_category_id=53));
The Hibernate query which i had written is.
public List<Product> findBy2ndLevel(String categoryName) {
Query query = null;
StringBuilder hql = new StringBuilder();
try {
hql.append("from Product product where product.category.id in");
hql.append("(select id from Category category where category.parentCategory.id in");
hql.append("(select id from Category category where category.parentCategory.id=:category_id))");
query = sessionFactory.getCurrentSession().createQuery(hql.toString());
query.setParameter("category_id",Integer.parseInt(categoryName));
} catch (HibernateException e) {
e.printStackTrace();
}
return query.list();
}
It is not working though. Give anyone correct or give me the equivalent HQ.
I'm assuming that the entity Product has 1 category and a variable for this (called category), and that a category has 1 parent category and a variable for this (called parent).
But, something like:
select p
from Product p join p.category c join c.parent pc
where pc.id = :categoryId
Furthermore, I don't really get your variable use, you link the id of category with categoryName, and include a variable orgaId which is never really used for anything.. Also, your code will produce a NullPointerException if the creation of the query fails.

Loading the last related record instantly for multiple parent records using Entity framework

Does anyone know a good approach using Entity Framework for the problem described below?
I am trying for our next release to come up with a performant way to show the placed orders for the logged on customer.
Of course paging is always a good technique to use when a lot of data is available I would like to see an answer without any paging techniques.
Here's the story: a customer places an order which gets an orderstatus = PENDING. Depending on some strategy we move that order up the chain in order to get it APPROVED.
Every change of status is logged so we can see a trace for statusses and maybe even an extra line of comment per status which can provide some extra valuable information to whoever sees this order in an interface.
So an Order is linked to a Customer. One order can have multiple orderstatusses stored in OrderStatusHistory.
In my testscenario I am using a customer which has 100+ Orders each with about 5 records in the OrderStatusHistory-table.
I would for now like to see all orders in one page not using paging where for each Order I show the last relevant Status and the extra comment (if there is any for this last status; both fields coming from OrderStatusHistory; the record with the highest Id for the given OrderId).
There are multiple scenarios I have tried, but I would like to see any potential other solutions or comments on the things I have already tried.
Trying to do Include() when getting Orders but this still results in multiple queries launched on the database. Each order triggers an extra query to the database to get all orderstatusses in the history table. So all statusses are queried here instead of just returning the last relevant one, plus 100 extra queries are launched for 100 orders. You can imagine the problem when there are 100000+ orders in the database.
Having 2 computed columns on the database: LastStatus, LastStatusInformation and a regular Linq-Query which gets those columns which are available through the Entity-model.
The problem with this approach is the fact that those computed columns are determined using a scalar function which can not be changed without removing the formula from the computed column, etc...
In the end I am very familiar with SQL and Stored procedures, but since the rest of the data-layer uses Entity Framework I would like to stick to it as long as possible, even though I have my doubts about performance.
Using the SQL approach I would write something like this:
WITH cte (RN, OrderId, [Status], Information)
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY Id DESC), OrderId, [Status], Information
FROM OrderStatus
)
SELECT o.Id, cte.[Status], cte.Information AS StatusInformation, o.* FROM [Order] o
INNER JOIN cte ON o.Id = cte.OrderId AND cte.RN = 1
WHERE CustomerId = #CustomerId
ORDER BY 1 DESC;
which returns all orders for the customer with the statusinformation provided by the Common Table Expression.
Does anyone know a good approach using Entity Framework?
Something like this should work as you want (make only 1 db call), but I didn't test it:
var result = from order in context.Orders
where order.CustomerId == customerId
let lastStatus = order.OrderStatusHistory.OrderBy(x => x.Id).Last()
select new
{
//you can return the whole order if you need
//Order = order,
//or only the information you actually need to display
Number = order.Number,
Status = lastStatus.Status,
ExtraComment = lastStatus.ExtraComment,
};
This assumes your Order class looks something like this:
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public string Number { get; set; }
...
public ICollection<OrderStatusHistory> OrderStatusHistory { get; set; }
}
If your Order class doesn't have something like an ICollection<OrderStatusHistory> OrderStatusHistory property then you need to do a join first. Let me know if that is the case and I will edit my answer to include the join.

JPA - calculated column as entity class property?

Relatively new to JPA, so I have one kind of architectural question.
Let's say I have tables EMPLOYEE and DEPARTMENT with many to one relationship (i.e. many employees work for one department):
EMPLOYEE
EMPLOYEE_ID
EMPLOYEE_NAME
DEPARTMENT_ID
DEPARTMENT
DEPARTMENT_ID
DEPARTMENT_NAME
So I can define proper entities for Employee and Department, there's no problem. However, in one view I would like to display list of departments with number of employees working for that department, something like this:
SELECT D.DEPARTMENT_NAME,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID) NUMBER_OF_EMPLOYEES
FROM DEPARTMENT D
I'm just not sure what is the right strategy to accomplish this using JPA...
I don't want to always fetch number of employees for Department entity, as there is only one view when it is needed.
It looks like Hibernate's #Formula would be one possible approach, but afaik it does not conform with JPA standard.
You can create any object in your QL using the "new" syntax - your class just needs a constructor that takes the values returned by your query.
For example, with a class like DepartmentEmployeeCount, with a constructor:
public DepartmentEmployeeCount(String departmentName, Integer employeeCount)
you could use QL something like:
SELECT NEW DepartmentEmployeeCount(D.DEPARTMENT_NAME, count(E.id)) from Department D left join D.employees E GROUP BY D.DEPARTMENT_NAME
Or if you were just selecting the count(*) you could simply cast the query result to a Number.
Alternatively, to do the same without the DepartmentEmployeeCount class, you could leave out the NEW, so:
SELECT D.DEPARTMENT_NAME, count(E.id)
This would return a List<Object[]> where each list item was an array of 2 elements, departmentName and count.
To answer your later question in the comments, to populate all fields of a Department plus a transient employeeCount field, one suggestion would be to do 2 queries. This would still be more efficient than your original query (a subselect for each employee count).
So one query to read the departments
SELECT D from Department D
giving you a List<Department>
Then a 2nd query returning a temporary array:
SELECT D.DEPARTMENT_ID, count(E.id) from Department D left join D.employees E GROUP BY D.DEPARTMENT_ID
giving you a List<Object[]> with DEPARTMENT_ID and count in it.
Then you use the 2nd list to update the transient count property on your first list.
(You could try selecting into a Map to make this lookup easier, but I think that's a Hibernate feature).
Option 1: I suggested this since you didn't like the constructor route MattR was suggesting. You mentioned the word "view" several times, and I know you were talking about the view to the user, but why not setup a view in your database that includes the computed columns and then create a read-only entity that maps to the computed columns?
Option 2: In response to your comment about not wanting to create a view. You could create a container object that holds the entity and the calculated column, then much like MattR suggests, you use a new in your select. Something like:
public class DepartmentInfo {
private Department department;
// this might have to be long or something
// just see what construct JPA tries to call
private int employeeCount;
public DepartmentInfo( Department d, int count ) {
department = d;
employeeCount = count;
}
// getters and setters here
}
Then your select becomes
SELECT new my.package.DepartmentInfo( D,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID))
FROM DEPARTMENT D
With that you can use the DepartmentInfo to get the properties you are interested in.
You could create a member in your entity as an additional column, and then reference it with an alias in your query. The column name in the #Column annotation must match the alias.
Say, for your original query, you can add a countEmployees member as following. Also add insertable=false and updatable=false so the entity manager wont try to include it in insert or update statements:
public class Department {
#Column(name="DEPARTMENT_ID")
Long departmentId;
#Column(name="DEPARTMENT_NAME")
String departmentName;
#Column(name="countEmployees", insertable=false, updatable=false)
Long countEmployees;
//accessors omitted
}
And your query:
SELECT D.DEPARTMENT_NAME,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID) AS countEmployees
FROM DEPARTMENT D
This also applies when working with Spring Data Jpa Repositories.

How to include parent objects list in child entity object

I have the following tables in my database
Table1: tblAddressType (Id, Name)
Table2: tblAddressDtls (Id, AddressTypeId, Address1)
I am left joining the above two tables to get list of all address types and corresponding address details as follows
SQL Query:
select t1.*, t2.*
from tblAddressType t1
left outer join tblAddressDtls t2 on t1.Id = t2.AddressTypeId and t2.Id = 1;
For the above tables, i have created POCO entity classes as follows:
[Table("tblAddressType ")]
public partial class AddressType
{
[Key]
[Column(Name="ID")]
public int ID { get; set; }
[Required]
[Column(Name = "Name")]
public virtual string Name {get; set;}
[Include]
[Association("AddressTypeAddress", "ID", "AddressTypeId")]
public virtual ICollection<Address> Addresses { get; set; }
}
[Table("tblAddress", SchemaName="dbo")]
public class Address
{
[Column(Name="ID")]
public int ID { get; set; }
[Column(Name = "AddressTypeId")]
public int? AddressTypeId{ get; set; }
[Column(Name = "Address1")]
public string Address1{ get; set; }
[Include]
[Association("AddressTypeAddress", "AddressTypeId", "ID", IsForeignKey = true)]
public virtual AddressType AddressType { get; set; }
}
and, to fetch the data as shown in the sql query above, i have written the following LINQ query in my service code and this query returns me the data as needed:
var qry = (from p in dbContext.AddressTypes
join pa in (from t in dbContext.Addresses
where t.ID == 1 select t)
on p.ID equals pa.AddressTypeId into ppa
from t in ppa.DefaultIfEmpty()
select t).AsQueryable();
Now, I want to write a domain service method named "GetAddressById(int addressId)" which should return me the matching Address object along with list of AddressType objects as i need to bind list of "AddressType" objects to the drop down box in Add/Edit address screen.
I wanted to include and fetch list of "AddressType" objects data at the time of
fetching Address object data itself to avoid round-trip to server
in my silverlight client app.
Kindly suggest me the best possible way to achieve this functionality?
NEW:
I assume that in the database, Addresses has a relation to AddressTypes and again that you are using EntityFramework.
GetAddressById(int addressId){
return dbContext.Address.SingleOrDefault(a => a.ID == addressId).Include("AddressTypes");
}
that row of code would now get a single address which has the id of addressId, if there are none it would return null or if more returned it would throw an exception, the Include would tell EF that you also want AddressTypes to be loaded when you get the address and would create an appropriate join to make this happen, all this would make into a single query to the database and get the result that you want.
OLD:
Let's say we want the AddressType and all its Addresses with just one call to the db (asuming you use EntityFramework), we would call a method like
GetAddressTypeIncludingAddresses(int id){
return _context.AddressType.Include("Addressess");
/*if you use ctp5 of ef code first you should even be able to do (at => at.Addresses) in the include */
}
and then when you have it just do addressType.Id and foreach(var address in addressType.Addresses){} and the like when you use it.
I hope I understood your question, if not try again and I'll edit my answer.
You could do this by creating a stored proc in your database that returns mutliple result sets. First the one which gets you your desired child and parent and second the one that gets you your list of parents. Then you can use the work-around described here:
http://blogs.msdn.com/b/swiss_dpe_team/archive/2008/02/04/linq-to-sql-returning-multiple-result-sets.aspx
Which allows you to get each part of the results.
As an aside, you don't need a left join for your query. Since your where clause references the table on the right you will never get null values on the right side of the join. Use an inner join instead.

Resources