STUFF SQL Query in NHibernate, basically trying to replicate MySQLGroupConcat - sql-server

Quick info on what I am trying to accomplish. I have some history tables and am joining the data into a single row for each history item. The issue I am having is trying to replicate a STUFF(query,1,1,'') in NHibernate. I have all my relationships setup and working and the query with just the joins works fine, just cant figure out how to implement the STUFF with a query into it.
This is the entire query:
SELECT h.*, u.FirstName, u.LastName, eh.*,
STUFF((
SELECT CONCAT(c.Name, ' - ')
FROM SubHistory sh
LEFT JOIN Cust c ON c.CustID = sh.SubCustId
WHERE h.Id = sh.HistoryId
FOR XML PATH ('')), 1, 1, ''
) AS Subs
FROM History h
LEFT JOIN EmailHistory eh ON eh.HistoryId = h.Id
LEFT JOIN Usr u ON u.UsrID = h.UserId
The Result I need is (columns):
H.ID - H.ReportID - H.UserID - (Concatenated Subs) - u.FirstName - u.LastName - eh.Email
I can do this without the STUFF in Nhibernate like so:
IList<History> hist = session.QueryOver<History>(() => historyAlias)
.Left.JoinAlias(() => historyAlias.User, () => usrAlias)
.Left.JoinAlias(() => historyAlias.Email, () => emailAlias)
.List();
I followed the tutorial here http://blog.andrewawhitaker.com/blog/2014/08/15/queryover-series-part-7-using-sql-functions/ for creating the STUFF function, but I do not think it was created with the idea of using it the way I want to and I have been unable to successfully make it work.
So, my question can and how can I implement STUFF the way I want to? Or is there a better approach to getting the info I want?
Database Example for reference:
History Table
ID - ReportID - Name - UserID
SubHistory Table - (Can be many)
ID - HistoryID - SubInfo
UserTable
ID - FirstName - LastName
Cust Table
ID - CustInfo
EmailHistory Table - (Can only have one)
ID - HistoryID - Email

I imagine with enough tinkering, you can eventually get it to work with NHibernate calling STUFF, but it might take a very long time. There might be a faster / easier solution: Create a view called HistoryView that essentially does the query you have at the top. Then, create a new C# class also called HistoryView that has properties that correspond to your view's columns. These HistoryView objects will be readonly, but it will accomplish the goal you're after.

So I ended up actually using a stored procedure call for this. I essentially did what Aron suggested with a View but as a stored proc instead, and created a new object to house the stored procedure returned rows.
How to call a SP in NHibernate:
http://nhibernate.info/doc/nhibernate-reference/querysql.html#sp_query

Related

SQL Server - Dynamic Pivot using existing table

I have the following query
SELECT CONCAT(RIGHT('00' + CONVERT(VARCHAR(2),Procedures.SeriesNum),2),'-',RIGHT('0000' + CONVERT(VARCHAR(4),Procedures.ProcNum),4)) AS 'Procedure',
Procedures.Description,
Procedures.CurrentRev,
Procedures.DayToDayLevel,
Procedures.MaxLevel,
Users.Username,
UsersProcedures.RevTrained,
UsersProcedures.LevelTrained
FROM Procedures
CROSS JOIN Users
LEFT JOIN UsersProcedures ON UsersProcedures.Username = Users.Username AND Procedures.SeriesNum = UsersProcedures.SeriesNum AND Procedures.ProcNum = UsersProcedures.ProcNum
Which returns the following results:
However I would like to use a Pivot so that the info for each user (Specifically The Level they were trained at is shown as a value in a column (With the column title being that of the username).
I would like to ignore the CurrentRev, and instead have 1 row for each Revision for each procedure (if no users have been trained to the current revision - whether users have been trained to any previous revisions on that procedure or not, that should be there as an empty row), like below:
I presume I need to use a pivot, although i've never attempted it before, plus the examples i've seen on the web seem to use a static list for the pivoted columns, whereas I want to use all records in the Users table.
I should point out, when it comes to displaying the results in a datagrid, it'll be coloured for clarity.
How would I go about attempting this?
UPDATE: I've got as far as this query
SELECT *
FROM (SELECT CONCAT(RIGHT('00' + CONVERT(VARCHAR(2),Procedures.SeriesNum),2),'-',RIGHT('0000' + CONVERT(VARCHAR(4),Procedures.ProcNum),4)) AS 'Procedure',
Procedures.Description,
Procedures.CurrentRev,
UsersProcedures.RevTrained,
Users.Username,
UsersProcedures.LevelTrained FROM Procedures CROSS JOIN Users LEFT JOIN UsersProcedures ON UsersProcedures.Username = Users.Username AND Procedures.SeriesNum = UsersProcedures.SeriesNum AND Procedures.ProcNum = UsersProcedures.ProcNum) AS Procs
PIVOT
(
MAX(LevelTrained)
FOR Procs.Username
IN(User1,User2,User3)
) AS PivotTable
Which gives me this:
However i'd still like to group the results further similar to my 2nd screenshot. i.e. if at least 1 user has already been trained on a particular revision, then don't show another empty row for that revision as well.
Also if RevTrained is NULL, then show CurrentRev in its place.
OK, here's the finished solution, using a dynamic query to get all the users, and an IF statement to solve the above problems:
If at least 1 user has already been trained on a particular revision, then don't show another empty row for that revision as well.
Also if RevTrained is NULL, then show CurrentRev in its place.
DECLARE #users AS VARCHAR(MAX),#query AS VARCHAR(MAX)
SELECT #users = STUFF((SELECT ',' + CONCAT('[',Username,']') FROM Users FOR XML PATH(''),TYPE).value('.', 'VARCHAR(MAX)'),1,1,'')
SET #query = 'SELECT *
FROM (SELECT CONCAT(RIGHT(''00'' + CONVERT(VARCHAR(2),Procedures.SeriesNum),2),''-'',RIGHT(''0000'' + CONVERT(VARCHAR(4),Procedures.ProcNum),4)) AS ''Procedure Number'',
CASE WHEN UsersProcedures.RevTrained IS NULL THEN Procedures.CurrentRev ELSE UsersProcedures.RevTrained END AS ''Revision'',
Procedures.Description,
Procedures.DayToDayLevel AS ''Maxmimum Training Level'',
Procedures.DayToDayLevel AS ''Training level for Day to Day Usage'',
Users.Username,
UsersProcedures.LevelTrained FROM Procedures CROSS JOIN Users LEFT JOIN UsersProcedures ON UsersProcedures.Username = Users.Username AND Procedures.SeriesNum = UsersProcedures.SeriesNum AND Procedures.ProcNum = UsersProcedures.ProcNum) AS Procs
PIVOT(
MAX(LevelTrained)
FOR Procs.Username
IN('+#users+')) AS PivotTable'
execute(#query);

Updating column based on three tables

I know it's very unprofessional, but it's our business system so I can't change it.
I have three tables: t_posList, t_url, t_type. The table t_posList has a column named URL which is also stored in the table t_url (the ID of the table t_url is not saved in t_posList so I have to find it like posList.Url = t_url.Url).
The column t_posList.status of every data row should be updated to 'non-customer' (it will be a status id but lets keep it simple) if: the ID of t_url can NOT be found in t_type.url_id.
So the query has like two steps: first I have to get all of the data rows where t_posList.Url = t_url.Url. After this I have to check which ID's of the found t_url rows can NOT be found in t_type.url_id.
I really hope you know what I mean. Because our system is very unprofessional and my SQL knowledge is not that good I'm not able to make this query.
EDIT: I tried this:
UPDATE t_poslist SET status = (
SELECT 'non-customer'
FROM t_url, t_type
WHERE url in
(select url from t_url
LEFT JOIN t_type ON t_url.ID = t_type.url_id
WHERE t_type.url_id is null)
)
What about this?
UPDATE p
SET status = 'non-customer'
FROM t_poslist p
INNER JOIN t_url u ON u.url = p.url
WHERE NOT EXISTS
(
SELECT * FROM t_type t WHERE t.url_id = u.ID
)

SQL Server: transforming query for materialized path traversing in a view

I would like to transform this query in a view that I can call passing the UserId as a parameter to select all groups (including subtrees of those groups) related to that user.
In fact I have 2 tables:
"EVA_Roles" which contains the roles tree, defined through a materialized path in the RoleHid field as varchar(8000)
"EVA_UsersInRoles" which related users to roles through the field UserId
Problem here is that only some roles may be related to the user in the EVA_UsersInRoles table, but maybe those roles are parents to other roles in the tree hierarchy so I have to retrieve multiple subtrees for each user.
Finally I came up with this query which seems to work fine, but I would like to transform it in a View. The problem I'm facing of course is that the UserId parameter, which is the one I would use to filter the view results, is inside the subquery.
Any hint to refactor this into a view?
SELECT A.RoleId, E.EndDate FROM EVA_Roles A INNER JOIN EVA_Roles B ON
A.RoleHid LIKE B.RoleHid + '%' AND B.RoleHid IN (SELECT RoleHid FROM EVA_Roles C
LEFT JOIN EVA_UsersInRoles D ON C.RoleId = D.RoleId WHERE
(D.Userid = #0 OR C.RoleId = #1) AND C.ApplicationId = #2)
LEFT JOIN EVA_UsersInRoles E ON A.RoleId = E.RoleId AND E.UserId = #0 WHERE
A.ApplicationId = #2 ORDER BY A.RoleId
I left parameters where I should pass values to the view. I think it may be impossible to refactor in a view. It was just to exploit my micro-ORM (PetaPoco) in a more friendly way, otherwise I have to use the SQL in my code but it's ok, don't loose your mind on this.
About the tables definition:
EVA_Roles
RoleId INT - Primary Key
RoleHid VARCHAR(800) - Here I store the materialized path of the tree using nodes
ids... An example on this later.
RoleLevel INT - Security level of the role
RoleName INT - Name of the role (admin, guest, ...)
ApplicationID INT - Id of the application (in a multi app scenario)
EVA_UsersInRoles
RoleId - INT
UserId - INT
The materialized path in RoleHid follows this logic. Consider this data where RoleId 2 is child of RoleId 1:
RoleId 1
RoleHid "1."
RoleId 2
RoleHid "1.2."
With the query above I'm able to retrieve all subtrees tied to a specific user and application.

PostgreSQL: Can one define a session variable with the language and use it in views?

Here's a simplified example of schema:
Table l10n ( l10n_id SMALLINT, code VARCHAR(5) )
Table product ( product_id INT, ..language-neutral columns.. )
Table product_l10n ( product_id INT, l10n_id SMALLINT, ..language-specific columns.. )
Querying for products with localized data is done like this:
SELECT *
FROM product a
LEFT JOIN product_l10n b ON b.id = a.id
LEFT JOIN l10n c ON c.id = b.id
WHERE c.code = 'en-US';
To avoid that big fat query, I would like to use views.
The basic idea is to create a view based on the above query without the where clause.
Querying for products would then become:
SELECT * FROM product_view WHERE c.code = 'en-US';
Another idea would be to have a variable containing the language tag, defined for each DB connection/session.
The view would be based on the first query using the variable in the where clause.
The variable being set in the current DB session, querying for products would then be as simple as this:
SELECT * FROM product_view;
So my question is: Can one do that? How?
It's feasible using custom variables in postgresql.conf. See the doc Customized Options.
In postgresql.conf:
custom_variable_classes = 'myproject'
myproject.l10n_id = 'en-US'
At the beginning of the DB session (the param is set at session level by default):
SET myproject.l10n_id = 'en-US';
In views:
WHERE c.code = current_setting('myproject.l10n_id')
But... I don't like having to define a variable for the whole server. Is there a way to achieve the same but on a per database basis?
Thanks in advance,
Pascal
PS: I've postes another question regarding using l10n_id as SMALLINT or directly as the ISO code in a VARCHAR(5). See http:// stackoverflow.com /questions/1307087/how-to-store-language-tag-ids-in-databases-as-smallint-or-varchar (sorry, only 1 URL for new users :-)
Well, what exactly is the problem with making the variable for whole server? It doesn't influence any other connection/query, so it should be fine.
Another approach can be used by the fact that each connection can be found using backend pid, which can be obtained with
select pg_backend_pid();
So, you could create a table, with columns like:
backend_pid int4
variable_name text
variable_value text
with primary key on (backend_pid, variable_name) and provide set of functions that get the value and set the value, internally checking pg_backend_pid.
There is still a problem what happens if connection will close without "cleaning up" (removing all variables), and new connection will start - thus getting variables from previous connection, but this is usually not very likely.
I thought a bit about it, and wrote blog post with exact sql that will create table and functions that are required to make it work.

What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?

The "N+1 selects problem" is generally stated as a problem in Object-Relational mapping (ORM) discussions, and I understand that it has something to do with having to make a lot of database queries for something that seems simple in the object world.
Does anybody have a more detailed explanation of the problem?
Let's say you have a collection of Car objects (database rows), and each Car has a collection of Wheel objects (also rows). In other words, Car → Wheel is a 1-to-many relationship.
Now, let's say you need to iterate through all the cars, and for each one, print out a list of the wheels. The naive O/R implementation would do the following:
SELECT * FROM Cars;
And then for each Car:
SELECT * FROM Wheel WHERE CarId = ?
In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.
Alternatively, one could get all wheels and perform the lookups in memory:
SELECT * FROM Wheel;
This reduces the number of round-trips to the database from N+1 to 2.
Most ORM tools give you several ways to prevent N+1 selects.
Reference: Java Persistence with Hibernate, chapter 13.
What is the N+1 query problem
The N+1 query problem happens when the data access framework executed N additional SQL statements to fetch the same data that could have been retrieved when executing the primary SQL query.
The larger the value of N, the more queries will be executed, the larger the performance impact. And, unlike the slow query log that can help you find slow running queries, the N+1 issue won’t be spot because each individual additional query runs sufficiently fast to not trigger the slow query log.
The problem is executing a large number of additional queries that, overall, take sufficient time to slow down response time.
Let’s consider we have the following post and post_comments database tables which form a one-to-many table relationship:
We are going to create the following 4 post rows:
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 1', 1)
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 2', 2)
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 3', 3)
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 4', 4)
And, we will also create 4 post_comment child records:
INSERT INTO post_comment (post_id, review, id)
VALUES (1, 'Excellent book to understand Java Persistence', 1)
INSERT INTO post_comment (post_id, review, id)
VALUES (2, 'Must-read for Java developers', 2)
INSERT INTO post_comment (post_id, review, id)
VALUES (3, 'Five Stars', 3)
INSERT INTO post_comment (post_id, review, id)
VALUES (4, 'A great reference book', 4)
N+1 query problem with plain SQL
If you select the post_comments using this SQL query:
List<Tuple> comments = entityManager.createNativeQuery("""
SELECT
pc.id AS id,
pc.review AS review,
pc.post_id AS postId
FROM post_comment pc
""", Tuple.class)
.getResultList();
And, later, you decide to fetch the associated post title for each post_comment:
for (Tuple comment : comments) {
String review = (String) comment.get("review");
Long postId = ((Number) comment.get("postId")).longValue();
String postTitle = (String) entityManager.createNativeQuery("""
SELECT
p.title
FROM post p
WHERE p.id = :postId
""")
.setParameter("postId", postId)
.getSingleResult();
LOGGER.info(
"The Post '{}' got this review '{}'",
postTitle,
review
);
}
You are going to trigger the N+1 query issue because, instead of one SQL query, you executed 5 (1 + 4):
SELECT
pc.id AS id,
pc.review AS review,
pc.post_id AS postId
FROM post_comment pc
SELECT p.title FROM post p WHERE p.id = 1
-- The Post 'High-Performance Java Persistence - Part 1' got this review
-- 'Excellent book to understand Java Persistence'
SELECT p.title FROM post p WHERE p.id = 2
-- The Post 'High-Performance Java Persistence - Part 2' got this review
-- 'Must-read for Java developers'
SELECT p.title FROM post p WHERE p.id = 3
-- The Post 'High-Performance Java Persistence - Part 3' got this review
-- 'Five Stars'
SELECT p.title FROM post p WHERE p.id = 4
-- The Post 'High-Performance Java Persistence - Part 4' got this review
-- 'A great reference book'
Fixing the N+1 query issue is very easy. All you need to do is extract all the data you need in the original SQL query, like this:
List<Tuple> comments = entityManager.createNativeQuery("""
SELECT
pc.id AS id,
pc.review AS review,
p.title AS postTitle
FROM post_comment pc
JOIN post p ON pc.post_id = p.id
""", Tuple.class)
.getResultList();
for (Tuple comment : comments) {
String review = (String) comment.get("review");
String postTitle = (String) comment.get("postTitle");
LOGGER.info(
"The Post '{}' got this review '{}'",
postTitle,
review
);
}
This time, only one SQL query is executed to fetch all the data we are further interested in using.
N+1 query problem with JPA and Hibernate
When using JPA and Hibernate, there are several ways you can trigger the N+1 query issue, so it’s very important to know how you can avoid these situations.
For the next examples, consider we are mapping the post and post_comments tables to the following entities:
The JPA mappings look like this:
#Entity(name = "Post")
#Table(name = "post")
public class Post {
#Id
private Long id;
private String title;
//Getters and setters omitted for brevity
}
#Entity(name = "PostComment")
#Table(name = "post_comment")
public class PostComment {
#Id
private Long id;
#ManyToOne
private Post post;
private String review;
//Getters and setters omitted for brevity
}
FetchType.EAGER
Using FetchType.EAGER either implicitly or explicitly for your JPA associations is a bad idea because you are going to fetch way more data that you need. More, the FetchType.EAGER strategy is also prone to N+1 query issues.
Unfortunately, the #ManyToOne and #OneToOne associations use FetchType.EAGER by default, so if your mappings look like this:
#ManyToOne
private Post post;
You are using the FetchType.EAGER strategy, and, every time you forget to use JOIN FETCH when loading some PostComment entities with a JPQL or Criteria API query:
List<PostComment> comments = entityManager
.createQuery("""
select pc
from PostComment pc
""", PostComment.class)
.getResultList();
You are going to trigger the N+1 query issue:
SELECT
pc.id AS id1_1_,
pc.post_id AS post_id3_1_,
pc.review AS review2_1_
FROM
post_comment pc
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4
Notice the additional SELECT statements that are executed because the post association has to be fetched prior to returning the List of PostComment entities.
Unlike the default fetch plan, which you are using when calling the find method of the EntityManager, a JPQL or Criteria API query defines an explicit plan that Hibernate cannot change by injecting a JOIN FETCH automatically. So, you need to do it manually.
If you didn't need the post association at all, you are out of luck when using FetchType.EAGER because there is no way to avoid fetching it. That's why it's better to use FetchType.LAZY by default.
But, if you wanted to use post association, then you can use JOIN FETCH to avoid the N+1 query problem:
List<PostComment> comments = entityManager.createQuery("""
select pc
from PostComment pc
join fetch pc.post p
""", PostComment.class)
.getResultList();
for(PostComment comment : comments) {
LOGGER.info(
"The Post '{}' got this review '{}'",
comment.getPost().getTitle(),
comment.getReview()
);
}
This time, Hibernate will execute a single SQL statement:
SELECT
pc.id as id1_1_0_,
pc.post_id as post_id3_1_0_,
pc.review as review2_1_0_,
p.id as id1_0_1_,
p.title as title2_0_1_
FROM
post_comment pc
INNER JOIN
post p ON pc.post_id = p.id
-- The Post 'High-Performance Java Persistence - Part 1' got this review
-- 'Excellent book to understand Java Persistence'
-- The Post 'High-Performance Java Persistence - Part 2' got this review
-- 'Must-read for Java developers'
-- The Post 'High-Performance Java Persistence - Part 3' got this review
-- 'Five Stars'
-- The Post 'High-Performance Java Persistence - Part 4' got this review
-- 'A great reference book'
FetchType.LAZY
Even if you switch to using FetchType.LAZY explicitly for all associations, you can still bump into the N+1 issue.
This time, the post association is mapped like this:
#ManyToOne(fetch = FetchType.LAZY)
private Post post;
Now, when you fetch the PostComment entities:
List<PostComment> comments = entityManager
.createQuery("""
select pc
from PostComment pc
""", PostComment.class)
.getResultList();
Hibernate will execute a single SQL statement:
SELECT
pc.id AS id1_1_,
pc.post_id AS post_id3_1_,
pc.review AS review2_1_
FROM
post_comment pc
But, if afterward, you are going to reference the lazy-loaded post association:
for(PostComment comment : comments) {
LOGGER.info(
"The Post '{}' got this review '{}'",
comment.getPost().getTitle(),
comment.getReview()
);
}
You will get the N+1 query issue:
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1
-- The Post 'High-Performance Java Persistence - Part 1' got this review
-- 'Excellent book to understand Java Persistence'
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2
-- The Post 'High-Performance Java Persistence - Part 2' got this review
-- 'Must-read for Java developers'
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3
-- The Post 'High-Performance Java Persistence - Part 3' got this review
-- 'Five Stars'
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4
-- The Post 'High-Performance Java Persistence - Part 4' got this review
-- 'A great reference book'
Because the post association is fetched lazily, a secondary SQL statement will be executed when accessing the lazy association in order to build the log message.
Again, the fix consists in adding a JOIN FETCH clause to the JPQL query:
List<PostComment> comments = entityManager.createQuery("""
select pc
from PostComment pc
join fetch pc.post p
""", PostComment.class)
.getResultList();
for(PostComment comment : comments) {
LOGGER.info(
"The Post '{}' got this review '{}'",
comment.getPost().getTitle(),
comment.getReview()
);
}
And, just like in the FetchType.EAGER example, this JPQL query will generate a single SQL statement.
Even if you are using FetchType.LAZY and don't reference the child association of a bidirectional #OneToOne JPA relationship, you can still trigger the N+1 query issue.
How to automatically detect the N+1 query issue
If you want to automatically detect N+1 query issue in your data access layer, you can use the db-util open-source project.
First, you need to add the following Maven dependency:
<dependency>
<groupId>com.vladmihalcea</groupId>
<artifactId>db-util</artifactId>
<version>${db-util.version}</version>
</dependency>
Afterward, you just have to use SQLStatementCountValidator utility to assert the underlying SQL statements that get generated:
SQLStatementCountValidator.reset();
List<PostComment> comments = entityManager.createQuery("""
select pc
from PostComment pc
""", PostComment.class)
.getResultList();
SQLStatementCountValidator.assertSelectCount(1);
In case you are using FetchType.EAGER and run the above test case, you will get the following test case failure:
SELECT
pc.id as id1_1_,
pc.post_id as post_id3_1_,
pc.review as review2_1_
FROM
post_comment pc
SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 1
SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 2
-- SQLStatementCountMismatchException: Expected 1 statement(s) but recorded 3 instead!
SELECT
table1.*
, table2.*
INNER JOIN table2 ON table2.SomeFkId = table1.SomeId
That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.
SELECT table1.*
SELECT table2.* WHERE SomeFkId = #
The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.
Consider:
class House
{
int Id { get; set; }
string Address { get; set; }
Person[] Inhabitants { get; set; }
}
class Person
{
string Name { get; set; }
int HouseId { get; set; }
}
and tables with a similar structure. A single query for the address "22 Valley St" may return:
Id Address Name HouseId
1 22 Valley St Dave 1
1 22 Valley St John 1
1 22 Valley St Mike 1
The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.
A N+1 query for the same address used above would result in:
Id Address
1 22 Valley St
with a separate query like
SELECT * FROM Person WHERE HouseId = 1
and resulting in a separate data set like
Name HouseId
Dave 1
John 1
Mike 1
and the final result being the same as above with the single query.
The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.
Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.
***** Table: Supplier *****
+-----+-------------------+
| ID | NAME |
+-----+-------------------+
| 1 | Supplier Name 1 |
| 2 | Supplier Name 2 |
| 3 | Supplier Name 3 |
| 4 | Supplier Name 4 |
+-----+-------------------+
***** Table: Product *****
+-----+-----------+--------------------+-------+------------+
| ID | NAME | DESCRIPTION | PRICE | SUPPLIERID |
+-----+-----------+--------------------+-------+------------+
|1 | Product 1 | Name for Product 1 | 2.0 | 1 |
|2 | Product 2 | Name for Product 2 | 22.0 | 1 |
|3 | Product 3 | Name for Product 3 | 30.0 | 2 |
|4 | Product 4 | Name for Product 4 | 7.0 | 3 |
+-----+-----------+--------------------+-------+------------+
Factors:
Lazy mode for Supplier set to “true” (default)
Fetch mode used for querying on Product is Select
Fetch mode (default): Supplier information is accessed
Caching does not play a role for the first time the
Supplier is accessed
Fetch mode is Select Fetch (default)
// It takes Select fetch mode as a default
Query query = session.createQuery( "from Product p");
List list = query.list();
// Supplier is being accessed
displayProductsListWithSupplierName(results);
select ... various field names ... from PRODUCT
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
Result:
1 select statement for Product
N select statements for Supplier
This is N+1 select problem!
I can't comment directly on other answers, because I don't have enough reputation. But it's worth noting that the problem essentially only arises because, historically, a lot of dbms have been quite poor when it comes to handling joins (MySQL being a particularly noteworthy example). So n+1 has, often, been notably faster than a join. And then there are ways to improve on n+1 but still without needing a join, which is what the original problem relates to.
However, MySQL is now a lot better than it used to be when it comes to joins. When I first learned MySQL, I used joins a lot. Then I discovered how slow they are, and switched to n+1 in the code instead. But, recently, I've been moving back to joins, because MySQL is now a heck of a lot better at handling them than it was when I first started using it.
These days, a simple join on a properly indexed set of tables is rarely a problem, in performance terms. And if it does give a performance hit, then the use of index hints often solves them.
This is discussed here by one of the MySQL development team:
http://jorgenloland.blogspot.co.uk/2013/02/dbt-3-q3-6-x-performance-in-mysql-5610.html
So the summary is: If you've been avoiding joins in the past because of MySQL's abysmal performance with them, then try again on the latest versions. You'll probably be pleasantly surprised.
We moved away from the ORM in Django because of this problem. Basically, if you try and do
for p in person:
print p.car.colour
The ORM will happily return all people (typically as instances of a Person object), but then it will need to query the car table for each Person.
A simple and very effective approach to this is something I call "fanfolding", which avoids the nonsensical idea that query results from a relational database should map back to the original tables from which the query is composed.
Step 1: Wide select
select * from people_car_colour; # this is a view or sql function
This will return something like
p.id | p.name | p.telno | car.id | car.type | car.colour
-----+--------+---------+--------+----------+-----------
2 | jones | 2145 | 77 | ford | red
2 | jones | 2145 | 1012 | toyota | blue
16 | ashby | 124 | 99 | bmw | yellow
Step 2: Objectify
Suck the results into a generic object creator with an argument to split after the third item. This means that "jones" object won't be made more than once.
Step 3: Render
for p in people:
print p.car.colour # no more car queries
See this web page for an implementation of fanfolding for python.
The good short explanation of the problem can be found in the Phabricator documentation:
The N+1 query problem is a common performance antipattern. It looks like this:
$cats = load_cats();
foreach ($cats as $cat) {
$cats_hats => load_hats_for_cat($cat);
// ...
}
Assuming load_cats() has an implementation that boils down to:
SELECT * FROM cat WHERE ...
..and load_hats_for_cat($cat) has an implementation something like this:
SELECT * FROM hat WHERE catID = ...
..you will issue "N+1" queries when the code executes, where N is the number of cats:
SELECT * FROM cat WHERE ...
SELECT * FROM hat WHERE catID = 1
SELECT * FROM hat WHERE catID = 2
SELECT * FROM hat WHERE catID = 3
SELECT * FROM hat WHERE catID = 4
...
Solution:
It is much faster to issue 1 query which returns 100 results than to
issue 100 queries which each return 1 result.
Load all your data before iterating through it.
Suppose you have COMPANY and EMPLOYEE. COMPANY has many EMPLOYEES (i.e. EMPLOYEE has a field COMPANY_ID).
In some O/R configurations, when you have a mapped Company object and go to access its Employee objects, the O/R tool will do one select for every employee, wheras if you were just doing things in straight SQL, you could select * from employees where company_id = XX. Thus N (# of employees) plus 1 (company)
This is how the initial versions of EJB Entity Beans worked. I believe things like Hibernate have done away with this, but I'm not too sure. Most tools usually include info as to their strategy for mapping.
Here's a good description of the problem
Now that you understand the problem it can typically be avoided by doing a join fetch in your query. This basically forces the fetch of the lazy loaded object so the data is retrieved in one query instead of n+1 queries. Hope this helps.
Check Ayende post on the topic: Combating the Select N + 1 Problem In NHibernate.
Basically, when using an ORM like NHibernate or EntityFramework, if you have a one-to-many (master-detail) relationship, and want to list all the details per each master record, you have to make N + 1 query calls to the database, "N" being the number of master records: 1 query to get all the master records, and N queries, one per master record, to get all the details per master record.
More database query calls → more latency time → decreased application/database performance.
However, ORMs have options to avoid this problem, mainly using JOINs.
In my opinion the article written in Hibernate Pitfall: Why Relationships Should Be Lazy is exactly opposite of real N+1 issue is.
If you need correct explanation please refer Hibernate - Chapter 19: Improving Performance - Fetching Strategies
Select fetching (the default) is
extremely vulnerable to N+1 selects
problems, so we might want to enable
join fetching
The supplied link has a very simply example of the n + 1 problem. If you apply it to Hibernate it's basically talking about the same thing. When you query for an object, the entity is loaded but any associations (unless configured otherwise) will be lazy loaded. Hence one query for the root objects and another query to load the associations for each of these. 100 objects returned means one initial query and then 100 additional queries to get the association for each, n + 1.
http://pramatr.com/2009/02/05/sql-n-1-selects-explained/
N+1 select issue is a pain, and it makes sense to detect such cases in unit tests.
I have developed a small library for verifying the number of queries executed by a given test method or just an arbitrary block of code - JDBC Sniffer
Just add a special JUnit rule to your test class and place annotation with expected number of queries on your test methods:
#Rule
public final QueryCounter queryCounter = new QueryCounter();
#Expectation(atMost = 3)
#Test
public void testInvokingDatabase() {
// your JDBC or JPA code
}
N+1 problem in Hibernate & Spring Data JPA
N+1 problem is a performance issue in Object Relational Mapping that fires multiple select queries (N+1 to be exact, where N = number of records in table) in database for a single select query at application layer. Hibernate & Spring Data JPA provides multiple ways to catch and address this performance problem.
What is N+1 Problem?
To understand N+1 problem, lets consider with a scenario. Let’s say we have a collection of User objects mapped to DB_USER table in database, and each user has collection or Role mapped to DB_ROLE table using a joining table DB_USER_ROLE. At the ORM level a User has many to many relationship with Role.
Entity Model
#Entity
#Table(name = "DB_USER")
public class User {
#Id
#GeneratedValue(strategy=GenerationType.AUTO)
private Long id;
private String name;
#ManyToMany(fetch = FetchType.LAZY)
private Set<Role> roles;
//Getter and Setters
}
#Entity
#Table(name = "DB_ROLE")
public class Role {
#Id
#GeneratedValue(strategy= GenerationType.AUTO)
private Long id;
private String name;
//Getter and Setters
}
A user can have many roles. Roles are loaded Lazily. Now lets say we want to fetch all users from this table and print roles for each one. Very naive Object Relational implementation could be -
UserRepository with findAllBy method
public interface UserRepository extends CrudRepository<User, Long> {
List<User> findAllBy();
}
The equivalent SQL queries executed by ORM will be:
First Get All User (1)
Select * from DB_USER;
Then get roles for each user executed N times (where N is number of users)
Select * from DB_USER_ROLE where userid = <userid>;
So we need one select for User and N additional selects for fetching roles for each user, where N is total number of users. This is a classic N+1 problem in ORM.
How to identify it?
Hibernate provide tracing option that enables SQL logging in the console/logs. using logs you can easily see if hibernate is issuing N+1 queries for a given call.
If you see multiple entries for SQL for a given select query, then there are high chances that its due to N+1 problem.
N+1 Resolution
At SQL level, what ORM needs to achieve to avoid N+1 is to fire a query that joins the two tables and get the combined results in single query.
Fetch Join SQL that retrieves everything (user and roles) in Single Query
OR Plain SQL
select user0_.id, role2_.id, user0_.name, role2_.name, roles1_.user_id, roles1_.roles_id from db_user user0_ left outer join db_user_roles roles1_ on user0_.id=roles1_.user_id left outer join db_role role2_ on roles1_.roles_id=role2_.id
Hibernate & Spring Data JPA provide mechanism to solve the N+1 ORM issue.
1. Spring Data JPA Approach:
If we are using Spring Data JPA, then we have two options to achieve this - using EntityGraph or using select query with fetch join.
public interface UserRepository extends CrudRepository<User, Long> {
List<User> findAllBy();
#Query("SELECT p FROM User p LEFT JOIN FETCH p.roles")
List<User> findWithoutNPlusOne();
#EntityGraph(attributePaths = {"roles"})
List<User> findAll();
}
N+1 queries are issued at database level using left join fetch, we resolve the N+1 problem using attributePaths, Spring Data JPA avoids N+1 problem
2. Hibernate Approach:
If its pure Hibernate, then the following solutions will work.
Using HQL :
from User u *join fetch* u.roles roles roles
Using Criteria API:
Criteria criteria = session.createCriteria(User.class);
criteria.setFetchMode("roles", FetchMode.EAGER);
All these approaches work similar and they issue a similar database query with left join fetch
The issue as others have stated more elegantly is that you either have a Cartesian product of the OneToMany columns or you're doing N+1 Selects. Either possible gigantic resultset or chatty with the database, respectively.
I'm surprised this isn't mentioned but this how I have gotten around this issue... I make a semi-temporary ids table. I also do this when you have the IN () clause limitation.
This doesn't work for all cases (probably not even a majority) but it works particularly well if you have a lot of child objects such that the Cartesian product will get out of hand (ie lots of OneToMany columns the number of results will be a multiplication of the columns) and its more of a batch like job.
First you insert your parent object ids as batch into an ids table.
This batch_id is something we generate in our app and hold onto.
INSERT INTO temp_ids
(product_id, batch_id)
(SELECT p.product_id, ?
FROM product p ORDER BY p.product_id
LIMIT ? OFFSET ?);
Now for each OneToMany column you just do a SELECT on the ids table INNER JOINing the child table with a WHERE batch_id= (or vice versa). You just want to make sure you order by the id column as it will make merging result columns easier (otherwise you will need a HashMap/Table for the entire result set which may not be that bad).
Then you just periodically clean the ids table.
This also works particularly well if the user selects say 100 or so distinct items for some sort of bulk processing. Put the 100 distinct ids in the temporary table.
Now the number of queries you are doing is by the number of OneToMany columns.
Without going into tech stack implementation details, architecturally speaking there are at least two solutions to N + 1 Problem:
Have Only 1 - big query - with Joins. This makes a lot of information be transported from the database to the application layer, especially if there are multiple child records. The typical result of a database is a set of rows, not graph of objects (there are solutions to that with different DB systems)
Have Two(or more for more children needed to be joined) Queries - 1 for the parent and after you have them - query by IDs the children and map them. This will minimize data transfer between the DB and APP layers.
A generalisation of N+1
The N+1 problem is an ORM specific name of a problem where you move loops that could be reasonably executed on a server to the client. The generic problem isn't specific to ORMs, you can have it with any remote API. In this article, I've shown how JDBC roundtrips are very costly, if you're calling an API N times instead of only 1 time. The difference in the example is whether you're calling the Oracle PL/SQL procedure:
dbms_output.get_lines (call it once, receive N items)
dbms_output.get_line (call it N times, receive 1 item each time)
They're logically equivalent, but due to the latency between server and client, you're adding N latency waits to your loop, instead of waiting only once.
The ORM case
In fact, the ORM-y N+1 problem isn't even ORM specific either, you can achieve it by running your own queries manually as well, e.g. when you do something like this in PL/SQL:
-- This loop is executed once
for parent in (select * from parent) loop
-- This loop is executed N times
for child in (select * from child where parent_id = parent.id) loop
...
end loop;
end loop;
It would be much better to implement this using a join (in this case):
for rec in (
select *
from parent p
join child c on c.parent_id = p.id
)
loop
...
end loop;
Now, the loop is executed only once, and the logic of the loop has been moved from the client (PL/SQL) to the server (SQL), which can even optimise it differently, e.g. by running a hash join (O(N)) rather than a nested loop join (O(N log N) with index)
Auto-detecting N+1 problems
If you're using JDBC, you could use jOOQ as a JDBC proxy behind the scenes to auto-detect your N+1 problems. jOOQ's parser normalises your SQL queries and caches data about consecutive executions of parent and child queries. This even works if your queries aren't exactly the same, but semantically equivalent.
Take Matt Solnit example, imagine that you define an association between Car and Wheels as LAZY and you need some Wheels fields. This means that after the first select, hibernate is going to do "Select * from Wheels where car_id = :id" FOR EACH Car.
This makes the first select and more 1 select by each N car, that's why it's called n+1 problem.
To avoid this, make the association fetch as eager, so that hibernate loads data with a join.
But attention, if many times you don't access associated Wheels, it's better to keep it LAZY or change fetch type with Criteria.
N+1 SELECT problem is really hard to spot, especially in projects with large domain, to the moment when it starts degrading the performance. Even if the problem is fixed i.e. by adding eager loading, a further development may break the solution and/or introduce N+1 SELECT problem again in other places.
I've created open source library jplusone to address those problems in JPA based Spring Boot Java applications. The library provides two major features:
Generates reports correlating SQL statements with executions of JPA operations which triggered them and places in source code of your application which were involved in it
2020-10-22 18:41:43.236 DEBUG 14913 --- [ main] c.a.j.core.report.ReportGenerator :
ROOT
com.adgadev.jplusone.test.domain.bookshop.BookshopControllerTest.shouldGetBookDetailsLazily(BookshopControllerTest.java:65)
com.adgadev.jplusone.test.domain.bookshop.BookshopController.getSampleBookUsingLazyLoading(BookshopController.java:31)
com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading [PROXY]
SESSION BOUNDARY
OPERATION [IMPLICIT]
com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading(BookshopService.java:35)
com.adgadev.jplusone.test.domain.bookshop.Author.getName [PROXY]
com.adgadev.jplusone.test.domain.bookshop.Author [FETCHING ENTITY]
STATEMENT [READ]
select [...] from
author author0_
left outer join genre genre1_ on author0_.genre_id=genre1_.id
where
author0_.id=1
OPERATION [IMPLICIT]
com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading(BookshopService.java:36)
com.adgadev.jplusone.test.domain.bookshop.Author.countWrittenBooks(Author.java:53)
com.adgadev.jplusone.test.domain.bookshop.Author.books [FETCHING COLLECTION]
STATEMENT [READ]
select [...] from
book books0_
where
books0_.author_id=1
Provides API which allows to write tests checking how effectively your application is using JPA (i.e. assert amount of lazy loading operations )
#SpringBootTest
class LazyLoadingTest {
#Autowired
private JPlusOneAssertionContext assertionContext;
#Autowired
private SampleService sampleService;
#Test
public void shouldBusinessCheckOperationAgainstJPlusOneAssertionRule() {
JPlusOneAssertionRule rule = JPlusOneAssertionRule
.within().lastSession()
.shouldBe().noImplicitOperations().exceptAnyOf(exclusions -> exclusions
.loadingEntity(Author.class).times(atMost(2))
.loadingCollection(Author.class, "books")
);
// trigger business operation which you wish to be asserted against the rule,
// i.e. calling a service or sending request to your API controller
sampleService.executeBusinessOperation();
rule.check(assertionContext);
}
}

Resources