Relatively new to JPA, so I have one kind of architectural question.
Let's say I have tables EMPLOYEE and DEPARTMENT with many to one relationship (i.e. many employees work for one department):
EMPLOYEE
EMPLOYEE_ID
EMPLOYEE_NAME
DEPARTMENT_ID
DEPARTMENT
DEPARTMENT_ID
DEPARTMENT_NAME
So I can define proper entities for Employee and Department, there's no problem. However, in one view I would like to display list of departments with number of employees working for that department, something like this:
SELECT D.DEPARTMENT_NAME,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID) NUMBER_OF_EMPLOYEES
FROM DEPARTMENT D
I'm just not sure what is the right strategy to accomplish this using JPA...
I don't want to always fetch number of employees for Department entity, as there is only one view when it is needed.
It looks like Hibernate's #Formula would be one possible approach, but afaik it does not conform with JPA standard.
You can create any object in your QL using the "new" syntax - your class just needs a constructor that takes the values returned by your query.
For example, with a class like DepartmentEmployeeCount, with a constructor:
public DepartmentEmployeeCount(String departmentName, Integer employeeCount)
you could use QL something like:
SELECT NEW DepartmentEmployeeCount(D.DEPARTMENT_NAME, count(E.id)) from Department D left join D.employees E GROUP BY D.DEPARTMENT_NAME
Or if you were just selecting the count(*) you could simply cast the query result to a Number.
Alternatively, to do the same without the DepartmentEmployeeCount class, you could leave out the NEW, so:
SELECT D.DEPARTMENT_NAME, count(E.id)
This would return a List<Object[]> where each list item was an array of 2 elements, departmentName and count.
To answer your later question in the comments, to populate all fields of a Department plus a transient employeeCount field, one suggestion would be to do 2 queries. This would still be more efficient than your original query (a subselect for each employee count).
So one query to read the departments
SELECT D from Department D
giving you a List<Department>
Then a 2nd query returning a temporary array:
SELECT D.DEPARTMENT_ID, count(E.id) from Department D left join D.employees E GROUP BY D.DEPARTMENT_ID
giving you a List<Object[]> with DEPARTMENT_ID and count in it.
Then you use the 2nd list to update the transient count property on your first list.
(You could try selecting into a Map to make this lookup easier, but I think that's a Hibernate feature).
Option 1: I suggested this since you didn't like the constructor route MattR was suggesting. You mentioned the word "view" several times, and I know you were talking about the view to the user, but why not setup a view in your database that includes the computed columns and then create a read-only entity that maps to the computed columns?
Option 2: In response to your comment about not wanting to create a view. You could create a container object that holds the entity and the calculated column, then much like MattR suggests, you use a new in your select. Something like:
public class DepartmentInfo {
private Department department;
// this might have to be long or something
// just see what construct JPA tries to call
private int employeeCount;
public DepartmentInfo( Department d, int count ) {
department = d;
employeeCount = count;
}
// getters and setters here
}
Then your select becomes
SELECT new my.package.DepartmentInfo( D,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID))
FROM DEPARTMENT D
With that you can use the DepartmentInfo to get the properties you are interested in.
You could create a member in your entity as an additional column, and then reference it with an alias in your query. The column name in the #Column annotation must match the alias.
Say, for your original query, you can add a countEmployees member as following. Also add insertable=false and updatable=false so the entity manager wont try to include it in insert or update statements:
public class Department {
#Column(name="DEPARTMENT_ID")
Long departmentId;
#Column(name="DEPARTMENT_NAME")
String departmentName;
#Column(name="countEmployees", insertable=false, updatable=false)
Long countEmployees;
//accessors omitted
}
And your query:
SELECT D.DEPARTMENT_NAME,
(SELECT COUNT(*) FROM EMPLOYEE E WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID) AS countEmployees
FROM DEPARTMENT D
This also applies when working with Spring Data Jpa Repositories.
Related
we have a problem to query our database in a meant-to-be fashion:
Tables:
employees <1-n> employee_card_validity <n-1> card <1-n> stamptimes
id id id id
employee_id no card_id
card_id timestamp
valid_from
valid_to
Employee is mapped onto Card via the EmployeeCardValidity Pivot which has additional attributes.
We reuse cards which means that a card has multiple entries in the pivot table. Which card is right is determined by valid_from/valid_to. These attributes are constrained not to overlap. Like that there's always a unique relationship from employee to stamptimes where an Employee can have multiple cards and a card can belong to multiple Employees over time.
Where we fail is to define a custom relationship from Employee to Stamptimes which regards which Stamptimes belong to an Employee. That means when I fetch a Stamptime its timestamp is distinctly assigned to a Card because it's inside its valid_from and valid_to.
But I cannot define an appropriate relation that gives me all Stamptimes for a given Employee. The only thing I have so far is to define a static field in Employee and use that to limit the relationship to only fetch Stamptimes of the given time.
public static $date = '';
public function cardsX() {
return $this->belongsToMany('App\Models\Tempos\Card', 'employee_card_validity',
'employee_id', 'card_id')
->wherePivot('valid_from', '>', self::$date);
}
Then I would say in the Controller:
\App\Models\Tempos\Employee::$date = '2020-01-20 00:00:00';
$ags = DepartmentGroup::with(['departments.employees.cardsX.stamptimes'])
But I cannot do that dynamically depending on the actual query result as you could with sql:
SELECT ecv.card_id, employee_id, valid_from, valid_to, s.timestamp
FROM staff.employee_card_validity ecv
join staff.stamptimes s on s.card_id = ecv.card_id
and s.stamptimes between valid_from and coalesce(valid_to , 'infinity'::timestamp)
where employee_id = ?
So my question is: is that database desing unusual or is an ORM mapper just not capable of describing such relationships. Do I have to fall back to QueryBuilder/SQL in such cases?
Do you suit your database model towards ORM or the other way?
You can try:
DB::query()->selectRaw('*')->from('employee_card_validity')
->join('stamptimes', function($join) {
return $join->on('employee_card_validity.card_id', '=', 'stamptimes.card_id')
->whereRaw('stamptimes.timestamp between employee_card_validity.valid_from and employee_card_validity.valid_to');
})->where('employee_id', ?)->get();
If your Laravel is x > 5.5, you can initiate Model extends the Pivot class I believe, so:
EmployeeCardValidity::join('stamptimes', function($join) {
return $join->on('employee_card_validity.card_id', '=', 'stamptimes.card_id')
->whereRaw('stamptimes.timestamp between employee_card_validity.valid_from and employee_card_validity.valid_to');
})->where('employee_id', ?)->get();
But code above is only translating your sql query, I believe I can write better if I know exactly your use cases.
If anyone could even just help me phrase this question better I'd appreciate it.
I have a SQL Server table, let's call it cars, which contains entries representing items and information about their owners including car_id, owner_accountNumber, owner_numCars.
We're using a system that sorts 'importantness of owner' based on number of cars owned, and relies on the owner_numCars column to do so. I'd rather not adjust this, if reasonably possible.
Is there a way I can update owner_numCars per owner_accountNumber using a stored procedure? Maybe some other more efficient way I can accomplish every owner_numCars containing the count of entries per owner_accountNumber?
Right now the only way I can think to do this is to (from the c# application):
SELECT owner_accountNumber, COUNT(*)
FROM mytable
GROUP BY owner_accountNumber;
and then foreach row returned by that query
UPDATE mytable
SET owner_numCars = <count result>
WHERE owner_accountNumber = <accountNumber result>
But this seems wildly inefficient compared to having the server handle the logic and updates.
Edit - Thanks for all the help. I know this isn't really a well set up database, but it's what I have to work with. I appreciate everyone's input and advice.
This solution takes into account that you want to keep the owner_numCars column in the CARs table and that the column should always be accurate in real time.
I'm defining table CARS as a table with attributes about cars including it's current owner. The number of cars owned by the current owner is de-normalized into this table. Say I, LAS, own three cars, then there are three entries in table CARS, as such:
car_id owner_accountNumber owner_numCars
1 LAS1 3
2 LAS1 3
3 LAS1 3
For owner_numCars to be used as an importance factor in a live interface, you'd need to update owner_numCars for every car every time LAS1 sells or buys a car or is removed from or added to a row.
Note you need to update CARS for both the old and new owners. If Sam buys car1, both Sam's and LAS' totals need to be updated.
You can use this procedure to update the rows. This SP is very context sensitive. It needs to be called after rows have been deleted or inserted for the deleted or inserted owner. When an owner is updated, it needs to be called for both the old and new owners.
To update real time as accounts change owners:
create procedure update_car_count
#p_acct nvarchar(50) -- use your actual datatype here
AS
update CARS
set owner_numCars = (select count(*) from CARS where owner_accountNumber = #p_acct)
where owner_accountNumber = #p_acct;
GO
To update all account_owners:
create procedure update_car_count_all
AS
update C
set owner_numCars = (select count(*) from CARS where owner_acctNumber = C.owner_acctNumber)
from CARS C
GO
I think what you need is a View. If you don't know, a View is a virtual table that displays/calculates data from a real table that is continously updated as the table data updates. So if you want to see your table with owner_numCars added you could do:
SELECT a.*, b.owner_numCars
from mytable as a
inner join
(SELECT owner_accountNumber, COUNT(*) as owner_numCars
FROM mytable
GROUP BY owner_accountNumber) as b
on a.owner_accountNumber = b.owner_accountNumber
You'd want to remove the owner_numCars column from the real table since you don't need to actually store that data on each row. If you can't remove it you can replace a.* with an explicit list of all the fields except owner_numCars.
You don't want to run SQL to update this value. What if it doesn't run for a long time? What if someone loads a lot of data and then runs the score and finds a guy that has 100 cars counts as a zero b/c the update didn't run. Data should only live in 1 place, updating has it living in 2. You want a view that pulls this value from the tables as it is needed.
CREATE VIEW vOwnersInfo
AS
SELECT o.*,
ISNULL(c.Cnt,0) AS Cnt
FROM OWNERS o
LEFT JOIN
(SELECT OwnerId,
COUNT(1) AS Cnt
FROM Cars
GROUP BY OwnerId) AS c
ON o.OwnerId = c.OwnerId
There are a lot of ways of doing this. Here is one way using COUNT() OVER window function and an updatable Common Table Expression [CTE]. That you won't have to worry about relating data back, ids etc.
;WITH cteCarCounts AS (
SELECT
owner_accountNumber
,owner_numCars
,NewNumberOfCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
)
UPDATE cteCarCounts
SET owner_numCars = NewNumberOfCars
However, from a design perspective I would raise the question of whether this value (owner_numCars) should be on this table or on what I assume would be the owner table.
Rominus did make a good point of using a view if you want the data to always reflect the current value. You could also use also do it with a table valued function which could be more performant than a view. But if you are simply showing it then you could simply do something like this:
SELECT
owner_accountNumber
,owner_numCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
By adding a where clause to either the CTE or the SELECT statement you will effectively limit your dataset and the solution should remain fast. E.g.
WHERE owner_accountNumber = #owner_accountNumber
I have 3 (hypothetical) tables.
Photos (a list of photos)
Attributes (things describing the photos)
PhotosToAttributes (a table to link the first 2)
I want to retrieve the Names of all the Photos that have a list of attributes.
For example, all photos that have both dark lighting and are portraits (AttributeID 1 and 2). Or, for example, all photos that have dark lighting, are portraits and were taken at a wedding (AttributeID 1 and 2 and 5). Or any arbitrary number of attributes.
The scale of the database will be maybe 10,000 rows in Photos, 100 Rows in Attributes and 100,000 rows in PhotosToAttributes.
This question: SQL: Many-To-Many table AND query is very close. (I think.) I also read the linked answers about performance. That leads to something like the following. But, how do I get Name instead of PhotoID? And presumably my code (C#) will build this query and adjust the attribute list and count as necessary?
SELECT PhotoID
FROM PhotosToAttributes
WHERE AttributeID IN (1, 2, 5)
GROUP by PhotoID
HAVING COUNT(1) = 3
I'm a bit database illiterate (it's been 20 years since I took a database class); I'm not even sure this is a good way to structure the tables. I wanted to be able to add new attributes and photos at will without changing the data access code.
It is probably a reasonable way to structure the database. An alternate would be to keep all the attributes as a delimited list in a varchar field, but that would lead to performance issues as you search the field.
Your code is close, to take it to the final step you should just join the other two tables like this:
Select p.Name, p.PhotoID
From Photos As p
Join PhotosToAttributes As pta On p.PhotoID = pta.PhotoID
Join Attributes As a On pta.AttributeID = a.AttributeID
Where a.Name In ('Dark Light', 'Portrait', 'Wedding')
Group By p.Name, p.PhotoID
Having Count(*) = 3;
By joining the Attributes table like that it means you can search for attributes by their name, instead of their ID.
For first create view from your joins:
create view vw_PhotosWithAttributes
as
select
p.PhotoId,
a.AttributeID,
p.Name PhotoName,
a.Name AttributeName
from Photos p
inner join PhotosToAttributes pa on p.PhotoId = pa.PhotoId
inner join Attributes a on a.AttributeID = pa.AttributeID
You can easy ask for attribute, name, id but don't forget to properly index field.
Does anyone know a good approach using Entity Framework for the problem described below?
I am trying for our next release to come up with a performant way to show the placed orders for the logged on customer.
Of course paging is always a good technique to use when a lot of data is available I would like to see an answer without any paging techniques.
Here's the story: a customer places an order which gets an orderstatus = PENDING. Depending on some strategy we move that order up the chain in order to get it APPROVED.
Every change of status is logged so we can see a trace for statusses and maybe even an extra line of comment per status which can provide some extra valuable information to whoever sees this order in an interface.
So an Order is linked to a Customer. One order can have multiple orderstatusses stored in OrderStatusHistory.
In my testscenario I am using a customer which has 100+ Orders each with about 5 records in the OrderStatusHistory-table.
I would for now like to see all orders in one page not using paging where for each Order I show the last relevant Status and the extra comment (if there is any for this last status; both fields coming from OrderStatusHistory; the record with the highest Id for the given OrderId).
There are multiple scenarios I have tried, but I would like to see any potential other solutions or comments on the things I have already tried.
Trying to do Include() when getting Orders but this still results in multiple queries launched on the database. Each order triggers an extra query to the database to get all orderstatusses in the history table. So all statusses are queried here instead of just returning the last relevant one, plus 100 extra queries are launched for 100 orders. You can imagine the problem when there are 100000+ orders in the database.
Having 2 computed columns on the database: LastStatus, LastStatusInformation and a regular Linq-Query which gets those columns which are available through the Entity-model.
The problem with this approach is the fact that those computed columns are determined using a scalar function which can not be changed without removing the formula from the computed column, etc...
In the end I am very familiar with SQL and Stored procedures, but since the rest of the data-layer uses Entity Framework I would like to stick to it as long as possible, even though I have my doubts about performance.
Using the SQL approach I would write something like this:
WITH cte (RN, OrderId, [Status], Information)
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY Id DESC), OrderId, [Status], Information
FROM OrderStatus
)
SELECT o.Id, cte.[Status], cte.Information AS StatusInformation, o.* FROM [Order] o
INNER JOIN cte ON o.Id = cte.OrderId AND cte.RN = 1
WHERE CustomerId = #CustomerId
ORDER BY 1 DESC;
which returns all orders for the customer with the statusinformation provided by the Common Table Expression.
Does anyone know a good approach using Entity Framework?
Something like this should work as you want (make only 1 db call), but I didn't test it:
var result = from order in context.Orders
where order.CustomerId == customerId
let lastStatus = order.OrderStatusHistory.OrderBy(x => x.Id).Last()
select new
{
//you can return the whole order if you need
//Order = order,
//or only the information you actually need to display
Number = order.Number,
Status = lastStatus.Status,
ExtraComment = lastStatus.ExtraComment,
};
This assumes your Order class looks something like this:
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public string Number { get; set; }
...
public ICollection<OrderStatusHistory> OrderStatusHistory { get; set; }
}
If your Order class doesn't have something like an ICollection<OrderStatusHistory> OrderStatusHistory property then you need to do a join first. Let me know if that is the case and I will edit my answer to include the join.
I'm creating a database to track my students' participation in classes. This is what I've set up so far. I'm working in Access 2007.
Participant Master table - name, contact info, enrolled class, enrolled semester. Enrolled class (Class A, Class B, Class C) and enrolled semester (Semester 1, Semester 2) are defined in tables. Primary key is an autoincrement number but students all get a school ID number (ParticipantID).
Query1 pulls name & address for students enrolled in class A, semester 2
(SELECT name, address FROM ParticipantMaster WHERE EnrClass = "Class A" and EnrSem = "Semester 2"). The query works.
DailySessionLog is a table to represent each daily class. Includes fields for date, instructor name (check from list), discusssion topic (check from list).
Now I want to link DailySessionLog to Query1 -- letting me check off every day whether a student was there for None, Partial, Half, or Full session that day. I'm having trouble linking these and creating a subform. Any help?
I tried having a ParticipantID field in DailySessionLog which I linked to ParticipantID in Query1. It doesn't recognize if it's a one:one or :many relationship. If I go ahead and create a subform using the Access wizard it treats the Participant data as the "higher" form and the DailySessionLog data as the "sub" form. I want it to be the other way around.
Thanks for helping!
To create a one-to-one or one-to-many relationship, you should link DailySessionLog to ParticipantMaster rather than to Query1. You would then create a query to show the daily session logs of a given class for a given semester. Example:
SELECT {field list} FROM ParticipantMaster INNER JOIN DailySessionLog ON {join expression} WHERE ParticipantMaster.EnrClass = "Class A" AND ParticipantMaster.EnrSem = "Semester 2"
However, it would be better to use variable parameters rather than hard-coded strings. Example:
SELECT {field list} FROM ParticipantMaster INNER JOIN DailySessionLog ON {join expression} WHERE ParticipantMaster.EnrClass = [ClassName] AND ParticipantMaster.EnrSem = [SemesterName]
Or, to use a value from a control on an open form:
SELECT {field list} FROM ParticipantMaster INNER JOIN DailySessionLog ON {join expression} WHERE ParticipantMaster.EnrClass = [Forms]![FormName]![ClassControlName] AND ParticipantMaster.EnrSem = [Forms]![FormName]![SemesterControlName]
EDIT
Actually, you want to use this AND xQbert's idea, so, with table names like this for brevity:
Participants (a.k.a. ParticipantMaster)
Sessions (a.k.a DailySessionLog)
ParticipantSession (a.k.a. Participant_daily_session_log)
the first query would be more like this:
SELECT {field list}
FROM
Participants
INNER JOIN ParticipantSession ON Participant.ID = ParticipantSession.ParticipantID
INNER JOIN Sessions ON ParticipantSession.SessionID = Session.ID
Where do you intend the database to "Store" the participation?
I think the problem is you need another table: a Particpiant_Daily_sessioN_log which would store the results of your daily log for each student participation.
Think about the table dailysessionlog you don't want instructor name, topic and date listed for EACH student do you?
So what you have is a many students may attend class and a class may have many students. This means you have a many to many which needs to be resolved before access can figure out what you want to do.
Think of the following tables:
Participant (ParticipantID)
Class (ClassID)
Session (SessionID, ClassID)
ClassParticipants (ClassId, ParticipantID, Semester, year
SessionParticipants (SessionID, ClassID, ParticipantID)