Using MAX() with GROUP BY - sql-server

I have a history table and I want to get the latest modification of one employee.
I have this example, the max always brings one record?
CREATE TABLE EmployeeHIST
(
Id INT PRIMARY KEY,
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
ModifiedDate DATETIME
)
INSERT INTO EmployeeHIST VALUES (1, 1, 'Jhon', 'Doo', '2013-01-24 23:45:12')
INSERT INTO EmployeeHIST VALUES (2, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
INSERT INTO EmployeeHIST VALUES (3, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
SELECT EmployeeId, MAX([ModifiedDate])
FROM EmployeeHIST
WHERE EmployeeId = 1
GROUP BY EmployeeId
Ok yes, you are right, but in case I need to get the Id column for EmployeeId = 1, in this case I will receive two values 2 and 3, so I need to apply a top one right?

The Max() brings one record for each of the combination of values defined in Group By.
In your sample data, yes always one record.
If you were Group By Id,EmployeeId you would get three records as there are three unique combinations of those values.
This also applies for other aggregation functions as Min(), Avg(), Count() etc
UPDATE
If you want to get the id of the record that has the max(date) then you have the following option (there may be better ones):
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT E.Id,E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
SQLFiddle 1
Now, in this case you have both ids 2 and 3 returned. I do not know what is the business requirement here, but i believe you would want only 3 to be returned, so the next is a solution:
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT MAX(E.Id),E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
GROUP BY E.EmployeeId,ModifiedDate
SQLFiddle 2

Related

SQL Server 2014: Pairing rows from 2 tables based on values coming from a third one

I have 2 tables that contains typed events over time.
The first table #T1 contains events that always comes before events in the second table #T2.
A third table #E contains records that defines for an event the values that comes in #T1 and #T2 respectively.
Sample data:
CREATE TABLE #T1
(
EventTimestamp DateTime,
VehicleId int,
EventId varchar(50),
EventValue varchar(50)
);
CREATE TABLE #T2
(
EventTimestamp DateTime,
VehicleId int,
EventId varchar(50),
EventValue varchar(50)
);
CREATE TABLE #E
(
EventId varchar(50),
FirstValue int,
LastValue varchar(50)
);
INSERT INTO #T1(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (GETDATE(), 1, 'TwigStatus', '12'),
(GETDATE(), 2, 'SafeProtectEvent', '5')
INSERT INTO #T2(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (DATEADD(second, 30, GETDATE()), 1, 'TwigStatus', '7'),
(DATEADD(second, 30, GETDATE()), 2, 'SafeProtectEvent', '6')
INSERT INTO #E(EventId, FirstValue, LastValue)
VALUES ('TwigStatus', '12', '7'),
('SafeProtectEvent', '5', '6')
DECLARE #EventId varchar(50) = 'TwigStatus';
DECLARE #FirstValue varchar(50) = '12';
DECLARE #LastValue varchar(50) = '7';
WITH ord AS
(
SELECT
first, last,
EventNr = ROW_NUMBER() OVER (ORDER BY first)
FROM
(SELECT
first = t1.EventTimestamp, last = t2.EventTimestamp,
rn = ROW_NUMBER() OVER (PARTITION BY t1.VehicleId ORDER BY t2.EventTimestamp)
FROM
#T1 t1
INNER JOIN
#T2 t2 ON t2.EventTimestamp > t1.EventTimestamp
AND t2.EventValue = #LastValue
WHERE
t1.EventId = #EventId AND t1.EventValue = #FirstValue) ids
WHERE
rn = 1
)
SELECT
t.VehicleId, o.first, o.last, t.EventId, t.EventValue
FROM
#T2 t
INNER JOIN
ord o ON t.EventTimestamp >= o.first
AND t.EventTimestamp <= o.last;
WHERE t.EventId = #EventId;
DROP TABLE #E;
DROP TABLE #T1;
DROP TABLE #T2;
Basically, for a record in table E you see that for eventID 'TwigStatus' the value '12' should come first in table T1 and then '7' should be next in table T2. There is a second event sequence that is defined.
The VehicleId column is the link between the tables T1 and T2.
I need to compute the delay between two matching events in table T1 and T2.
To start simple, I do not use the table E yet, I'm using variables that contains predefined values and I'm returning timestamps.
But the result of the query above;
VehicleId first last EventId EventValue
1 2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 TwigStatus 7
2 2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 SafeProtectEvent 6
Is not what I'm expecting because the EventId 'SafeProtectEvent' Should be filtered out for now.
So I have 2 questions:
How to avoid the second event to show with the actual query.
How to work with the content of the table E and get rid of variables to process event sequences.
EDIT 1: Problem 1 Solved by adding a restriction on the query (see above)
Update/new version below - now allows rows in T1 without matching rows in T2.
Based on discussion on comments below, I have updated this suggestion.
This code replaces everything from the DECLARE #EventId to the end of that SELECT statement.
Logic is as follows - for each row in T1 ...
Determine the time boundaries for that row in T1 (between its EventTimestamp, and the next EventTimestamp in T1 for that vehicle; or 1 day in the future if there is no next event)
Find the matching rows in T2, where 'matching' means a) same VehicleId, b) same EventId, c) EventValue is limited by possibilities in #E, and d) occurs within the time boundaries of T1
Find the first of these rows, if available
Calculate EventDelay as the times between the two timestamps
; WITH t1 AS
(SELECT VehicleId,
EventTimestamp,
EventId,
EventValue,
COALESCE(LEAD(EventTimestamp, 1) OVER (PARTITION BY VehicleID ORDER BY EventTimestamp), DATEADD(day, 1, getdate())) AS NextT1_EventTimeStamp
FROM #T1
),
ord AS
(SELECT t1.VehicleId,
t1.EventTimestamp AS first,
t2.EventTimestamp AS last,
t1.EventId,
t2.EventValue,
ROW_NUMBER() OVER (PARTITION BY t1.VehicleId, t1.EventTimestamp, t1.EventId ORDER BY t2.EventTimestamp) AS rn
FROM t1
LEFT OUTER JOIN #E AS e ON t1.EventId = e.EventId
AND t1.EventValue = e.FirstValue
LEFT OUTER JOIN #T2 AS t2 ON t1.VehicleID = t2.VehicleID
AND t1.EventID = t2.EventID
AND t2.eventId = e.EventId
AND t2.EventValue = e.LastValue
AND t2.EventTimestamp > t1.EventTimestamp
AND t2.EventTimestamp < NextT1_EventTimeStamp
)
SELECT VehicleId, first, last, EventId, EventValue,
DATEDIFF(second, first, last) AS EventDelay
FROM ord
WHERE rn = 1
The ever-growing DB<>fiddle has the latest updates, as well as original posts and previous suggestions.

SQL subquery with GROUP BY that returns duplicate minimums. how is this interpreted by outer query?

I was working through some subquery questions and code below was provided as the answer.
my question:
if the inner query returns two minimum salaries that are the same, but belong to different departments. how will the outer query interpet this? will it recognize that salaries refer to different departments?
SELECT first_name, last_name, salary, department_id
FROM employees
WHERE salary IN ( SELECT MIN(salary)
FROM employees
GROUP BY department_id );
thank you
No, it does not know anything about the department information. You need to change the IN to a JOIN:
SELECT e.first_name, e.last_name, e.salary, e.department_id
FROM employees e
INNER JOIN ( SELECT department_id,IN(salary) AS salary
FROM employees
GROUP BY department_id) s
ON s.department_id=e.department_id
AND s.salary=e.salary;
As the statement is currently written it is going to show you each record from employees table which matches the minimum salary for each department with the salary itself. So, the outer query doesn't know anything about the department_id, meaning that there is no correlation between this attribute from the inner query with the outer query at all. You would need to change your logic for example to a JOIN to achieve that.
You can use rank() :
select e.*
from (select e.*, rank() over (partition by e.department_id order by e.salary) as seq
from employees e
) e
where e.seq = 1;
You can pass additional information/condition like this for the department.
CREATE TABLE employees (
empname VARCHAR(20)
,salary DECIMAL(18, 2)
,department_id INT
)
INSERT INTO employees
VALUES ('A', 100,1), ('B', 100, 2), ('C', 300, 2)
SELECT *
FROM employees
WHERE salary IN (
SELECT MIN(salary)
FROM employees
GROUP BY department_id
HAVING department_id = 2
)
AND department_id = 2;
Here is the output
empname salary department_id
-----------------------------
B 100.00 2
Although, 100.00 is the minimum salary for both A and B, but you have passed information for department id: 2. So, only B has come in the output.

SQL Query to get Merchant Name with max date

I would like to achieve the below but not sure how to go about it any query pointing in the right direction will be a great help.
Tables: I have three tables below#
Merchant(MerchantId, Name, Date),
MerchantCategory(MerchantId, CategoryId),
Category (CategoryId, Name)
How to return category name, Merchant count,Merchant name with max date
From the requirement I understand that there should be 1 row per category, that the number of merchants should be shown and that the name of the merchant with the most recent date should be shown.
I have prepared a query below that generates some sample data and provides the result intended as I understand it.
The way this works is that the merchant volume is calculated by joining the merchant category table on to the category table and then counting the merchant id's per category. The name is trickier and requires using outer apply that per category (per row) works out the top 1 name in the merchant table ordered by the max(date) desc
I hope this helps, any questions please let me know.
declare #Merchant table (
MerchantId int,
Name nvarchar(25),
Date Date
);
declare #MerchantCategory table (
MerchantId int,
CategoryId int
);
declare #Category table (
CategoryId int,
Name nvarchar(25)
);
insert into #Merchant (MerchantId, Name, Date)
values
(1, 'Lucy', '2019-01-05'),
(2, 'Dave', '2019-01-30'),
(3, 'Daniel' ,'2019-02-01');
insert into #MerchantCategory (MerchantId, CategoryId)
values
(1, 4),
(1, 5),
(2, 4),
(3, 5);
insert into #Category (CategoryId, Name)
values
(4, 'Cat1'),
(5, 'Cat2');
select c. Name, max(m.name) as MaxMerchantName, count(distinct mc2.merchantid) as Merchantvol from #Category c
left join #MerchantCategory mc2 on c.CategoryId=mc2.CategoryId
outer apply (select top 1 name, max(date) as date from #Merchant m inner join #MerchantCategory mc on m.MerchantId=mc.MerchantId where c.CategoryId=mc.CategoryId group by Name order by max(date) desc) m
group by c.Name;
I would have expected to see your efforts..
so am going against SO principles # the moment..
try this:
select c.Name as category_name, count(*) as Merchant_Count, m.Name as Merchant_Name, max(Date) from
Merchant m join MerchantCategory mc
on m.MerchantId = mc.MerchantId
join Category c
on mc.CategoryId = c.CategoryId
group by c.Name, m.Name

Transforming and repeating multiple rows

I have a table that has two IDs within it named FamilyID and PersonID. I need to be able to repeat these rows with all combinations, as the below screenshot shows noting that each of the numbers get an extra row.
Here is some SQL to create the table with some sample data. There is no set number of occurrences that could occur.
Anyone aware of how we could be achieved?
CREATE TABLE #TempStackOverflow
(
FamilyID int,
PersonID int
)
insert into #TempStackOverflow
(
FamilyID,
PersonID
)
select
1012,
1
union
select
1013,
1
union
select
1014,
1
union
select
1015,
2
union
select
14774,
3
union
select
1019,
5
I understand that you need some sort of a complete list of matches within groups, but honestly, it would be much better if you would explain the business context, using plain English, in the first place.
The following query seems to produce your sample result:
with cte as (
select a.FamilyID, a.PersonID, a.PersonID as [GroupId] from #TempStackOverflow a
union all
select b.PersonID, b.FamilyID, b.PersonID from #TempStackOverflow b
)
select distinct c.FamilyID, s.PersonID
from cte c
inner join cte s on s.GroupId = c.GroupId
where c.FamilyID != s.PersonID;
Here is the simplest version I can come up with that groups the items by PersonId, as you do above. Obviously if you don't want that, then you can remove the outer query.
SELECT FamilyId,
PersonID
FROM (
SELECT FamilyId, PersonId, PersonID as SortBy
FROM #TempStackOverflow t1
UNION
SELECT PersonId, FamilyId, PersonId as SortBy
FROM #TempStackOverflow t1
UNION
SELECT t1.FamilyID, t2.FamilyID, t1.PersonID as SortBy
FROM #TempStackOverflow t1
FULL OUTER JOIN #TempStackOverflow t2
ON t1.PersonID = t2.PersonID
WHERE t1.FamilyID != t2.FamilyID
) as Src
ORDER BY SortBy

Select all records for customers where mindate in 2015

I want to select all records for customers whose first order is from 2015. I want any orders they placed after 2015 too, but I DON'T want the records for customers whose first order was in 2016. I am ultimately trying to find the percentage of people who order more than twice, but I want to exclude the customers who were new in 2016.
This doesn't work because 'mindate' is an invalid column name but I'm not sure why or how else to try it.
Select
od.CustomerID, OrderID, OrderDSC, OrderDTS
From
OrderDetail OD
Join
(Select
OrderID, Min(orderdts) as mindate
From
OrderDetail
Where
mindate Between '2015-1-1' and '2015-12-31'
Group By Orderid) b on od.OrderID = b.OrderID
Because execution phases - it's seqency how is qry evaluated and by engine. In where clause your mindate not yet exists.
You can change mindate by orderdts:
select OrderID, min(orderdts) as mindate
from OrderDetail
where orderdts between '2015-1-1' and '2015-12-31'
group by Orderid
Second option is to use having statement - it's evaluated after group by.
What I di was select the distinct CustomerIDs that fall in between your daterange and did a left join with the table so it filters out anyone that doesn't fall in between your daterange.
SELECT * FROM
(Select DISTINCT(CustomerID) as CustomerID
FROM OrderDetail WHERE OrderDTS between '2015-1-1' AND '2015-12-31') oIDs
LEFT JOIN
OrderDetail OD
ON oIDs.CustomerID = OD.CustomerID
Try using the EXISTS clause. It is basically a sub-query. Below is an example you should be able to adapt.
create table Test (Id int, aDate datetime)
insert Test values (1,'04/04/2014')
insert Test values (1,'05/05/2015')
insert Test values (1,'06/06/2016')
insert Test values (2,'04/30/2016')
insert Test values (3,'02/27/2014')
select t.* from Test t
where
aDate>='01/01/2015'
and exists(select * from Test x where x.Id=t.Id and x.aDate >='01/01/2015' and x.aDate<'01/01/2016')
I don't know the orderdts data type but if it is datetime orders on 2015-12-31 will not be included (unless the order date is 2015-12-31 00:00:00.000. Note how this will skip the first record:
DECLARE #orders TABLE (CustomerID INT, orderDate DATETIME);
INSERT #orders VALUES (1, '2015-12-31 00:00:01.000'), (1, '2015-12-30'), (2, '2015-01-04');
SELECT * FROM #orders WHERE orderDate BETWEEN '2015-01-01' AND '2015-12-31';
In this case you would want the WHERE clause filter to look like:
WHERE orderDate BETWEEN '2015-01-01 00:00:00.000' AND '2015-12-31 23:59:59.999';
Or
WHERE CAST(orderDate AS date) BETWEEN '2015-01-01' AND '2015-12-31';
(the first example will almost certainly perform better).
Now, using this sample data:
-- Sample data
CREATE TABLE #LIST (LISTName varchar(10) NOT NULL);
INSERT #LIST
SELECT TOP (100) LEFT(newid(), 8)
FROM sys.all_columns a, sys.all_columns b;
-- You will probably want LISTName to be indexed
CREATE NONCLUSTERED INDEX nc_LISTName ON #LIST(LISTName);
You can implement Paul's solution like this:
DECLARE #LIST_Param varchar(8) = 'No List';
SELECT LISTName
FROM
(
SELECT distinct LISTName
FROM #LIST
UNION ALL
SELECT 'No List'
WHERE (SELECT COUNT(DISTINCT LISTName) FROM #LIST) < 1000000
) Distinct_LISTName
WHERE (#LIST_Param = 'No List' or #LIST_Param = LISTName);
Alternatively you can do this:
DECLARE #LIST_Param varchar(8) = 'No List';
WITH x AS
(
SELECT LISTName, c = COUNT(*)
FROM #LIST
WHERE (#LIST_Param = 'No List' or #LIST_Param = LISTName)
GROUP BY LISTName
),
c AS (SELECT s = SUM(c) FROM x)
SELECT LISTName
FROM x CROSS JOIN c
WHERE s < 1000000;

Resources