Get newest record per group from subquery

Get newest record per group from subquery - sql-server

I would like to get the latest record based on date for each email from my query.
This query produces multiple records for each email. Let's call this output, table C.
My question is: How to filter from the alias table C only the most recent record.
+-------------------+-----+------------+
| email | id | date |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-21 |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-15 |
+-------------------+-----+------------+
Desired result is:
+-------------------+-----+------------+
| email | id | date |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-21 |
+-------------------+-----+------------+
My starting query (that produces multiple email records) is the following:
SELECT DISTINCT
Email,
ID,
Date
FROM [TABLE_A] AS a
LEFT JOIN (
select *
from [TABLE_B]
where ID = '123'
) AS b
ON a.Email = b.Key
My attempt:
SELECT c.Email, c.ID, c.Date
FROM (
SELECT DISTINCT
Email,
ID,
Date
FROM [TABLE_A] AS a
LEFT JOIN (
select *
from [TABLE_B]
where ID = '123'
) AS b ON a.Email = b.Key
) AS c
INNER JOIN (
SELECT Email, max(Date) as MaxDate
FROM c
GROUP BY Email
) tm on c.Email = tm.Email and c.Date = tm.Date
Looks like SQL cannot 'see' table C as I am getting an error:
invalid object name

You can use WITH TIES in concert with row_number()
Example
Select Top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By Id Order By [Date] Desc)

Related

How can I create a new table in SQL where some rows are exact matches and some are not?

Say I have a few tables as follows:
Users table:
| id | name | email |
+----+-------------+-------------+
| 1 | David | Dave#example.com
| 2 | Bill | Dave#example.com
| 3 | David | Dave#example.com
Favorites table:
| userid | favoriteanimal |
+--------+----------------+
| 1 | Hippo |
| 2 | Dog |
| 3 | Hippo |
Activity table:
| userid | firstday | lastday | daysused |
+--------+-----------+-----------+-----------+
| 3 | 7/31/2019 | 8/2/2019 | 2 |
| 1 | 8/3/2019 | 8/20/2019 | 4 |
| 2 | 7/31/2019 | 8/20/2019 | 15 |
I want to create a new table in my database that collapses the current tables in a few different ways: Email, Name, and Favorite animal should match. If the all match and we have multiple rows look at the activity table, if the first day for one user is the next day after the same user's last day, we should combine those users so that we now have a new row with the firstday being the earliest day, userid being the id from the earliest day user, and lastday being the latest last day. Days used should add up the daysused from userids.
The results I'm expecting should look like below:
| userid | firstday | lastday | daysused | favoriteanimal |
+--------+-----------+-----------+----------+----------------+
| 2 | 7/31/2019 | 8/20/2019 | 15 | Hippo |
| 3 | 7/31/2019 | 8/20/2019 | 6 | Dog |
I have the following fiddle that I am playing around with, but I have been unsuccessful so far: http://sqlfiddle.com/#!18/09b76/11

Please check below query for your answer.
SELECT
MIN(a.userid) USERID,
u.name,
u.email,
f.favoriteanimal,
MIN(a.firstday) FirstDay,
MAX(a.lastday) LastDay,
SUM(a.daysused) daysused,
count(a.userid) usercnt
FROM
users u
INNER JOIN activity a ON u.id = a.userid
INNER JOIN favorites f ON f.userid = u.id
GROUP BY u.name,
u.email,
f.favoriteanimal

with data as (
select u.id as userid, u.name, u.email, f.favoriteanimal,
a.firstday, a.lastday, a.daysused,
case when datediff(day,
lag(a.lastday) over (
partition by u.name, u.email, f.favoriteanimal
order by a.firstday, a.lastday),
a.firstday
) > 1 then 1 else 0 end as gap
from users as u
inner join activity as a on u.id = a.userid
inner join favorites as f on f.userid = u.id
), concatenated as (
select *,
sum(gap) over (partition by name, email, favoriteanimal order by firstday, lastday) as grp
from data
), agg as (
select
userid, favoriteanimal,
row_number() over (partition by name, email, favoriteanimal, grp order by firstday) as rn,
min(firstday) over (partition by name, email, favoriteanimal, grp) as firstday,
max(lastday) over (partition by name, email, favoriteanimal, grp) as lastday,
sum(daysused) over (partition by name, email, favoriteanimal, grp) as daysused
from concatenated
)
select userid, firstday, lastday, daysused, favoriteanimal
from agg where rn = 1;
When I introduced a wider gap between users 1 and 3 it returns separate rows. Here's an example: https://rextester.com/MONL73656

So the question changed while I was still writing this get much simpler and allow matching only on e-mail. Matching only on e-mail allows us to greatly simplify this down a GROUP BY query. But since I didn't want to throw away the work:
SELECT first.id as userid, first.firstday,
coalesce(second.lastday, first.lastday) lastday,
coalesce(second.daysused + first.daysused, first.daysused) daysused,
base.favoriteanimal
FROM (
SELECT DISTINCT u.name, u.email, f.favoriteanimal
FROM Users u
INNER JOIN Favorites f on f.userid = u.id
) base
CROSS APPLY (
SELECT TOP 1 u.id, a.firstday, a.lastday, a.daysused
FROM Users u
INNER JOIN Favorites f on f.userid = u.id
INNER JOIN Activity a on a.userid = u.id
WHERE u.name = base.Name and u.email = base.email and f.favoriteanimal = base.favoriteanimal
ORDER BY a.firstday
) first
OUTER APPLY (
SELECT TOP 1 a.lastday, a.daysused
FROM Users u
INNER JOIN Favorites f on f.userid = u.id
INNER JOIN Activity a on a.userid = u.id
WHERE u.name = base.Name and u.email = base.email and f.favoriteanimal = base.favoriteanimal
and u.id <> first.id and a.FirstDay = DateAdd(day, 1, first.LastDay)
) second
If you want to get real fancy, you can get this into a recursive CTE, that keeps running to find more and more second result sets, so a user could have many stops/starts.

SQL Server group by but select 'top' date

I have a table in SQL server like so (Note the ID field is not unique):
-----------------------------------
| ID | IsAdamBrown | DateComplete |
| 1 | TRUE | 2017-01-01 |
| 1 | TRUE | 2017-01-03 |
-----------------------------------
I'd like to select one row for all the unique IDs in the table and the most recent 'DateComplete' for that ID.
My desired output in this case would be:
-----------------------------------
| ID | IsAdamBrown | DateComplete |
| 1 | TRUE | 2017-01-03 |
-----------------------------------
I've tried:
SELECT DISTINCT DateComplete, ID, IsAdamBrown
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY DateComplete, ID, IsAdamBrown
ORDER BY DateComplete DESC
Unfortunately I still get the two date rows back. In MySQL I would group by just the first two rows and the ORDER BY would make sure the DateComplete was the most recent. SQL servers requirement that the SELECT fields match the GROUP BY makes this impossible.
How can I get a single row back for each ID with the most recent DateComplete?

SELECT id,
isadambrown,
Max(datecomplete) AS DateComplete
FROM thistable
GROUP BY id,
isadambrown
ORDER BY Max(datecomplete) DESC

You can get by GROUP BY with MAX() of DateComplete
SELECT ID, IsAdamBrown, MAX(DateComplete) AS DateComplete
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY ID, IsAdamBrown
ORDER BY MAX(DateComplete) DESC

You can using LIMIT
SELECT ID, IsAdamBrown, DateComplete
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY ID, IsAdamBrown
ORDER BY DateComplete LIMIT 1

You can use this. I hope it will work for you.
SELECT ID, IsAdamBrown, DateComplete
FROM thisTable a
WHERE DateComplete IN
(
SELECT MAX(DateComplete) FROM thisTable b WHERE a.ID = b.ID GROUP BY b.ID
) ORDER BY DateComplete DESC

You can use ROW_NUMBER() for grouping according to ID and a subquery to get the only first record with recent iscomplete. This will first sort your data according to id and recent iscomplete and then the first result for all the unique IDs
SELECT X.ID, X.IsAdamBrown, X.DateComplete
FROM ( SELECT ID, IsAdamBrown, DateComplete,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DateComplete DESC) RN
FROM thisTable
WHERE IsAdamBrown IS NOT NULL ) X
WHERE X.RN=1

How to get columns that are referenced in another table a certain number of times? in SQL Server

So I have two hypothetical tables
Country (CountryCode, CountryName)
Groups (GroupId, GroupName, CountryCode)
I know that group is a reserved word but it's just for the sake of the example
What I want to get is the countries with 3 or more groups without the use of another referential table.
I have tried the following
select *
from Country c
where CountryCode in (select g.CountryCode
from Group g
where g.CountryCode=c.CountryCode
group by g.CountryCode
having count(*) > 3)
But I get no results given I have the following data in my Groups table:
|GroupId|GroupName|CountryCode|
| 1 | 'asd' | USA |
| 4 | 'fgh' | USA |
| 3 | 'jkl' | USA |
| 4 | 'zxc' | ARG |
The result I want is:
|CountryCode| CountryName|
| USA |UnitedStates|
because there 3 groups with the CountryCode = USA

Get those country code by using GROUP BY and Having then Join the result with Country table, you will get your expected result.
select C.*
from
(
select g.CountryCode
from Group g
group by g.CountryCode
having count(*) >= 3
) CC
INNER JOIN Country C ON C.CountryCode = CC.CountryCode
UPDATE Without JOIN
select C.*
from Country C
WHERE C.CountryCode IN
(
select g.CountryCode
from Group g
group by g.CountryCode
having count(*) >= 3
)

You are almost true without where clause But Answer of #Mahedi Sabuj is better for performance
SELECT *
FROM Country c
WHERE CountryCode IN
(
SELECT g.CountryCode
FROM Group g
GROUP BY g.CountryCode
HAVING COUNT(*) > 3
)

Select c.CountryCode, c.CountryName
FROM [Country] c
INNER JOIN [Group] g ON c.CountryCode = g.CountryCode
GROUP BY c.CountryCode, c.CountryName
HAVING COUNT(DISTINCT GroupName) >= 3

Referencing outer table in an aggregate function in a subquery

I'm looking for a solution to particular query problem. I have a table Departments and table Employees designed like that:
Departments Employees
===================== ============================
ID | Name ID | Name | Surname | DeptID
--------------------- ----------------------------
1 | ADMINISTRATION 1 | X | Y | 2
2 | IT 2 | Z | Z | 1
3 | ADVERTISEMENT 3 | O | O | 1
4 | A | B | 3
I'd like to get list of all departments whose number of employees is smaller than number of employees working in Administration.
That was one of my ideas, but it did not work:
select * from Departments as Depts where Depts.ID in
(select Employees.ID from Employees group by Employees.ID
having count(Employees.ID) < count(case when Depts.Name='ADMINISTRATION' then 1 end));

Using GROUP BY and HAVING:
SELECT
d.ID, d.Name
FROM Departments d
LEFT JOIN Employees e
ON e.DeptID = d.ID
GROUP BY d.ID, d.Name
HAVING
COUNT(e.ID) < (SELECT COUNT(*) FROM Employees WHERE DeptID = 1)

Try this,
declare #Departments table (ID int, Name varchar(50))
insert into #Departments
values
(1 ,'ADMINISTRATION')
,(2 ,'IT')
,(3 ,'ADVERTISEMENT')
declare #Employees table (ID int, Name varchar(50)
,Surname varchar(50),DeptID int)
insert into #Employees
values
(1 ,'X','Y',2)
,(2 ,'Z','Z',1)
,(3 ,'O','O',1)
,(4 ,'A','B',3)
;
WITH CTE
AS (
SELECT *
,row_number() OVER (
PARTITION BY deptid ORDER BY id
) rn
FROM #Employees
WHERE deptid <> 1
)
SELECT *
FROM cte
WHERE rn < (
SELECT count(id) admincount
FROM #Employees
WHERE DeptID = 1
)

How can I this so only the best performing company is shown for each year?

I've been working on this single SQL statement for a couple of days now & I can't seem to get this done.
With some help from friends/family, I managed to get the statement close to completion, but there's a vital part still missing & I can't figure out how to do this exactly.
This is the code I have so far:
SELECT DISTINCT
T1.Jaar,
T1.CompanyName,
CONCAT('€ ', CONVERT(money,T1.Kost)) AS 'Hoogste Prijs'
FROM
(
SELECT
Year(OrderDate) AS 'Jaar',
CompanyName,
SUM(Freight) AS 'Kost'
FROM Orders
JOIN Shippers S ON ShipVia = S.ShipperID
GROUP BY YEAR(OrderDate), CompanyName
) T1
LEFT JOIN
(
SELECT
Year(OrderDate) AS 'Jaar',
CompanyName,
SUM(Freight) AS 'Kost'
FROM Orders
JOIN Shippers S ON ShipVia = S.ShipperID
GROUP BY YEAR(OrderDate), CompanyName
) T2 ON T1.CompanyName = T2.CompanyName
ORDER BY Jaar
Which returns me the following resultset:
Now, for the part that I can't figure out:
Using the above statement, I need to expand it so that I only get the highest value for "Hoogste Prijs" for each year.
So in the end, my resultset should look like this:
+------+------------------+---------------+
| Jaar | CompanyName | Hoogste Prijs |
+------+------------------+---------------+
| 1996 | Federal Shipping | € 4233.78 |
| 1997 | United Package | € 12374.04 |
| 1998 | United Package | € 12122.14 |
+------+------------------+---------------+
From what I understand, I shouldn't be far off from the solution, but I can't seem to find it at all.

What about this:
with CompanyPerYear as ( -- this is your query, I removed the self-join since both give the same results
SELECT
Year(OrderDate) AS 'Jaar',
CompanyName,
SUM(Freight) AS 'Kost'
FROM Orders
JOIN Shippers S ON ShipVia = S.ShipperID
GROUP BY YEAR(OrderDate), CompanyName
),
cte as (
SELECT
*,
rn = ROW_NUMBER() OVER(PARTITION BY Jaar ORDER BY Kost DESC)
FROM CompanyPerYear
)
SELECT
Jaar,
CompanyName,
CONCAT('€ ', CONVERT(money,Kost)) AS 'Hoogste Prijs'
FROM cte
where rn = 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Get newest record per group from subquery - sql-server

You can use WITH TIES in concert with row_number() Example Select Top 1 with ties * From YourTable Order By Row_Number() over (Partition By Id Order By [Date] Desc)

Related

How can I create a new table in SQL where some rows are exact matches and some are not?

SQL Server group by but select 'top' date

How to get columns that are referenced in another table a certain number of times? in SQL Server

Referencing outer table in an aggregate function in a subquery

How can I this so only the best performing company is shown for each year?

Categories

Resources