Count top 5 persons that were most together

Count top 5 persons that were most together - sql-server

I have a check-in table that consists of the flowing columns:
PK CheckInID int
PersonID int
CheckInDate smalldatetime
I'm trying to create a query that gives me a top 3 of persons who most frequently were checked-in together for a specific person.
For example:
personID 1 was
18 times together with personID 3
13 times together with personID 9
11 times together with personID 4
Implementing this in C# is not really a problem for me but I want to create a stored procedure and TSQL is not really my strong side.

Assuming that date is designator:
SELECT TOP 3 PersonId, COUNT(*) cnt
FROM your_table
WHERE CheckInDate IN (SELECT CheckInDate
FROM your_table
WHERE PersonId = ?)
AND PersonId <> ? -- do not count the same person
GROUP BY PersonId
ORDER BY cnt DESC;

A faster way (no subquery and no "IN" statement) is :
SELECT TOP 3 T2.PersonId
, SUM(1) AS NB_TIME_CHECKED_IN_WITH_XXX
FROM your_table AS T1
INNER JOIN your_table AS T2 ON (T1.[PK CheckInID]=T2.[PK CheckInID] AND T2.PersonId <> XXX)
WHERE T1.PersonId = XXX
GROUP BY PersonId
ORDER BY NB_TIME_CHECKED_IN_WITH_XXX DESC;

Related

Left outer join with CASE condition on most recent date

I have two tables:
dbo.Order
PK_Order FK_Customer OrderDate Total
1 1 2020-01-20 150.00
2 1 2020-01-25 200.00
dbo.Customer:
PK_Customer Name Age
1 John Miller 25
2 Max Monroe 28
I would like to join these two tables BUT when a customer has more than one order, only the one with the most recent date should be joined. This would be the initial code to join the two:
SELECT *
FROM dbo.Customer as Customer
LEFT OUTER JOIN dbo.Order
ON Customer.PK_Customer = dbo.Order.FK_Customer
I have never worked with case conditions in queries. Could anybody give me a hint?

I like using TOP 1 WITH TIES for problems like this:
SELECT TOP 1 WITH TIES *
FROM dbo.Customer c
LEFT OUTER JOIN o
ON c.PK_Customer = o.FK_Customer
ORDER BY
ROW_NUMBER() OVER (PARTITION BY c.PK_Customer ORDER BY o.OrderDate DESC);

You can LEFT JOIN only record with the latest date:
--CREATE TABLE [Order]
--(
-- PK_Order int,
-- FK_Customer int,
-- OrderDate date,
-- Total decimal(10,2)
--)
--INSERT [Order] VALUES
--(1,1,'2020-01-20',150),
--(2,1,'2020-01-25',200)
--CREATE TABLE Customer
--(
-- PK_Customer int,
-- Name nvarchar(20),
-- Age int
--)
--INSERT [Customer] VALUES
--(1,'John Miller',25),
--(2,'Max Monroe',28)
SELECT *
FROM dbo.Customer C
LEFT OUTER JOIN dbo.[Order] O
ON C.PK_Customer = O.FK_Customer
AND OrderDate=(SELECT MAX(OrderDate) FROM [Order] WHERE [Order].FK_Customer=O.FK_Customer)
Note 1: Since there can be many orders in recent date, I preserve all.
Note 2: It's not a good idea to keep age - it must be updated every year. Keep date of birth.

A similar way to Tim's answer but the difference is that the Partition by is within orders table and joining on Row =1 for each customer.
select * from #Customer c
left join
(select ROW_NUMBER() over (partition by FK_Customer order by OrderDate desc) as order_NUM,
PK_Order,
FK_Customer,
OrderDate,
Total from #Order
) o on c.PK_Customer = o.FK_Customer and order_NUM = 1
order by c.PK_Customer, o.OrderDate desc

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.

This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.

Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

SQL Server - Select most recent records with condition

I have a table like this.
Table :
ID EnrollDate ExitDate
1 4/1/16 8/30/16
2 1/1/16 null
2 1/1/16 7/3/16
3 2/1/16 8/1/16
3 2/1/16 9/1/16
4 1/1/16 12/12/16
4 1/1/16 12/12/16
4 1/1/16 12/12/16
4 1/1/16 null
5 5/1/16 11/12/16
5 5/1/16 11/12/16
5 5/1/16 11/12/16
Need to select the most recent records with these conditions.
One and only one record has the most recent enroll date - select that
Two or more share same most recent enroll date and one and only one record has either a NULL Exit Date or the most recent Exit Date - Select the record with null. If no null record pick the record with recent exit date
Two or more with same enroll and Exit Date - If this case exists, don't select those record
So the expected result for the above table should be :
ID EnrollDate ExitDate
1 4/1/16 8/30/16
2 1/1/16 null
3 2/1/16 9/1/16
4 1/1/16 null
I wrote the query with group by. I am not sure how to select with the conditions 2 and 3.
select t1.* from table t1
INNER JOIN(SELECT Id,MAX(EnrollDate) maxentrydate
FROM table
GROUP BY Id)t2 ON EnrollDate = t2.maxentrydate and t1.Id=t2.Id
Please let me know what is the best way to do this.

Using the rank() window function, I think it's possible.
This is untested, but it should work:
select t.ID, t.EnrollDate, t.ExitDate
from (select t.*,
rank() over(
partition by ID
order by EnrollDate desc,
case when ExitDate is null then 1 else 2 end,
ExitDate desc) as rnk
from tbl t) t
where t.rnk = 1
group by t.ID, t.EnrollDate, t.ExitDate
having count(*) = 1
The basic idea is that the rank() window function will rank the most "recent" rows with a value of 1, which we filter on in the outer query's where clause.
If more than one row have the same "most recent" data, they will all share the same rank of 1, but will get filtered out by the having count(*) = 1 clause.

Use ROW_NUMBER coupled with CASE expression to achieve the desired result:
WITH Cte AS(
SELECT t.*,
ROW_NUMBER() OVER(
PARTITION BY t.ID
ORDER BY
t.EnrollDate DESC,
CASE WHEN t.ExitDate IS NULL THEN 0 ELSE 1 END,
t.ExitDate DESC
) AS rn
FROM Tbl t
INNER JOIN (
SELECT
ID,
COUNT(DISTINCT CHECKSUM(EnrollDate, ExitDate)) AS DistinctCnt, -- Count distinct combination of EnrollDate and ExitDate per ID
COUNT(*) AS RowCnt -- Count number of rows per ID
FROM Tbl
GROUP BY ID
) a
ON t.ID = a.ID
WHERE
(a.DistinctCnt = 1 AND a.RowCnt = 1)
OR a.DistinctCnt > 1
)
SELECT
ID, EnrollDate, ExitDate
FROM Cte c
WHERE Rn = 1
The ORDER BY clause in the ROW_NUMBER takes care of conditions 2 and 3.
The INNER JOIN and the WHERE clause take care of 1 and 4.
ONLINE DEMO

with B as (
select id, enrolldate ,
exitdate,
row_number() over (partition by id order by enrolldate desc, case when exitdate is null then 0 else 1 end, exitdate desc) rn
from ab )
select b1.id, b1.enrolldate, b1.exitdate from b b1
left join b b2
on b1.rn = b2.rn -1 and
b1.id = b2.id and
b1.exitdate = b2.exitdate and
b1.enrolldate = b2.enrolldate
where b1.rn = 1 and
b2.id is nULL
The left join is used to fullfill the 3) requirement. When record is returned then we don't want it.

Join two tables with conditions depending on multiples columns

In SQL Server 2008, I want to join two table on key that might have duplicate, but the match is unique with the information from other columns.
For a simplified purchase record example,
Table A:
UserId PayDate Amount
1 2015 100
1 2010 200
2 2014 150
Table B:
UserId OrderDate Count
1 2009 4
1 2014 2
2 2013 5
Desired Result:
UserId OrderDate PayDate Amount Count
1 2009 2010 200 4
1 2014 2015 100 2
2 2013 2014 150 5
It's guaranteed that:
Table A and Table B have same number of rows, and UserId in both table are same set of numbers.
For any UserId, PayDate is always later than OrderDate
Rows with same UserId are matched by sorted sequence of Date. For example, Row 1 in Table A should match Row 2 in Table B
My idea is that on both tables, first sort by Date, then add another Id column, then join on this Id column. But I not authorized to write anything into the database. How can I do this task?

Row_Number() will be your friend here. It allows you to add a virtual sequencing to your resultset.
Run this and study the output:
SELECT UserID
, OrderDate
, "Count" As do_not_use_reserved_words_for_column_names
, Row_Number() OVER (PARTITION BY UserID ORDER BY OrderDate) As sequence
FROM table_b
The PARTITION BY determines when the counter should be "reset" i.e. it should restart after a change of UserID
The ORDER BY, well, you've guessed it - determines the order of the sequence!
Pull this all together:
; WITH payments AS (
SELECT UserID
, PayDate
, Amount
, Row_Number() OVER (PARTITION BY UserID ORDER BY PayDate) As sequence
FROM table_b
)
, orders AS (
SELECT UserID
, OrderDate
, "Count" As do_not_use_reserved_words_for_column_names
, Row_Number() OVER (PARTITION BY UserID ORDER BY OrderDate) As sequence
FROM table_b
)
SELECT orders.UserID
, orders.OrderDate
, orders.do_not_use_reserved_words_for_column_names
, payments.PayDate
, payments.Amount
FROM orders
LEFT
JOIN payments
ON payments.UserID = orders.UserID
AND payments.sequence = orders.sequence
P.S. I've opted for an outer join because I assumed that there's not always going to be a payment for every order.

Try:
;WITH t1
AS
(
SELECT UserId, PayDate, Amount,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY PayDate) AS RN
FROM TableA
),
t2
AS
(
SELECT UserId, OrderDate, [Count],
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY OrderDate) AS RN
FROM TableB
)
SELECT t1.UserId, t2.OrderDate, t1.PayDate, t1.Amount, t2.[Count]
FROM t1
INNER JOIN t2
ON t1.UserId = t2.UserId AND t1.RN = t2.RN

Return NULL columns if IDs don't exist in the table

I have one solution with left join for the below question, but I'm looking for more efficient query
Select * from table1 where Id in (1,2,3,4,5);
returns all the existing "Ids" in the table. Now I want all the Ids to be returned with null columns if the Id is not existing in the table.
EX: Result must contain 3 and 5 though the IDs not existing in the table
ID Name Designation
1 John Employee
2 Nar Manager
3 **NULL** **NULL**
4 Esh Executive.
5 **NULL** **NULL**

select x.id, y.name, y.designation
from (
select row_number() OVER(order by id) as id
from table1
) x
left join table1 y
on x.id = y.id
This ought to work.