Using RowNumber and Partition - sql-server

Consider this code:
Select U.[user_id] As UserID
Max(AL.entry_dt) As LastLoginDate
From Users U with (nolock)
Inner Join activity_log AL with (nolock) On AL.[user_id] = U.[user_id]
And AL.activity_type = 'LOGIN'
And U.external_user = 1
Group By U.[user_id]
Having Max(al.entry_dt) < GetDate() - 30
Order By U.[user_id]
I was curious if the Row_Number / Partition could be used here? Perhaps to make this more effective, or if it can be used at all?
Essentially, I want 1 row per user with the last instance the user logged in where the user hasn't logged in during the last 30 days.
Bring on the pain.....

To use the result of the row_number() in a where clause, wrap the query in a subquery/derived table or common table expression:
Original answer for users that have logged in within the last 30 days:
select UserId, LastLoginDate
from (
Select
U.[user_id] As UserID
, AL.entry_dt As LastLoginDate
, rn = row_number() over(partition by u.user_id order by AL.entry_dt desc)
From Users U with (nolock)
Inner Join activity_log AL with (nolock)
On AL.[user_id] = U.[user_id]
And AL.activity_type = 'LOGIN'
And U.external_user = 1
Where AL.entry_dt > GetDate() - 30 -- swapped < for >
) sub
where rn = 1
Order By sub.[userid]
rextester demo: http://rextester.com/XZU40394
returns:
+--------+---------------+
| UserId | LastLoginDate |
+--------+---------------+
| 1 | 2017-09-13 |
| 2 | 2017-09-10 |
| 3 | 2017-09-07 |
+--------+---------------+
Updated answer for users who have not logged in in the last 30 days:
select UserId, LastLoginDate
from (
Select
U.[user_id] As UserID
, AL.entry_dt As LastLoginDate
, rn = row_number() over(partition by u.user_id order by AL.entry_dt desc)
From Users U with (nolock)
Inner Join activity_log AL with (nolock)
On AL.[user_id] = U.[user_id]
And AL.activity_type = 'LOGIN'
And U.external_user = 1
) sub
where rn = 1
and lastlogindate < getdate() - 30
Order By [userid]
rextester demo: http://rextester.com/XZU40394
returns:
+--------+---------------+
| UserId | LastLoginDate |
+--------+---------------+
| 4 | 2016-09-13 |
| 6 | 2016-09-10 |
+--------+---------------+
from test setup:
create table users (user_id int, external_user bit)
create table activity_log (user_id int, activity_type varchar(32), entry_dt date)
insert into users values (1,1),(2,1),(3,1),(4,1),(5,0),(6,1)
insert into activity_log values
(1,'login','20170913') ,(1,'login','20170912') ,(1,'login','20170911'),(1,'login','20160908')
,(2,'login','20170910') ,(2,'login','20170909') ,(2,'login','20170908')
,(3,'login','20170907') ,(3,'login','20170906') ,(3,'login','20170905')
,(4,'login','20160913') ,(4,'login','20160912') ,(4,'login','20160908')
,(5,'login','20160910') ,(5,'login','20160909') ,(5,'login','20160908')
,(6,'login','20160910') ,(6,'login','20160909') ,(6,'login','20160908')
To correct your query in the question, move your where to having like so:
Select U.[user_id] As UserID
,Max(AL.entry_dt) As LastLoginDate
From Users U with (nolock)
Inner Join activity_log AL with (nolock) On AL.[user_id] = U.[user_id]
And AL.activity_type = 'LOGIN'
And U.external_user = 1
Group By U.[user_id]
having max(al.entry_dt) < GetDate() - 30
Order By U.[user_id]

CROSS APPLY or OUTER APPLY allow you to return n records from correlated query for each record in related table. I think cross apply is what you want since if a user hasn't logged in in the past 30 days you don't want to see them at all in results. Cross apply similar to inner join but runs correlation query for each record related table. OUTER Apply similar to OUTER join so it returns all records from related table and only those that match in the correlated query.
So in the below example, for each user, return the top 1 record in descending order of entry_dT. for each related user. Outer apply would resemble a left join so all users would be returned even if no activity occurred.
MODIFIED DEMO: http://rextester.com/UQEI69366 (all 3 below) again thx to SQLZim for tester/data
SELECT U.[user_id] As UserID
, AL.entry_dt As LastLoginDate
FROM Users U with (nolock)
CROSS APPLY (SELECT top 1 *
FROM activity_log IAL
WHERE U.User_ID = IAL.User_ID
AND IAL.activity_type = 'LOGIN'
ORDER BY IAL.entry_DT Desc) AL
WHERE U.external_user = 1
AND IAL.entry_dt < GetDate() - 30
ORDER BY U.[user_id]
If all you're after is users who haven't logged in in the past 30 days...
a simple not exists seems like it would work. Who cares about the date time if they have; you're just after a list of users who haven't logged in in 30 days.
SELECT U.[user_id] As UserID
FROM Users U
WHERE not exists (SELECT *
FROM activity_log IAL
WHERE IAL.activity_type = 'LOGIN'
AND IAL.entry_dt > GetDate() - 30
AND IAL.[user_id] = U.[user_id])
AND U.external_user = 1
ORDER BY U.[user_id]
a simple left join would work as well (return all external users who have not had a login in 30 days from present date.
SELECT U.[user_id] As UserID
FROM Users U with (nolock)
LEFT JOIN activity_log AL
ON AL.[user_id] = U.[user_id]
AND AL.activity_type = 'LOGIN'
AND AL.entry_dt > GetDate() - 30
WHERE U.external_user = 1
and AL.user_ID is null
ORDER BY U.[user_id]

I was curious if the Row_Number / Partition could be used here? Perhaps to make this more effective, or if it can be used at all?
I prefer group by in your case than row number since row number needs additional index than group by.Read below to know more
Assuming you use the same query you posted,below are the indexes needed
for users table..
create index nci_test on
dbo.usertable(userid,external_login)
For activity log table, you will need to know more about the data..
Ex:
if join filters out more rows than where,then index can be
create index nci_test1 on
dbo.actvititlog(userid,entry_Dt,activity_type )
if entry_dt column filters out more rows,then leading column can be entry_Dt in above index
if you use RowNumber,it will need a POC index and your query spreads across two tables,so this can't be done

Related

SQL COUNT on 3 tables with JOIN

I have 4 tables in a database. The warehouse contains boxes owned by clients, and the boxes have files in them. There is a Client table, a Warehouse table, a Boxes table, and a Files table.
So the Client table has WarehouseID as a foreign key, the Boxes table has ClientID as a foreign key, and the Files table has BoxID as a foreign key. I want to count the number of boxes and files that each client has in my query, as well as the number of boxes that are in and out of the warehouse. A Status field on the Boxes and Files tables determines if the boxes and files are in or out of the warehouse. I run the following query on the boxes and the numbers are correct:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
SUM(CASE WHEN [b].[Status] = #IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = #OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].WarehouseID = [w].[WarehouseID]
WHERE [c].[ClientID] = #ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This produces the output of:
ClientID | ClientName | WarehouseName | BoxCount | BoxesIn | BoxesOut | BoxesForDestruction
1 | ACME Corp. | FooFactory | 22744 | 22699 | 45 | 7888
The output of the count is correct. When I add the Files table to the INNER JOIN then the numbers get inflated. Here is the SQL:
SELECT
[c].[ClientID],
[c].[Name] AS [ClientName],
[w].[Name] AS [WarehouseName],
COUNT(DISTINCT [b].[BoxID]) AS [BoxCount],
COUNT(DISTINCT [f].[FileID]) AS [FileCount], -- *NEW*
SUM(CASE WHEN [b].[Status] = #IN THEN 1 ELSE 0 END)) AS [BoxesIn],
SUM(CASE WHEN [b].[Status] = #OUT THEN 1 ELSE 0 END) AS [BoxesOut],
SUM(CASE WHEN [b].[DestructionDate] <= GETDATE() THEN 1 ELSE 0 END) AS [BoxesForDestruction],
FROM [Clients] AS [c] INNER JOIN [Boxes] AS [b]
ON [c].[ClientID] = [b].[ClientID]
INNER JOIN [Warehouses] AS [w]
ON [c].[WarehouseID] = [w].[WarehouseID]
INNER JOIN [Files] AS [f] -- *NEW*
ON [b].[BoxID] = [f].[BoxID] -- *NEW*
WHERE [c].[ClientID] = #ClientID
GROUP BY
[c].[ClientID],
[c].[Name],
[w].[Name]
This gives me the count output below (I've omitted the first 3 columns since they're not relevant):
BoxCount | FilesCount | BoxesIn | BoxesOut | BoxesForDestruction
19151 | 411961 | 411381 | 580 | 144615
The FilesCount is correct, but the other numbers are off. I know why this is happening, but I'm not sure how to fix it. The extra rows are created due to the multiple rows returned by the join on the boxes and files. When performing the SUM, the extra rows inflate the count. Since there is only one row for the warehouse, that join doesn't affect the count. How do I modify my query to get the correct number of files and boxes in and out of the warehouse?
A join repeats each row in the left hand table for each row in the right hand table. If you combine multiple joins some rows will be double counted. A solution is to move the count to a subquery. For example:
select *
from table1 t1
join (
select table1_id
, count(*)
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*)
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id
As mentioned by Andomar, I included "as myColumnOne" and "myColumnTwo" besides Count(*), as it is required on SQL Server 2018:
select *
from table1 t1
join (
select table1_id
, count(*) as myColumnOne
from table2
group by
table1_id
) t2
on t2.table1_id = t1.id
join (
select table1_id
, count(*) as myColumnTwo
from table3
group by
table1_id
) t3
on t3.table1_id = t1.id

Group By and inner join with latest records based on TimeStamp

I have a History table as below:
ID | GroupCode | Category | TimeStamp
---+-----------+----------+-----------
1 | x | shoes | 2016-09-01
2 | y | blach | 2016-09-01
History table gets updated every month and a single entry for each GroupCode gets inserted in the table.
I have also a Current table which holds the latest position.
Before or after I update the History table with the current position I would like to find out whether the Category has changed from last month to this month.
I need to compare the last Category with the current Category and, if it has changed, then flag the CategoryChanged in the Current table.
Current table:
ID | GroupCode | Category | CategoryChanged
---+-----------+----------+----------------
1 | x | shoes | True
2 | y | blah | False
I tried to achieve this with INNER JOIN but I am having difficulties to INNER JOIN to latest month and year entries in History table, but no success.
--get highest group code based on timestamp
;with History
as
(select top 1 with ties groupcode,category
from
history
order by
row_number() over (partition by group code order by timestamp desc) as rownum
)
--now do a left join with current table
select
ct.ID,
ct.GroupCode,
ct.Category,
case when ct.category=ht.category or ht.category is null then 'False'
else 'true'
end as 'changed'
from
currenttable ct
left join
history ht
on ht.groupcode=ct.groupcode
use below to update ,after checking if your select values are correct..
update ct
set ct.category=
case when ct.category=ht.category or ht.category is null then 'False'
else 'true'
end
from
currenttable ct
left join
history ht
on ht.groupcode=ct.groupcode
if you make a CTE where the history records have rown_numbwer for each GroupCode ordered by date descending, then you are interested in rows 1 AND 2, SO YOU CAN THEREFORE join your CTE on GroupCode, and select records 1 and 2, you can the see if category has changed between rows 1 and 2
;WITH CTE AS (SELECT *, row_number() OVER (PARTITION BY GroupCode ORDER BY TimeStamp Desc) RN FROM History)
SELECT
C1.ID,
C1.GroupCode,
C1.Category,
CASE WHEN C1.Category = C2.Category THEN
'false'
else
'true'
end AS CategoryChanged
FROM CTE C1
JOIN
CTE C2
ON C1.GroupCode = C2.GroupCode
AND C1.Rn=1 AND C2.RN = 2;
if you have null categories, you can avoid with - BTW you will need to learn how to handle NULLs the way you want to handle them - you can't expect people to post on here thinking about NULLs you never mentioned forever! And happening to realise what you want to do with them for that matter
;WITH CTE AS (SELECT *, row_number() OVER (PARTITION BY GroupCode ORDER BY TimeStamp Desc) RN FROM History)
SELECT
C1.ID,
C1.GroupCode,
C1.Category,
CASE WHEN C1.Category = C2.Category OR C1.Category IS NULL AND C2.Category IS NULL THEN
'false'
else
'true'
end AS CategoryChanged
FROM CTE C1
JOIN
CTE C2
ON C1.GroupCode = C2.GroupCode
AND C1.Rn=1 AND C2.RN = 2;

SQL SERVER MULTIPLE JOIN: avoid duplicate values

I have 3 tables in my db:-
Customer:
CId | CName | CLocation | CJoinDate
Payment:
TxnId | TxStatus | TxComment | TxAmount | CId | TxDate
Appointment:
AppId | AppCode | CId | ADate | AComment
When I do a Left join with two tables then the calculated results come right. But when I try to do join with 3 tables then the result calculated is wrong.
for eg:-
If I try this query then the total amount calculated is correct:
SELECT c.CName, sum(p.TxAmount)
FROM Customer c LEFT JOIN Payment p ON c.CId = p.CId
WHERE p.TxStatus = 1
GROUP BY c.CName;
In the above query I am just joining two tables which gives me the correct result.
Now when I want to show all the records in one table so I had to join 3 tables.
Below is the query I tried:
SELECT c.CName as Name, sum(p.TxAmount) as Payment, count(distinct a.ADate) as Total_Visit
FROM Customer c LEFT JOIN Payment p ON c.CId = p.CId LEFT JOIN Appointment a ON c.CId = a.CId
WHERE p.TxStatus = 1
GROUP BY c.CName;
The above query gives me wrong Payment amount for each customers. The reason for the wrong result is the Appointments table has more rows as compared to Payment table for each customer. So to show all the appointment entries, the payment amount gets duplicated coz of which the calculation gets wrong.
How can I fix the above scenario witht the above query.
Tks
EDIT: Actually theres 2-3 more tables which I have to join similiar to Appointment table along with a GROUP BY clause for each month.
EDIT1: Fixed it by multiple CTE. Thanks for your valuable pointers it was indeed helpful.
Use a simple CTE expression if you are sure that your sum is calculated correctly by the first query
WITH cte AS
(
SELECT c.CName, c.CID, sum(p.TxAmount) AS sumAmount
FROM Customer c LEFT JOIN Payment p ON c.CId = p.CId
WHERE p.TxStatus = 1
GROUP BY c.CName, c.CID
)
SELECT cte.CName, cte.sumAmount, count(distinct a.ADate) as Total_Visit
FROM cte LEFT JOIN Appointment a ON c.CId = a.CId
GROUP BY c.CName, cte.sumAmount
Try to use a sub query:
SELECT
c.CName as Name
, sum(p.TxAmount) as Payment
, Total_Visit = (SELECT count(distinct a.ADate) FROM Appointment a ON c.CId = a.CId)
FROM Customer c
LEFT JOIN Payment p ON c.CId = p.CId
WHERE p.TxStatus = 1
GROUP BY c.CName;
You can calculate total_visit per CId and then join the subquery
SELECT
c.CName as Name
, sum(p.TxAmount) as Payment
, Total_Visit
FROM Customer c
LEFT JOIN Payment p ON c.CId = p.CId
LEFT JOIN (SELECT a.CId, count(distinct a.ADate) Total_visit FROM Appointment a group by a.CId) as a on c.CId = a.CId
WHERE p.TxStatus = 1
GROUP BY c.CName;

Finding all users which appears in ALL departments in SQL server?

I have Table Users
I also have Table Departments
And I have a map table between Users and Departments.
I want to find all users name and Id which appears in all departments.
for example if there's 5 departments and Paul is only in department #1 , so Paul will not be in the output list.
He would , only if he is listed in all departments (1..5)
I started doing something which is very long (readly long) using temp table and I assume there is a better way.
I also create a Sql Fiddle.
There's more than one way of doing this.
You could require that the number of departments that the user is in equals the total number of departments:
SELECT
*
FROM
Users
INNER JOIN
(
SELECT userId, COUNT(*) c FROM MapUserstoDepartments
GROUP BY userId
HAVING COUNT(*) = (SELECT COUNT(*) FROM Departments)
) UsersInAllDepartments
ON Users.userId = UsersInAllDepartments.userId
You could require that removing the user's departments from the list of all departments leaves nothing:
SELECT *
FROM Users
WHERE NOT EXISTS
(
SELECT depId FROM Departments
EXCEPT
SELECT depId FROM MapUserstoDepartments WHERE userId = Users.userId
)
I'm sure there are others.
Try this
SELECT u.userId, u.UserName
FROM MapUserstoDepartments m INNER JOIN
Users u ON u.userId = m.userId
GROUP BY u.userId, u.UserName
HAVING COUNT(m.depId) = (SELECT COUNT(*) FROM Departments)
That will produce
| USERID | USERNAME |
---------------------
| 100 | John |
And sqlfiddle
You can do it like this:
select u.*
from Users u
where not exists
(
select 1
from Departments d
where not exists
(
select 1
from MapUserstoDepartments m
where d.depId = m.depId
and m.userId = u.userId
)
)
SQL Fiddle
Is this what you want?
Select Tbl.userID , Tbl.username from (Select u.userid , u.username ,
count(u.userid) as Count from MapUsersToDepartments m
inner join Users u on m.UserID = u.userID
group by u.userid , u.username)Tbl
where Tbl.Count = (Select count(*) from Departments)
Here is the fiddle
http://sqlfiddle.com/#!3/5a960/53
select
Users.userId,
count(Departments.depId),
count(MapUserstoDepartments.userId)
from
Users
left join MapUserstoDepartments on MapUserstoDepartments.userId = Users.userId
left join Departments on Departments.depId = MapUserstoDepartments.depId
group by
Users.userId
having
(SELECT COUNT(*) from Departments) = count(MapUserstoDepartments.userId)
Try below query. Let me know if this will help
select * from users where userid in
(select userid from MapUserstoDepartments
group by userid
having count(userid) = 5)

need help to group my result

I am using sql server 2008 r2. here's my query.
SELECT TOP 25 A.*, U.Displayname AS UserName,
SU.Displayname AS SmoothieAuthorName,
S.Name AS SmoothieName, S.Id AS SmoothieId
FROM dbo.Activity AS A
LEFT JOIN dbo.[User] AS U ON A.UserId = U.Id
LEFT JOIN dbo.[User] AS SU ON A.SmoothieAuthorId = SU.Id
LEFT JOIN dbo.Smoothie AS S ON A.SmoothieId = S.Id
WHERE A.UserId = 2 --#UserId
AND A.UserId <> A.SmoothieAuthorId
ORDER BY CreatedDate DESC
following is my result. now, i need to group them by SmoothieId and CreatedDate (only group date part, ignore time). First two should only return one back, 3 to 5 should only return one back. not sure how to do it, please help.
Id ActionType UserId SmoothieId SmoothieAuthorId CreatedDate UserName SmoothieAuthorName SmoothieName SmoothieId
1 view 2 128 1 2013-01-15 20:05:03.403 mike test1234 new testing 2d 128
2 view 2 128 1 2013-01-15 20:16:24.733 mike test1234 new testing 2d 128
12 view 2 128 1 2013-01-16 21:45:56.167 mike test1234 new testing 2d 128
13 view 2 128 1 2013-01-16 22:12:51.217 mike test1234 new testing 2d 128
14 view 2 128 1 2013-01-16 22:12:54.407 mike test1234 new testing 2d 128
15 view 2 69 1 2013-01-16 22:19:54.783 mike test1234 sdfsdfwww 69
If you need ALL columns from A including Id, I think you'll have a have a hard time including Id. I think you'll need to explicitly list the columns from A you are after.
I've also assumed you want a count of records you are grouping, hence the Count(TheDate) element.
Other than that, look at getting just the date portion of a datetime and group on that.
Something like;
SELECT ActionType, UserId, SmoothieId, SmoothieAuthorId,
TheDate, Count(TheDate) AS Occurances, UserName, SmoothieAuthorName,
SmoothieName, SmoothieId
FROM (
SELECT TOP 25 A.ActionType, A.UserId, A.SmoothieId,
A.SmoothieAuthorId,
DATEADD(dd, 0, DATEDIFF(dd, 0, CreatedDate)) AS TheDate,
U.Displayname AS UserName,
SU.Displayname AS SmoothieAuthorName,
S.Name AS SmoothieName, S.Id AS SmoothieId
FROM dbo.Activity AS A
LEFT JOIN dbo.[User] AS U ON A.UserId = U.Id
LEFT JOIN dbo.[User] AS SU ON A.SmoothieAuthorId = SU.Id
LEFT JOIN dbo.Smoothie AS S ON A.SmoothieId = S.Id
WHERE A.UserId = 2 --#UserId
AND A.UserId <> A.SmoothieAuthorId
-- ORDER BY CreatedDate DESC
) x GROUP BY ActionType, UserId, SmoothieId, SmoothieAuthorId, UserName,
TheDate, SmoothieAuthorName, SmoothieName, SmoothieId
ORDER BY The Date DESC
Note This isn't tested, it is just a quick suggestion at what I'd try.
I'm not sure I fully understand the question. Assuming you want distinct values for those columns, then just use DISTINCT:
SELECT DISTINCT
CAST(CreatedDate to Date) as DateWanted,
SmoothieId
FROM
(
SELECT TOP 25 A.*, U.Displayname AS UserName,
SU.Displayname AS SmoothieAuthorName,
S.Name AS SmoothieName, S.Id AS SmoothieId
FROM dbo.Activity AS A
LEFT JOIN dbo.[User] AS U ON A.UserId = U.Id
LEFT JOIN dbo.[User] AS SU ON A.SmoothieAuthorId = SU.Id
LEFT JOIN dbo.Smoothie AS S ON A.SmoothieId = S.Id
WHERE A.UserId = 2 --#UserId
AND A.UserId <> A.SmoothieAuthorId
) t
Reviewing your comment, you say you need all fields in the Activity table. What do you expect to receive in your Id column for the first 2 records, 1 or 2? Are all the other values in the other columns the same? To just get a single record, you're going to need to either pick one row over the other, or do a group by with the columns that have the same information.
Good luck.

Resources