SQL Server : select distinct by one column and by another column value - sql-server

This is a SQL Server table's data
id user_id start_date status_id payment_id
======================================================
2 4 20-nov-11 1 5
3 5 23-nov-11 1 245
4 5 25-nov-11 1 128
5 6 20-nov-11 1 223
6 6 25-nov-11 2 542
7 4 29-nov-11 2 123
8 4 05-jan-12 2 875
I need to get distinct values by user_id also order by id asc, but only one user_id with highest start_date
I need the following output:
id user_id start_date status_id payment_id
======================================================
8 4 05-jan-12 2 875
4 5 25-nov-11 1 128
6 6 25-nov-11 2 542
Please help!
What is SQL query for this?

You can use row_number() in either a sub-query or using CTE.
Subquery Version:
select id, user_id, start_date, status_id, payment_id
from
(
select id, user_id, start_date, status_id, payment_id,
row_number() over(partition by user_id order by start_date desc) rn
from yourtable
) src
where rn = 1
See SQL Fiddle with Demo
CTE Version:
;with cte as
(
select id, user_id, start_date, status_id, payment_id,
row_number() over(partition by user_id order by start_date desc) rn
from yourtable
)
select id, user_id, start_date, status_id, payment_id
from cte
where rn = 1
See SQL Fiddle with Demo
Or you can join the table to itself:
select t1.id,
t1.user_id,
t1.start_date,
t1.status_id,
t1.payment_id
from yourtable t1
inner join
(
select user_id, max(start_date) start_date
from yourtable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.start_date = t2.start_date
See SQL Fiddle with Demo
All of the queries will produce the same result:
| ID | USER_ID | START_DATE | STATUS_ID | PAYMENT_ID |
---------------------------------------------------------------------------
| 8 | 4 | January, 05 2012 00:00:00+0000 | 2 | 875 |
| 4 | 5 | November, 25 2011 00:00:00+0000 | 1 | 128 |
| 6 | 6 | November, 25 2011 00:00:00+0000 | 2 | 542 |

Not the best and untested:
select *
from ServersTable
join (
select User_Id, max(Id) as ID
from ServersTable x
where x.start_date = (
select max(start_date)
from ServersTable y
where y.UserID = x.UserId
)
group by User_Id) s on ServersTable.Id = s.Id

Related

Count number of sale by order amount

I'm using SQL Server 2008 R2 and doing a analysis on a table that contains CustomerID, OrderAmount, RegionID. I need to count number of orders in different categories according to the OrderAmount in each region. And if there is no sales in the category, returns 0.
Sample of data:
CustomerID | OrderAmount | RegionID
10001 | 50 | 801
10002 | 25 | 801
10003 | 200 | 802
10001 | 100 | 802
10002 | 20 | 802
...
And my expected result is:
RegionID | CategoryID | Num_of_Sales
801 | 1 | 2 -----Below 100
801 | 2 | 0 -----100-200
802 | 1 | 2 -----Below 100
802 | 2 | 1 -----100-200
...
My question is:
1. How to return 0 on the category that is empty?
2. Is there a better way to write the code?(Not using UNION)
WITH Category1 AS(
SELECT * FROM Sales_Table
WHERE NewAmount <= 100
)
, Category2 AS(
SELECT * FROM Sales_Table
WHERE NewAmount BETWEEN 101 AND 200
)
, [...]
SELECT Region_ID, CategoryID, Num_of_Sales
FROM (
SELECT Region_ID, COUNT(*) AS [Num_of_Sales], 1 AS CategoryID
FROM Category1
GROUP BY Region_ID
UNION
SELECT Region_ID, COUNT(*) AS [Num_of_Sales], 2 AS CategoryID
FROM Category2
GROUP BY Region_ID
UNION
[...]
)z
ORDER BY Region_ID, CategoryID
So, I use these code and get my result, but the count did not return 0 on the 100-200 Category at Region 801.
A table holding RegionID and CategoryID is needed for what you are trying to achieve. Then we can use that table to do a join as shown below.
With RegCatSales as
(
select RegionID,C,COUNT(*) AS [Num_of_Sales]
from
(
select RegionID,OrderAmount,
CASE
WHEN OrderAmount <= 100 THEN 1
WHEN OrderAmount BETWEEN 101 AND 200 THEN 2
END as C
from Sales_Table x
) xx
group by RegionID, C
),
Regions as
(
select distinct RegionID from RegCatSales
),
Categories as
(
select distinct C from RegCatSales
),
RegCat AS(
select distinct RegionID, C as CategoryID from Regions,Categories
)
select rc.RegionID,rc.CategoryID, ISNULL([Num_of_Sales],0) NUM_Of_Sales from
RegCatSales rcs
right join RegCat rc
on rc.RegionID= rcs.RegionID and rc.CategoryID = rcs.C
order by rc.RegionID, rc.CategoryID

Multiple COUNT(DISTINCT) from CTE

This is for SQL Server 2012: a subset of the data in my CTE looks like this:
Employee | OrderID | OrderType
---------+---------+----------
Kala | 321111 | 953
Paul | 321222 | 1026
Don | 321333 | 1026
Don | 321333 | 953
Kala | 321444 | 953
I'd like the following result:
Employee | 953_Order_Count | 1026_Order_Count
---------+-----------------+-----------------
Kala | 2 | 0
Don | 1 | 1
Paul | 0 | 1
To validate that I want is possible in my mind, when I run:
SELECT
Employee,
OrderType,
COUNT(DISTINCT OrderID) AS 'Count'
FROM
CTE
GROUP BY
employee, ordertype
The following result is returned:
Employee | OrderType | Count
---------+-----------+------
Kala | 953 | 1
Paul | 1026 | 1
Don | 1026 | 1
Don | 953 | 1
Close, but not close enough. So I run:
SELECT
Employee,
COUNT(DISTINCT OrderID) AS 'Total_Orders',
COUNT(DISTINCT (CASE WHEN OrderType = 1026 THEN OrderID END)) AS '1026_Order_Count',
COUNT(DISTINCT(CASE WHEN OrderType = 953 THEN OrderID END)) AS '953_Order_Count'
FROM
CTE
GROUP BY
Employee
The result is an accurate first "count," but the rest return 0. If this were not a CTE, I'd use a recursive statement.
Any help is appreciated!
Just use conditional aggregation:
SELECT
Employee,
COUNT(CASE WHEN OrderType = 953 THEN 1 END) AS [953_Order_Count],
COUNT(CASE WHEN OrderType = 1026 THEN 1 END) AS [1026_Order_Count]
FROM CTE
GROUP BY
Employee;
Demo
The 953 count, for example, works above by counting 1 when the order type is 953 and NULL (the implicit ELSE value) when the order type is not 953. COUNT ignores NULL by default, so it only counts the 953 orders.
Tim's answer looks fine. You could also use a PIVOT:
; with cte (Employee, OrderID, OrderType)
as
(
select 'Kala', 321111, 953
union select 'Paul', 321222, 1026
union select 'Don', 321333, 1026
union select 'Don', 321333, 953
union select 'Kala', 321444, 953
)
select Employee, [953] as [953_Order_Count],[1026] as [1026_Order_Count]
from
(
select Employee, OrderType from cte ) as sourceData
pivot
(
count(OrderType)
for OrderType
in ([953],[1026])
) as myPivot
If you want to have dynamic columns based on the set of available values in the OrderType column, you can build the query dynamically. See #Taryn's answer to Understanding PIVOT function in T-SQL for an example.

How can I sum durations grouped by overlapping times in SQL Server

I am trying to create a stored proc in SQL Server 2008.
I have a "Timings" Table (which could have thousands of records):
StaffID | MachineID | StartTime | FinishTime
1 | 1 | 01/01/2018 12:00 | 01/01/18 14:30
2 | 1 | 01/01/2018 12:00 | 01/01/18 13:00
3 | 2 | 01/01/2018 12:00 | 01/01/18 13:00
3 | 2 | 01/01/2018 13:00 | 01/01/18 14:00
4 | 3 | 01/01/2018 12:00 | 01/01/18 12:30
5 | 3 | 01/01/2018 11:00 | 01/01/18 13:30
This shows how long each staff member was working on each machine.
I would like to produce a results table as below:
MachineID | StaffQty | TotalMins
1 | 1 | 90
1 | 2 | 60
2 | 1 | 120
3 | 1 | 120
3 | 2 | 30
This would show how many minutes each machine had only one person using it, how many minutes each machine had 2 people using it etc.
Normally, I would post what I have tried so far, but all my attempts seem to be so far away, I don't think there is much point.
Obviously, I would be very grateful of a complete solution but I would also appreciate even just a little nudge in the right direction.
I think this answers your question:
declare #t table (StaffID int, MachineID int, StartTime datetime2,FinishTime datetime2)
insert into #t(StaffID,MachineID,StartTime,FinishTime) values
(1,1,'2018-01-01T12:00:00','2018-01-01T14:30:00'),
(2,1,'2018-01-01T12:00:00','2018-01-01T13:00:00'),
(3,2,'2018-01-01T12:00:00','2018-01-01T12:30:00')
;With Times as (
select MachineID,StartTime as Time from #t
union
select MachineID,FinishTime from #t
), Ordered as (
select
*,
ROW_NUMBER() OVER (PARTITION BY MachineID ORDER BY Time) rn
from Times
), Periods as (
select
o1.MachineID,o1.Time as StartTime,o2.Time as FinishTime
from
Ordered o1
inner join
Ordered o2
on
o1.MachineID = o2.MachineID and
o1.rn = o2.rn - 1
)
select
p.MachineID,
p.StartTime,
MAX(p.FinishTime) as FinishTime,
COUNT(*) as Cnt,
DATEDIFF(minute,p.StartTime,MAX(p.FinishTime)) as TotalMinutes
from
#t t
inner join
Periods p
on
p.MachineID = t.MachineID and
p.StartTime < t.FinishTime and
t.StartTime < p.FinishTime
group by p.MachineID,p.StartTime
Results:
MachineID StartTime FinishTime Cnt TotalMinutes
----------- --------------------------- --------------------------- ----------- ------------
1 2018-01-01 12:00:00.0000000 2018-01-01 13:00:00.0000000 2 60
1 2018-01-01 13:00:00.0000000 2018-01-01 14:30:00.0000000 1 90
2 2018-01-01 12:00:00.0000000 2018-01-01 12:30:00.0000000 1 30
Hopefully you can see what each of the CTEs is doing. The only place where this may not give you exactly the results you're seeking is if one person's FinishTime is precisely equal to another person's StartTime on the same machine. Should be rare in real data hopefully.
For Sql server 2012+,
Please mention your Sql server version.
Try my script with other sample data.
Please post other sample data if it is not working.
I think my script can be fix for other Test scenario.
create table #temp(StaffID int,MachineID int,StartTime datetime,FinishTime datetime)
insert into #temp VALUES
(1, 1,'01/01/2018 12:00','01/01/18 14:30')
,(2, 1,'01/01/2018 12:00','01/01/18 13:00')
,(3, 2,'01/01/2018 12:00','01/01/18 12:30')
;
WITH CTE
AS (
SELECT t.*
,t1.StaffQty
,datediff(MINUTE, t.StartTime, t.FinishTime) TotalMinutes
FROM #temp t
CROSS APPLY (
SELECT count(*) StaffQty
FROM #temp t1
WHERE t.machineid = t1.machineid
AND (
t.StartTime >= t1.StartTime
AND t.FinishTime <= t1.FinishTime
)
) t1
)
SELECT MachineID
,StaffQty
,TotalMinutes - isnull(LAG(TotalMinutes, 1) OVER (
PARTITION BY t.MachineID ORDER BY t.StartTime
,t.FinishTime
), 0)
FROM cte t
 
drop table #temp
for Sql server 2008,
;
WITH CTE
AS (
SELECT t.*
,t1.StaffQty
,datediff(MINUTE, t.StartTime, t.FinishTime) TotalMinutes
,ROW_NUMBER() OVER (
PARTITION BY t.machineid ORDER BY t.StartTime
,t.FinishTime
) rn
FROM #temp t
CROSS APPLY (
SELECT count(*) StaffQty
FROM #temp t1
WHERE t.machineid = t1.machineid
AND (
t.StartTime >= t1.StartTime
AND t.FinishTime <= t1.FinishTime
)
) t1
)
SELECT t.MachineID
,t.StaffQty
,t.TotalMinutes - isnull(t1.TotalMinutes, 0) TotalMinutes
FROM cte t
OUTER APPLY (
SELECT TOP 1 TotalMinutes
FROM cte t1
WHERE t.MachineID = t1.machineid
AND t1.rn < t.rn
ORDER BY t1.rn DESC
) t1

Select rows based on count of child table

I have three entities: department, employee, and report. A department has many employees, each of whom has many reports. I want to select the one employee in each department who has the most reports. I have no idea how to even start this query. This question seems very similar, but I can't figure out how to manipulate those answers for what I want.
I have full access to the entire system, so I can make any changes necessary. In the event of a tie, it's safe to arbitrarily pick one of the results.
Department:
ID | Name
----|------
1 | DeptA
2 | DeptB
3 | DeptC
4 | DeptD
Employee:
ID | Name | DeptID
----|------|--------
1 | Joe | 1
2 | John | 1
3 | Emma | 2
4 | Jack | 3
5 | Sven | 3
6 | Axel | 4
7 | Brad | 4
8 | Jane | 4
Report:
ID | EmployeeID
----|------------
1 | 1
2 | 2
3 | 3
4 | 5
5 | 6
6 | 6
7 | 8
Desired result (assuming I queried names only):
Joe OR John (either is acceptable)
Emma
Sven
Axel
How to start this query? Well, get the information about each employee, the department, and the number of reports:
select e.name, e.deptid, count(*) as numreports
from employee e join
reports r
on e.id = r.employeeid
group by e.name, e.deptid;
Now you just want the largest count in each department. I would suggest row_number() or rank() depending on how you want to handle ties:
select er.*
from (select e.name, e.deptid, count(*) as numreports,
row_number() over (partition by e.deptid order by count(*) desc) as seqnum
from employee e join
reports r
on e.id = r.employeeid
group by e.name, e.deptid
) er
where seqnum = 1;
If you want the department name instead of number, you can join that in as well.
From your Question schema will be
SELECT * into #Department FROM(
select 1 ID,'DEPTA' NAME
UNION ALL
select 2,'DEPTB'
UNION ALL
select 3,'DEPTC'
UNION ALL
select 4,'DEPTD')TAB
SELECT * INTO #Employee FROM (
SELECT 1 ID ,'Joe' Name , 1 DeptID
UNION ALL
SELECT 2 , 'John' , 1
UNION ALL
SELECT 3 , 'Emma' ,2
UNION ALL
SELECT 4 ,'Jack' , 3
UNION ALL
SELECT 5 ,'Sven' , 3
UNION ALL
SELECT 6 , 'Axel' , 4
UNION ALL
SELECT 7 ,'Brad' , 4
UNION ALL
SELECT 8 ,'Jane' , 4)AS A
SELECT * INTO #Report FROM(
SELECT 1 ID ,1 EmployeeID
UNION ALL
SELECT 2, 2
UNION ALL
SELECT 3 ,3
UNION ALL
SELECT 4, 5
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 6
UNION ALL
SELECT 7, 8
UNION ALL
SELECT 8, 8
UNION ALL
SELECT 9, 8
)AS A
And you need to apply DENSE_RANK() for giving rank based on no of reports(count)
;WITH CTE AS(
select DEP.ID DEP_ID, DEP.NAME DEP,EMP.ID EMP_ID, EMP.Name EMP
,DENSE_RANK() OVER(PARTITION BY DEP.ID ORDER BY COUNT(REP.ID) DESC) REP_RANK
,COUNT(REP.ID) NO_OF_REP FROM #Department DEP
inner join #Employee emp on emp.deptid=dep.id
inner join #report rep on rep.EmployeeID=emp.id
GROUP BY DEP.ID, DEP.NAME ,EMP.ID, EMP.Name
)
SELECT DEP, EMP, NO_OF_REP FROM CTE WHERE REP_RANK=1
Here in the DEPTA Joe & John both will be picked because both are having 1 report count which is a max count in DEPTA.
And the result will be
+-------+------+-----------+
| DEP | EMP | NO_OF_REP |
+-------+------+-----------+
| DEPTA | Joe | 1 |
| DEPTA | John | 1 |
| DEPTB | Emma | 1 |
| DEPTC | Sven | 1 |
| DEPTD | Jane | 3 |
+-------+------+-----------+
Please try the below code:-
SELECT D.NAME
FROM (
SELECT C.NAME, RANK() OVER (
PARTITION BY C.DEPTID ORDER BY C.COUNTS DESC
) RNK
FROM (
SELECT EMPID, NAME, COUNT(EMPID) AS COUNTS, DEPTID
FROM DBO.REPORT AS A
JOIN DBO.EMPLO AS B ON A.EMPID = B.ID
GROUP BY EMPID, NAME, DEPTID
) AS C
) AS D
WHERE D.RNK = 1

LAG of MIN in SQL Analytic

I have a table containing employees id, year id, client id, and the number of sales. For example:
--------------------------------------
id_emp | id_year | sales | client id
--------------------------------------
4 | 1 | 14 | 1
4 | 1 | 10 | 2
4 | 2 | 11 | 1
4 | 2 | 17 | 2
For a employee, I want to obtain rows with the minimum sales per year and the minimum sales of the previous year.
One of the queries I tried is the following:
select distinct
id_emp,
id_year,
MIN(sales) OVER(partition by id_emp, id_year) AS min_sales,
LAG(min(sales), 1) OVER(PARTITION BY id_emp, id_year
ORDER BY id_emp, id_year) AS previous
from facts
where id_emp = 4
group by id_emp, id_year, sales;
I get the result:
-------------------------------------
id_emp | id_year | sales | previous
-------------------------------------
4 | 1 | 10 | (null)
4 | 1 | 10 | 10
4 | 2 | 11 | (null)
but I expect to get:
-------------------------------------
id_emp | id_year | sales | previous
-------------------------------------
4 | 1 | 10 | (null)
4 | 2 | 11 | 10
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE EMPLOYEE_SALES ( id_emp, id_year, sales, client_id ) AS
SELECT 4, 1, 14, 1 FROM DUAL
UNION ALL SELECT 4, 1, 10, 2 FROM DUAL
UNION ALL SELECT 4, 2, 11, 1 FROM DUAL
UNION ALL SELECT 4, 2, 17, 2 FROM DUAL;
Query 1:
SELECT ID_EMP,
ID_YEAR,
SALES AS SALES,
LAG( SALES ) OVER ( PARTITION BY ID_EMP ORDER BY ID_YEAR ) AS PREVIOUS
FROM (
SELECT e.*,
ROW_NUMBER() OVER ( PARTITION BY id_emp, id_year ORDER BY sales ) AS RN
FROM EMPLOYEE_SALES e
)
WHERE rn = 1
Query 2:
SELECT ID_EMP,
ID_YEAR,
MIN( SALES ) AS SALES,
LAG( MIN( SALES ) ) OVER ( PARTITION BY ID_EMP ORDER BY ID_YEAR ) AS PREVIOUS
FROM EMPLOYEE_SALES
GROUP BY ID_EMP, ID_YEAR
Results - Both give the same output:
| ID_EMP | ID_YEAR | SALES | PREVIOUS |
|--------|---------|-------|----------|
| 4 | 1 | 10 | (null) |
| 4 | 2 | 11 | 10 |
You mean like this?
select id_emp, id_year, min(sales) as min_sales,
lag(min(sales)) over (partition by id_emp order by id_year) as prev_year_min_sales
from facts
where id_emp = 4
group by id_emp, id_year;
I believe it is because you are using sales column in your group by statement.
Try to remove it and just use
GROUP BY id_emp,id_year
You could get your desired output using ROW_NUMBER() and LAG() analytic functions.
For example,
Table
SQL> SELECT * FROM t;
ID_EMP ID_YEAR SALES CLIENT_ID
---------- ---------- ---------- ----------
4 1 14 1
4 1 10 2
4 2 11 1
4 2 17 2
Query
SQL> WITH DATA AS
2 (SELECT t.*,
3 row_number() OVER(PARTITION BY id_emp, id_year ORDER BY sales) rn
4 FROM t
5 )
6 SELECT id_emp,
7 id_year ,
8 sales ,
9 lag(sales) over(order by sales) previous
10 FROM DATA
11 WHERE rn =1;
ID_EMP ID_YEAR SALES PREVIOUS
---------- ---------- ---------- ----------
4 1 10
4 2 11 10

Resources