retrieve Top 2 & bottom 2 employees salary in each department [duplicate] - sql-server

This question already has answers here:
Select top and bottom rows
(9 answers)
Closed 7 months ago.
Employees salary table with department number
salary
deptno
15000
sales
16422
sales
18654
tech
25789
sales
12548
tech
13598
tech

WITH CTE(SALARY,DEPNO)AS
(
SELECT 15000,'SALES'UNION ALL
SELECT 16422,'SALES'UNION ALL
SELECT 18654,'TECH'UNION ALL
SELECT 25789,'SALES'UNION ALL
SELECT 12548,'TECH'UNION ALL
SELECT 13598,'TECH'UNION ALL
SELECT 1,'KUMAR'UNION ALL
SELECT 2,'KUMAR'UNION ALL
SELECT 70,'KUMAR'UNION ALL
SELECT 500,'KUMAR'UNION ALL
SELECT 1000,'KUMAR'
)
SELECT X.SALARY,X.DEPNO,
CASE
WHEN X.TOP_COL IN(1,2)THEN 'TWO MAX_SAL'
WHEN X.BOTTOM_COL IN(1,2)THEN 'TWO MIN SAL'
END AS FLAG
FROM
(
SELECT C.SALARY,C.DEPNO,
RANK()OVER(PARTITION BY C.DEPNO ORDER BY C.SALARY DESC)TOP_COL,
RANK()OVER(PARTITION BY C.DEPNO ORDER BY C.SALARY ASC)BOTTOM_COL
FROM CTE AS C
)X WHERE X.TOP_COL IN(1,2)OR X.BOTTOM_COL IN(1,2)
Your sample data is represented by CTE. I have enhanced them with additional department "KUMAR" to show two min and two max salaries

Related

Filtering on window functions in SQL Server [duplicate]

This question already has answers here:
Why no windowed functions in where clauses?
(8 answers)
SQL Condition on Window function
(5 answers)
Closed 12 months ago.
I generated a CTE called mycte from 5 select statements using union to combine them. The output looks like this for a particular job:
ID
JOB_ID
STATUS
BASE_ID
PERCENTAGE
20DA
GBR01
0
12
20
21DA
GBR01
0
12
30
21DA
GBR01
0
14
50
For every unique JOB_ID the sum of the percentage must be 100%.
To test my CTE, I used:
SELECT JOB_ID, SUM(PERCENTAGE) AS myTOTAL
FROM myCTE
GROUP BY JOB_ID
HAVING SUM(PERCENTAGE) <> 100
ORDER BY SUM(PERCENTAGE)
The output showed that not all sum up to 100 because of dirty data in the database. I then attempted to extract 2 different tables, one for PERCENTAGE = 100% and the other for <> 100%.
Since the columns I needed to extract for the new table are ID, JOB_ID, STATUS, BASE_ID and PERCENTAGE, I then applied
SELECT
ID, JOB_ID, STATUS, BASE_ID, PERCENTAGE,
SUM(percentage) OVER (PARTITION BY JOB_ID, BASE_ID, ID) AS PERCENTAGE_SUM
FROM
mycte
Unfortunately where clause will not work on window function.
Question: how do I extract only ID, JOB_ID, STATUS, BASE_ID, PERCENTAGE from mycte where sum of the percentage = 100?
Looking at the sample data it looks like you need to partition by JOB_ID only:
WITH mycte AS (
...
), cte2 as (
SELECT
ID, JOB_ID, STATUS, BASE_ID, PERCENTAGE,
SUM(percentage) OVER (PARTITION BY JOB_ID) AS PERCENTAGE_SUM
FROM mycte
)
SELECT *
FROM cte2
WHERE PERCENTAGE_SUM = 100

select top 10 records for each group in sql [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 8 years ago.
i have table below field
Hour,PathId,Duration,Event,CellId,Channel
Here each cellid have four pathId(i.e, 0,1,2,3),Each pathId have many Events,Channel and Durations.
Now i want to display top 10 records(each pathId) for each cellid.
(group by cellid, pathid and channel we got duration.. we take top ten each pathid based on duration)
i have 50+ cellid and each cellid have four pathid(i.e, 0,1,2,3)
pls help me
!
SampleTable
!
outputtable
i want to display top 10 records(pathId) for each cellid.
You can use the ROW_NUMBER() function to do that, something like:
WITH Ranked
AS
(
SELECT
Hour,PathId,Duration,Event,CellId,Channel,
ROW_NUMBER() OVER(PARTITION BY cellid ORDER BY pathId) AS RN
FROM tablename
)
SELECT Hour,PathId,Duration,Event,CellId,Channel
FROM Ranked
WHERE RN <= 10
The function ROW_NUMBER() OVER(PARTITION BY cellid ORDER BY pathId) will generate a ranking number, by ordering the pathId for each group of cellid and then get the top 10. (Note that this will order by pathId ascending).

How can I output information from a different table? [duplicate]

This question already has answers here:
How to get max number in a column?
(3 answers)
Closed 8 years ago.
I have a query that i get the MAX number of "stars"
<cfquery datasource="Intranet" name="getMaxstars">
SELECT TOP (1) WITH TIES employee, SUM(execoffice_status) AS 'total_max'
FROM CSEReduxResponses
GROUP BY employee
ORDER BY 'total_max' DESC
</cfquery >
I also have a different table EMPLOYEE. Table EMPLOYEE also comes from a different datasource="phonelist". Where in this table I have the employees first_name and last_name columns , they share the same column emp_id.
How can I output the employee first_name and last_name using the other table.
What I eventually I want to do it output:
max:
john doe - stars = 4
Use a subquery like below:
select employee_id, sum(stars) as num_stars
from table_a
group by employee_id
having sum(stars) = (select max(num_stars)
from (select employee_id, sum(stars) as num_stars
from table_a
group by employee_id) x)
SELECT TOP (1) WITH TIES employee_id, SUM(stars) AS 'total'
FROM Table_A
GROUP BY employee_id
ORDER BY 'total' DESC
This is an alternative method.

Select Top row from each group [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 8 years ago.
I have a table called DynamicText with the following fields: DynamicID, Content, Timestamp, and DynamicTextEnum.
I'm trying to do design a query that would select the most recent record based on Timestamp from each of the groups that are grouped by DynamicTextEnum.
For example:
Enum Timestamp
-----------------
1 1/10/2012
1 2/10/2012
2 1/10/2012
3 3/10/2012
2 3/10/2012
3 4/10/2012
So the results would look like this:
Enum Timestamp
-----------------
1 2/10/2012
2 3/10/2012
3 4/10/2012
My current simply SELECTS TOP 1 them based on Enum and orders them in DESC order based on
Timestamp but it doesn't work when I need all Enums. Any ideas?
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DynamicTextEnum ORDER BY Timestamp DESC) AS rn
FROM DynamicText
)
SELECT *
FROM cte
WHERE rn = 1
Take a look at this question and answer:
Efficiently select top row for each category in the set
You can do a sub-query to get the enum/timestamp pair, and join that with your full row. That assumes you don't have duplicate timestamps for any given enum.

SQL Select Statement For Calculating A Running Average Column

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.
Let me explain
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.
I need some help trying to construct a SQL Select statement that will do this.
This should do it:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
Edit: I missed the point that it should average the three previous records...
For a general running average, I think something like this would work:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
A simple self join would seem to perform much better than a row referencing subquery
Generate 10k rows of test data:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
Check out some solutions here. I'm sure that you could adapt one of them easily enough.
If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.
Alternatively you can denormalize and store precalculated running values. Described here:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Performance of selects is as fast as it goes. Of course, modifications are slower.

Resources