Getting Top 10 based on column value - sql-server

I have a code that output a long list of the sum of count of work orders per name and sorts it by total, name and count:
;with cte as (
SELECT [Name],
[Emergency],
count([Emergency]) as [CountItem]
FROM tableA
GROUP BY [Name], [Emergency])
select Name,[Emergency],[Count],SUM([CountItem]) OVER(PARTITION BY Name) as Total from cte
order by Total desc, Name, [CountItem] desc
but I only want to get the top 10 Names with the highest total like the one below:
+-------+-------------------------------+-------+-------+
| Name | Emergency | Count | Total |
+-------+-------------------------------+-------+-------+
| PLB | No | 7 | 15 |
| PLB | No Hot Water | 4 | 15 |
| PLB | Resident Locked Out | 2 | 15 |
| PLB | Overflowing Tub | 1 | 15 |
| PLB | No Heat | 1 | 15 |
| GG | Broken Lock - Exterior | 6 | 6 |
| BOA | Broken Lock - Exterior | 2 | 4 |
| BOA | Garage Door not working | 1 | 4 |
| BOA | Resident Locked Out | 1 | 4 |
| 15777 | Smoke Alarm not working | 3 | 3 |
| FP | No air conditioning | 2 | 3 |
| FP | Flood | 1 | 3 |
| KB | No electrical power | 2 | 3 |
| KB | No | 1 | 3 |
| MEM | Noise Complaint | 3 | 3 |
| ANG | Parking Issue | 2 | 2 |
| ALL | Smoke Alarm not working | 2 | 2 |
| AAS | No air conditioning | 1 | 2 |
| AAS | Toilet - Clogged (1 Bathroom) | 1 | 2 |
+-------+-------------------------------+-------+-------+
Note: I'm not after unique values. As you can see from the example above it gets the top 10 names from a very long table.
What I want to happen is assign a row id for each name so all PLB above will have a row id of 1, GG = 2, BOA = 3, ...
So on my final select I will only add the where clause where row id <= 10. I already tried ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) but it's assigning 1 to every unique Name it encounters.

You may try this:
;with cte as (
SELECT [Name],
[Emergency],
count([Emergency]) as [CountItem]
FROM tableA
GROUP BY [Name], [Emergency]),
ct as (
select Name,[Emergency],[Count],SUM([CountItem]) OVER(PARTITION BY PropertyName) as Total from cte
),
ctname as (
select dense_rank() over ( order by total, name ) as RankName, Name,[Emergency],[Count], total from ct )
select * from ctname where rankname < 11

Related

SQL-Server Closure table query

I need a hierarchy for my database and decided to use the closure table model. The hierarchy tables have the usual structure, like this:
locations table
+----+---------+
| id | name |
+----+---------+
| 1 | Europe |
| 2 | France |
| 3 | Germany |
| 4 | Spain |
| 5 | Paris |
| 6 | Nizza |
| 7 | Berlin |
| 8 | Munich |
| 9 | Madrid |
+----+---------+
CREATE TABLE locations (
id int IDENTITY(1,1) PRIMARY KEY,
name varchar(30)
)
lacations_relation table
+----+--------+--------+-------+
| id | src_id | dst_id | depth |
+----+--------+--------+-------+
| 1 | 1 | 1 | 0 |
| 2 | 2 | 2 | 0 |
| 3 | 1 | 2 | 1 |
| 4 | 3 | 3 | 0 |
| 5 | 1 | 3 | 1 |
| 6 | 4 | 4 | 0 |
| 7 | 1 | 4 | 1 |
| 8 | 5 | 5 | 0 |
| 9 | 2 | 5 | 1 |
| 10 | 1 | 5 | 2 |
| 11 | 6 | 6 | 0 |
| 12 | 2 | 6 | 1 |
| 13 | 1 | 6 | 2 |
| 14 | 7 | 7 | 0 |
| 15 | 3 | 7 | 1 |
| 16 | 1 | 7 | 2 |
| 17 | 8 | 8 | 0 |
| 18 | 3 | 8 | 1 |
| 19 | 1 | 8 | 2 |
| 20 | 9 | 9 | 0 |
| 21 | 4 | 9 | 1 |
| 22 | 1 | 9 | 2 |
+----+--------+--------+-------+
CREATE TABLE locations_relation (
id int IDENTITY(1,1) PRIMARY KEY,
src_id int,
dst_id int,
depth int,
CONSTRAINT FK_src FOREIGN KEY (src_id)
REFERENCES locations (id),
CONSTRAINT FK_dst FOREIGN KEY (dst_id)
REFERENCES locations (id)
)
Now there is a third table, which holds information about documents and is referencing the locations table, which looks like this:
closure_junction
+----+------------+-------------+
| id | country_id | document_id |
+----+------------+-------------+
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 6 | 2 |
| 4 | 6 | 3 |
| 5 | 5 | 2 |
| 6 | 5 | 4 |
+----+------------+-------------+
CREATE TABLE closure_junction (
id int IDENTITY(1,1) PRIMARY KEY,
country_id int NOT NULL,
document_id int,
CONSTRAINT FK_countries FOREIGN KEY (id)
REFERENCES countries(id)
)
What I'd like to have is single SQL-Query which counts the document per location and if there are documents in a child it should be counted up in the parent. For example if paris holds 2 documents than france should automatically also hold 2 documents. The query should also output the path of each node to the root aswell as the depth of the node. I know there is way to do this recursively, but I'd like to avoid that.
I have a query which gives me the correct result, but I'm not satisfied with how it works. Is there a way to circumentvent storing the children in a column?
This is my query with the correct output:
;WITH cte (name, path, depth, children) AS
(
SELECT
node.name,
STRING_AGG(locations.name, ' / ' ) WITHIN GROUP (ORDER BY relation.depth DESC) as path,
MAX(relation.depth) as depth,
STRING_AGG(locations.id, ' ') as children
FROM locations node
INNER JOIN locations_relation relation
ON node.id = relation.dst_id
INNER JOIN locations
ON relation.src_id = locations.id
GROUP BY node.name
)
SELECT
name,
path,
depth,
COUNT(DISTINCT document_id) as count_docs
FROM cte
CROSS APPLY string_split(children, ' ')
LEFT JOIN closure_junction ON
closure_junction.country_id = value
GROUP BY name, path, depth
ORDER BY depth ASC
+---------+---------------------------+-------+------------+
| name | path | depth | count_docs |
+---------+---------------------------+-------+------------+
| Europe | Europe | 0 | 0 |
| France | Europe / France | 1 | 2 |
| Germany | Europe / Germany | 1 | 0 |
| Spain | Europe / Spain | 1 | 0 |
| Berlin | Europe / Germany / Berlin | 2 | 0 |
| Madrid | Europe / Spain / Madrid | 2 | 0 |
| Munich | Europe / Germany / Munich | 2 | 0 |
| Nizza | Europe / France / Nizza | 2 | 3 |
| Paris | Europe / France / Paris | 2 | 3 |
+---------+---------------------------+-------+------------+
Would be great if someone could give me a clue on how to accomplish this.
The count you can easily replace with a simple LEFT JOIN, but for this path you will still need to concatenate it somehow.
Something like this:
WITH CTE_path
AS
( SELECT node.id,
STRING_AGG(locations.name, ' / ' ) WITHIN GROUP (ORDER BY relation.depth DESC) as path
FROM locations node
INNER JOIN locations_relation relation
ON node.id = relation.dst_id
INNER JOIN locations
ON relation.src_id = locations.id
GROUP BY node.id)
SELECT l.name,count(DISTINCT cj.document_id),pa.path
FROM locations l
JOIN CTE_path pa
ON pa.id = l.id
LEFT JOIN locations_relation lr
ON l.id = lr.dst_id
LEFT JOIN closure_junction cj
ON cj.country_id = lr.src_id
GROUP BY l.name,pa.path

Divide selected value by count(*)

I have a Microsoft SQL Server with the following tables:
Projects
BookedHours (with fk_Project = Projects.ID)
Products
ProjectsToProducts (n:m with fk_Projects = Projects.ID and fk_Products = Products.ID)
I now want to select how many hours are booked to which product per month. The problem is, that one project can have multiple products (that's why I need the n:m table).
If I do the following, it will count the hours twice if a project has two products.
SELECT
P.ID AS fk_Product, MONTH(B.Datum) AS Monat, SUM(B.Hours) AS Stunden
FROM
tbl_BookedHours AS B
INNER JOIN
tbl_Projects AS M on B.fk_Project = M.ID
INNER JOIN
tbl_ProjectProduct AS PP ON PP.fk_Project = M.ID
INNER JOIN
tbl_Products AS P ON PP.fk_Product = P.ID
WHERE
YEAR(B.Datum) = 2020
GROUP BY
P.ID, MONTH(B.Datum)
ORDER BY
P.ID, MONTH(B.Datum)
I can get the number of products for each project with this SQL:
SELECT fk_Project, COUNT(*) AS Cnt
FROM tbl_ProjectProduct
GROUP By fk_MainProject
But how can I now divide the hours for each project by its individual factor and add it all up per product and month?
I could do it in my C# program or I could use a cursor and iterate through all projects, but I think there should be an more elegant way.
Edit with sample data:
|----------------| |----------------| |------------------------------|
| tbl_Projects | | tbl_Products | | tbl_ProjectProduct |
|----------------| |----------------| |------------------------------|
| ID | Name | | ID | Name | | ID | fk_Project | fk_Product |
|----+-----------| |----+-----------| |------------------------------|
| 1 | Project 1 | | 1 | Product 1 | | 1 | 1 | 1 |
| 2 | Project 2 | | 2 | Product 2 | | 2 | 1 | 2 |
| 3 | Project 3 | | 3 | Product 3 | | 3 | 2 | 1 |
| 4 | Project 4 | | 4 | Product 4 | | 4 | 3 | 3 |
|----------------| |----------------| | 5 | 4 | 1 |
| 6 | 4 | 2 |
| 7 | 4 | 4 |
|------------------------------|
|--------------------------------------|
| tbl_BookedHours |
|--------------------------------------|
| ID | fk_Project | Hours | Date |
|--------------------------------------|
| 1 | 1 | 10 | 2020-01-15 |
| 2 | 1 | 20 | 2020-01-20 |
| 3 | 2 | 10 | 2020-01-15 |
| 4 | 3 | 30 | 2020-01-18 |
| 5 | 2 | 20 | 2020-01-20 |
| 6 | 4 | 30 | 2020-01-25 |
| 7 | 1 | 10 | 2020-02-15 |
| 8 | 1 | 20 | 2020-02-20 |
| 9 | 2 | 10 | 2020-02-15 |
| 10 | 3 | 30 | 2020-03-18 |
| 11 | 2 | 20 | 2020-03-20 |
| 12 | 4 | 30 | 2020-03-25 |
|--------------------------------------|
The Result should be:
|----------------------------|
| fk_Product | Month | Hours |
|----------------------------|
| 1 | 1 | 55 |
| 2 | 1 | 25 |
| 3 | 1 | 30 |
| 4 | 1 | 10 |
| 1 | 2 | 25 |
| 2 | 2 | 15 |
| 1 | 3 | 30 |
| 2 | 3 | 10 |
| 3 | 3 | 30 |
| 4 | 3 | 10 |
|----------------------------|
For example booking Nr. 1 has to be divided by 2 (because Project 1 has two products) and one half of amount added to Product 1 and the other to Product 2 (Both in January). Booking Nr. 4 should not be divided, because Project 3 only has one product. Booking Numer 12 for example has to be divided by 3.
So that in total the Hours in the end add up to the same total.
I hope it's clearer now.
*** EDIT 2***
DECLARE #tbl_Projects TABLE (ID INT, [Name] VARCHAR(MAX))
INSERT INTO #tbl_Projects VALUES
(1,'Project 1'),
(2,'Project 2'),
(3,'Project 3'),
(4,'Project 4')
DECLARE #tbl_Products TABLE (ID INT, [Name] VARCHAR(MAX))
INSERT INTO #tbl_Products VALUES
(1,'Product 1'),
(2,'Product 2'),
(3,'Product 3'),
(4,'Product 4')
DECLARE #tbl_ProjectProduct TABLE (ID INT, fk_Project int, fk_Product int)
INSERT INTO #tbl_ProjectProduct VALUES
(1,1,1),
(2,1,2),
(3,2,1),
(4,3,3),
(5,4,1),
(6,4,2),
(7,4,4)
DECLARE #tbl_BookedHours TABLE (ID INT, fk_Project int, Hours int, [Date] Date)
INSERT INTO #tbl_BookedHours VALUES
(1,1,10,'2020-01-15'),
(2,1,20,'2020-01-20'),
(3,2,10,'2020-01-15'),
(4,3,30,'2020-01-18'),
(5,2,20,'2020-01-20'),
(6,4,30,'2020-01-25'),
(7,1,10,'2020-02-15'),
(8,1,20,'2020-02-20'),
(9,2,10,'2020-02-15'),
(10,3,30,'2020-03-18'),
(11,2,20,'2020-03-20'),
(12,4,30,'2020-03-25')
SELECT P.ID AS fk_Product, MONTH(B.Date) AS Month, SUM(B.Hours) AS SumHours
FROM #tbl_BookedHours AS B INNER JOIN #tbl_Projects AS M on B.fk_Project = M.ID
INNER JOIN #tbl_ProjectProduct AS PP ON PP.fk_Project = M.ID
INNER JOIN #tbl_Products AS P ON PP.fk_Product = P.ID
GROUP BY P.ID,MONTH(B.Date)
ORDER BY P.ID, MONTH(B.Date)
This gives me the wrong result, because it Counts the hours for both products:
| fk_Product | Month | SumHours |
|-------------------------------|
| 1 | 1 | 90 |
| 1 | 2 | 40 |
| 1 | 3 | 50 |
| 2 | 1 | 60 |
| 2 | 2 | 30 |
| 2 | 3 | 30 |
| 3 | 1 | 30 |
| 3 | 3 | 30 |
| 4 | 1 | 30 |
| 4 | 3 | 30 |
|-------------------------------|
Consider the following query. I modified your table variables to temp tables so it was easier to debug.
;WITH CTE AS
(
SELECT fk_Project, count(fk_Product) CNT
FROM #tbl_ProjectProduct
GROUP BY fk_Project
)
,CTE2 AS
(
SELECT t1.Date, t2.fk_Project, Hours/CNT NewHours
FROM #tbl_BookedHours t1
INNER JOIN CTE t2 on t1.fk_Project = t2.fk_Project
)
SELECT t4.ID fk_Product, MONTH(date) MN, SUM(NewHours) HRS
FROM CTE2 t1
INNER JOIN #tbl_Projects t2 on t1.fk_Project = t2.id
INNER JOIN #tbl_ProjectProduct t3 on t3.fk_Project = t2.ID
INNER JOIN #tbl_Products t4 on t4.ID = t3.fk_Product
GROUP BY t4.ID,MONTH(date)

Creating a conditonal ROW_NUMBER() Partition clause based on previous row value

I have a table that looks like this:
+----------------+--------+
| EvidenceNumber | ID |
+----------------+--------+
| 001 | 8 |
| 001.A | 8 |
| 001.A.01 | 8 |
| 001.A.02 | 8 |
| 001.B | 8 |
| 001.C | 8 |
| 001.D | 8 |
| 001.E | 8 |
| 001.F | 8 |
| 001.G | 8 |
| 001.G.01 | 8 |
+----------------+--------+
If 001 were a bag, inside of it was 001.A, 001.B, and so on through to 001.G
In the output above, 001.A was another bag, and that bag contained 001.A.01 and 001.A.02. The same thing can be seen with 001.G.01.
Every entry in this table is either a bag or an item. I am only interested in counting the amount of items per ID.
Since 001.A.01 and 001.A.02 is the last we see of the "001.A's" we know A.01 and A.02 were items.
Since we see 001.B only once, that was an item as well.
001.G was a bag, but 001.G.01 was an item.
The above output is showing 8 items and 3 bags.
I feel like Row_number and the Partition clause is the perfect tool for the job, but I can't find a way to partition based on a clause that uses a previous row's value.
Maybe something like that isn't even necessary here, but I pictured it like:
{001} -- variable
{001}.A -- variable seen again, obviously 001 was a bag. Create new variable {001.A} and move on.
{001.A}.01 -- same thing.
{001.A.01} -- Unique variable. This is a final step. This is a bag and should be Row number 1.
Obviously, the below code is just making "ItemNum" 1 for each item since there are not duplicates.
SELECT
ROW_NUMBER() OVER(Partition BY EvidenceNumber ORDER BY EvidenceNumber) AS ItemNum,
EvidenceNumber,
ID
FROM EVIDENCE
WHERE ID = '18'
ORDER BY EvidenceNumber
+---------+----------------+--------+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+--------+
| 1 | 001 | 8 |
| 1 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 1 | 001.A.02 | 8 |
| 1 | 001.B | 8 |
| 1 | 001.C | 8 |
| 1 | 001.D | 8 |
| 1 | 001.E | 8 |
| 1 | 001.F | 8 |
| 1 | 001.G | 8 |
| 1 | 001.G.01 | 8 |
+---------+----------------+--------+
Ideally, it would partition on the items only, so in this case:
+---------+----------------+----+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+----+
| 0 | 001 | 8 |
| 0 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 2 | 001.A.02 | 8 |
| 3 | 001.B | 8 |
| 4 | 001.C | 8 |
| 5 | 001.D | 8 |
| 6 | 001.E | 8 |
| 7 | 001.F | 8 |
| 0 | 001.G | 8 |
| 8 | 001.G.01 | 8 |
+---------+----------------+----+
I don't think window functions alone are the best approach. Instead:
select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t ;
Then sum these up using another subquery:
select t.*,
sum(is_item) over (partition by caseid order by EvidenceNumber) as item_counter
from (select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t
) t;
trick with Lead and Row_Number:
DECLARE #Table TABLE (
EvidenceNumber varchar(64),
Id int
)
INSERT INTO #Table VALUES
('001',8),
('001.A',8),
('001.A.01',8),
('001.A.02',8),
('001.B',8),
('001.C',8),
('001.D',8),
('001.E',8),
('001.F',8),
('001.G',8),
('001.G.01',8);
WITH CTE AS (
SELECT
[IsBag] = PATINDEX(EvidenceNumber+'%',
IsNull(LEAD(EvidenceNumber) OVER (ORDER BY EvidenceNumber),0)
),
[EvidenceNumber],
[Id]
FROM
#Table
)
SELECT
[NumItem] = IIF(IsBag = 0,ROW_NUMBER() OVER (PARTITION BY [ISBag] order by [IsBag]),0),
[EvidenceNumber],
[Id]
FROM
CTE
ORDER BY EvidenceNumber

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.
I have a table of data which contains a row per invoice. For example:
| Invoice ID | Customer Key | Date | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
| 1 | A | 08/02/2019 | 100 | 1 |
| 2 | B | 07/02/2019 | 14 | 0 |
| 3 | A | 06/02/2019 | 234 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 |
| 5 | B | 04/02/2019 | 11 | 1 |
| 6 | A | 03/02/2019 | 12 | 0 |
I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:
| Invoice ID | Customer Key | Date | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
| 1 | A | 08/02/2019 | 100 | 1 | 2 |
| 2 | B | 07/02/2019 | 14 | 0 | 1 |
| 3 | A | 06/02/2019 | 234 | 1 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 | 0 |
| 5 | B | 04/02/2019 | 11 | 1 | 0 |
| 6 | A | 03/02/2019 | 12 | 0 | 0 |
I know I can do this using either a CTE like this...
(
select
count(*)
from table
where
[Customer Key] = t.[Customer Key]
and [Date] < t.[Date]
and Something = 1
)
But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.
So; is there a more efficient means of achieving this, or do I just suck it up?
EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])
Assuming you're using SQL Server 2012+ you can use Window Functions:
COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
Old answer before new required logic:
COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
If you're not using 2012 an alternative is to use ROW_NUMBER
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

SQL Server query for next row value where previous row value

This query gives me Event values from 1 to 20 within an hour, how to add to that if a consecutive Event value is >=200 as well?
SELECT ID, count(Event) as numberoftimes
FROM table_name
WHERE Event >=1 and Event <=20
GROUP BY ID, DATEPART(HH, AtHour)
HAVING DATEPART(HH, AtHour) <= 1
ORDER BY ID desc
In this dummy 24h table:
+----+-------+--------+
| ID | Event | AtHour |
+----+-------+--------+
| 1 | 1 | 11:00 |
| 1 | 4 | 11:01 |
| 1 | 1 | 11:02 |
| 1 | 20 | 11:03 |
| 1 | 200 | 11:04 |
| 1 | 1 | 13:00 |
| 1 | 1 | 13:05 |
| 1 | 2 | 13:06 |
| 1 | 500 | 13:07 |
| 1 | 39 | 13:10 |
| 1 | 50 | 13:11 |
| 1 | 2 | 13:12 |
+----+-------+--------+
I would like to select IDs with Event with values with range between 1 and 20 followed immediately by value greater than or equal to 200 within an hour.
Expected result should be something like that:
+----+--------+
| ID | AtHour |
+----+--------+
| 1 | 11 |
| 1 | 13 |
| 2 | 11 |
| 2 | 14 |
| 3 | 09 |
| 3 | 12 |
+----+--------+
or just how many times it has happened for unique ID instead of which hour.
Please excuse me I am still rusty with post formatting!
CREATE TABLE data (Id INT, Event INT, AtHour SMALLDATETIME);
INSERT data (Id, Event, AtHour) VALUES
(1,1,'2017-03-16 11:00:00'),
(1,4,'2017-03-16 11:01:00'),
(1,1,'2017-03-16 11:02:00'),
(1,20,'2017-03-16 11:03:00'),
(1,200,'2017-03-16 11:04:00'),
(1,1,'2017-03-16 13:00:00'),
(1,1,'2017-03-16 13:05:00'),
(1,2,'2017-03-16 13:06:00'),
(1,500,'2017-03-16 13:07:00'),
(1,39,'2017-03-16 13:10:00')
;
; WITH temp as (
SELECT rownum = ROW_NUMBER() OVER (PARTITION BY id ORDER BY AtHour)
, *
FROM data
)
SELECT a.id, DATEPART(HOUR, a.AtHour) as AtHour, COUNT(*) AS NumOfPairs
FROM temp a JOIN temp b ON a.rownum = b.rownum-1
WHERE a.Event BETWEEN 1 and 20 AND b.Event >= 200
AND DATEDIFF(MINUTE, a.AtHour, b.AtHour) <= 60
GROUP BY a.id, DATEPART(HOUR, a.AtHour)
;

Resources