Selecting grouped rows after first two rows SQL Server - sql-server

This is a bit of a tricky question/situation and my search fu failed me.
Lets say i have the following data
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 3 | 1 | foo | 2/8/2016 |
| 4 | 1 | foo | 2/11/2016 |
| 5 | 2 | bar | 1/11/2016 |
| 6 | 2 | bar | 2/11/2016 |
| 7 | 3 | baz | 2/1/2016 |
| 8 | 3 | baz | 2/3/2016 |
| 9 | 3 | baz | 2/11/2016 |
And I would like to ommit a variable number of leading rows (most recent date in this case) and lets say that number is 2 in this example. The resulting table would be something like this:
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 7 | 3 | baz | 2/1/2016 |
Is this possible in SQL? Essentially I want to filter on an unknown number of rows which uses the date column as the order by. The goal is to get the oldest types and get a list of UID's in the process.

Sure, it's possible. Use a ROW_NUMBER function to assign a value to each row, partitioning by the SharedID column so that the count restarts every time that ID changes, and select those rows with a value greater than your limit.
WITH cteNumberedRows AS (
SELECT UID, SharedID, Type, Date,
ROW_NUMBER() OVER(PARTITION BY SharedID ORDER BY Date DESC) AS RowNum
FROM YourTable
)
SELECT UID, SharedID, Type, Date
FROM cteNumberedRows
WHERE RowNum > 2;

Not sure if I understand what you mean but something like this?
SELECT * FROM MyTable t1 JOIN MyTable T2 ON t2.id NOT IN (
SELECT TOP 2 UID FROM myTable
WHERE SharedID = t1.sharedID
ORDER BY [Date] DESC
)

Related

Creating a conditonal ROW_NUMBER() Partition clause based on previous row value

I have a table that looks like this:
+----------------+--------+
| EvidenceNumber | ID |
+----------------+--------+
| 001 | 8 |
| 001.A | 8 |
| 001.A.01 | 8 |
| 001.A.02 | 8 |
| 001.B | 8 |
| 001.C | 8 |
| 001.D | 8 |
| 001.E | 8 |
| 001.F | 8 |
| 001.G | 8 |
| 001.G.01 | 8 |
+----------------+--------+
If 001 were a bag, inside of it was 001.A, 001.B, and so on through to 001.G
In the output above, 001.A was another bag, and that bag contained 001.A.01 and 001.A.02. The same thing can be seen with 001.G.01.
Every entry in this table is either a bag or an item. I am only interested in counting the amount of items per ID.
Since 001.A.01 and 001.A.02 is the last we see of the "001.A's" we know A.01 and A.02 were items.
Since we see 001.B only once, that was an item as well.
001.G was a bag, but 001.G.01 was an item.
The above output is showing 8 items and 3 bags.
I feel like Row_number and the Partition clause is the perfect tool for the job, but I can't find a way to partition based on a clause that uses a previous row's value.
Maybe something like that isn't even necessary here, but I pictured it like:
{001} -- variable
{001}.A -- variable seen again, obviously 001 was a bag. Create new variable {001.A} and move on.
{001.A}.01 -- same thing.
{001.A.01} -- Unique variable. This is a final step. This is a bag and should be Row number 1.
Obviously, the below code is just making "ItemNum" 1 for each item since there are not duplicates.
SELECT
ROW_NUMBER() OVER(Partition BY EvidenceNumber ORDER BY EvidenceNumber) AS ItemNum,
EvidenceNumber,
ID
FROM EVIDENCE
WHERE ID = '18'
ORDER BY EvidenceNumber
+---------+----------------+--------+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+--------+
| 1 | 001 | 8 |
| 1 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 1 | 001.A.02 | 8 |
| 1 | 001.B | 8 |
| 1 | 001.C | 8 |
| 1 | 001.D | 8 |
| 1 | 001.E | 8 |
| 1 | 001.F | 8 |
| 1 | 001.G | 8 |
| 1 | 001.G.01 | 8 |
+---------+----------------+--------+
Ideally, it would partition on the items only, so in this case:
+---------+----------------+----+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+----+
| 0 | 001 | 8 |
| 0 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 2 | 001.A.02 | 8 |
| 3 | 001.B | 8 |
| 4 | 001.C | 8 |
| 5 | 001.D | 8 |
| 6 | 001.E | 8 |
| 7 | 001.F | 8 |
| 0 | 001.G | 8 |
| 8 | 001.G.01 | 8 |
+---------+----------------+----+
I don't think window functions alone are the best approach. Instead:
select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t ;
Then sum these up using another subquery:
select t.*,
sum(is_item) over (partition by caseid order by EvidenceNumber) as item_counter
from (select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t
) t;
trick with Lead and Row_Number:
DECLARE #Table TABLE (
EvidenceNumber varchar(64),
Id int
)
INSERT INTO #Table VALUES
('001',8),
('001.A',8),
('001.A.01',8),
('001.A.02',8),
('001.B',8),
('001.C',8),
('001.D',8),
('001.E',8),
('001.F',8),
('001.G',8),
('001.G.01',8);
WITH CTE AS (
SELECT
[IsBag] = PATINDEX(EvidenceNumber+'%',
IsNull(LEAD(EvidenceNumber) OVER (ORDER BY EvidenceNumber),0)
),
[EvidenceNumber],
[Id]
FROM
#Table
)
SELECT
[NumItem] = IIF(IsBag = 0,ROW_NUMBER() OVER (PARTITION BY [ISBag] order by [IsBag]),0),
[EvidenceNumber],
[Id]
FROM
CTE
ORDER BY EvidenceNumber

Getting Top 10 based on column value

I have a code that output a long list of the sum of count of work orders per name and sorts it by total, name and count:
;with cte as (
SELECT [Name],
[Emergency],
count([Emergency]) as [CountItem]
FROM tableA
GROUP BY [Name], [Emergency])
select Name,[Emergency],[Count],SUM([CountItem]) OVER(PARTITION BY Name) as Total from cte
order by Total desc, Name, [CountItem] desc
but I only want to get the top 10 Names with the highest total like the one below:
+-------+-------------------------------+-------+-------+
| Name | Emergency | Count | Total |
+-------+-------------------------------+-------+-------+
| PLB | No | 7 | 15 |
| PLB | No Hot Water | 4 | 15 |
| PLB | Resident Locked Out | 2 | 15 |
| PLB | Overflowing Tub | 1 | 15 |
| PLB | No Heat | 1 | 15 |
| GG | Broken Lock - Exterior | 6 | 6 |
| BOA | Broken Lock - Exterior | 2 | 4 |
| BOA | Garage Door not working | 1 | 4 |
| BOA | Resident Locked Out | 1 | 4 |
| 15777 | Smoke Alarm not working | 3 | 3 |
| FP | No air conditioning | 2 | 3 |
| FP | Flood | 1 | 3 |
| KB | No electrical power | 2 | 3 |
| KB | No | 1 | 3 |
| MEM | Noise Complaint | 3 | 3 |
| ANG | Parking Issue | 2 | 2 |
| ALL | Smoke Alarm not working | 2 | 2 |
| AAS | No air conditioning | 1 | 2 |
| AAS | Toilet - Clogged (1 Bathroom) | 1 | 2 |
+-------+-------------------------------+-------+-------+
Note: I'm not after unique values. As you can see from the example above it gets the top 10 names from a very long table.
What I want to happen is assign a row id for each name so all PLB above will have a row id of 1, GG = 2, BOA = 3, ...
So on my final select I will only add the where clause where row id <= 10. I already tried ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) but it's assigning 1 to every unique Name it encounters.
You may try this:
;with cte as (
SELECT [Name],
[Emergency],
count([Emergency]) as [CountItem]
FROM tableA
GROUP BY [Name], [Emergency]),
ct as (
select Name,[Emergency],[Count],SUM([CountItem]) OVER(PARTITION BY PropertyName) as Total from cte
),
ctname as (
select dense_rank() over ( order by total, name ) as RankName, Name,[Emergency],[Count], total from ct )
select * from ctname where rankname < 11

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.
I have a table of data which contains a row per invoice. For example:
| Invoice ID | Customer Key | Date | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
| 1 | A | 08/02/2019 | 100 | 1 |
| 2 | B | 07/02/2019 | 14 | 0 |
| 3 | A | 06/02/2019 | 234 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 |
| 5 | B | 04/02/2019 | 11 | 1 |
| 6 | A | 03/02/2019 | 12 | 0 |
I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:
| Invoice ID | Customer Key | Date | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
| 1 | A | 08/02/2019 | 100 | 1 | 2 |
| 2 | B | 07/02/2019 | 14 | 0 | 1 |
| 3 | A | 06/02/2019 | 234 | 1 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 | 0 |
| 5 | B | 04/02/2019 | 11 | 1 | 0 |
| 6 | A | 03/02/2019 | 12 | 0 | 0 |
I know I can do this using either a CTE like this...
(
select
count(*)
from table
where
[Customer Key] = t.[Customer Key]
and [Date] < t.[Date]
and Something = 1
)
But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.
So; is there a more efficient means of achieving this, or do I just suck it up?
EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])
Assuming you're using SQL Server 2012+ you can use Window Functions:
COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
Old answer before new required logic:
COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
If you're not using 2012 an alternative is to use ROW_NUMBER
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

SQL Server : update sequence number across multiple groups

I would like to update a table:
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | NULL |
| 2 | 1 | 2010-04-27 | NULL |
| 3 | 2 | 2010-04-28 | NULL |
| 4 | 3 | 2010-04-28 | NULL |
To this (note that created_at is used for ordering, and sequence is "grouped" by type_id):
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | 1 |
| 2 | 1 | 2010-04-27 | 2 |
| 3 | 2 | 2010-04-28 | 1 |
| 4 | 3 | 2010-04-28 | 1 |
Same question has been raised but for SQL Server.
Link
Thanks.
You can use ROW_NUMBER() to get sequence number per type_id slice. Use a CTE to make UPDATE operation simpler:
;WITH ToUpdate AS (
SELECT id, type_id, created_at, sequence,
ROW_NUMBER() OVER (PARTITION BY type_id ORDER BY created_at) AS newSeq
FROM mytable
)
UPDATE ToUpdate
SET sequence = newSeq
Demo here

Finding the max and min date values in SQL Server tables

I have two tables:
A lookup table (tabOne):
KEY | Group | Name | Desc | Val_Key
----------------------------------------
1 | a | NameA | DescA | 10
2 | b | NameB | DescB | 20
3 | c | NameC | DescC | 30
4 | d | NameD | DescD | 40
5 | e | NameE | DescE | 50
6 | f | NameF | DescF | 60
A second table containing readings (tabTwo):
KEY | Date | Reading | Val_Key
----------------------------------------
1 | Date | Read | 10
2 | Date | Read | 20
3 | Date | Read | 40
4 | Date | Read | 40
5 | Date | Read | 30
6 | Date | Read | 20
7 | Date | Read | 40
8 | Date | Read | 20
9 | Date | Read | 10
10 | Date | Read | 20
11 | Date | Read | 50
12 | Date | Read | 60
What I need to do is join tabTwo with TabOne and create a column with the newest Reading and a column with the oldest reading for each item in the group column of TabOne.
At the end of the day I want a table that look as follow:
KEY | Group | Name | Desc | Val_Key | LastReading | FirstReading |
-------------------------------------------------------------------------
1 | a | NameA | DescA | 10 | | |
2 | b | NameB | DescB | 20 | | |
3 | c | NameC | DescC | 30 | | |
4 | d | NameD | DescD | 40 | | |
5 | e | NameE | DescE | 50 | | |
6 | f | NameF | DescF | 60 | | |
Thanks!
Freddie
If this is Sql Server 2005 or newer, outer apply will help:
select TabOne.*,
last.Reading LastReading,
first.Reading FirstReading
from TabOne
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date desc
) last
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date asc
) first
Live test is # Sql Fiddle.
#Nikola Markovinović's solution can be made more universally applicable if the subqueries are moved directly to the main query's SELECT clause, which is possible each of them retrieves only one value and is, therefore, valid as a scalar expression:
SELECT
t1.[KEY],
t1.[Group],
t1.Name,
t1.[Desc],
t1.Val_Key,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date DESC
) AS LastReading,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date ASC
) AS FirstReading
FROM TabOne t1
If you needed e.g. dates along the way, you would probably have to stick to Nikola's solution. There is an alternative to it, but it's more cumbersome (albeit more standard too): it would involve grouping TabTwo's data by Val_Key to get earliest/latest dates per Val_Key, then joining back to TabTwo to access entire rows corresponding to the found dates to finally pull the necessary columns, and ultimately joining both result sets to TabOne to get the final column set.

Resources