SQL Server: Flag only First duplicate row - sql-server

I want to flag only the first duplicate ID-VL combination in the dataset shown below. Column FirstOccurence is what I want the end result to be.
ID VL FirstOccurence
1 a 1
1 b 1
2 a 1
2 a 0
3 a 1
3 a 0
4 a 1
4 a 0
5 a 1
5 b 1
5 a 0
There is currently not a unique index available in the original table.
Is there any way to do this with for instance the LAG-functionality? I cannot find any examples online that result in the flagging of duplicates. Any suggestions are much appreciated!
Kind regards,
Igor

One method is with ROW_NUMBER() along with a CASE expression:
SELECT
ID
,VL
,CASE ROW_NUMBER() OVER(PARTITION BY ID, VL ORDER BY ID, VL) WHEN 1 THEN 1 ELSE 0 END AS FirstOccurance
FROM dbo.example
ORDER BY
ID
,VL
,FirstOccurance;
Results:
+----+----+----------------+
| ID | VL | FirstOccurance |
+----+----+----------------+
| 1 | a | 1 |
| 1 | b | 1 |
| 2 | a | 0 |
| 2 | a | 1 |
| 3 | a | 0 |
| 3 | a | 1 |
| 4 | a | 0 |
| 4 | a | 1 |
| 5 | a | 0 |
| 5 | a | 1 |
| 5 | b | 1 |
+----+----+----------------+
Note that this result order differs from your end result. If there are one or more columns present in the table that provide the same ordering as the results in you question, specify that in the ORDER BY clause instead.

Related

Creating a conditonal ROW_NUMBER() Partition clause based on previous row value

I have a table that looks like this:
+----------------+--------+
| EvidenceNumber | ID |
+----------------+--------+
| 001 | 8 |
| 001.A | 8 |
| 001.A.01 | 8 |
| 001.A.02 | 8 |
| 001.B | 8 |
| 001.C | 8 |
| 001.D | 8 |
| 001.E | 8 |
| 001.F | 8 |
| 001.G | 8 |
| 001.G.01 | 8 |
+----------------+--------+
If 001 were a bag, inside of it was 001.A, 001.B, and so on through to 001.G
In the output above, 001.A was another bag, and that bag contained 001.A.01 and 001.A.02. The same thing can be seen with 001.G.01.
Every entry in this table is either a bag or an item. I am only interested in counting the amount of items per ID.
Since 001.A.01 and 001.A.02 is the last we see of the "001.A's" we know A.01 and A.02 were items.
Since we see 001.B only once, that was an item as well.
001.G was a bag, but 001.G.01 was an item.
The above output is showing 8 items and 3 bags.
I feel like Row_number and the Partition clause is the perfect tool for the job, but I can't find a way to partition based on a clause that uses a previous row's value.
Maybe something like that isn't even necessary here, but I pictured it like:
{001} -- variable
{001}.A -- variable seen again, obviously 001 was a bag. Create new variable {001.A} and move on.
{001.A}.01 -- same thing.
{001.A.01} -- Unique variable. This is a final step. This is a bag and should be Row number 1.
Obviously, the below code is just making "ItemNum" 1 for each item since there are not duplicates.
SELECT
ROW_NUMBER() OVER(Partition BY EvidenceNumber ORDER BY EvidenceNumber) AS ItemNum,
EvidenceNumber,
ID
FROM EVIDENCE
WHERE ID = '18'
ORDER BY EvidenceNumber
+---------+----------------+--------+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+--------+
| 1 | 001 | 8 |
| 1 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 1 | 001.A.02 | 8 |
| 1 | 001.B | 8 |
| 1 | 001.C | 8 |
| 1 | 001.D | 8 |
| 1 | 001.E | 8 |
| 1 | 001.F | 8 |
| 1 | 001.G | 8 |
| 1 | 001.G.01 | 8 |
+---------+----------------+--------+
Ideally, it would partition on the items only, so in this case:
+---------+----------------+----+
| ItemNum | EvidenceNumber | ID |
+---------+----------------+----+
| 0 | 001 | 8 |
| 0 | 001.A | 8 |
| 1 | 001.A.01 | 8 |
| 2 | 001.A.02 | 8 |
| 3 | 001.B | 8 |
| 4 | 001.C | 8 |
| 5 | 001.D | 8 |
| 6 | 001.E | 8 |
| 7 | 001.F | 8 |
| 0 | 001.G | 8 |
| 8 | 001.G.01 | 8 |
+---------+----------------+----+
I don't think window functions alone are the best approach. Instead:
select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t ;
Then sum these up using another subquery:
select t.*,
sum(is_item) over (partition by caseid order by EvidenceNumber) as item_counter
from (select t.*,
(case when exists (select 1
from evidence t2
where t2.caseid = t.caseid and
t2.EvidenceNumber like t.EvidenceNumber + '.%'
)
then 0 else 1
end) as is_item
from evidence t
) t;
trick with Lead and Row_Number:
DECLARE #Table TABLE (
EvidenceNumber varchar(64),
Id int
)
INSERT INTO #Table VALUES
('001',8),
('001.A',8),
('001.A.01',8),
('001.A.02',8),
('001.B',8),
('001.C',8),
('001.D',8),
('001.E',8),
('001.F',8),
('001.G',8),
('001.G.01',8);
WITH CTE AS (
SELECT
[IsBag] = PATINDEX(EvidenceNumber+'%',
IsNull(LEAD(EvidenceNumber) OVER (ORDER BY EvidenceNumber),0)
),
[EvidenceNumber],
[Id]
FROM
#Table
)
SELECT
[NumItem] = IIF(IsBag = 0,ROW_NUMBER() OVER (PARTITION BY [ISBag] order by [IsBag]),0),
[EvidenceNumber],
[Id]
FROM
CTE
ORDER BY EvidenceNumber

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.
I have a table of data which contains a row per invoice. For example:
| Invoice ID | Customer Key | Date | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
| 1 | A | 08/02/2019 | 100 | 1 |
| 2 | B | 07/02/2019 | 14 | 0 |
| 3 | A | 06/02/2019 | 234 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 |
| 5 | B | 04/02/2019 | 11 | 1 |
| 6 | A | 03/02/2019 | 12 | 0 |
I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:
| Invoice ID | Customer Key | Date | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
| 1 | A | 08/02/2019 | 100 | 1 | 2 |
| 2 | B | 07/02/2019 | 14 | 0 | 1 |
| 3 | A | 06/02/2019 | 234 | 1 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 | 0 |
| 5 | B | 04/02/2019 | 11 | 1 | 0 |
| 6 | A | 03/02/2019 | 12 | 0 | 0 |
I know I can do this using either a CTE like this...
(
select
count(*)
from table
where
[Customer Key] = t.[Customer Key]
and [Date] < t.[Date]
and Something = 1
)
But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.
So; is there a more efficient means of achieving this, or do I just suck it up?
EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])
Assuming you're using SQL Server 2012+ you can use Window Functions:
COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
Old answer before new required logic:
COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
If you're not using 2012 an alternative is to use ROW_NUMBER
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

Segregate every second group into 1 or 2 groups SQL

I have a list of multiple components that all have a model number. I want to group every second component based on the model it belongs to.
Model | Component | Group
1 1 1
1 2 2
1 3 1
1 4 2
1 5 1
2 1 1
2 2 2
2 3 1
Every second component belonging to a model should have an alternative group number.
I believe I have to use a windows function but haven't been able to solve.
I have assumed that your Model and Component ID numbers will not be perfectly incremental, but that they will be unique. As such, you can use the row_number windowed function along with the modulo operator % to get the remainder of the division of the row_number result by 2:
declare #t table (Model int, Component int);
insert into #t values (1,1),(1,2),(1,3),(1,4),(1,5),(2,1),(2,2),(2,2);
select Model
,Component
,case (row_number() over (partition by Model order by Component) % 2)
when 1 then 1
when 0 then 2
end as [Group]
from #t;
Output:
+-------+-----------+-------+
| Model | Component | Group |
+-------+-----------+-------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 1 |
| 1 | 4 | 2 |
| 1 | 5 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 2 | 1 |
+-------+-----------+-------+

Create Tree Query From Numeric Mapping Table in SQL (Specific Format)

I have an exported table from accounting software like below.
AccountID AccountName
--------- -----------
11 Acc11
12 Acc12
13 Acc13
11/11 Acc11/11
11/12 Acc11/12
11/111 Acc11/111
11/11/001 Acc11/11/001
11/11/002 Acc11/11/002
12/111 Acc12/111
12/112 Acc12/112
I want to convert it to tree query in MS-SQL Server 2008 to use it as a Treelist datasource in my win aaplication.
I raised this question before and it's answered with a way that it was very very slow for my big table with more than 5000 records (Create Tree Query From Numeric Mapping Table in SQL). But I think counting "/" and separating AccountID field with "/" can solve my problem easier and very faster.
Anyway, My expected result must be like below:
AccountID AccountName ID ParentID Level HasChild
--------- ----------- --- --------- ------ --------
11 Acc11 1 Null 1 1
12 Acc12 2 Null 1 1
13 Acc13 3 Null 1 0
11/11 Acc11/11 4 1 2 1
11/12 Acc11/12 5 1 2 0
11/111 Acc11/111 6 1 2 0
11/11/001 Acc11/11/001 7 4 3 0
11/11/002 Acc11/11/002 8 4 3 0
12/111 Acc12/111 9 2 2 0
12/112 Acc12/112 10 2 2 0
Please Help Me.
I modified my answer given in the first question...
It would be best, if your table would keep the relation data directly in indexed columns. Before you change your table's structure you might try this:
A table with test data
DECLARE #tbl TABLE ( AccountID VARCHAR(100), AccountName VARCHAR(100));
INSERT INTO #tbl VALUES
('11','Acc11')
,('12','Acc12')
,('13','Acc13')
,('11/11','Acc11/11')
,('11/12','Acc11/12')
,('11/111','Acc11/111')
,('11/11/001','Acc11/11/001')
,('11/11/002','Acc11/11/002')
,('12/111','Acc12/111')
,('12/112','Acc12/112');
This will get the needed data into a newly created temp table called #tempHierarchy
SELECT AccountID
,AccountName
,ROW_NUMBER() OVER(ORDER BY LEN(AccountID)-LEN(REPLACE(AccountID,'/','')),AccountID) AS ID
,Extended.HierarchyLevel
,STUFF(
(
SELECT '/' + A.B.value('.','varchar(10)')
FROM Extended.IDsXML.nodes('/x[position() <= sql:column("HierarchyLevel")]') AS A(B)
FOR XML PATH('')
),1,2,'') AS ParentPath
,Extended.IDsXML.value('/x[sql:column("HierarchyLevel")+1][1]','varchar(10)') AS ownID
,Extended.IDsXML.value('/x[sql:column("HierarchyLevel")][1]','varchar(10)') AS ancestorID
INTO #tempHierarchy
FROM #tbl
CROSS APPLY(SELECT LEN(AccountID)-LEN(REPLACE(AccountID,'/','')) + 1 AS HierarchyLevel
,CAST('<x></x><x>' + REPLACE(AccountID,'/','</x><x>') + '</x>' AS XML) AS IDsXML) AS Extended
;
The intermediate result
+-----------+--------------+----+----------------+------------+-------+------------+
| AccountID | AccountName | ID | HierarchyLevel | ParentPath | ownID | ancestorID |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11 | Acc11 | 1 | 1 | | 11 | |
+-----------+--------------+----+----------------+------------+-------+------------+
| 12 | Acc12 | 2 | 1 | | 12 | |
+-----------+--------------+----+----------------+------------+-------+------------+
| 13 | Acc13 | 3 | 1 | | 13 | |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11/11 | Acc11/11 | 4 | 2 | 11 | 11 | 11 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11/111 | Acc11/111 | 5 | 2 | 11 | 111 | 11 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11/12 | Acc11/12 | 6 | 2 | 11 | 12 | 11 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 12/111 | Acc12/111 | 7 | 2 | 12 | 111 | 12 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 12/112 | Acc12/112 | 8 | 2 | 12 | 112 | 12 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11/11/001 | Acc11/11/001 | 9 | 3 | 11/11 | 001 | 11 |
+-----------+--------------+----+----------------+------------+-------+------------+
| 11/11/002 | Acc11/11/002 | 10 | 3 | 11/11 | 002 | 11 |
+-----------+--------------+----+----------------+------------+-------+------------+
And now a similar recursive approach takes place as in my first answer. But - as it is using a real table now and all the string splitting has taken place already - it should be faster...
WITH RecursiveCTE AS
(
SELECT th.*
,CAST(NULL AS BIGINT) AS ParentID
,CASE WHEN EXISTS(SELECT 1 FROM #tempHierarchy AS x WHERE x.ParentPath=th.AccountID) THEN 1 ELSE 0 END AS HasChild
FROM #tempHierarchy AS th WHERE th.HierarchyLevel=1
UNION ALL
SELECT sa.AccountID
,sa.AccountName
,sa.ID
,sa.HierarchyLevel
,sa.ParentPath
,sa.ownID
,sa.ancestorID
,(SELECT x.ID FROM #tempHierarchy AS x WHERE x.AccountID=sa.ParentPath)
,CASE WHEN EXISTS(SELECT 1 FROM #tempHierarchy AS x WHERE x.ParentPath=sa.AccountID) THEN 1 ELSE 0 END AS HasChild
FROM RecursiveCTE AS r
INNER JOIN #tempHierarchy AS sa ON sa.HierarchyLevel=r.HierarchyLevel+1
AND r.AccountID=sa.ParentPath
)
SELECT r.AccountID
,r.AccountName
,r.ID
,r.ParentID
,r.HierarchyLevel
,r.HasChild
FROM RecursiveCTE AS r
ORDER BY HierarchyLevel,ParentID;
And finally I clean up
DROP TABLE #tempHierarchy;
And here's the final result
+-----------+--------------+----+----------+----------------+----------+
| AccountID | AccountName | ID | ParentID | HierarchyLevel | HasChild |
+-----------+--------------+----+----------+----------------+----------+
| 11 | Acc11 | 1 | NULL | 1 | 1 |
+-----------+--------------+----+----------+----------------+----------+
| 12 | Acc12 | 2 | NULL | 1 | 1 |
+-----------+--------------+----+----------+----------------+----------+
| 13 | Acc13 | 3 | NULL | 1 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 11/11 | Acc11/11 | 4 | 1 | 2 | 1 |
+-----------+--------------+----+----------+----------------+----------+
| 11/111 | Acc11/111 | 5 | 1 | 2 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 11/12 | Acc11/12 | 6 | 1 | 2 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 12/111 | Acc12/111 | 7 | 2 | 2 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 12/112 | Acc12/112 | 8 | 2 | 2 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 11/11/001 | Acc11/11/001 | 9 | 4 | 3 | 0 |
+-----------+--------------+----+----------+----------------+----------+
| 11/11/002 | Acc11/11/002 | 10 | 4 | 3 | 0 |
+-----------+--------------+----+----------+----------------+----------+

Do Subselects make SQlite ignore distincts?

we are facing a strange behaviour in SQLite (Version 3).
We have a table for Vehicles with two columns referencing an engine and a gear.
Of course there could be more than one vehicle with the same engine gear combination.
I now want to find the distinct combination of engines and gears of the vehicles (and use it for an insert => thats why randomblob(36)).
Example:
Vehicle | EngineId | GearId
-----------------------------
1 | 1 | 1
1 | 1 | 2
1 | 2 | 1
1 | 2 | 2
1 | 1 | 2
1 | 1 | 2
The following select statement results in too many rows:
Select randomblob(36), tmp.EngineId, tmp.GearId from (Select distinct EngineId, GearId from tblVehicle order by EngineId, GearId) as tmp;
RandomId| EngineId | GearId
-----------------------------
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
4 | 2 | 2
5 | 1 | 2
6 | 1 | 2
But the expected result would just be:
RandomId| EngineId | GearId
-----------------------------
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
4 | 2 | 2
If I replace the randomblob(36) with a constant, the result is as expected (of course without a random Id).
Select 2, tmp.EngineId, tmp.GearId from (Select distinct EngineId, GearId from tblVehicle order by EngineId, GearId) as tmp;
Can someone explain me this behaviour of SQLite? Is this the expected behaviour?
This is a bug.
I can reproduce this with SQLite 3.6.23.1 but not with 3.7.15, so it has been fixed already.

Resources