Delete partial dulicate rows - sql - sql-server

I have some troubles with deleting partial duplicate rows
The structure is like this:
+-----+--------+--+-----------+--+------+
| id | userid | | location | | week |
+-----+--------+--+-----------+--+------+
| 1 | 001 | | amsterdam | | 11 |
| 2 | 001 | | amsterdam | | 23 |
| 3 | 002 | | berlin | | 28 |
| 4 | 002 | | berlin | | 22 |
| 5 | 003 | | paris | | 19 |
| 6 | 003 | | paris | | 35 |
+-----+--------+--+-----------+--+------+
I only need to keep one row from each userid, it doesn't matter which week number it has.
Thanks,
Maxcim

This should work across most databases:
DELETE
FROM yourTable
WHERE id <> (SELECT MIN(id)
FROM yourTable t
WHERE t.userid = userid)
This query would delete from each userid group all records except for the record having the lowest id for that group. I assume that id is a unique column.

This method is tested, try it.
We are getting the number of rows occuring at each record, and then we are deleting only the ones with more than 1 row occruring... keeping the original one.
BEGIN TRANSACTION
SELECT UserID, Location,
RN = ROW_NUMBER()OVER(PARTITION BY UserID, Location ORDER BY UserID, Location)
into #test1
FROM dbo.MyTbl
Delete MyTbl
From MyTbll
INNER JOIN #test1
ON #test1.UserID= MyTbl.UserID
WHERE RN > 1
if ##Error <> 0 GOTO Errlbl
Commit Transaction
RETURN
Errlbl:
RollBack Transaction
GO

Related

How to get the top row from a SQL Server record set query and other constraint

I have two SQL Server tables as below:
Event
+------------+----------------------------+-------------+------------+-----------------------------+
| Id | EventTypeId | PersonId | UCNumber | Name |DateEvent
+------------+----------------------------+-------------+------------+-----------------------------+
| 2307 | 3 | 2189 | 004947 | Migrated | 1900-01-01 00:00:00.6780000 |
| 2308 | 15 | 2189 | 004947 | Birthday | 2020-09-18 16:48:32.6870000 |
| 3400 | 15 | 2190 | 006857 | Birthday | 1900-01-01 00:00:00.0000000 |
| 3401 | 2 | 2190 | 006857 | Migrated | 2016-03-12 00:00:00.0000000 |
Person
+------------+----------------+-------------------+-----------+-------------------------------+
| Id | UCNumber | Name |LastName | AnotherDate |
+------------+----------------+-------------------+-----------+-------------------------------+
| 2189 | 004947 | John | Smith | 1900-01-01 00:00:00.0000000 |
| 2190 | 006857 | Alice | Timo | 2020-02-20 00:00:00.0000000 |
I need to get retrieved the top row (latest in time) based on the Event's Id. (The higher the Id, the more recent the Event) and it should be a 15 as EventTypeId.
I tried this:
Select P.Id, P.UCNUMBER, P.AnotherDate from
db.dbo.Person P
Inner join db.dbo.Event L on L.PersonId = P.Id
where P.Id in (
SELECT TOP (1) PersonId
FROM
db.dbo.Event
where PersonId = P.Id --and EventTypeID = 15
ORDER BY
Id DESC)
and EventTypeId = 15
but it does not work properly. I posted here just samples from the 2 tables. Generally the query takes also other events which are not latest ones (as higher Id). Something is missing in it.
In this case, for instance, it should return only 1 row:
2189 004947 1900-01-01 00:00:00.0000000
Sounds like you just want ORDER BY and TOP 1.
SELECT TOP 1
p.id,
p.ucnumber,
p.anotherdate
FROM event e
LEFT JOIN person p
ON p.id = e.personid
WHERE e.eventtypeid = 15
ORDER BY e.dateevent DESC;
If you want all ties in case there are more events on the same latest time you can replace TOP 1 with TOP 1 WITH TIES.

Update table combining rows based on the same column value

I'm having a table called table such that:
| id | name | city |
|----|-------|---------|
| 0 | Rose | Madrid |
| 1 | Alex | Lima |
| 2 | Rose | Sidney |
| 3 | Mario | Glasgow |
And I need to UPDATE the table so that two rows sharing the same name combined into a new one and deleted.
| id | name | city |
|----|-------|----------------|
| 1 | Alex | Lima |
| 3 | Mario | Glasgow |
| 4 | Rose | Madrid, Sidney |
I don't care if it has to be done in several SQL statements.
So far all I've done is to list the rows that are affected by this.
SELECT *
FROM table
WHERE name IN (
SELECT name
FROM table
GROUP BY name
HAVING COUNT(*) > 1
);
Assuming that id is auto increment primary key, you need an INSERT and a DELETE statement:
insert into tablename(name, city)
select name, group_concat(city, ',')
from tablename
group by name
having count(*) > 1;
delete from tablename
where instr(name, ',') = 0
and exists (
select 1 from tablename t
where t.id <> tablename.id and t.name = tablename.name
and ',' || t.city || ',' like '%,' || tablename.city || ',%'
);
See the demo.
Results:
| id | name | city |
| --- | ----- | ------------- |
| 1 | Alex | Lima |
| 3 | Mario | Glasgow |
| 4 | Rose | Madrid,Sidney |

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.
I have a table of data which contains a row per invoice. For example:
| Invoice ID | Customer Key | Date | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
| 1 | A | 08/02/2019 | 100 | 1 |
| 2 | B | 07/02/2019 | 14 | 0 |
| 3 | A | 06/02/2019 | 234 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 |
| 5 | B | 04/02/2019 | 11 | 1 |
| 6 | A | 03/02/2019 | 12 | 0 |
I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:
| Invoice ID | Customer Key | Date | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
| 1 | A | 08/02/2019 | 100 | 1 | 2 |
| 2 | B | 07/02/2019 | 14 | 0 | 1 |
| 3 | A | 06/02/2019 | 234 | 1 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 | 0 |
| 5 | B | 04/02/2019 | 11 | 1 | 0 |
| 6 | A | 03/02/2019 | 12 | 0 | 0 |
I know I can do this using either a CTE like this...
(
select
count(*)
from table
where
[Customer Key] = t.[Customer Key]
and [Date] < t.[Date]
and Something = 1
)
But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.
So; is there a more efficient means of achieving this, or do I just suck it up?
EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])
Assuming you're using SQL Server 2012+ you can use Window Functions:
COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
Old answer before new required logic:
COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
If you're not using 2012 an alternative is to use ROW_NUMBER
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

Exclude Secondary ID Records from Original SELECT

I'm relatively new to SQL and am running into a lot of issues trying to figure this one out. I've tried using a LEFT JOIN, and dabbled in using functions to get this to work but to no avail.
For every UserID, if there is a NULL value, I need to remove all records of the Product ID for that UserID from my SELECT.
I am using SQL Server 2014.
Example Table
+--------------+-------------+---------------+
| UserID | ProductID | DateTermed |
+--------------+-------------+---------------+
| 578 | 2 | 1/7/2017 |
| 578 | 2 | 1/7/2017 |
| 578 | 1 | 1/15/2017 |
| 578 | 1 | NULL |
| 649 | 1 | 1/9/2017 |
| 649 | 2 | 1/11/2017 |
+--------------+-------------+---------------+
Desired Output
+--------------+-------------+---------------+
| UserID | ProductID | DateTermed |
+--------------+-------------+---------------+
| 578 | 2 | 1/7/2017 |
| 578 | 2 | 1/7/2017 |
| 649 | 1 | 1/9/2017 |
| 649 | 2 | 1/11/2017 |
+--------------+-------------+---------------+
Try the following:
SELECT a.userid, a.productid, a.datetermed
FROM yourtable a
LEFT OUTER JOIN (SELECT userid, productid, datetermed FROM yourtable WHERE
datetermed is null) b
on a.userid = b.userid and a.productid = b.productid
WHERE b.userid is not null
This will left outer join all records with a null date to their corresponding UserID and ProductID records. If you only take records that don't have an associated UserID and ProductID in the joined table, you should only be left with records that don't have a null date.
You can use this WHERE condition:
SELECT
UserID,ProducID,DateTermed
FROM
[YourTableName]
WHERE
(CONVERT(VARCHAR,UserId)+
CONVERT(VARCHAR,ProductID) NOT IN (
select CONVERT(VARCHAR,UserId)+ CONVERT(VARCHAR,ProductID)
from
[YourTableName]
where DateTermed is null)
)
When you concatenate the UserId and the ProductId get a unique value for each pair, then you can use them as a "key" to exclude the "pairs" that have the null value in the DateTermed field.
Hope this help.

Sql query to check if a certain value appears more than once in rows

I have table with 5 columns like this
+----+-------------------------+-----------+--------+-----------+
| Id | CreateDate | CompanyId | UserId | IsEnabled |
+----+-------------------------+-----------+--------+-----------+
| 1 | 2016-01-02 23:40:46.517 | 1 | 1 | 1 |
| 2 | 2016-01-16 00:07:59.857 | 1 | 2 | 1 |
| 3 | 2016-01-25 15:17:54.420 | 3 | 3 | 1 |
| 25 | 2016-03-07 16:48:39.260 | 24 | 10 | 0 |
| 26 | 2016-03-07 16:48:39.263 | 25 | 2 | 0 |
+----+-------------------------+-----------+--------+-----------+
(thanks http://www.sensefulsolutions.com/2010/10/format-text-as-table.html for ASCII table!)
I'm trying to check if a UserId is recorded for more than one CompanyId's.
So far I managed to check if a UserId happens to appear more than one by using this query
WITH T AS
(
SELECT * ,
Count(*) OVER (PARTITION BY UserId) as Cnt
From CompanyUser
)
select Distinct UserId
FROM T
Where Cnt >1
It returns 2 correctly.
Where I'm stuck is, how can I parameterize the UserId and check if an Id is recorded for more than one company.
Declare #UserID as bigint
Set #UserID = 2
select Distinct Count(CompanyID)
FROM ComapynUser
Where UserId = #UserId
I think this gives you what you need.

Resources