deleting duplicates based on value of another column - sql-server

I have a table with 3 columns and the first column is 'name'. Some names are entered twice, some 3 times and some more than that. I would like to keep only one value for each name and delete the extra rows based on the values of Column 2 and 3. If column 2 and 3 are null, I would like to delete that row.
There are no primary keys or id column.
There are about 2.75 million rows in the table.
Would like to delete using one query(preferably) in SQL 14. Can someone help please?
Name column2 column3
Suzy english null
Suzy null null
Suzy null 5
John null null
John 7 7
George null benson
George null null
George benson null
George 5 benson
Would like to have it as:
Name column2 column3
Suzy english null
Suzy null 5
John 7 7
George benson null
George 5 benson
Many thanks in advance.

Use partitions over name with the appropriate order by:
WITH cte as (
SELECT ROW_NUMBER()
OVER (PARTITION BY name
ORDER BY case
when column1 = 'null' and column2 = 'null' then 3
when column2 = 'null' then 2
when column1 = 'null' then 1
else 0 end
) num
FROM mytable
)
delete from cte where num > 1
This deletes duplicates, keeping in order of preference, rows with:
both column1 and column2 not null (random one kept if there are multiple of these)
column1 not null
column2 not null
both column1 and column2 null
Note that is query assumes (based on comments to question) that your "null" values are actually the text string "null" and not an SQL null.
If they were actually nulls, replace = 'null' with IS NULL.

Delete from yourtable
where column2 is null and column3 is null
above query is Based on this..
I would like to keep only one value for each name and delete the extra rows based on the values of Column 2 and 3. If column 2 and 3 are null, I would like to delete that row

Related

SQL Check to see if any value in a list exists in another table

I have a temp table that looks something like this:
Record DepartmentId PositionId EmployeeId StatusId CustomerId
1 Null Null Null 4
2 7 454 Null Null
3 Null 454 Null 3
3 Null Null Null Null 214
3 Null Null Null Null 100
3 Null Null Null Null 312
4 Null Null Null Null 357
I inserted the above into the temp table from tables that looked like this:
Record Table Record-to-Department Record-to-Position
Record Name Record DepartmentId Record PositionId
1 Red 2 7 2 454
2 blue 3 454
3 Green
4 Purple
Record-To-Status Record-To-Customer
Record StatusId Record CustomerId
1 4 3 214
3 3 3 100
3 312
4 357
I have an employee whose record looks something like this:
EmployeeId DepartmentId PositionId StatusId
342 7 454 4
Employee Customers:
EmployeeId CustomerId
342 357
342 95
342 720
In this scenario, it would return Record 1 (because it matches the StatusId), Record 2 (because it matches both the DepartmentId and the PositionId), but it would not return Record 3 because it only matches the PositionId and not the StatusId, and it would return RecordId 4 because one of the Employee CustomerIds matches the CustomerId on Record 4.
I got part of this answer on another question enter link description here (please forgive me I am new and trying to figure out how to ask everything I need to know), but I can't figure out how to handle the multi-records.
I tried selecting the Employees customer Id's into a table variable and then attempted to use the Coalesce like this:
Declare #Customers table(CustomerId int)
INSERT INTO #Customers(CustomerId)
SELECT DISTINCT S.CustomerId
FROM employee_Customers
Select * from tbl
WHERE
COAlesce(StatusId,#StatusId)=#StatusId AND
COALESCE(DepartmentId,#DepartmentId)=#DepartmentId AND
Coalesce(PositionId,#PositionId)=#PositionId AND
Coalesce(EmployeeCompanyId,#EmployeeCompanyId) = #EmployeeCompanyId AND
COALESCE((Select CustomerId from tbl_Requirement_to_Customer),(Select CustomerId from #Customers)) = (Select CustomerId from #Customers)
But I receive the error "Subquery Returned more than 1 value".
I have a possible solution you can try. I don't think it will be plug-and-play but hopefully you can adapt it to your situation. I am using just the data as presented in your temp table, your employee record and your Employee-customers correlation.
The basic logic is to join your temp table to the employee(s) using or condition, but then to get a count of populated values, and compare this count to a count of the number of matching values, which must be at least the first count and greater than zero.
This returns your desired output:
select t.*
from Temp t
left join emp e on e.DepartmentId=t.DepartmentId or e.PositionId=t.PositionId or e.EmployeeId=t.EmployeeId or e.StatusId=t.StatusId
outer apply (
select case when exists (
select * from EmployeeCustomers ec join emp e on e.EmployeeId=ec.EmployeeId where ec.CustomerID=t.CustomerId
) then 1 else 0 end CustomerIdMatch
)c
outer apply (
values (
Iif(t.departmentId is null,0,1) +
Iif(t.PositionId is null,0,1) +
Iif(t.EmployeeId is null,0,1) +
Iif(t.StatusId is null,0,1) +
c.CustomerIdMatch
))x(Cnt)
outer apply (
values (
Iif(t.departmentId=e.DepartmentId,1,0) +
Iif(t.PositionId=e.PositionId,1,0) +
Iif(t.EmployeeId=e.EmployeeId,1,0) +
Iif(t.StatusId=e.StatusId,1,0) +
c.CustomerIdMatch
))y(Cnt2)
where cnt2>=cnt and cnt2>0
See working DB<>Fiddle

Adding values from a different table based on values in multiple columns of another table

I have 2 Tables
Table 1
Name column2 column3 column 4
Suzy English null null
Rocky Polish Irish null
John English American Funny
George Funny English null
Table 2
Column Value
English 2
Polish 3
Irish 2
Funny 0
American 1
The values in Column in Table 2 are unique.
I want to add a column in Table 1 which finds all the matching values from columns 2, 3 and 4 in table 1, finds the corresponding values in the ‘column’ in table 2 and the adds the corresponding values, So that Table 1 now is updated to look like
Table 1
Name column2 column3 column 4 Total
Suzy english null null 2
Rocky Polish Irish null 5
John English American Funny 3
George Funny English null 2
Is this possible at all? Or do I need to have another query first?
Your table structures are less than ideal, since apparently all of column2, column3 and column4 in table 1 contain items of the same "type".
There are various ways of creating your totals - we can either perform multiple joins or use a correlated subquery. I'm using the subquery here:
select
*,
(select SUM(t2.Value) from Table2 t2
where t2.Column1 in (t1.Column2,t1.Column3,t1.Column4)) as TotalValue
from
Table1 t1
You can use left join and do addition as below
select t1.*, [Total] = isnull(C2.Value,0) + isnull(C3.Value,0) + isnull(C4.Value,0)
from [Table 1] t1
left join [Table 2] c2 on t1.Column1 = c2.[Column]
left join [Table 2] c3 on t1.Column1 = c3.[Column]
left join [Table 2] c4 on t1.Column1 = c4.[Column]

delete multiple rows with same value in sql 14

I have a table with 3 columns and the first column is 'name'. Some names are entered twice, some 3 times and some more than that. I would like to keep only one value for each name and delete the extra rows.
There are no primary keys or id column.
There are about 1 million rows in the table.
Would like to delete using one query(preferably) in SQL 14. Can someone help please?
Name column2 column3
Suzy
Suzy
Suzy
John
John
George
George
George
George
Would like to have it as:
Name column2 column3
Suzy
John
George
Many thanks in advance
You can use row_number function, try like this,
WITH CTE
AS (
SELECT NAME
,column2
,column3
,RN = ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY NAME
)
FROM < YourTableName >
)
DELETE
FROM CTE
WHERE RN > 1

How can I always return Null for a column without updating the column's value in the database?

ID Name tuition num of courses
1 Brandon 4430 6
2 Lisa 2300 3
3 Victoria null 0
4 Jack 3330 4
The type of the tuition column is money, but I need to return return null in my select statement without updating the values in the table.
I tried nullif(tuition is not null), but it didn't work.
How can I return results like those in the table below, without updating the table or modifying the data in database?
ID Name tuition num of courses
1 Brandon null 6
2 Lisa null 3
3 Victoria null 0
4 Jack null 4
If you are returning null for every row, just code the column as:
NULL AS Tuition
Example query:
SELECT Id, Name, NULL as Tuition, NumCourses FROM TheTable
I have created the table and inserted records as you have shown above
It is a self join query.
-- To make sure that the underlying table is not updated run both the queries together.
select TT.Id, TT.Name,
nullif(TT.Tuition, BT.Tuition) as Tuition, TT.NOCs
from tblTuition TT
join tblTuition BT
on TT.Id = Bt.Id
select * from tblTuition
Whenever you need to get value as null then you can use like this,
SELECT NULL AS ABC FROM MYTABLE
So above statement add one ABC column in your select list AS All NULL Values, same thing can be use as getting a Default value e.g. if you want to get 1 then simply use SELECT 1 AS ABC FROM MYTABLE

Update first rows that match condition

Update table1
set column1 = 'abc', column2 = 25
where column3 IN ('John','Kate','Tim')
Column3 contains John twice (two associated rows/records), similarly - it has Kate third times and Tim twice.
How can I adjust the query so that the update affects only the first row with John, the first row with Kate and the first with Tim?
For the reference, here is table1:
column1 column2 column3
aa 2 John (!)
affd 24 John
dfd 5 Tim (!)
ss 77 Kate (!)
s 4 Tim
s 1 Kate
sds 34 Kate
I want to update only the rows marked with (!)
I am especially interested in Ms Access! - but also curious how this is done in Sql Server in case it differs. Thank you!
Sql Server solution - Note, you must have a unique identity column for this to work (or some set of unique columns).
UPDATE table1
SET column1 = 'abc',
column2 = 25
WHERE id IN (SELECT id
FROM (SELECT id,
Row_number()
OVER (
ORDER BY rowyouwanttoorderby ) AS ROWNUM
FROM table1
WHERE column3 IN ( 'John', 'Kate', 'Tim' )) AS temp
WHERE rownum = 1)

Resources