Merge duplicate records and update its table - sql-server

Can you help me with my problem with SQL Query?
I want to merge/(sum if necessary) all data of the customer with duplicate customer mame.
In my project, I already find out all the customer that has been duplicated using this code:
select Firstname, Lastname, count(1) as RepeatedCount
from customer
group by FirstName, LastName
having count(1) > 1
How can I update Customer table with only 1 customer record and get the sum of totalsales and totavisits in one record only.
Sample data:
FirstName LastName TotalSales TotalVisits
---------- ---------- -------------- -----------
Michelle Go 0.00 0
Michelle Go 6975.00 1
Michelle Go 1195.00 1
Michelle Go 9145.00 3
Michelle Go 57785.00 5
Michelle Go 5845.00 1
Michelle Go 0.00 0
Michelle Go 0.00 0
Expected Output:
FirstName LastName TotalSales TolalVisits
---------- ---------- -------------- -----------
Michelle Go 80945.00 11

You have use the aggregate function SUM with GROUP BY.
Query
SELECT FirstName,LastName,
SUM(totalsales) as totalsales,
SUM(totalvisits) as totalvisits
FROM customer
GROUP BY FirstName,LastName;
And for better practice I suggest you to add a column for customerId which is unique.
So that you can group it easily.
SQL Fiddle

You could use SUM and GROUP BY and insert the results in a temporary table.
IF OBJECT_ID('tempdb..#tempTable') IS NOT NULL
DROP TABLE #tempTable
SELECT
c.FirstName,
c.LastName,
SUM(c.TotalSales) AS TotalSales,
SUM(c.TotalVisits) AS TolalVisits
INTO #tempTable
FROM Customer c
GROUP BY c.LastName, c.FirstName
The, TRUNCATE the original table, Customer, and INSERT the data of #tempTable:
TRUNCATE TABLE Customer
INSERT INTO Customer
SELECT * FROM #tempTable

Let us consider below CTE as main table.
;WITH user_details
AS
(
SELECT 'Michelle' AS FirstName,'Go' AS LastName,0.00 AS totalsales,0 AS totalvists
UNION
SELECT 'MICHELLE','GO',6975.00,1
UNION
SELECT 'michelle','go',1195.00,1
UNION
SELECT 'michelle','go',9145.00,3
UNION
SELECT 'MICHELLE','GO',57785.00,5
UNION
SELECT 'MICHELLE','GO',5845.00,1
UNION
SELECT 'Michelle','Go',0.00,0
UNION
SELECT 'Michelle','Go',0.00,0
)
Group your query with aggregate function and insert into a temp table
SELECT
FirstName,
LastName,
SUM(totalsales) [totalsales],
SUM([totalvists]) [totalvisits]
INTO
#temp
FROM
user_details
GROUP BY
FirstName,
LastName
-- select * from #temp
Truncate your Main table
TRUNCATE TABLE user_details
Now again insert the updated records into your new table
INSERT INTO user_details (FirstName,LastName,totalsales,totalvisits -- Main table
SELECT
FirstName,
LastName,
totalsales,
totalvisits
FROM
#temp

Related

Delete a row based on another row

I have the following:
stock | Customer
12345 | NULL
12345 | ABC
What I want to do is remove the first without affecting the second anytime there is a set of rows like this:
if exists (select stock from table WHERE stock='12345' AND Customer is not null )
BEGIN
DELETE FROM table WHERE stock= '12345' AND Customer is null
END
The query works, but how can I change it so that I don't have to specify a stock? I want to keep the rows with null customer is it is the only value associated with that stock.
You can use exists:
DELETE t0
FROM table t0
WHERE Customer IS NULL
AND EXISTS
(
SELECT 1
FROM table t1
WHERE t0.stock = t1.stock
AND t1.Customer IS NOT NULL
)
This will only delete records where the customer is null and there is at least one record with the same stock id.
Please check following SQL DELETE command within CTE expression
I used SQL Count function with Partition By clause.
For testing NOT NULL customer field values I counted them per stock with filed name enabling me to remove NULL
;with cte as (
select
stock,
Customer,
cnt = Count(Customer) over (partition by stock)
from StockCustomer
)
delete from cte
where Customer is null and cnt > 0
You can consider different situations like in following rows
create table StockCustomer (stock int, Customer varchar(10))
insert into StockCustomer select 12345 , NULL
insert into StockCustomer select 12345 , 'ABC'
insert into StockCustomer select 11111 , 'XYZ'
insert into StockCustomer select 555555 , NULL
Use the following:
WITH CTE (stock, customer, DuplicateCount)
AS
(
SELECT stock, customer,
ROW_NUMBER() OVER(PARTITION BY Stock ORDER BY customer desc) AS DuplicateCount
FROM [Table]
)
DELETE
FROM CTE
WHERE DuplicateCount > 1 and customer is NULL
GO
You can use cross join as follows:
DELETE
FROM mytable
WHERE stock IN (
SELECT m2.stock
FROM mytable m1
CROSS JOIN mytable m2
WHERE m1.customer IS NULL
GROUP BY m2.stock
HAVING count(m2.stock) > 1
)
AND customer is NULL
Just do delete from table where customer is null if this is your only requirement.

Keep deleted data in subselect to insert afterwards

Is there a possible way to update an existing table with a group by summary of the same table?
Example:
Table A
data (decimal(5,2)) | id (int) | year (date)
In table A there are many records like
1.05 | 1 | 31.11.2015
10 | 1 | 31.11.2015
...
I now want to group by ID & YEAR and only have those records in table A.
11.5 | 1 | 31.11.2015
...
Is there a way to achieve this, without a copy of the table A? Like can I store a complete resultset in an variable, then truncate the table and insert the new ones grouped by in table A?
If you want to truncate the tableA and insert the new result set to tableA, then store the new result set to a temporary table, truncate tableA and then insert data from temp table to tableA.
Query
select sum(data) as data,
id,[year]
into #tbl
from tableA
group by id,[year];
truncate table tableA;
insert into tableA(data,id,[year])
select data,id,[year] from #tbl;
drop table #tbl;
select * from tableA;
Use a select into #TempResult, Truncate your table and reinsert the values from your temporary result...
I think you should calculate SUM(data) GROUP BY ID, Year, you can use Window functions (SQL Server 2005+) to resolve this, like this:
SELECT DISTINCT
SUM([data]) OVER(PARTITION BY ID, [Year]) AS [data],
ID, [Year]
FROM TableA
If your DBMS not support Window functions, you can try this:
SELECT DISTINCT
SUM([data]) AS [data],
ID, [Year]
FROM TableA
GROUP BY ID, [Year]
Without keeping a copy is kind of difficult. It is possible though, can't guarantee the performance:
This will update all rows with the new sum and after that delete all rows except 1 for each id combined with year of the date.
DECLARE #t table(data decimal(5,2), id int, year date)
INSERT #t
values(1.05, 1, '20151130'),(10, 1, '20151130')
;WITH CTE as
(
SELECT
data, SUM(data) OVER (partition by id, year(year)) new_data
FROM #t
)
UPDATE CTE SET data = new_data
;WITH CTE as
(
SELECT row_number() OVER (partition by id, year(year) ORDER BY (SELECT 1)) rn
FROM #t
)
DELETE CTE
WHERE rn > 1
SELECT * FROM #t

Delete duplicate records from SQL Server 2012 table with identity

I am trying to replicate a scenario where I need to delete all duplicate rows from a table except one. But all rows have a unique identity column.
For making things easier, I created a small test table student and the script is as below.
create table student
(
id int,
rollno int,
name varchar(50),
course varchar(50)
)
GO
insert into student values(1,1335592,'john','biology')
insert into student values(2,1335592,'john','biology')
insert into student values(3,1335592,'john','biology')
insert into student values(4,1335592,'john','biology')
insert into student values(5,1335593,'peter','biology')
insert into student values(6,1335593,'peter','biology')
insert into student values(7,1335593,'peter','biology')
GO
select * from student
This will generate the table as below.
id rollno name course
1 1335592 john biology
2 1335592 john biology
3 1335592 john biology
4 1335592 john biology
5 1335593 peter biology
6 1335593 peter biology
7 1335593 peter biology
I would like to keep the records with ID '1' and '5' in the result set and delete everything else. Is there any way to do this?.
All help will be greatly appreciated.
Thanks
Shammas
Use CTE.
Query
;with cte as
(
select rn=row_number() over
(
partition by rollno,name,course
order by id
),*
from student
)
delete from cte
where rn > 1;
Fiddle demo
It is simple query
Delete from student
where id not in (select min(id)
from student
group by rollno, name, course)
you can Use
DENSE_RANK,
ROW_NUMBER ,RANK
these all'll give you the result.
Try:
create table student
(id int,
rollno int,
name varchar(50),
course varchar(50)
)
GO
insert into student values(1,1335592,'john','biology')
insert into student values(2,1335592,'john','biology')
insert into student values(3,1335592,'john','biology')
insert into student values(4,1335592,'john','biology')
insert into student values(5,1335593,'peter','biology')
insert into student values(6,1335593,'peter','biology')
insert into student values(7,1335593,'peter','biology')
GO
;with cte as
(
select rn=row_number() over
(
partition by rollno,name,course
order by id
),*
from student
)
select * from cte where rn=1
;with cte2 as
(
select rn=RANK() over
(
partition by rollno,name,course
order by id
),*
from student
)
select * from cte2 where rn=1
;with cte3 as
(
select rn=Dense_RANK() over
(
partition by rollno,name,course
order by id
),*
from student
)
select * from cte3 where rn=1
See Difference between ROW_NUMBER(), RANK() and DENSE_RANK()
DELETE s
FROM student s
JOIN student s2 ON s.course = s2.course
AND s.NAME = s2.NAME
AND s.rollno = s2.rollno
WHERE s2.id < s.id

SQL - Add id to all rows

Let's assume my table is the following:
id | name | country
--------------------
| John | USA
| Mary | USA
| Mike | USA
Someone can help me with a script that can add id's to all names?
Thanks
-- Create a temporary table for the example.
CREATE TABLE #People(Id int, Name nvarchar(10), Country nvarchar(10))
-- Insert data, leaving the Id column NULL.
INSERT INTO #People(Name, Country) SELECT
'John', 'USA' UNION ALL SELECT
'Mary', 'USA' UNION ALL SELECT
'Mike', 'USA';
-- 1. Use Row_Number() to generate an Id.
-- 2. Wrap the query in a common table expression (CTE), which is like an inline view.
-- 3. If the CTE references a single table, we can update the CTE to affect the underlying table.
WITH PeopleCte AS (
SELECT
Id,
Row_Number() OVER (ORDER BY (SELECT NULL)) AS NewId
FROM
#People
)
UPDATE PeopleCte SET Id = NewId
SELECT * FROM #People
DROP TABLE #People
try this
update table set a.id=b.newid from table a,(select row_number() over (order by (select null)) newid,name from #temp) b
make changes like ordering as needed

Sql server union but keep order

Is there a way to union two tables, but keep the rows from the first table appearing first in the result set?
For example:
Table1
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Table2
name surname
------------------
Lucky Dube
Abby Arnold
I want the result set to look like:
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Lucky Dube
Abby Arnold
Unfortunately, union somehow reorders the table. Is there a way around this?
Try this :-
Select *
from
(
Select name,surname, 1 as filter
from Table1
Union all
Select name,surname , 2 as filter
from Table2
)
order by filter
The only way to guarantee output order is to use ORDER BY:
SELECT name,surname,1 as rs
FROM table1
UNION ALL
SELECT name,surname,2
FROM table2
ORDER BY rs
If you don't want rs to appear in the final result set, do the UNION as a subquery:
SELECT name,surname
FROM (
SELECT name,surname,1 as rs
FROM table1
UNION ALL
SELECT name,surname,2
FROM table2
) t
ORDER BY rs
;WITH cte as (
SELECT name, surname, 1 as n FROM table1
UNION ALL
SELECT name, surname, 2 as n FROM table2
UNION ALL
SELECT name, surname, 3 as n FROM table3
)
SELECT name, surname
FROM cte
ORDER BY n;
.Like this?
CREATE TABLE #Table1 (Names VARCHAR(50))
CREATE TABLE #Table2 (Names VARCHAR(50))
INSERT INTO #Table1
(
Names
)
VALUES
('John Doe'), ('Bob Marley'), ('Ras Tafari')
INSERT INTO #Table2
(
Names
)
VALUES
('Lucky Dube'), ('Abby Arnold')
SELECT ArbSeq = 1, *
FROM #Table1
UNION ALL
SELECT ArbSeq = 2, *
FROM #Table2
ORDER BY ArbSeq
It should be noted that ordering is not guaranteed when not explicitly defined.
If the table features a clustered index, the rows will typically be returned in the index's order - but this is not guaranteed.

Resources