delete duplicate rows - sql-server

anyone know how can i delete duplicate rows by writing new way from script below to improve performance.
DELETE lt1 FROM #listingsTemp lt1, #listingsTemp lt2
WHERE lt1.code = lt2.code and lt1.classification_id > lt2.classification_id and (lt1.fap < lt2.fap or lt1.fap = lt2.fap)

Delete Duplicate Rows in a SQL Table :
delete table_a
where rowid not in
(select min(rowid) from table_a
group by column1, column2);

1 - Create an Identity Column (ID) for your table (t1)
2 - Do a Group by on your table with your conditions and get IDs of duplicated records.
3 - Now, simply Delete records from t1 where IDs IN duplicated IDs set.

Look into BINARY_CHECKSUM .... you could possibly use it when creating your temp tables to more quickly determine if the data is the same.... for example create a new field in both temp tables storing the binary_checksum value... then just delete where those fields equal

The odiseh answer seems to be valid (+1), but if for some reason you can't alter the structure of the table (because you have not the code of the applications that are using it or something) you could write a job that run every night and delete the duplicates (using the Moayad Mardini code).

Related

Is there a way to delete the duplicate from a table in SQL Server?

I have a table like this:
As you can see, the rows 2 and 3 are similar, and row 3 is a useless duplicate. My question is how can we delete row 3 only but keep row2 and row4 at the same time.
Like this:
Thanks for your help!
You don't have duplicates. If you had a heap table with identical records, then every value in one or more records would be the same. One means of dealing with this would be to add an identity column. Then the identity column can be used to remove some but not all of the duplicates.
In your case, you want to delete records if another record exists that is similar and perhaps has "better" data. You can use an EXISTS clause to do this. The logic below is not what you want, but it should give you the idea of how to handle this.
DELETE t
FROM MyTable t
WHERE t.BCT IS NULL -- delete only records with no values?
AND t.BCS IS NULL
AND EXISTS( -- another record with a value exists, so this one might not be needed?
SELECT *
FROM MyTable x
WHERE (x.BCT IS NOT NULL OR t.BCS IS NOT NULL)
AND x.portCode = t.portCode
AND x.effDate = t.effDate
AND LEFT(x.issueName, 26) = LEFT(t.issueName, 26)
)

TSQL - Copy data from one table to another

I'm copying the contents of a table into another identical table. But there are already data in the destination table.
Some data in the destination table has the same code as the source table.
Is it possible to skip the duplicates and not to block the insertion for the rest of the data without it failing?
insert into [DB2].[dbo].[MAN] values([MAN],[DES])
SELECT [MAN]
,[DES]
FROM [DB1].[dbo].[MAN]
You can use NOT EXISTS :
INSERT INTO [DB2].[dbo].[MAN] ([MAN], [DES])
SELECT M.[MAN], M.[DES]
FROM [DB1].[dbo].[MAN] AS M
WHERE NOT EXISTS (SELECT 1 FROM [DB2].[dbo].[MAN] M1 WHERE M1.COL = M.COL);
You need to change the M1.COL = M.COL with your actual column name from which you can identify the duplicate values.
If you have your unique col then you can go like this.
insert into [DB2].[dbo].[MAN] values([MAN],[DES])
SELECT [MAN]
,[DES]
FROM [DB1].[dbo].[MAN] WHERE uniqueCol NOT IN (SELECT uniqueCol FROM [DB2].[dbo].[MAN])
Otherwise append few columns to get unique one and compare like that.

Deleting old records from a very big table based on criteria

I have a table (Table A) that contains 300 million records, I want to do a data retention activity on basis of some criteria. So I want to delete about 200M records of the table.
Concerning the performance, I planned to create a new table (Table-B) with the oldest 10M records from Table-A. Then I can select records from Table-B which matches the criteria and will delete it in Table A.
Extracting 10M records from Table-A and loading into Table-B using SQL Loader takes ~5 hours.
I already created indexes and I use parallel 32 wherever applicable.
What I wanted to know is,
Is there any better way to extract from Table-A and to load it in Table-B.
Is there any better approach other than creating a temp table(Table-B).
DBMS: Oracle 10g, PL/SQL and Shell.
Thanks.
If you want to delete 70% of the records of your table, the best way is to create a new table that contains the remaining 30% of the rows, drop the old table and rename the new table to the name of the old table. One possibility to create the new table is a create-table-as-select statement (CTAS), but there are also possibilities that make the impact on the running system much smaller, e.g. one can use materialized views to select the remaining data and convert the materialized vie to a table. The details of the approach depend on the requirements.
This reading and writing is much more efficient then deleting the rows of the old table.
If you delete the rows of the old table it is probably necessary to reorganize the old table which will also end up in writing these remaining 30% of data.
Partitioning the table by your criteria may be an option.
Consider a case with the criteria is the month. All January data falls into the Jan partition. All February data falls into the Feb partition...
Then when it comes time to drop all the old January data, you just drop the partition.
Using rowid best to use but inline cursor can help u more
Insert into table a values ( select * from table B where = criteria) then truncate table A
Is there any better way to extract from Table-A and to load it in? You can use parallel CTAS - create table-b as select from table-a. You can use compression and parallel query in one step.
Table-B. Is there any better approach other than creating a temp
table(Table-B)? Better approach would be partitioning of table a
Probably better approach would be partitioning of Table A but if not you can try fast and simple:
declare
i pls_integer :=0 ;
begin
for r in
( -- select what you want to move to second table
SELECT
rowid as rid,
col1,
col2,
col3
FROM
table_a t
WHERE
t.col < SYSDATE - 30 --- or other criteria
)
loop
insert /*+ append */ into table_b values (r.col1, r.col2, r.col3 ); -- insert it to second table
delete from table_a where rowid = r.rid; -- and delete it
if i < 500 -- check your best commit interval
then
i:=i+1;
else
commit;
i:=0;
end if;
end loop;
commit;
end;
In above example you will move your records in small 500 rows transactions. You can optimize it using collection and bulk insert but i wanted to keep simple code.
I was missing one index on a column that i was using in a search criteria.
Apart from this there was some indexes missing on referenced tables too.
Apart from this #miracle173 answer is also good but we are having some foreign key too that might create problem if we had used that approach.
+1 to #miracle173

compare 2 tables if match update if not match insert

I have 2 tables with same table structure. Table A is having all transaction with 3 unique key in each record. Table B have only condition base record only.
I want compare both tables if Table B has matching record than I want to update and Table B have not matching record than insert in Table B.
Can you please suggest best way to do it like ssis or any thing else
The easiest way is a MERGE statement:
MERGE INTO Table_B
USING Table_A
ON TableA.ID1 = Table_B.ID1 AND TableA.ID2 = Table_B.ID2 AND TableA.ID3 = Table_B.ID3
WHEN MATCHED THEN UPDATE SET A = Table_A.A, B = Table_A.B -- Etcetera...
WHEN NOT MATCHED THEN INSERT (A, B) VALUES (Table_A.a, Table_A.B) -- Etcetera...
WHEN NOT MATCHED BY SOURCE THEN DELETE -- If Necessary...
;
By the way, don't forget the ";" at the end. SQL Server doesn't usually need them, but a MERGE does.

Merge query using two tables in SQL server 2012

I am very new to SQL and SQL server, would appreciate any help with the following problem.
I am trying to update a share price table with new prices.
The table has three columns: share code, date, price.
The share code + date = PK
As you can imagine, if you have thousands of share codes and 10 years' data for each, the table can get very big. So I have created a separate table called a share ID table, and use a share ID instead in the first table (I was reliably informed this would speed up the query, as searching by integer is faster than string).
So, to summarise, I have two tables as follows:
Table 1 = Share_code_ID (int), Date, Price
Table 2 = Share_code_ID (int), Share_name (string)
So let's say I want to update the table/s with today's price for share ZZZ. I need to:
Look for the Share_code_ID corresponding to 'ZZZ' in table 2
If it is found, update table 1 with the new price for that date, using the Share_code_ID I just found
If the Share_code_ID is not found, update both tables
Let's ignore for now how the Share_code_ID is generated for a new code, I'll worry about that later.
I'm trying to use a merge query loosely based on the following structure, but have no idea what I am doing:
MERGE INTO [Table 1]
USING (VALUES (1,23-May-2013,1000)) AS SOURCE (Share_code_ID,Date,Price)
{ SEEMS LIKE THERE SHOULD BE AN INNER JOIN HERE OR SOMETHING }
ON Table 2 = 'ZZZ'
WHEN MATCHED THEN UPDATE SET Table 1.Price = 1000
WHEN NOT MATCHED THEN INSERT { TO BOTH TABLES }
Any help would be appreciated.
http://msdn.microsoft.com/library/bb510625(v=sql.100).aspx
You use Table1 for target table and Table2 for source table
You want to do action, when given ID is not found in Table2 - in the source table
In the documentation, that you had read already, that corresponds to the clause
WHEN NOT MATCHED BY SOURCE ... THEN <merge_matched>
and the latter corresponds to
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
Ergo, you cannot insert into source-table there.
You could use triggers for auto-insertion, when you insert something in Table1, but that will not be able to insert proper Shared_Name - trigger just won't know it.
So you have two options i guess.
1) make T-SQL code block - look for Stored Procedures. I think there also is a construct to execute anonymous code block in MS SQ, like EXECUTE BLOCK command in Firebird SQL Server, but i don't know it for sure.
2) create updatable SQL VIEW, joining Table1 and Table2 to show last most current date, so that when you insert a row in this view the view's on-insert trigger would actually insert rows to both tables. And when you would update the data in the view, the on-update trigger would modify the data.

Resources