How to delete duplicates in SQL table with Primary Key [duplicate] - sql-server

This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Closed 6 years ago.
As an example, consider the following table.
+-------+----------+----------+------------+
| ID(PK)| ClientID | ItemType | ItemID |
+-------+----------+----------+------------+
| 1 | 4 | B | 56 |
| 2 | 8 | B | 54 |
| 3 | 276 | B | 57 |
| 4 | 8653 | B | 25 |
| 5 | 3 | B | 55 |
| 6 | 4 | B | 56 |
| 7 | 4 | B | 56 |
| 8 | 276 | B | 57 |
| 9 | 8653 | B | 25 |
+-------+----------+----------+------------+
We have a process that's causing duplicates that we need to delete. In the example above, clients 4, 276, and 8653 should only ever have one ItemType/ItemID combination. How would I delete the extra rows that I don't need. So in this example, I'd need to delete all row contents of ID(PK)s 6, 7, 8, 9. Now this would need to happen on a much larger scale so I can't just go in one by one and delete the rows. Is there a query that will identify all ID(PK)s that aren't the lowest ID(PK) so I can delete them? I'm picturing a delete statement that operates on a subquery, but I'm open to suggestions. I've tried creating a rownumber to identify duplicates, however, because the table has a PK all rows are unique so that hasn't worked for me.
Thank you!
Edit: Here's the expected result
+-------+----------+----------+------------+
| ID(PK)| ClientID | ItemType | ItemID |
+-------+----------+----------+------------+
| 1 | 4 | B | 56 |
| 2 | 8 | B | 54 |
| 3 | 276 | B | 57 |
| 4 | 8653 | B | 25 |
| 5 | 3 | B | 55 |
+-------+----------+----------+------------+

You can use CTE:
;WITH ToDelete AS (
SELECT ROW_NUMBER() OVER (PARTITION BY ClientID, ItemType, ItemID
ORDER BY ID) AS rn
FROM mytable
)
DELETE FROM ToDelete
WHERE rn > 1

Related

How to check a specific range value is filled in SQL with Window functions?

I am working on a project in which we should evaluate suppliers and in this database I have this table EvaluationGrade:
+------+---------------------+------------+-----------+
| Id | EvaluationMethodId | FromScore | ToScore |
+------+---------------------+------------+-----------+
| 1 | 2 | 1 | 20 |
| 2 | 2 | 21 | 50 |
| 3 | 2 | 51 | 70 |
| 4 | 2 | 71 | 100 |
| 5 | 3 | 1 | 20 |
| 6 | 3 | 31 | 40 |
+------+---------------------+------------+-----------+
This table categorize scores and I am gonna be sure for EvaluationMethodId=2 scope values fill 1 to 100 (just like sample above).
I am looking for something like this:
+---------------------+------------+
| EvaluationMethodId | Sum |
+---------------------+------------+
| 2 | 100 |
| 3 | 30 |
+---------------------+------------+
This is the way I attempted:
WITH myUpdate
AS (SELECT emg.Id,emg.EvaluationMethodId,
SUM(emg.ToGrade - emg.FromGrade) + 1 AS SumScope
FROM generalsup.EvaluationMethodGrading emg
GROUP BY emg.Id,emg.EvaluationMethodId)
SELECT myUpdate.EvaluationMethodId, SUM(myUpdate.SumScope) AS SumScopeAll
FROM myUpdate
GROUP BY myUpdate.EvaluationMethodId;
But I use window function that put less overhead on server.
Since there is no case of overlaps in the scores, you can do it with group by EvaluationMethodId and sum():
select EvaluationMethodId, sum(ToScore - FromScore + 1) [Sum]
from EvaluationMethodGrading
group by EvaluationMethodId
See the demo.
Results:
> EvaluationMethodId | Sum
> -----------------: | --:
> 2 | 100
> 3 | 30

Programming In SQLITE

In college I learned PL/SQL, which I used to insert/update data into table programmatically.
So is there any way to do it in SQLITE?
I have one table book which has two columns: readPages and currentPage. readPage contains info about how many pages I've read today and currentPage shows total read pages till today.
Currently I have data for only readPages so I want to calculate currentPage for past days, e.g.
readPages: 19 10 43 20 35 # I have data for 5 days
currentPage: 19 29 72 92 127 # I want to calculate it
So this can be easy with programming, but how to do with sqlite as it is not like plsql.
The order of the rows can be determined by id or by date.
The problem with the column date is that its format: 'DD-MM' is not comparable.
Better change it to something like: 'YYYY-MM-DD'.
Since your version of SQLite does not allow you to use window functions, you can do what you need with this:
update findYourWhy
set currentPage = coalesce(
(select sum(f.readPage) from findYourWhy f where f.id <= findYourWhy.id),
0
);
If you change the format of the date column, you can also do it with this:
update findYourWhy
set currentPage = coalesce(
(select sum(f.readPage) from findYourWhy f where f.date <= findYourWhy.date),
0
);
See the demo.
CREATE TABLE findYourWhy (
id INTEGER,
date TEXT,
currentPage INTEGER,
readPage INTEGER,
PRIMARY KEY(id)
);
INSERT INTO findYourWhy (id,date,currentPage,readPage) VALUES
(1,'06-05',null,36),
(2,'07-05',null,9),
(3,'08-05',null,12),
(4,'09-05',null,5),
(5,'10-05',null,12),
(6,'11-05',null,13),
(7,'12-05',null,2),
(8,'13-05',null,12),
(9,'14-05',null,3),
(10,'15-05',null,5),
(11,'16-05',null,6),
(12,'17-05',null,7),
(13,'18-05',null,7);
Results:
| id | date | currentPage | readPage |
| --- | ----- | ----------- | -------- |
| 1 | 06-05 | 36 | 36 |
| 2 | 07-05 | 45 | 9 |
| 3 | 08-05 | 57 | 12 |
| 4 | 09-05 | 62 | 5 |
| 5 | 10-05 | 74 | 12 |
| 6 | 11-05 | 87 | 13 |
| 7 | 12-05 | 89 | 2 |
| 8 | 13-05 | 101 | 12 |
| 9 | 14-05 | 104 | 3 |
| 10 | 15-05 | 109 | 5 |
| 11 | 16-05 | 115 | 6 |
| 12 | 17-05 | 122 | 7 |
| 13 | 18-05 | 129 | 7 |
If you're using sqlite 3.25 or newer, something like:
SELECT date, readPages
, sum(readPages) OVER (ORDER BY date) AS total_pages_read
FROM yourTableName
ORDER BY date;
will compute the running total of pages.

How to get Detailed Explain Plan?

I worked on management studio in the past and remember explain/query plan was descriptive like it used to tell
1) Order in which statements will be fired
2) Number of rows return by each statement
I am using "explain plan" by OracleSQL developer but i don't see above features. Is there any other good free tool ?
Order in which statements will be fired
Adrian Billington has created an "XPlan Utility", to extend the output of DBMS_XPLAN to include the execution order of the steps. The following output shows the difference between the default output and that produced by Adrian's XPlan Utility.
For example,
EXPLAIN PLAN FOR
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';
SET LINESIZE 130
-- Default Output
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 3 (0)| 00:00:53 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 58 | 3 (0)| 00:00:53 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 38 | 2 (0)| 00:00:35 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:18 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("E"."ENAME"='SMITH')
4 - access("E"."DEPTNO"="D"."DEPTNO")
18 rows selected.
SQL>
Let's see the extended plan to see the order of steps. See the ORD column:
-- XPlan Utility output
#xplan.display.sql
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | 1 | 58 | 3 (0)| 00:00:53 |
| 1 | 0 | 5 | NESTED LOOPS | | | | | |
| 2 | 1 | 3 | NESTED LOOPS | | 1 | 58 | 3 (0)| 00:00:53 |
|* 3 | 2 | 1 | TABLE ACCESS FULL | EMP | 1 | 38 | 2 (0)| 00:00:35 |
|* 4 | 2 | 2 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
| 5 | 1 | 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:18 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("E"."ENAME"='SMITH')
4 - access("E"."DEPTNO"="D"."DEPTNO")
About
------
- XPlan v1.2 by Adrian Billington (http://www.oracle-developer.net)
18 rows selected.
SQL>
Number of rows return by each statement
In SQL Developer, the explain plan window has the cardinality column which shows the number of rows.
In SQL*Plus, using DBMS_XPLAN, you can display in a readable format. The rows column shows the number of rows.
See How to create and display explain plan in SQL*Plus. Few good examples and usage here.

Add one with same foreign key at row sql server

I have a problem on sql server.
How to get running number from foreign key in one time select data from table?
example :
I have one table such as
-----------------
| id | pid | desc |
-----------------
| 1 | 1 | a |
| 2 | 1 | b |
| 3 | 1 | c |
| 4 | 2 | d |
| 5 | 2 | e |
| 6 | 2 | f |
| 7 | 2 | g |
| 8 | 3 | h |
| 9 | 3 | i |
| 10 | 1 | j |
| 11 | 1 | k |
-----------------
I want to get result as below
------------------------
| id | pid | desc | rec |
------------------------
| 1 | 1 | a | 1 |
| 2 | 1 | b | 2 |
| 3 | 1 | c | 3 |
| 4 | 2 | d | 1 |
| 5 | 2 | e | 2 |
| 6 | 2 | f | 3 |
| 7 | 2 | g | 4 |
| 8 | 3 | h | 1 |
| 9 | 3 | i | 2 |
| 10 | 1 | j | 4 |
| 11 | 1 | K | 5 |
------------------------
In above tables foreign key ('pid') Column has values 1 to 3 in different row numbers.
I tried to get the running number from each 'pid' field name.
I havn't found any way to do this,
Can I do that? Can some one help me? am still newbie at sql server
Try this
SELECT
id,
pid,
[desc],
Row_Number() OVER (PARTITION BY pid ORDER BY id) AS rec
FROM <yourtable>
ORDER BY id
You can use Ranking function in SQL Server 2005+ to accomplish that,
So here is your query
Select Row_Number() over (partition by pid order by id) as rec , * from Table

SQL Server. Join two tables n a view, take rows from one and turn into columns [duplicate]

This question already has answers here:
Efficiently convert rows to columns in sql server
(5 answers)
Closed 8 years ago.
I'm pretty new to SQL Server so don't really know what I'm doing with this. I have two tables, which might look like this:
table 1
| ID | customer | Date |
| 1 | company1 | 01/08/2014 |
| 2 | company2 | 10/08/2014 |
| 3 | company3 | 25/08/2014 |
table 2
| ID | Status | Days |
| 1 | New | 6 |
| 1 | In Work | 25 |
| 2 | New | 17 |
| 3 | New | 14 |
| 3 | In Work | 72 |
| 3 | Complete | 25 |
What I need to do is join based on the ID, and create new columns to show how long each ID has been in each status. Every time an order goes to a new status, a new line is added and the number of days is counted as in the 2nd table above. What I need to create from this, should look like this:
| ID | customer | Date | New | In Work | Complete |
| 1 | company1 | 01/08/2014 | 6 | 25 | |
| 2 | company2 | 10/08/2014 | 17 | | |
| 3 | company3 | 25/08/2014 | 14 | 72 | 25 |
So what do I need to to to create this?
Thanks for any help, as I say I'm pretty new to this.
I would suggest that AHiggins' link is a better candidate to mark this as a dupe rather than the one that's actually been selected because his link involves a join.
WITH [TimeTable] AS (
SELECT
T1.ID,
T1.[Date],
T2.[Status] AS [Status],
T2.[Days]
FROM
dbo.Table1 T1
INNER JOIN dbo.Table2 T2
ON T2.ID = T1.ID
)
SELECT *
FROM
[TimeTable]
PIVOT (MAX([Days]) FOR [Status] IN ([New], [Complete], [In Work])) TT
;

Resources