sql query to delete only one duplicate row - sql-server

I've a table with some duplicate rows in it. I want to delete only one duplicate row.
For example I'v 9 duplicate rows so should delete only one row and should show 8 remaining rows.
example
date calling called duration timestampp
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
from above date should delete only one row and should show 3 rows
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
from above date should delete only one row and should show 2 rows
How can I do this?

This solution allows you to delete one row from each set of duplicates (rather than just handling a single block of duplicates at a time):
;WITH x AS
(
SELECT [date], rn = ROW_NUMBER() OVER (PARTITION BY
[date], calling, called, duration, [timestamp]
ORDER BY [date])
FROM dbo.UnspecifiedTableName
)
DELETE x WHERE rn = 2;
As an aside, both [date] and [timestamp] are terrible choices for column names...

For SQL Server 2005+ you can do the following:
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [date], calling, called, duration, [timestamp] ORDER BY 1) RN
FROM YourTable
)
DELETE FROM CTE
WHERE RN = 2

Do you have a primary key on the table?
What makes a row a duplicate? Same time? same date? all columns being the same?
If you have a primary key you can use the TOP function to select only one record and delete that one row:
Delete from [tablename] where id in (select top 1 id from [tablename] where [clause])

If you don't mind the order of these rows there is a command in MySQL:
DELETE TOP (numberOfRowsToDelete) FROM db.tablename WHERE {condition for ex id = 5};

Since I don't have the schema, I'd a possible solution in steps:
Apply a row number to the select of all columns
Make a group by with those columns and delete the min(rownumber) in each group
Edit:
The rownumber is in a inner query and will have the rownumber incrementing in all rows. In the outer query I make the group by of the inner query and select the min(rownumber) for each group. Since each group is composed by duplicated rows, I then remove the min(rownumber) for each group.

using LIMIT 1 will help you delete only 1 ROW that matches your DELETE query:
DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;
BEFORE:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+
| 4 | value |
+-----+----------------+
AFTER:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+

Related

Grouping rows to minimise deviation

I have a Employee Wages table like this, with their EmpID and their wages.
EmpId | Wages
================
101 | 1280
102 | 1600
103 | 1400
104 | 1401
105 | 1430
106 | 1300
I need to write a Stored Procedure in SQL Server, to group the Employees according to their wages, such that similar salaried people are in groups together and the deviations within the group is as minimum as possible.
There are no other conditions or rules mentioned.
The output should look like this
EmpId | Wages | Group
=======================
101 | 1280 | 1
106 | 1300 | 1
103 | 1400 | 2
104 | 1401 | 2
105 | 1430 | 2
102 | 1600 | 3
You can use a query like the following:
SELECT EmpId, Wages,
DENSE_RANK() OVER (ORDER BY CAST(Wages - t.min_wage AS INT) / 100) AS grp
FROM mytable
CROSS JOIN (SELECT MIN(Wages) AS min_wage FROM mytable) AS t
The query calculates the distance of each wage from the minimum wage and then uses integer division by 100 in order to place records in slices. So all records that have a deviation that is between 0 - 99 off the minimum wage are placed in the first slice. The second slice contains records off by 100 - 199 from the minimum wage, etc.
You can for +-30 deviation as the below:
DECLARE #Tbl TABLE (EmpId INT, Wages INT)
INSERT INTO #Tbl
VALUES
(99, 99),
(100, 101),
(101, 1280),
(102, 1600),
(103, 1400),
(104, 1401),
(105, 1430),
(106, 1300)
;WITH CTE AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY Wages) AS RowId FROM #Tbl )
SELECT
A.EmpId ,
A.Wages ,
DENSE_RANK() OVER (ORDER BY MIN(B.RowId)) [Group]
FROM
CTE A CROSS JOIN CTE B
WHERE
ABS(B.Wages - A.Wages) BETWEEN 0 AND 30 -- Here +-30
GROUP BY A.EmpId, A.Wages
ORDER BY A.Wages
Result:
EmpId Wages Group
----------- ----------- --------------------
99 99 1
100 101 1
101 1280 2
106 1300 2
103 1400 3
104 1401 3
105 1430 3
102 1600 4

How to add a calculated column of certain rows in SQL

I want to add a calculated column (persisted) that is the total rows for the same group of categories such as sales order below. How would you do this in SQL Server?
SalesOrder Amount Total(calculated)
100 10 25
100 15 25
101 20 45
101 25 45
102 30 65
102 35 65
The best mechanism to use for storing pre-calculated aggregates that are automatically maintained would be an indexed view, it is not possible via a persisted computed column (you could use a scalar UDF in a computed column to calculate the result but this can't be persisted and such computed columns are generally bad for performance both as they force RBAR evaluation and as they block parallelism).
CREATE VIEW dbo.AggregatedSales
WITH SCHEMABINDING
AS
SELECT SalesOrder,
SUM(Amount) AS Total
FROM dbo.YourTable
GROUP BY SalesOrder
GO
CREATE UNIQUE CLUSTERED INDEX UIX ON dbo.AggregatedSales(SalesOrder)
Then the aggregates will be pre calculated and stored in the view. Your queries will need to join on the view. You may need to use the NOEXPAND hint to be sure that the pre calculated aggregates are in fact used and they aren't recalculated at runtime.
For SQL server 2012
CREATE TABLE #t (saleOrder int , amount int)
INSERT INTO #t VALUES
(100,10)
,(100,15)
,(101,20)
,(101,25)
,(102,30)
,(102,35)
SELECT *
,SUM(amount) OVER (PARTITION BY saleorder) as [total]
FROM #t
Result :
saleOrder | amount | total
==========================
100 | 10 | 25
100 | 15 | 25
101 | 20 | 45
101 | 25 | 45
102 | 30 | 65
102 | 35 | 65

How to remove the duplicate records in select query over clause

I am having Transactions table as follows in SQL SERVER.
UserID TranDate Amount
1 | 2015-04-01 | 0
1 | 2015-05-02 | 5000
1 | 2015-09-07 | 1000
1 | 2015-10-01 | -4000
1 | 2015-10-02 | -700
1 | 2015-10-03 | 252
1 | 2015-10-03 | 260
1 | 2015-10-04 | 1545
1 | 2015-10-05 | 1445
1 | 2015-10-06 | -2000
I want to query this table to get available balance at any particular date. So I used Windowing function for that.
SELECT TransactionDate,
SUM(Amount) OVER (PARTITION BY UserId ORDER BY TransactionDate ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM Transactions
But as transactions table is having duplicate entry for date 2015-10-03 it is repeating data for date 2015-10-03. Whenever there is same date I am expecting the last record of that date with available balance summed up.
Current output
TransactionDate AvailableBalance
2015-04-01 | 0
2015-05-02 | 5000
2015-09-07 | 6000
2015-10-01 | 2000
2015-10-02 | 1300
2015-10-03 | 1552
2015-10-03 | 1804
2015-10-04 | 3349
2015-10-05 | 4794
2015-10-06 | 2794
Expected: I want to remove below record from the above result set.
2015-10-03 | 1552
HERE is my sql fiddle
You can SUM before windowed function like:
SqlFiddleDemo
WITH cte AS
(
SELECT TransactionDate, UserId, SUM(Amount) AS Amount
FROM Transactions
GROUP BY TransactionDate, UserId
)
SELECT TransactionDate,
SUM(Amount) OVER (PARTITION BY UserId ORDER BY TransactionDate ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS AvailableBalance
FROM cte
Use RANGE instead of ROWS.
SQL Fiddle
SELECT
TransactionDate,
SUM(Amount) OVER (
PARTITION BY UserId
ORDER BY TransactionDate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS AvailableBalance
FROM Transactions;
This variant produces a different result set than originally requested, but it may be useful in some cases. This variant returns same number of rows as in Transactions table. So, it will return two rows with 2015-10-03, but for both rows AvailableBalance would be 1804.
I just wanted to highlight that there is that option RANGE. If you really need one row per day, then grouping by day at first as in the answer by #lad2025 is the way to go.

SQL Query to fill missing gaps across time and get last non-null value

I have the following table in my database:
Month|Year | Value
1 |2013 | 100
4 |2013 | 101
8 |2013 | 102
2 |2014 | 103
4 |2014 | 104
How can I fill in "missing" rows from the data, so that if I query from 2013-03 through 2014-03, I would get:
Month|Year | Value
3 |2013 | 100
4 |2013 | 101
5 |2013 | 101
6 |2013 | 101
7 |2013 | 101
8 |2013 | 102
9 |2013 | 102
10 |2013 | 102
11 |2013 | 102
12 |2013 | 102
1 |2014 | 102
2 |2014 | 103
3 |2014 | 103
As you can see I want to repeat the previous Value for a missing row.
I have created a SQL Fiddle of this solution for you to play with.
Essentially it creates a Work Table #Months and then Cross joins this will all years in your data set. This produces a complete list of all months for all years. I then left join the Test data provided in your example (Table named TEST - see SQL fiddle for schema) back into this list to give me a complete list with Values for the months that have them. The next issue to overcome was using the last months values if this months didn't have any. For that, I used a correlated sub-query i.e. joined tblValues back on itself only where it matched the maximum Rank of a row which has a value. This then gives a complete result set!
If you want to filter by year\month you can add this into a WHERE clause just before the final Order By.
Enjoy!
Test Schema
CREATE TABLE TEST( Month tinyint, Year int, Value int)
INSERT INTO TEST(Month, Year, Value)
VALUES
(1,2013,100),
(4,2013,101),
(8,2013,102),
(2,2014,103),
(4,2014,104)
Query
DECLARE #Months Table(Month tinyint)
Insert into #Months(Month)Values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
With tblValues as (
select Rank() Over (ORDER BY y.Year, m.Month) as [Rank],
m.Month,
y.Year,
t.Value
from #Months m
CROSS JOIN ( Select Distinct Year from Test ) y
LEFT JOIN Test t on t.Month = m.Month and t.Year = y.Year
)
Select t.Month, t.Year, COALESCE(t.Value, t1.Value) as Value
from tblValues t
left join tblValues t1 on t1.Rank = (
Select Max(tmax.Rank)
From tblValues tmax
Where tmax.Rank < t.Rank AND tmax.Value is not null)
Order by t.Year, t.Month

Query 1 field using 2 field for where clause

i have table like this in POSTGRESQL:
Column | Type | Modifiers
---------------+-----------------------------+-----------
id | smallint | not null
merchant_id | smallint | not null
batch_no | smallint | not null
i have query like this :
select merchant_id , max(batch_no) from batch group by merchant_id
it returns a value like this :
merchant_id | max
-------------------+------
14 | 593
45 | 1
34 | 3
46 | 1
25 | 326
27 | 61
17 | 4
how i can get an id of each data? what query i can used for to get 1 result whish is the id of the data above?
This query works with any version of PostgreSQL, even before there were window functions (PostgreSQL 8.3 or earlier):
SELECT b.id, b.merchant_id, b.batch_no
FROM batch b
JOIN (
SELECT merchant_id, max(batch_no) AS batch_no
FROM batch
GROUP BY merchant_id
) bmax USING (merchant_id, batch_no)
If batch_no should not be unique per merchant_id, you may get multiple rows per merchant_id.
With PostgreSQL 8.4 or later you use the window function first_value():
SELECT DISTINCT
merchant_id
, first_value(batch_no) OVER w
, first_value(id) OVER w
FROM batch
GROUP BY merchant_id
WINDOW w AS (PARTITION BY merchant_id ORDER BY batch_no DESC, id)
This even yields unique rows per merchant_id if batch_no should not be unique. In this case the smallest id (for the biggest batch_no per merchant_id) would be selected as I additionally sort the window by id.
I use DISTINCT here, because it is applied after the window function (as opposed to GROUP BY).

Resources