Uisng exists in sql to find duplicates, is there a cleaner way? - sql-server

See script below to find duplicates in SQL Server DB. Is there a cleaner way?
select itemnum
from matusetrans a
where exists (select null
from matusetrans b
where a.itemnum = b.itemnum
and a.actualdate = b.actualdate
and a.matusetransid != b.matusetransid
and (a.rotassetnum = b.rotassetnum
or (a.rotassetnum is null and b.rotassetnum is null))
and a.quantity = b.quantity)
group by itemnum

You could try:
SELECT itemnum
FROM matusetrans
GROUP BY [ColumnNames]
HAVING
COUNT(*) > 1

Assuming you want to find duplicate itemnum in table,Please use below query
SELECT itemnum
FROM matusetrans
GROUP BY [ItemNum]
HAVING COUNT(ItemNum) > 1
Using HAVING COUNT(*) > 1 may give you result as all are distinct if there are any Datetime columns like order datetime which generally varies per record.
Thanks,
Sree

Another possibility (but not neccessarily "cleaner") might be
WITH cte AS(
SELECT columns, ROW_NUMBER() OVER (PARTITION BY columns ORDER by columns) AS RowIdx
FROM matusetrans
GROUP BY columns
)
SELECT *
FROM cte
WHERE RowIdx > 1

Related

SQL Server: Count distinct occurrences in one field by value in another

Currently I'm writing two queries to count distinct occurrences of fieldOne for each possible value of fieldTwo. How can I do this in one query? Thanks
select
count(*) from(select distinct(fieldOne) from myTable where fieldTwo= 'valueOne')x
select
count(*) from(select distinct(fieldOne) from myTable where fieldTwo = 'valueTwo') y
Try using CASE statement
SELECT COUNT(DISTINCT CASE WHEN FIELDTWO= 'VALUEONE' THEN FIELDONE END) X ,
COUNT(DISTINCT CASE WHEN FIELDTWO= 'VALUETWO' THEN FIELDONE END)Y
FROM MYTABLE
This can be done with cross apply to remove the need to know the possible values in fieldTwo:
select twos.FieldTwo, count(1)
from (select distinct fieldTwo from MyTable) twos
cross apply (select distinct t.fieldOne
from MyTable t
where t.fieldTwo = twos.FieldTwo) ones
group by twos.FieldTwo

Deleting duplicates in a time series

I have a large set of measurements taken every 1 millisecond stored in a SQL Server 2012 table. Whenever there are 3 or more duplicate values in some rows that I would like to delete the middle duplicates. Highlighted values in this image of sample data are the ones that I want to delete. Is there a way to do this with a SQL query?
You can do this using a CTE and ROW_NUMBER:
SQL Fiddle
WITH CteGroup AS(
SELECT *,
grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS)
FROM YourTable
),
CteFinal AS(
SELECT *,
RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS),
RN_LAST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS DESC)
FROM CteGroup
)
DELETE
FROM CteFinal
WHERE
RN_FIRST > 1
AND RN_LAST > 1
I'm sure there must be a more efficient way to do this, but you could join the table to itself twice to find the previous and next value in the list, and then delete all of the entries where all three values are the same.
DELETE FROM tbl
WHERE ms IN
(
SELECT T.ms
FROM tbl T
INNER JOIN tbl T1 ON T.ms = T1.ms + 1
INNER JOIN tbl T2 ON T.ms = T2.ms - 1
WHERE T.value = T1.value AND T.value = T2.value
)
If the table is really big, I can see this blowing tempdb though.
Yes there is
select * from table group by table.field ->value

SQL Server: How to select fist 10 records from table without using TOP keyword

There is a table which contains 50 records. I want to select first 10 records without using TOP keyword.
In SQL Server 2012+ you can use OFFSET ... FETCH
SELECT *
FROM YourTable
ORDER BY YourColumn ASC
OFFSET 0 ROWS
FETCH FIRST 10 ROWS ONLY
You can use ROW_NUMBER and Common Table Expression to query any range of data.
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber <= 10 -- other conditions: RowNumber between 50 and 60
Refere ROW NUMBER Here
Although it's probably the same thing internally, you can use
set rowcount 10
and then run the query.
I guess you can try something like this:
SELECT t.Id, t.Name FROM Table t
WHERE 10 > (SELECT count(*) FROM Table t2 WHERE t.id > t2.id)
You can use ROW_NUMBER. Let's say your table contains columns ID and Name. In that case you can use such query:
SELECT t.Id, t.Name
FROM (
SELECT ID, Name,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
FROM TableName
) t
WHERE RowNumber <= 10

SQL Server show only rows (all columns) with a distinct column name

I have a query which produces several results. I have concatenated several columns into 1 as an ID. I only want to show the rows where the ID is unique.
The below image is an example of my table:
As you can see the ID is repeated a few times. How can I construct a query to show only the 3 unique rows?
Nesting this query and using distinct(RowID) shows the three rows but I cannot show the rest of the columns?
Any ideas welcome. Thank you!
Use distinct in the select for all columns
Query:
select distinct RowID, OrderNum, cDescription, Thickness, UllTimberThickness, Width, UllTimberWidth, Length
from YourTable
Use GROUP BY:
SELECT RowID, OrderNum, cDescription, Thickness, etcetera ...
FROM dbo.TableName
GROUP BY RowID, OrderNum, cDescription, Thickness, etcetera ...
( etcetera is a placeholder for the rest of your columns )
Try this:
SELECT *
FROM mytable t1
WHERE
(SELECT COUNT(*) FROM mytable t2
where t2.id = t1.id) = 1
In this way, you get all rows where id is unique. I write this query because I don't know if at your ID other fields are the same, because you have built your field with a set of information. If all the columns are the same for equal id so you can use the statement DISTINCT as adviced you by Vasanth Sundaralingam in his post.
I'm a little confused by what "get the three unique rows means".
If you mean rows that are unique, use count(*) as a window function:
select *
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt = 1;
If you mean one example of each row, use row_number():
select *
from (select t.*,
row_number(*) over (partition by id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1;

select top 1 with a group by

I have two columns:
namecode name
050125 chris
050125 tof
050125 tof
050130 chris
050131 tof
I want to group by namecode, and return only the name with the most number of occurrences. In this instance, the result would be
050125 tof
050130 chris
050131 tof
This is with SQL Server 2000
I usually use ROW_NUMBER() to achieve this. Not sure how it performs against various data sets, but we haven't had any performance issues as a result of using ROW_NUMBER.
The PARTITION BY clause specifies which value to "group" the row numbers by, and the ORDER BY clause specifies how the records within each "group" should be sorted. So partition the data set by NameCode, and get all records with a Row Number of 1 (that is, the first record in each partition, ordered by the ORDER BY clause).
SELECT
i.NameCode,
i.Name
FROM
(
SELECT
RowNumber = ROW_NUMBER() OVER (PARTITION BY t.NameCode ORDER BY t.Name),
t.NameCode,
t.Name
FROM
MyTable t
) i
WHERE
i.RowNumber = 1;
select distinct namecode
, (
select top 1 name from
(
select namecode, name, count(*)
from myTable i
where i.namecode = o.namecode
group by namecode, name
order by count(*) desc
) x
) as name
from myTable o
SELECT max_table.namecode, count_table2.name
FROM
(SELECT namecode, MAX(count_name) AS max_count
FROM
(SELECT namecode, name, COUNT(name) AS count_name
FROM mytable
GROUP BY namecode, name) AS count_table1
GROUP BY namecode) AS max_table
INNER JOIN
(SELECT namecode, COUNT(name) AS count_name, name
FROM mytable
GROUP BY namecode, name) count_table2
ON max_table.namecode = count_table2.namecode AND
count_table2.count_name = max_table.max_count
I did not try but this should work,
select top 1 t2.* from (
select namecode, count(*) count from temp
group by namecode) t1 join temp t2 on t1.namecode = t2.namecode
order by t1.count desc
Here are to examples that you could use but the temp table use is more efficient than the view, but was done on a small data sample. You would want to check your own statistics.
--Creating A View
GO
CREATE VIEW StateStoreSales AS
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank'
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
GO
SELECT * FROM StateStoreSales
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP VIEW StateStoreSales
---Using a Temp Table
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank' INTO #TEMP
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
SELECT * FROM #TEMP
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP TABLE #TEMP

Resources