delete duplicate tuples except for one specific tuple - database

I want to delete the duplicate(same from and same to values) tuples from a table but need to keep the tuple with minimum object id(object id is pk).
So here are columns:
from | to | time | distance | object_id
I can see the correct number of tuples that will be deleted by executing
select [from],[to],count(*)
FROM table
where [object_id] NOT IN(
SELECT min([object_id])
FROM table
group by [from],[to]
having count(*) > 1)
group by [from],[to]
having count(*) > 1
but I want to first see the object_id's which are counted on the SQL above.

You could try with this (untested)...
;WITH temp AS (
SELECT [from_id], [to_id], [object_id] = min([object_id])
FROM table
group by [from_id],[to_id]
having count(*) > 1)
SELECT
t2.[from_id],
t2.[to_id],
t.[object_id]
FROM
table t
join temp t2
on t2.[from_id] = t.[from_id]
AND t2.[to_id] = t.[to_id]
AND t2.[object_id] != t.[object_id]
EDIT:
CTE temp will yield all distinct from/to groupings with min object_id, one you would like to keep.
SELECT [from_id], [to_id], [object_id] = min([object_id])
FROM table
group by [from_id],[to_id]
having count(*) > 1
There are other pairs you would like to remove, and these are same from/to pairs, but with different object_id. Last select should output those records exactly.

Related

Total Count of column value in Sql Server

Two tables,
TableA and TableB with column "filename" which has same value in both table.
only the number of occurance of data is different.
e.g
|###TableA#########|
|id|filename_TableA|
|01|file1 |
|02|file1 |
|03|file2 |
|04|file2 |
|05|file3 |
|06|file4 |
|## TableB ########|
|id|filename_TableB|
|01|file1 |
|02|file1 |
|03|file1 |
|04|file2 |
|05|file2 |
|06|file3 |
|07|file3 |
|08|file4 |
|09|file4 |
I need to generate a SQL query which shows the distinct filename with there
number of count and totalcount of the distinct filename.
like this:
using select count(distinct filename_TableA) as totalCount from TableA
gives the totalCount of filename but I am not able to generate the sql query for above result output.
Tried for single table:
select
filename_TableA,
count(filename_TableA)as filecount_TableA,
totalCount = (
select count(distinct filename_TableA) from TableA
)
from TableA
group by filename_TableA
Always try to break your problem into smaller parts!
Your question consists of two parts:
Get distinct files and counts from tableA
Get distinct files and counts from tableB
We write queries:
1.
SELECT filename_TableA, COUNT( * ) AS filecount_TableA
FROM TableA
GROUP BY filename_TableA
2.
SELECT filename_TableB, COUNT( * ) AS filecount_TableB
FROM TableB
GROUP BY filename_TableB
Check that the results of each individual query are correct.
Then we combine the queries:
SELECT filename_TableA, filename_TableB, filecount_TableA, filecount_TableB, ISNULL( filecount_TableA, 0 ) + ISNULL( filecount_TableB, 0 ) AS totalCount,
COUNT(*) OVER() AS UniqueFileCount
FROM
( SELECT filename_TableA, COUNT( * ) AS filecount_TableA
FROM TableA
GROUP BY filename_TableA ) AS A
FULL OUTER JOIN
( SELECT filename_TableB, COUNT( * ) AS filecount_TableB
FROM TableB
GROUP BY filename_TableB ) AS B
ON A.filename_TableA = filename_TableB
Note: To cover scenarios where a file name may appear in one table but not the other I have used FULL OUTER JOIN.
If you do not have such a scenario i.e. each file name will appear at least once in every table, then you should use INNER JOIN as it will be faster.

TSQL : Find PAIR Sequence in a table

I have following table in T-SQL(there are other columns too but no identity column or primary key column):
Oid Cid
1 a
1 b
2 f
3 c
4 f
5 a
5 b
6 f
6 g
7 f
So in above example I would like to highlight that following Oid are duplicate when looking at Cid column values as "PAIRS":
Oid:
1 (1 matches Oid: 5)
2 (2 matches Oid: 4 and 7)
Please NOTE that Oid 2 match did not include Oid 6, since the pair of 6 has letter 'G' as well.
Is it possible to create a query without using While loop to highlight the "Oid" like above? along with how many other matches count exist in database?
I am trying to find the patterns within the dataset relating to these two columns. Thank you in Advance.
Here is a worked example - see comments for explanation:
--First set up your data in a temp table
declare #oidcid table (Oid int, Cid char(1));
insert into #oidcid values
(1,'a'),
(1,'b'),
(2,'f'),
(3,'c'),
(4,'f'),
(5,'a'),
(5,'b'),
(6,'f'),
(6,'g'),
(7,'f');
--This cte gets a table with all of the cids in order, for each oid
with cte as (
select distinct Oid, (select Cid + ',' from #oidcid i2
where i2.Oid = i.Oid order by Cid
for xml path('')) Cids
from #oidcid i
)
select Oid, cte.Cids
from cte
inner join (
-- Here we get just the lists of cids that appear more than once
select Cids, Count(Oid) as OidCount
from cte group by Cids
having Count(Oid) > 1 ) as gcte on cte.Cids = gcte.Cids
-- And when we list them, we are showing the oids with duplicate cids next to each other
Order by cte.Cids
select o1.Cid, o1.Oid, o2.Oid
, count(*) + 1 over (partition by o1.Cid) as [cnt]
from table o1
join table o2
on o1.Cid = o2.Cid
and o1.Oid < o2.Oid
order by o1.Cid, o1.Oid, o2.Oid
Maybe Like this then:
WITH CTE AS
(
SELECT Cid, oid
,ROW_NUMBER() OVER (PARTITION BY cid ORDER BY cid) AS RN
,SUM(1) OVER (PARTITION BY oid) AS maxRow2
,SUM(1) OVER (PARTITION BY cid) AS maxRow
FROM oid
)
SELECT * FROM CTE WHERE maxRow != 1 AND maxRow2 = 1
ORDER BY oid

combining groups in sql containing substrings

I apologize in advance for not explaining this very well.
I have a sql database with some data like this:
column1 | groups
3323052 | 3323052,3324794,3324795
3324794 | 3323052,3324794
3324794 | 3324794
3324794 | 3324794,3763369
3353586 | 3353586
3763369 | 3324794,3763369
I want to combine groups so that if a number is in two groups, the groups will combine and the number will only show up once in the list.
For example, the final result would look like this:
groups
3323052,3324794,3324795,3763369
3353586
I have been googling around without much luck. Any help is greatly appreciated.
Thanks.
So you want to recursively replace any items in groups -column with any values found from other rows with that value in column1? At least you can do it this way:
Split the data into rows, so there's just column1 -> group relation
Fetch any values that can be used as root nodes, my approach takes the smallest value because your data has a circle (3323052 -> 3324794 -> 3323052)
Fetch recursively all the value that can be found from the hierarchy under these root nodes
Put it back together into the original format
This example uses DelimitedSplit8k by Jeff Moden:
-- Step 1:
select distinct
d.column1,
convert(int, s.Item) as item
into #tmp
from
data d
cross apply DelimitedSplit8k(d.groups, ',') s
-- Step 2:
select distinct
column1
into #root
from #tmp t1
where not exists
(select 1 from #tmp t2 where t2.item = t1.column1 and t2.item > t2.column1)
-- Step 3:
;with CTE (root, parent, child) as (
select r.column1, r.column1, r.column1 from #root r
union all
select C.root, t.column1, t.item
from CTE C join #tmp t on t.column1 = C.child and t.item > C.parent
)
select distinct * into #results from CTE
-- Step 4:
SELECT r.column1, STUFF((SELECT distinct ', ' + convert(varchar(50), r2.child)
FROM #results r2
WHERE r2.root = r.column1
ORDER BY ', ' + convert(varchar(50), r2.child)
FOR XML PATH(N'')), 1, 2, '') as groups
FROM #root r
GROUP BY column1
ORDER BY column1
Result:
column1 groups
3323052 3323052, 3324794, 3324795, 3763369
3353586 3353586
I used temp. tables to be sure each of the steps is executed just once, but I believe it would be possible to do the whole thing with just one select and using CTEs instead of temp tables.
You can test this in SQL Fiddle

Need to return all columns from a table when using GROUP BY

I have a table let's say it has four columns
Id, Name, Cell_no, Cat_id.
I need to return all columns whose count of Cat_id is greater than 1.
The group should be done on Cell_no and Name.
What i have done so far..
select Cell_no, COUNT(Cat_id)
from TableName
group by Cell_Number
having COUNT(Cat_id) > 1
But what i need is some thing like this.
select *
from TableName
group by Cell_Number
having COUNT(Cat_id) > 1
Pratik's answer is good but rather than using the IN operator (which only works for single values) you will need to JOIN back to the result set like this
SELECT t.*
FROM tableName t
INNER JOIN
(SELECT Cell_no, Name
FROM TableName
GROUP BY Cell_no , Name
HAVING COUNT(Cat_id) > 1) filter
ON t.Cell_no = filter.Cell_no AND t.Name = filter.Name
you just need to modify your query like below --
select * from tableName where (Cell_no, Name) in (
select Cell_no, Name from TableName
Group by Cell_no , Name
having COUNT(Cat_id) > 1
)
as asked in question you want to group by Cell_no and Name.. if so you need to change your query for group by columns and select part also.. as I have mentioned
This version requires only one pass over the data:
SELECT *
FROM (SELECT a.*
,COUNT(cat_id) OVER (PARTITION BY cell_no)
AS count_cat_id_not_null
FROM TableName a)
WHERE count_cat_id_not_null > 1;

SQL Server: Joining in rows via. comma separated field

I'm trying to extract some data from a third party system which uses an SQL Server database. The DB structure looks something like this:
Order
OrderID OrderNumber
1 OX101
2 OX102
OrderItem
OrderItemID OrderID OptionCodes
1 1 12,14,15
2 1 14
3 2 15
Option
OptionID Description
12 Batteries
14 Gift wrap
15 Case
[etc.]
What I want is one row per order item that includes a concatenated field with each option description. So something like this:
OrderItemID OrderNumber Options
1 OX101 Batteries\nGift Wrap\nCase
2 OX101 Gift Wrap
3 OX102 Case
Of course this is complicated by the fact that the options are a comma separated string field instead of a proper lookup table. So I need to split this up by comma in order to join in the options table, and then concat the result back into one field.
At first I tried creating a function which splits out the option data by comma and returns this as a table. Although I was able to join the result of this function with the options table, I wasn't able to pass the OptionCodes column to the function in the join, as it only seemed to work with declared variables or hard-coded values.
Can someone point me in the right direction?
I would use a splitting function (here's an example) to get individual values and keep them in a CTE. Then you can join the CTE to your table called "Option".
SELECT * INTO #Order
FROM (
SELECT 1 OrderID, 'OX101' OrderNumber UNION SELECT 2, 'OX102'
) X;
SELECT * INTO #OrderItem
FROM (
SELECT 1 OrderItemID, 1 OrderID, '12,14,15' OptionCodes
UNION
SELECT 2, 1, '14'
UNION
SELECT 3, 2, '15'
) X;
SELECT * INTO #Option
FROM (
SELECT 12 OptionID, 'Batteries' Description
UNION
SELECT 14, 'Gift Wrap'
UNION
SELECT 15, 'Case'
) X;
WITH N AS (
SELECT I.OrderID, I.OrderItemID, X.items OptionCode
FROM #OrderItem I CROSS APPLY dbo.Split(OptionCodes, ',') X
)
SELECT Q.OrderItemID, Q.OrderNumber,
CONVERT(NVarChar(1000), (
SELECT T.Description + ','
FROM N INNER JOIN #Option T ON N.OptionCode = T.OptionID
WHERE N.OrderItemID = Q.OrderItemID
FOR XML PATH(''))
) Options
FROM (
SELECT N.OrderItemID, O.OrderNumber
FROM #Order O INNER JOIN N ON O.OrderID = N.OrderID
GROUP BY N.OrderItemID, O.OrderNumber) Q
DROP TABLE #Order;
DROP TABLE #OrderItem;
DROP TABLE #Option;

Resources