SQL Server 2012 - Looking for duplicates with differences - sql-server

In SQL Server 2012, I have a table like this:
Id | AccountID | Accession | Status
----------------------------------------
1 | 1234567 | ABCD | F
2 | 1234567 | ABCD | F
3 | 2345678 | BCDE | F
4 | 8765432 | BCDE | F
5 | 3456789 | CDEF | F
6 | 9876543 | CDEF | A
I need to find rows that have the same Accession and a Status of "F", but a different AccountID.
I need a query that would return:
Id | AccountID | Accession | Status
----------------------------------------
3 | 2345678 | BCDE | F
4 | 8765432 | BCDE | F
1 and 2 wouldn't be returned because they have the same AccountID. 5 and 6 wouldn't be returned because the status on 6 is "A" and not "F".

You could do something like this.
;WITH NonDupAccountIDs AS
(
SELECT AccountID,Accession, Status
FROM MyTable
WHERE Status = 'F'
GROUP BY AccountID,Accession, Status
HAVING COUNT(Id) = 1
)
,DupAccessions AS
(
SELECT Accession
FROM MyTable
WHERE Status = 'F'
GROUP BY Accession
HAVING COUNT(AccountID) > 1
)
select a.AccountID, a.Accession, a.Status
FROM NonDupAccountIDs a
INNER JOIN DupAccessions b
ON a.Accession = b.Accession

Another alternative
Declare #Table table (id int,AccountID varchar(25),Accession varchar(25),Status varchar(25))
Insert into #Table (id , AccountID , Accession , Status) values
(1, 1234567,'ABCD','F'),
(2, 1234567,'ABCD','F'),
(3, 2345678,'BCDE','F'),
(4, 8765432,'BCDE','F'),
(5, 3456789,'CDEF','F'),
(6, 9876543,'CDEF','A')
Select A.*
from #Table A
Join (
Select Accession
From #Table
Where Status='F'
Group By Accession
Having Min(Accession)=Max(Accession)
and count(Distinct AccountID)>1
) B on a.Accession=B.Accession
Returns
id AccountID Accession Status
3 2345678 BCDE F
4 8765432 BCDE F

This works as well. If there are multiple sets of duplicates, this only returns one with the highest ID. Example
John Cappelletti had a great solution as well, his returns all duplicated values if there exists any incongruity. Example
I had to add some more data to see what would happen. You should decide how you will treat these occurrences.
select
max(ID) ID,AccountID, Accession
from p where Status = 'F'
group by AccountID, Accession
having
(select count(Accession) from (select max(ID) ID,AccountID, Accession from p where Status = 'F' group by AccountID, Accession) f where f.accession = p.accession)>1
;

SELECT t2.Id, t1.AccountID, t1.Accession, t1.Status
FROM TABLE_NAME t2
INNER JOIN (
SELECT AccountID, Accession, Status
FROM TABLE_NAME
GROUP BY Status, Accession, AccountID
) t1
ON t1.AccountID = t2.AccountID
Might need to play with this but should get you close. Remember to replace TABLE_NAME with your table.

Related

sql - match two of the same values in different column positions

I am looking to join two different tables on the id, and need to extract unique names out of each table; if one table has a certain name but the other doesn't, there should be one value and one null. This should be vice versa as well.
With joins, the current output looks like this:
id name_1 name_2
1 max steph
1 max john
1 john chris
1 john chris
1 chris steph
1 chris null
1 null max
1 null null
1 tony john
1 tony max
expected output:
id name_1 name_2
1 max max
1 john john
1 chris chris
1 null steph
1 tony null
current sql:
select
table1.id,
table1.name as name_1,
table2.name as name_2
from table1
left join table2
on table1.id = table2.id
(snowflake)
SELECT
NVL(d1.id, d2.id) as id,
d1.name as name_1,
d2.name as name_2
FROM (
SELECT DISTINCT id,name FROM table1
) AS d1
FULL OUTER JOIN (
SELECT DISTINCT id,name FROM table2
) AS d2
ON d1.id = d2.id AND d1.name = d2.name
ORDER BY 1, (d1.name,d2.name)
This takes the distinct id,name pairs from both table, then full outer joins those sets of values. Thus if the id,name are in both they match. And if they don't match they are still keep.
So with these CTE's providing the fake data:
WITH table1(id,name) AS (
select * from values (1,'aa'),(1,'ab'),(2,'ba')
), table2(id,name) AS (
select * from values (1,'aa'),(1,'ac'),(2,'ba'),(2,'bb')
)
ID
NAME_1
NAME_2
1
aa
aa
1
ab
null
1
null
ac
2
ba
ba
2
null
bb
Following can be used for this -
with cte as
(
select distinct t1.id,name_1 from t1)
select distinct ifnull(t2.id,cte.id) id,
cte.name_1,
t2.name_2
from t2 full outer join cte
ON cte.id=t2.id
and cte.name_1 = t2.name_2
order by cte.name_1;
+----+--------+--------+
| ID | NAME_1 | NAME_2 |
|----+--------+--------|
| 1 | chris | chris |
| 1 | john | john |
| 1 | max | max |
| 1 | tony | NULL |
| 1 | NULL | steph |
+----+--------+--------+
Add a WHERE clause.
select
table1.id,
table1.name as name_1,
table2.name as name_2
from table1
left join table2
WHERE table1.name = table2.name
OR table1.name is null
OR table2.name is null
on table1.id = table2.id
If you just need a list of unique names
select distinct name from table1
union
select distinct name from table2
Simeons answer is the way to go since snowflake supports full outer joins. But for those of you that use a relational database that lacks support for full outer joins, and have the same issue, this approach can be an alternative:
select id,
if(instr(group_concat(tb), 1), name, NULL) name_1,
if(instr(group_concat(tb), 2), name, NULL) name_2
from(
select id, name, 1 tb from table1
union
select id, name, 2 tb from table2
) a
group by id, name
order by name
The result:
| id | name_1 | name_2 |
| --- | ------ | ------ |
| 1 | chris | chris |
| 1 | john | john |
| 1 | max | max |
| 1 | null | steph |
| 1 | tony | null |
Fake data:
CREATE TABLE table1 (
id int(11),
name varchar(50)
);
CREATE TABLE table2 (
id int(11),
name varchar(50)
);
INSERT INTO table1 VALUES
(1, 'max'),
(1, 'john'),
(1, 'chris'),
(1, 'tony');
INSERT INTO table2 VALUES
(1, 'steph'),
(1, 'john'),
(1, 'chris'),
(1, 'max');
And a dbfiddle: https://www.db-fiddle.com/f/gQ4U7hu2S2EyFEtZrapqdu/6

Displaying data in a different way

I have a log table that looks like this:
ProductId | OldDescription | NewDescription | OldTagId | NewTagId |
-------------------------------------------------------------------
12345 | description1 | description2 | 1 | 5 |
and I want to display it this way:
ProductId | ChangeId | OldVal | NewVal |
----------------------------------------------------------
12345 | 1 | description1 | description2 |
12345 | 2 | 1 | 5 |
Where the data in the ChangeId corresponds to the type of the value changed (Description, TagId)
How could I approach this?
Thank you
Just another option via CROSS APPLY
Example
Declare #YourTable Table ([ProductId] varchar(50),[OldDescription] varchar(50),[NewDescription] varchar(50),[OldTagId] int,[NewTagId] int)
Insert Into #YourTable Values
(12345,'description1','description2',1,5)
Select ProductID
,B.*
From #YourTable A
Cross Apply ( values (1,[OldDescription],[NewDescription])
,(2,left([OldTagId],25),left([NewTagId],25))
) B(ChangeID,OldVal,NewVal)
Returns
ProductID ChangeID OldVal NewVal
12345 1 description1 description2
12345 2 1 5
Just for fun:
I saw the comment of 30 columns. If performance is NOT essential, here is option that will dynamically pivot your data without actually using dynamic SQL
Select *
From (
Select ProductID
,C.*
From #YourTable A
Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
Cross Apply (
Select Item = left(xAttr.value('local-name(.)', 'varchar(100)'),3)+'Val'
,Value = xAttr.value('.','varchar(100)')
,ChangeID = ((row_number() over (order by (select null)) - 1 ) / 2)+1
From XMLData.nodes('//#*') xNode(xAttr)
Where xAttr.value('local-name(.)','varchar(100)') not in ('ProductID','Other','ColumnsToExclude')
) C
) src
Pivot ( max(Value) for Item in ([OldVal],[NewVal]) ) pvt
See if something like that works for you:
SELECT ProductId,
1 AS ChangeId,
OldDescription AS OldVal,
NewDescription AS NewVal
FROM log
UNION
SELECT ProductId,
2 AS ChangeId,
OldTagId AS OldVal,
NewTagId AS NewVal
FROM log
ORDER BY ProductId,
ChangeId

SQL Server split field and create addition rows with values from main row

I have a table with rows and in one field there are values like this A,B,C
Table 'Mytable':
|ID | Date | MyValue | SplitID |
|1 | 2019-12-17 | A | |
|2 | 2019-12-15 | A,B | |
|3 | 2019-12-16 | B,C | |
Result should be:
|1 | 2019-12-17 | A | 1 |
|2 | 2019-12-15 | A | 2 |
|4 | 2019-12-15 | B | 2 |
|3 | 2019-12-16 | B | 3 |
|5 | 2019-12-16 | C | 3 |
(Sorry, I could not find HOW to format a table in the Stackoverflow help)
I tried a inline table function which splits the Field Myvalue into more lines but could not pass my rows with
charindex(',',[MyValue])>0
from MyTable as input lines.
The code is this:
ALTER function [dbo].[fncSplitString](#input Varchar(max), #Splitter Varchar(99), #ID int)
returns table as
Return
with tmp (DataItem, ix, ID) as
( select LTRIM(#input) , CHARINDEX('',#Input), #ID --Recu. start, ignored val to get the types right
union all
select LTRIM(Substring(#input, ix+1,ix2-ix-1)), ix2, #ID
from (Select *, CHARINDEX(#Splitter,#Input+#Splitter,ix+1) ix2 from tmp) x where ix2<>0
) select DataItem,ID from tmp where ix<>0
Thanks for help
Michael
You can try the following query.
Create table #Temp(
Id int,
DateField Date,
MyValue Varchar(10),
SplitID int
)
CREATE FUNCTION [dbo].[SplitPra] (#Value VARCHAR(MAX), #delimiter CHAR)
RETURNS #DataResult TABLE([Position] TINYINT IDENTITY(1,1),[Value] NVARCHAR(128))
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#Value, #delimiter, ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #DataResult ([Value])
SELECT RTRIM(LTRIM(T.c.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') T(c)
RETURN
END
insert into #Temp Values(1, '2019-12-17', 'A', NULL),(2, '2019-12-15', 'A,B', NULL), (3, '2019-12-16', 'B,C', NULL)
Select
#Temp.Id, DateField, b.Value as MyValue, b.Id as SplitValue
from #Temp inner join (
select
Id, f.*
from
#Temp u
cross apply [dbo].[SplitPra](u.MyValue, ',') f
)b on #Temp.Id = b.Id
Drop table #Temp
This will give an output as shown below.
Id DateField MyValue SplitValue
---------------------------------
1 2019-12-17 A 1
2 2019-12-15 A 2
2 2019-12-15 B 2
3 2019-12-16 B 3
3 2019-12-16 C 3
You can find the live demo here.
I found this solution, i hope it will work for you. But i didn't use your function to solve this problem. Instead of that, i used cross apply function.You can find the query below:
-- Creating Test Table
CREATE TABLE #Test
(
ID int,
Date date,
MyValue nvarchar(max),
SplitID int
);
GO
-- Inserting data into test table
INSERT INTO #Test VALUES (1, '2019-12-17', 'A', NULL);
INSERT INTO #Test VALUES (2, '2019-12-15', 'A,B', NULL);
INSERT INTO #Test VALUES (3, '2019-12-16', 'B,C', NULL);
GO
-- Select query
SELECT
*,
(SELECT ID FROM test t1 WHERE t.Date = t1.date) AS SplitID
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY ID) AS ID,
Date,
substring(A.value,1,
CASE WHEN charindex(',',rtrim(ltrim(A.value))) = 0 then LEN(A.value)
ELSE charindex(',',rtrim(ltrim(A.value))) -1 end) as MyValue
FROM Test
CROSS APPLY string_split (MyValue, ',') A) AS T
ORDER BY MyValue ASC;
And the result must be like that:
ID Date MyValue SplitID
1 2019-12-17 A 1
2 2019-12-15 A 2
3 2019-12-15 B 2
4 2019-12-16 B 3
5 2019-12-16 C 3

How to fractionally split a column while doing a inner join in SQL Server

I wanted to basically split a column:
Table A has 3 columns: Id, Name, Number
Table b has 3 columns: Id, Name, School
I join on the basis of Id.
Suppose that the number is 100, the Name might have multiple schools (suppose 3), so I want to equally split 100 by 3 and do the join.
Sample Final Table
Id Name School Number
---------------------------
1 ABC A 33.33
1 ABC B 33.33
1 ABC C 33.33
You can get the count for each id using count(*) over(partition by a.Id) and divide the number by that.
test setup: rextester: http://rextester.com/JQK48793
create table a (id int, name char(3), number decimal(9,2))
insert into a
values (1,'ABC',100.0)
create table b (id int, name char(3), school char(1))
insert into b values
(1,'ABC','A')
,(1,'ABC','B')
,(1,'ABC','C')
query:
select
a.Id
, a.Name
, b.School
, Number = (a.Number+.0) / count(*) over (partition by a.Id)
from a
inner join b
on a.Id = b.Id
results:
+----+------+--------+------------------+
| Id | Name | School | Number |
+----+------+--------+------------------+
| 1 | ABC | A | 33,3333333333333 |
| 1 | ABC | B | 33,3333333333333 |
| 1 | ABC | C | 33,3333333333333 |
+----+------+--------+------------------+

Select rows based on count of child table

I have three entities: department, employee, and report. A department has many employees, each of whom has many reports. I want to select the one employee in each department who has the most reports. I have no idea how to even start this query. This question seems very similar, but I can't figure out how to manipulate those answers for what I want.
I have full access to the entire system, so I can make any changes necessary. In the event of a tie, it's safe to arbitrarily pick one of the results.
Department:
ID | Name
----|------
1 | DeptA
2 | DeptB
3 | DeptC
4 | DeptD
Employee:
ID | Name | DeptID
----|------|--------
1 | Joe | 1
2 | John | 1
3 | Emma | 2
4 | Jack | 3
5 | Sven | 3
6 | Axel | 4
7 | Brad | 4
8 | Jane | 4
Report:
ID | EmployeeID
----|------------
1 | 1
2 | 2
3 | 3
4 | 5
5 | 6
6 | 6
7 | 8
Desired result (assuming I queried names only):
Joe OR John (either is acceptable)
Emma
Sven
Axel
How to start this query? Well, get the information about each employee, the department, and the number of reports:
select e.name, e.deptid, count(*) as numreports
from employee e join
reports r
on e.id = r.employeeid
group by e.name, e.deptid;
Now you just want the largest count in each department. I would suggest row_number() or rank() depending on how you want to handle ties:
select er.*
from (select e.name, e.deptid, count(*) as numreports,
row_number() over (partition by e.deptid order by count(*) desc) as seqnum
from employee e join
reports r
on e.id = r.employeeid
group by e.name, e.deptid
) er
where seqnum = 1;
If you want the department name instead of number, you can join that in as well.
From your Question schema will be
SELECT * into #Department FROM(
select 1 ID,'DEPTA' NAME
UNION ALL
select 2,'DEPTB'
UNION ALL
select 3,'DEPTC'
UNION ALL
select 4,'DEPTD')TAB
SELECT * INTO #Employee FROM (
SELECT 1 ID ,'Joe' Name , 1 DeptID
UNION ALL
SELECT 2 , 'John' , 1
UNION ALL
SELECT 3 , 'Emma' ,2
UNION ALL
SELECT 4 ,'Jack' , 3
UNION ALL
SELECT 5 ,'Sven' , 3
UNION ALL
SELECT 6 , 'Axel' , 4
UNION ALL
SELECT 7 ,'Brad' , 4
UNION ALL
SELECT 8 ,'Jane' , 4)AS A
SELECT * INTO #Report FROM(
SELECT 1 ID ,1 EmployeeID
UNION ALL
SELECT 2, 2
UNION ALL
SELECT 3 ,3
UNION ALL
SELECT 4, 5
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 6
UNION ALL
SELECT 7, 8
UNION ALL
SELECT 8, 8
UNION ALL
SELECT 9, 8
)AS A
And you need to apply DENSE_RANK() for giving rank based on no of reports(count)
;WITH CTE AS(
select DEP.ID DEP_ID, DEP.NAME DEP,EMP.ID EMP_ID, EMP.Name EMP
,DENSE_RANK() OVER(PARTITION BY DEP.ID ORDER BY COUNT(REP.ID) DESC) REP_RANK
,COUNT(REP.ID) NO_OF_REP FROM #Department DEP
inner join #Employee emp on emp.deptid=dep.id
inner join #report rep on rep.EmployeeID=emp.id
GROUP BY DEP.ID, DEP.NAME ,EMP.ID, EMP.Name
)
SELECT DEP, EMP, NO_OF_REP FROM CTE WHERE REP_RANK=1
Here in the DEPTA Joe & John both will be picked because both are having 1 report count which is a max count in DEPTA.
And the result will be
+-------+------+-----------+
| DEP | EMP | NO_OF_REP |
+-------+------+-----------+
| DEPTA | Joe | 1 |
| DEPTA | John | 1 |
| DEPTB | Emma | 1 |
| DEPTC | Sven | 1 |
| DEPTD | Jane | 3 |
+-------+------+-----------+
Please try the below code:-
SELECT D.NAME
FROM (
SELECT C.NAME, RANK() OVER (
PARTITION BY C.DEPTID ORDER BY C.COUNTS DESC
) RNK
FROM (
SELECT EMPID, NAME, COUNT(EMPID) AS COUNTS, DEPTID
FROM DBO.REPORT AS A
JOIN DBO.EMPLO AS B ON A.EMPID = B.ID
GROUP BY EMPID, NAME, DEPTID
) AS C
) AS D
WHERE D.RNK = 1

Resources