Finding duplicate records with different IDs within a table - sql-server

I have a table as below:
ID Product# Service# ServiceDate
1 100 122 2017-01-02
2 100 124 2017-03-02
3 122 133 2017-04-02
100 100 122 2017-05-02
I need to find the records that have the same product# and service# but different IDs. For this, I wrote the code below:
Select *
FROM MyTable as M1 Inner join
MyTable as M2 on
M1.Product#=M2.Product# and M1.Service#=M2.Service# and M1.ID!=M2.ID
However, I get duplicate results as such:
ID Product# Service# ServiceDate ID Product# Service# ServiceDate
1 100 122 2017-01-02 100 100 122 2017-05-02
100 100 122 2017-05-02 1 100 122 2017-01-02
Any idea how to eliminate these duplicate rows? I need to see a result as such:
ID Product# Service# ServiceDate ID Product# Service# ServiceDate
1 100 122 2017-01-02 100 100 122 2017-05-02

Try the following:
Select *
FROM MyTable as M1
Inner join MyTable as M2 on M1.Product#=M2.Product# and M1.Service#=M2.Service# and M1.ID!=M2.ID
where m1.id < m2.id
Explanation: Your example shows both sides of each coin; by limiting it to having one of the ID's being less than the other, you'll automatically have just half of the records, effectively getting you all unique combinations.
Bonus: For fun, I tried to add one more duplicate row to your sample data set, and it worked just as expected.

If you're wanting to return just two rows without the duplicate columns, replace
Select *
with
Select M1.*

Related

SQL: insert total count row after joining table

The first step is to join staff and customer together. The second step is to count the distinct product_id. My target is to add the total(sum) field under the result table.
Thanks.
staff
staff_ID Name cust_id
1 Tom 101
1 Tom 101
1 Tom 105
2 Peter 102
2 Peter 104
3 Billy 103
customer
cust_id product_id
101 A1
102 A2
103 A3
104 A4
105 A5
My work:
SELECT a.staff_name,COUNT(DISTINCT a.product_id)
FROM (SELECT distinct a.staff_id, a.staff_name, a.cust_id
FROM staff)a
LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUPBY a.staff_name
What I want is to add the total column below the count.
Name count
Tom 2
Peter 2
Billy 1
Total 5
Update:
Regarding the "Total", as #MatBailie correctly pointed out in the comments:
The aggregate of multiple COUNT(DISTINCT) rows CAN NOT be guaranteed to be summable. If two staff members share the same product_id the summary value will be LESS THAN the sum of its members.
So for this sample data set:
db<>fiddle here
cust_id
product_id
101
A1
102
A2
103
A3
104
A4 <== Same product
105
A5
105
A4 <== Same product
Using GROUP BY ROLLUP yields a "Total" value of 5:
SELECT COALESCE(a.staff_name, 'Total') AS Staff_Name
, COUNT(DISTINCT b.product_id) AS [Count]
FROM staff a LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUP BY ROLLUP (a.staff_name);
Results:
Staff_Name
Count
Billy
1
Peter
2
Tom
3
Total
5 **
Whereas calculating a simple sum of the totals, yields a "Total" value of 6. So just be aware of the difference.
Staff_Name
Count
Billy
1
Peter
2
Tom
3
Total
6 **
Original (Wrong Answer):
Can't remember where I saw this answer, but ... assuming Staff_Name is never null, you could use GROUP BY ROLLUP to obtain the total. That said calculating grand totals is usually more of a front-end job.
SELECT COALESCE(a.staff_name, 'Total') AS Staff_Name
, COUNT(DISTINCT b.product_id) AS [Count]
FROM staff a LEFT JOIN customer b ON a.cust_id=b.cust_id
GROUP BY ROLLUP (a.staff_name);
Try this one:
SELECT s.staff_name, COUNT(DISTINCT b.product_id), SUM(count) Total
FROM staff s
INNER JOIN customer b ON b.cust_id = s.cust_id
GROUP BY s.staff_name

Find two rows of a column belongs to same row of another column

I have a table where I need to find list of subjects that have students from same department without using a subquery or Join
I tried to do the having count of department but it does not provide the output.
SELECT A.Subject,
B.StudentID,
B.DEPTID
FROM AUTHOR A , ACADEMIC B
WHERE A.StudentID = B.StudentID
GROUP BY B.DEPT,
A.Subject,
B.StudentID
Gives me the table output
Subject StudentID DEPT
1 100 100
1 101 100
2 102 100
3 103 100
3 104 100
I expect the output to give me the subject that has studentID from same department without using subquery or JOIN.

SQL server select statement to select the ids of a duplicated entries of another column

Consider the table 'Table1' as below
main_id main_item_id
-------- ---------
1 101
1 102
2 105
2 105
3 105
3 106
4 101
4 101
4 102
I need to fetch main_id 2 and 4 as it has duplicate main_item_id among 1 million other records
Thanks in advance.
This will select all unique main_id's which have 2 or more identical main_item_id's:
SELECT DISTINCT T.main_id
FROM YourTable T
GROUP BY T.main_id
, T.Main_item_id
HAVING COUNT(1) > 1
Use group by clause to check the duplication
SELECT main_id, main_item_id
FROM table
GROUP BY main_id, main_item_id
HAVING count(*) > 1

How would you write a T-SQL query that supported event study analysis

I trying to create a table that will support a simple event study analysis, but I'm not sure how best to approach this.
I'd like to create a table with the following columns: Customer, Date, Time on website, Outcome. I'm testing the premise that the outcome for a particular customer on any give day if a function of the time spent on the website on the current day as well as the preceding five site visits. I'm envisioning a table similar to this:
I'm hoping to write a T-SQL query that will produce an output like this:
Given this objective, here are my questions:
Assuming this is indeed possible, how should I structure my table to accomplish this objective? Is there a need for a column that refers to the prior visit? Do I need to add an index to a particular column?
Would this be considered a recursive query?
Given the appropriate table structure, what would the query look like?
Is it possible to structure the query with a variable that determines the number of prior periods to include in addition to the current period (for example, if I want to compare 5 periods to 3 periods)?
Not sure I understand analytic value of your matrix
Declare #Table table (id int,VisitDate date,VisitTime int,Outcome varchar(25))
Insert Into #Table (id,VisitDate,VisitTime,Outcome) values
(123,'2015-12-01',100,'P'),
(123,'2016-01-01',101,'P'),
(123,'2016-02-01',102,'N'),
(123,'2016-03-01',100,'P'),
(123,'2016-04-01', 99,'N'),
(123,'2016-04-09', 98,'P'),
(123,'2016-05-09', 99,'P'),
(123,'2016-05-14',100,'N'),
(123,'2016-06-13', 99,'P'),
(123,'2016-06-15', 98,'P')
Select *
,T0 = VisitTime
,T1 = Lead(VisitTime,1,0) over(Partition By ID Order By ID,VisitDate Desc)
,T2 = Lead(VisitTime,2,0) over(Partition By ID Order By ID,VisitDate Desc)
,T3 = Lead(VisitTime,3,0) over(Partition By ID Order By ID,VisitDate Desc)
,T4 = Lead(VisitTime,4,0) over(Partition By ID Order By ID,VisitDate Desc)
,T5 = Lead(VisitTime,5,0) over(Partition By ID Order By ID,VisitDate Desc)
From #Table
Order By ID,VisitDate Desc
Returns
id VisitDate VisitTime Outcome T0 T1 T2 T3 T4 T5
123 2016-06-15 98 P 98 99 100 99 98 99
123 2016-06-13 99 P 99 100 99 98 99 100
123 2016-05-14 100 N 100 99 98 99 100 102
123 2016-05-09 99 P 99 98 99 100 102 101
123 2016-04-09 98 P 98 99 100 102 101 100
123 2016-04-01 99 N 99 100 102 101 100 0
123 2016-03-01 100 P 100 102 101 100 0 0
123 2016-02-01 102 N 102 101 100 0 0 0
123 2016-01-01 101 P 101 100 0 0 0 0
123 2015-12-01 100 P 100 0 0 0 0 0
With fixed columns you can do it like this with lag:
select
time,
lag(time, 1) over (partition by customer order by date desc),
lag(time, 2) over (partition by customer order by date desc),
lag(time, 3) over (partition by customer order by date desc),
lag(time, 4) over (partition by customer order by date desc)
from
yourtable
If you need dynamic columns, then you'll have to build it using dynamic SQL.

Can I select data ordered by a field a way that it values would be different in consecutive rows?

I have a table of products with manufacturers field and I need to extract data so that two manufacturers with same id won't stay together.
For example:
id prod_id manf_id
1 100 300
2 101 300
3 102 400
4 103 400
5 103 500
So that result would look smth like:
1 100 300
3 102 400
5 103 500
2 101 300
4 103 400
It doesn't matter too much in the example above if there'll be sequences with ids that has same neighbours (300-400-300) but it would be more interesting to see more complex logic so that a single id would have only one neighbour id of the same type (300-400-500).
If such ordering could not be applied - show data with same consecutive values (300-300-300).
Something like this.
SELECT Row_number()OVER(partition BY manf_id ORDER BY id) rn, *
FROM Yourtable
ORDER BY rn,id
Try this.
;with cte as (
select id,prod_id,manuf_id,ROW_NUMBER() over(partition by manuf_id order by id) as row_no
from products
)
select id,prod_id,manuf_id from cte
order by row_no

Resources