Select different values in column from the same table ID in SQL - sql-server

I have searched much about it but I couldn't find any related information about my problem. I have a dataset like this.
Column1 Column2
A B
A B
A C
X B
X B
Y C
Y B
T A
T A
T A
I can distinct Column1 with total number of occurences. But I actually want to remove the constant rows. When I run the query, the result should be like this;
Column1 Column2
A B
A B
A C
Y C
Y B
As we see above A and Y have different values in Column2. How do I query this? I am using Sql Server 2014

You want to count the occurences of the pairings before you do anything
SELECT ColumnA, ColumnB, count(*)
FROM [source]
GROUP BY ColumnA, ColumnB
That will give you a list of each pairing and how often it occurs. Next you want to count how many pairings each value in ColumnA has, and cut out the ones with only one option:
SELECT ColumnA, count(*)
FROM
(
SELECT ColumnA, ColumnB, count(*)
FROM [source]
GROUP BY ColumnA, ColumnB
)
GROUP BY ColumnA
HAVING count(*) > 1
That will give you a list of ColumnA values that you're looking for. From there you want to look for those values of ColumnA in your original data with a WHERE .. IN statement:
SELECT ColumnA, ColumnB
FROM [source]
WHERE ColumnA IN
(
SELECT ColumnA, count(*)
FROM
(
SELECT ColumnA, ColumnB, count(*)
FROM [source]
GROUP BY ColumnA, ColumnB
)
GROUP BY ColumnA
HAVING count(*) > 1
)

COUNT(DISTINCT...) might work with a CTE:
; WITH CTE AS (
SELECT Column1
FROM [my_table]
GROUP BY Column1
HAVING COUNT(DISTINCT Column2) > 1
)
SELECT t.*
FROM [my_table] t
JOIN CTE ON CTE.Column1 = t.Column1;

Related

Snowflake - is it possible to "merge" array output to table query

i have two inputs:
a. Table A with 3 columns: a ,b , c - with 7 rows
b. Table B with an array column d , the array has 7 values.
Is there a way to "merge" the table A and TABLE B in a query - so that the first row of A and the first cell value of B will printed in the same row ?
Input:
For example:
Table A- Column 1
a
b
c
Table A- Column 2
d
e
f
Table b - column 1 (and only)
k
r
j
OUTPUT
should be three columns:
two first columns : column a, b (from table a)
third column - which is column 1 from table 2
You can use Flatten to convert your array into a table. Then you just need to define ordering and how to join them together.
Here's an example, where I use a ROW_NUMBER() to provide an ordering. And since I added it to both tableA and the flattened tableB, it can also be used to join the records together.
If you have IDs for ordering or joining, then that might be slightly cleaner, but with the simple columns provided in the example, you need to do something like this to line up the array values to the table rows.
with tA as (
select col1, col2,
ROW_NUMBER() OVER(ORDER BY col1) rnum
from tableA
)
,tB as (
select
x.value::string col1,
ROW_NUMBER() OVER(ORDER BY 1) rnum
from tableB ,
lateral flatten(input => array) x
)
SELECT a.col1, a.col2, b.col1
FROM tA a
JOIN tB b
ON a.rnum = b.rnum;
There is no "first row" in databases. There is a first with respect to an order you place on the data. Thus there a number of way to get just the "first row" of table A and table B.
On is to only select the first rows, I will use CTE but it can be done in a sub-select also. Via a QUALIFY and ROW_NUMBER, and to make it 'stable' I will use the selected output column. After that to JOIN the two tables, I will use a CROSS-JOIN, but given there are only one row from both CTE's this will give just one output row. But this does not feel like what you want.
WITH first_from_table_a AS (
SELECT column1, column2
FROM table_a
QUALIFY ROW_NUMBER() OVER (ORDER BY column1) = 1
), first_from_table_b AS (
SELECT column1
FROM table_b
QUALIFY ROW_NUMBER() OVER (ORDER BY column1) = 1
)
SELECT a.column1, a.column2, b.column1
FROM first_from_table_a AS a
CROSS JOIN first_from_table_b AS b
The problem with this is, if there are other things you are want to do this over it doesn't scale.
FIRST_VALUE is a function that could also help, if you join your data on some other schema, and want to chose a value from a larger set, but really the problem needs to be clarified more.
Another way to consider your question is to use the same ROW_NUMBER idea, and join on those, thus:
WITH first_from_table_a AS (
SELECT column1,
column2,
ROW_NUMBER() OVER (ORDER BY column1) AS rn
FROM table_a
), first_from_table_b AS (
SELECT column1,
ROW_NUMBER() OVER (ORDER BY column1) AS rn
FROM table_b
)
SELECT a.column1, a.column2, b.column1
FROM first_from_table_a AS a
JOIN first_from_table_b AS b
ON a.rn = b.rn
ORDER BY a.rn

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.
This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.
Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

Oracle: Delete Duplicate Rows, Comparison Excluding ID Column [duplicate]

I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.
How can I delete all duplicate rows and leave only one of them?
Use the rowid pseudocolumn.
DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);
Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.
From Ask Tom
delete from t
where rowid IN ( select rid
from (select rowid rid,
row_number() over (partition by
companyid, agentid, class , status, terminationdate
order by rowid) rn
from t)
where rn <> 1);
(fixed the missing parenthesis)
From DevX.com:
DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3...) ;
Where column1, column2, etc. is the key you want to use.
DELETE FROM tablename a
WHERE a.ROWID > ANY (SELECT b.ROWID
FROM tablename b
WHERE a.fieldname = b.fieldname
AND a.fieldname2 = b.fieldname2)
Solution 1)
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
Solution 2)
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
Solution 3)
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
create table t2 as select distinct * from t1;
You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:
declare
prev_var my_table.var1%TYPE;
begin
for t in (select var1 from my_table order by var 1) LOOP
-- if previous var equal current var, delete the row, else keep on going.
end loop;
end;
To select the duplicates only the query format can be:
SELECT GroupFunction(column1), GroupFunction(column2),...,
COUNT(column1), column1, column2...
FROM our_table
GROUP BY column1, column2, column3...
HAVING COUNT(column1) > 1
So the correct query as per other suggestion is:
DELETE FROM tablename a
WHERE a.ROWID > ANY (SELECT b.ROWID
FROM tablename b
WHERE a.fieldname = b.fieldname
AND a.fieldname2 = b.fieldname2
AND ....so on.. to identify the duplicate rows....)
This query will keep the oldest record in the database for the criteria chosen in the WHERE CLAUSE.
Oracle Certified Associate (2008)
create table abcd(id number(10),name varchar2(20))
insert into abcd values(1,'abc')
insert into abcd values(2,'pqr')
insert into abcd values(3,'xyz')
insert into abcd values(1,'abc')
insert into abcd values(2,'pqr')
insert into abcd values(3,'xyz')
select * from abcd
id Name
1 abc
2 pqr
3 xyz
1 abc
2 pqr
3 xyz
Delete Duplicate record but keep Distinct Record in table
DELETE
FROM abcd a
WHERE ROWID > (SELECT MIN(ROWID) FROM abcd b
WHERE b.id=a.id
);
run the above query 3 rows delete
select * from abcd
id Name
1 abc
2 pqr
3 xyz
This blog post was really helpful for general cases:
If the rows are fully duplicated (all values in all columns can have copies) there are no columns to use! But to keep one you still need a unique identifier for each row in each group.
Fortunately, Oracle already has something you can use. The rowid.
All rows in Oracle have a rowid. This is a physical locator. That is, it states where on disk Oracle stores the row. This unique to each row. So you can use this value to identify and remove copies. To do this, replace min() with min(rowid) in the uncorrelated delete:
delete films
where rowid not in (
select min(rowid)
from films
group by title, uk_release_date
)
The Fastest way for really big tables
Create exception table with structure below:
exceptions_table
ROW_ID ROWID
OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
CONSTRAINT VARCHAR2(30)
Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain
the rowids for the duplicate rows.
alter table add constraint
unique --or primary key
(dupfield1,dupfield2) exceptions into exceptions_table;
Join your table with exceptions_table by rowid and delete dups
delete original_dups where rowid in (select ROW_ID from exceptions_table);
If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table
create table new_table_with_no_dups AS (
select field1, field2 ........
from original_dups t1
where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
)
Using rowid-
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
Using self join-
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
Solution 4)
delete from emp where rowid in
(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid
) rn
from emp
)
where rn > 1
);
1. solution
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
2. sloution
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
3.solution
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
4. solution
delete from emp where rowid in
(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid
) rn
from emp
)
where rn > 1
);
5. solution
delete from emp where rowid in
(
select rid from
(
select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp
)
where rn > 1
);
DELETE from table_name where rowid not in (select min(rowid) FROM table_name group by column_name);
and you can also delete duplicate records in another way
DELETE from table_name a where rowid > (select min(rowid) FROM table_name b where a.column=b.column);
DELETE FROM tableName WHERE ROWID NOT IN (SELECT MIN (ROWID) FROM table GROUP BY columnname);
delete from dept
where rowid in (
select rowid
from dept
minus
select max(rowid)
from dept
group by DEPTNO, DNAME, LOC
);
For best performance, here is what I wrote :
(see execution plan)
DELETE FROM your_table
WHERE rowid IN
(select t1.rowid from your_table t1
LEFT OUTER JOIN (
SELECT MIN(rowid) as rowid, column1,column2, column3
FROM your_table
GROUP BY column1, column2, column3
) co1 ON (t1.rowid = co1.rowid)
WHERE co1.rowid IS NULL
);
Check below scripts -
1.
Create table test(id int,sal int);
2.
insert into test values(1,100);
insert into test values(1,100);
insert into test values(2,200);
insert into test values(2,200);
insert into test values(3,300);
insert into test values(3,300);
commit;
3.
select * from test;
You will see here 6-records.
4.run below query -
delete from
test
where rowid in
(select rowid from
(select
rowid,
row_number()
over
(partition by id order by sal) dup
from test)
where dup > 1)
select * from test;
You will see that duplicate records have been deleted.
Hope this solves your query.
Thanks :)
I didn't see any answers that use common table expressions and window functions.
This is what I find easiest to work with.
DELETE FROM
YourTable
WHERE
ROWID IN
(WITH Duplicates
AS (SELECT
ROWID RID,
ROW_NUMBER()
OVER(
PARTITION BY First_Name, Last_Name, Birth_Date)
AS RN
SUM(1)
OVER(
PARTITION BY First_Name, Last_Name, Birth_Date
ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS CNT
FROM
YourTable
WHERE
Load_Date IS NULL)
SELECT
RID
FROM
duplicates
WHERE
RN > 1);
Somethings to note:
1) We are only checking for duplication on the fields in the partition clause.
2) If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1
3) You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).
4) Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".
solution :
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
create or replace procedure delete_duplicate_enq as
cursor c1 is
select *
from enquiry;
begin
for z in c1 loop
delete enquiry
where enquiry.enquiryno = z.enquiryno
and rowid > any
(select rowid
from enquiry
where enquiry.enquiryno = z.enquiryno);
end loop;
end delete_duplicate_enq;
This is similar to the top answer but gives me a much better explain plan:
delete from your_table
where rowid in (
select max(rowid)
from your_table
group by column1, column2, column3
having count(*) > 1
);

Why doesn't this union give me two columns?

I want two columns in the output of the join. I only get one, the storeID. The StoreComponentID is not there.
if you want two column you need to declare two columns
SELECT column1, NULL as column2 -- even when Table1 doesnt have column2
FROM Table1
UNION
SELECT NULL as column1, column2 -- even when Table2 doesnt have column1
FROM Table2
Now if you want some kind of merge side by side.
WITH idA as (
SELECT StoreComponentID,
ROW_NUMBER() OVER (ORDER BY StoreComponentID) as rn
FROM StoreComponent
), idB as (
SELECT StoreID
ROW_NUMBER() OVER (ORDER BY StoreID) as rn
FROM Store
)
SELECT idA.StoreComponentID,
idB.StoreID
FROM idA
FULL JOIN idB
ON idA.rn = idB.rn
I figured out a simple solution:
select S.storeid as sID, SC.storecomponentid as SCID from tstore as S, tstorecomponent as SC

How to join sequential numbers to unrelated data (SQL Server)

This question is a followup to a previous question I had about discovering unused sequential number ranges without having to resort to cursors (Working with sequential numbers in SQL Server 2005 without cursors). I'm using SQL Server 2005.
What I need to do with those numbers is to assign those numbers to records in a table. I just can't seem to come up with a way to actually relate the numbers table with the records that need those numbers.
One possible solution that came to mind was insert the records in a temp table using an identity and using the beginning of the number range as an identity seed. The only problem with this approach is that if there are gaps in the number sequence then I'll end up with duplicate control numbers.
This is how my tables look like (overly simplified):
Numbers table:
Number
-------
102314
102315
102319
102320
102324
102329
Data table:
CustomerId PaymentAmt ControlNumber
---------- ---------- -------------
1001 4502.01 NULL
1002 890.00 NULL
9830 902923.34 NULL
I need a way to make it so i end up with:
CustomerId PaymentAmt ControlNumber
---------- ---------- -------------
1001 4502.01 102314
1002 890.00 102315
9830 902923.34 102319
Is this possible without having to use cursors? The reason I'm avoiding cursors is because our current implementation uses cursors and since its so slow (8 minutes over 12,000 records) I was looking for alternatives.
Note: Thanks to all who posted answers. All of them were great, I had to pick the one that seemed easier to implement and easiest to maintain for whomever comes after me. Much appreciated.
Try this:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY CustomerId) Corr
FROM DataTable
)
UPDATE CTE
SET CTE.ControlNumber = B.Number
FROM CTE
JOIN ( SELECT Number, ROW_NUMBER() OVER(ORDER BY Number) Corr
FROM NumberTable) B
ON CTE.Corr = B.Corr
Buidling on Martin's code from the linked question, you could give all rows without control number a row number. Then give all unused numbers a row number. Join the two sets together, and you get a unique number per row:
DECLARE #StartRange int, #EndRange int
SET #StartRange = 790123401
SET #EndRange = 790123450;
; WITH YourTable(ControlNumber, CustomerId) AS
(
SELECT 790123401, 1000
UNION ALL SELECT 790123402, 1001
UNION ALL SELECT 790123403, 1002
UNION ALL SELECT 790123406, 1003
UNION ALL SELECT NULL, 1004
UNION ALL SELECT NULL, 1005
UNION ALL SELECT NULL, 1006
)
, YourTableNumbered(rn, ControlNumber, CustomerId) AS
(
select row_number() over (
partition by IsNull(ControlNumber, -1)
order by ControlNumber)
, *
from YourTable
)
, Nums(N) AS
(
SELECT #StartRange
UNION ALL
SELECT N+1
FROM Nums
WHERE N < #EndRange
)
, UnusedNums(rn, N) as
(
select row_number() over (order by Nums.N)
, Nums.N
from Nums
where not exists
(
select *
from YourTable yt
where yt.ControlNumber = Nums.N
)
)
select ytn.CustomerId
, IsNull(ytn.ControlNumber, un.N)
from YourTableNumbered ytn
left join
UnusedNums un
on un.rn = ytn.rn
OPTION (MAXRECURSION 0)
All you need is a deterministic order in data table. If you have that, you can use ROW_NUMBER() as a join condition:
with cte as (
select row_number() over (order by CustomerId) as [row_number],
ControlNumber
from [Data Table]
where ControlNumber is null),
nte as (
select row_number() over (order by Number) as [row_number],
Number
from [Numbers])
update cte
set ControlNumber = Number
from cte
join nte on nte.[row_number] = cte.[row_number];
If you need it to be concurency proof, it does get more complex.
EDITED added in code to remove used values from #Number, via the OUTPUT caluse of the UPDATE and a DELETE
try using ROW_NUMBER() to join them:
DECLARE #Number table (Value int)
INSERT #Number VALUES (102314)
INSERT #Number VALUES (102315)
INSERT #Number VALUES (102319)
INSERT #Number VALUES (102320)
INSERT #Number VALUES (102324)
INSERT #Number VALUES (102329)
DECLARE #Data table (CustomerId int, PaymentAmt numeric(10,2),ControlNumber int)
INSERT #Data VALUES (1001, 4502.01 ,NULL)
INSERT #Data VALUES (1002, 890.00 ,NULL)
INSERT #Data VALUES (9830, 902923.34 ,NULL)
DECLARE #Used table (Value int)
;WITH RowNumber AS
(
SELECT Value,ROW_NUMBER() OVER(ORDER BY Value) AS RowNumber FROM #Number
)
,RowData AS
(
SELECT CustomerId,ROW_NUMBER() OVER(ORDER BY CustomerId) AS RowNumber, ControlNumber FROM #Data WHERE ControlNumber IS NULL
)
UPDATE d
SET ControlNumber=r.Value
OUTPUT r.Value INTO #Used
FROM RowData d
INNER JOIN RowNumber r ON d.RowNumber=r.RowNumber
DELETE #Number WHERE Value IN (SELECT Value FROM #Used)
SELECT * FROM #Data
SELECT * FROM #Number
OUTPUT:
CustomerId PaymentAmt ControlNumber
----------- --------------------------------------- -------------
1001 4502.01 102314
1002 890.00 102315
9830 902923.34 102319
(3 row(s) affected)
Value
-----------
102320
102324
102329
(3 row(s) affected)
You'll need something to join the two tables together. Some data value that you can match between the two tables.
I'm assuming there's more to your numbers table than just one column of numbers. If there's anything in there that you can match to your data table you can get away with an update.
How are you updating the data table using cursors?

Resources