T-SQL Remove "duplicate/non-interesting" data rows

T-SQL Remove "duplicate/non-interesting" data rows - sql-server

I have the following set of data (sample given)
ID Status Code Type ModDate
1234 1 1 AB 1995-04-01
1234 1 1 CD 1998-08-31
1234 1 1 AB 2003-08-31
1234 1 NULL AB 2008-11-08
1234 1 2 AB 2013-11-09
1234 1 1 EF 2013-11-18
...
As these data has to be viewed on some sort of timeline, I want to read just the following from the database, as only the Type changes are of interest:
ID Status Code Type ModDate
1234 1 1 AB 1995-04-01
1234 1 1 CD 1998-08-31
1234 1 1 AB 2003-08-31
1234 1 1 EF 2013-11-18
...
How can this be done? I tried to partition the data and give some row numbers, but it gives me headaches becuase the Type is grouped.
SELECT
ID, Status, Code, Type, ModDate,
MIN(ModDate) OVER (PARTITION BY ID, Type) MinModDate,
MAX(ModDate) OVER (PARTITION BY ID, Type) MaxModDate,
ROW_NUMBER() OVER (PARTITION BY ID, Type ORDER BY ModDate) RowNumber
FROM Data
Output:
ID Status Code Type ModDate MinModDate MaxModDate RowNumber
1234 1 1 AB 1995-04-01 1995-04-01 2013-11-09 1
1234 1 1 CD 1998-08-31 1998-08-31 1998-08-31 1
1234 1 1 AB 2003-08-31 1995-04-01 2013-11-09 2
1234 1 NULL AB 2008-11-08 1995-04-01 2013-11-09 3
1234 1 2 AB 2013-11-09 1995-04-01 2013-11-09 4
1234 1 1 EF 2013-11-18 2013-11-18 2013-11-18 1
...
Output expected:
ID Status Code Type ModDate MinModDate MaxModDate RowNumber
1234 1 1 AB 1995-04-01 1995-04-01 2013-11-09 1
1234 1 1 CD 1998-08-31 1998-08-31 1998-08-31 1
1234 1 1 AB 2003-08-31 1995-04-01 2013-11-09 1
1234 1 NULL AB 2008-11-08 1995-04-01 2013-11-09 2
1234 1 2 AB 2013-11-09 1995-04-01 2013-11-09 3
1234 1 1 EF 2013-11-18 2013-11-18 2013-11-18 1
...
Can this be achieved easily without using cursors?

since you use 2012 then this should work:
SELECT ID, Status, Code, Type, ModDate FROM
(
SELECT
ID, Status, Code, Type, ModDate,
lag(type,1) OVER (ORDER BY ID, moddate) prevtype
FROM data
)t WHERE type<>ISNULL(prevtype,'')

Partitioning the data is what you want, you just need to do it by Type, since that the only changes of interest. You also need to add the ROW_NUMBER() function in order to filter the rows you want. Here's an updated query.
;WITH cte AS
(
SELECT ID, [Status], Code, [Type], ModDate
,rn = ROW_NUMBER() OVER (PARTITION BY ModDate ORDER BY ModDate)
FROM #data
)
SELECT ID, [Status], Code, [Type], ModDate
FROM cte
WHERE rn = 1
ORDER BY ModDate, [Type]

If I understood correctly, you just need to wrap your original SQL:
SELECT ID, Status, Code, Type, ModDate FROM
(
SELECT
ID, Status, Code, Type, ModDate,
MIN(ModDate) OVER (PARTITION BY ID, Type) MinModDate,
MAX(ModDate) OVER (PARTITION BY ID, Type) MaxModDate,
ROW_NUMBER() OVER (PARTITION BY ID, Type ORDER BY ModDate) RowNumber
FROM Data
) t
WHERE ModDate=MinModDate

Related

SQL: loop over every entry in a table and create a new one with rowname, Columname and entry

I have a SQL Server table called "tb_Object" that looks like this:
ObjektID Name ID1 ID2 ID3
----------------------------------------
1 O1 1 2.30 0.002
2 O2 2 3.40 0.004
3 O3 1 2.10 0.200
...
I would like to convert it into a table that looks like this:
PK_ID ObjektID Name ColName
-----------------------------------------
1 1 O1 1 ID1
2 2 O2 2 ID1
3 3 O3 1 ID1
...
110 1 O1 2.30 ID2
111 2 O2 3.40 ID2
112 3 O3 2.10 ID2
...
220 1 O1 0.002 ID3
221 2 O2 0.004 ID3
222 3 O3 0.200 ID3
...
I know that this is pretty easy in Python. But I have no clue how I could do this in SQL/T-SQL, especially because I can`t use indexing. Does anyone know how to handle this in a smart way?

If your number of columns is not too high, you could use UNION statement to turn it into the desired format:
;with cte as
(select 1 as ObjektID,'O1' as Name,1 as ID1,2.3 as ID2,0.002 as ID3 UNION
select 2 as ObjektID,'O2' as Name,2 as ID1,3.4 as ID2,0.004 as ID3 UNION
select 3 as ObjektID,'O3' as Name,1 as ID1,2.1 as ID2,0.200 as ID3)
select row_number() over(order by Objektid,Name,ColName) as PK_ID,
a.*
from
(select ObjektID,Name,'ID1' as ColName,ID1 as Value from cte
UNION
select ObjektID,Name,'ID2' as ColName,ID2 as Value from cte
UNION
select ObjektID,Name,'ID3' as ColName,ID3 as Value from cte) a
Hope this helps.
Edit: Another way to do this would be using UNPIVOT:
SELECT row_number() over(order by ObjektId,ColumnName) as PK_ID,
ObjektID, name, ColumnName, Value
FROM
(select ObjektID,Name,ID1,ID2,ID3 from cte) p
UNPIVOT
(Value FOR ColumnName IN
(ID1, ID2, ID3)) as unpvt;

Order by most duplicate record first

create table tbl_dup
(
name varchar(100)
);
insert into tbl_dup values('Arsel Rous'),('Oram Rock'),('Oram Rock'),('Brown Twor'),
('John Mak'),('Mak Dee'),('Smith Will'),('Mak Dee'),
('John Mak'),('Oram Rock'),('John Mak'),('Oram Rock');
Query: I am looking for ordering the duplicate records should display at first level in the result set.
select *
from
(
select name,row_number() over(partition by name order by name) rn
from tbl_dup
) a
order by name,rn;
Getting:
name rn
--------------
Arsel Rous 1
Brown Twor 1
John Mak 1
John Mak 2
John Mak 3
Mak Dee 1
Mak Dee 2
Oram Rock 1
Oram Rock 2
Oram Rock 3
Oram Rock 4
Smith Will 1
Expected Result:
name rn
---------------
Oram Rock 1
Oram Rock 2
Oram Rock 3
Oram Rock 4
John Mak 1
John Mak 2
John Mak 3
Mak Dee 1
Mak Dee 2
Arsel Rous 1
Brown Twor 1

Try using COUNT as an analytic function in the ORDER BY clause:
SELECT name, ROW_NUMBER() OVER (PARTITION by name ORDER BY name) rn
FROM tbl_dup
ORDER BY COUNT(*) OVER (PARTITION BY name) DESC, rn;
Demo

You need to first order by the number of occurrence desc and row number by asc.
select name,rn
from
(
select name,row_number() over(partition by name order by name) rn ,
count(*) over(partition by name) ct
from tbl_dup
) a
order by ct desc, rn asc

SQL Server : LEFT JOIN duplicated row

Table: DataTable1
ID SKU QTY
-----------------------
1 AAAA 1
2 BBBB 1
3 CCCC 1
4 CCCC 1
Table: DataTable2
ID assign_id SKU
-----------------------------
123 99 AAAA
124 99 CCCC
Is there any way to get result like this?
Thank you in advance.
ID SKU QTY AssignID
-----------------------------------
1 AAAA 1 99
2 BBBB 1 NULL
3 CCCC 1 99
4 CCCC 1 NULL

Not sure, but probably this is what you need
with DataTable1(ID , SKU , QTY ) as(
select 1 ,'AAAA', 1 union all
select 2 ,'BBBB', 1 union all
select 3 ,'CCCC', 1 union all
select 4 ,'CCCC', 1
),
DataTable2(ID , assign_id , SKU) as (
select 123, 99 ,'AAA' union all
select 124 , 99 ,'CCC'
)
select t.ID, t.SKU, t.QTY, case when rn = 1 then assign_id else null end as assign_id
from (
select DataTable1.*, DataTable2.assign_id, row_number() over(partition by DataTable1.SKU order by DataTable1.ID) as rn
from DataTable1
left join DataTable2
on DataTable1.SKU like concat(DataTable2.SKU, '%')
) t

how to use distinct on multiple columns

I have 6 fields
f1,f2,f3,f4,f5,f6
only fields 4 to 6 only vary i want result as single row based on field 1
Eg
name , age, policy_no, proposer_code, entry_date , status
-----------------------------------------------------------------------------
aaa 18 100002 101 20-06-2016 A
aaa 18 100002 101 21-06-2016 B
aaa 18 100002 101 22-06-2016 c
aaa 18 100002 101 24-06-2016 H
aaa 18 100002 101 26-06-2016 p
I want the last row alone only based on proposer code because that is the most recent entry date.

If I understand correctly, you just want to use row_number() like this:
select t.*
from (select t.*,
row_number() over (partition by name order by entry_date desc) as seqnum
from t
) t
where seqnum = 1;

In oracle you can use the below SQL query to achieve the resultset.
select name ,
age,
policy_no,
proposer_code,
entry_date ,
status
from (
select name ,
age,
policy_no,
proposer_code,
entry_date ,
status,
rank()over(partition by name ,age,policy_no, proposer_code order by entry_date desc) rnk
from test
group by name , age, policy_no, proposer_code ,entry_date , status ) a
where a.rnk = 1;

ORACLE Join table to select multiple record in 1 row

I have 2 tables
TableA: Main table with 1 record for each Field called ID
TableB: Can contain multiple record for field called ID
I want to select everything based on ID from TableA and TableB in 1 line for the end result
Example
Sample Records
From TableA
ID Name Address
--- ---- ----------
1 Jack 123 blahST
2 John 234 blahAVE
321 Sam 2123 blahWay
From TableB
ID AccNO
-- -------------
1 12345
1 345345
1 443453
2 99999
3 88888
3 77777
End Result should be like
Select TableA.ID,
TableA.Name,
TableA.Address,
TableB.ID,
TableB.AccNO1,
TableB.AccNO2, --if it exist
TableB.AccNO3, --if it exist
TableB.AccNO4, --if it exist MAX
From TableA Full Outer Join TableB on TableB.ID = TableA.ID
TableA.ID TableA.Name TableA.Address TableB.ID TableB.AccNO TableB.AccNO2 TableB.AccNo3,TableB.AccNo4
--------- ----------- -------------- --------- ------------ ------------- ------------- -------------
1 Jack 123 Blahst 1 12345 345345 443453
2 John 234 BlahAVE 2 99999
3 Sam 2123 Blahway 3 88888 7777777

Here are a couple of ways of doing it - one using the PIVOT keyword available in 11g and above, the other using the old-style pivot using MAX() and group by.
Method 1:
with tablea as (select 1 id, 'Jack' name, '123 blahST' address from dual union all
select 2 id, 'John' name, '234 blahAVE' address from dual union all
select 3 id, 'Sam' name, '2123 blahWay' address from dual),
tableb as (select 1 id, 12345 accno from dual union all
select 1 id, 345345 accno from dual union all
select 1 id, 443453 accno from dual union all
select 2 id, 99999 accno from dual union all
select 3 id, 88888 accno from dual union all
select 3 id, 77777 accno from dual),
b_res as (select *
from (select id,
accno,
row_number() over (partition by id order by accno) rn
from tableb)
pivot (max(accno)
for rn in (1 as accno1,
2 as accno2,
3 as accno3,
4 as accno4,
5 as accno5)))
select ta.id ta_id,
ta.name ta_name,
ta.address ta_address,
tb.id tb_id,
tb.accno1 tb_accno1,
tb.accno2 tb_accno2,
tb.accno3 tb_accno3,
tb.accno4 tb_accno4,
tb.accno5 tb_accno5
from tablea ta
inner join b_res tb on ta.id = tb.id;
TA_ID TA_NAME TA_ADDRESS TB_ID TB_ACCNO1 TB_ACCNO2 TB_ACCNO3 TB_ACCNO4 TB_ACCNO5
---------- ------- ------------ ---------- ---------- ---------- ---------- ---------- ----------
1 Jack 123 blahST 1 12345 345345 443453
2 John 234 blahAVE 2 99999
3 Sam 2123 blahWay 3 77777 88888
Method 2:
with tablea as (select 1 id, 'Jack' name, '123 blahST' address from dual union all
select 2 id, 'John' name, '234 blahAVE' address from dual union all
select 3 id, 'Sam' name, '2123 blahWay' address from dual),
tableb as (select 1 id, 12345 accno from dual union all
select 1 id, 345345 accno from dual union all
select 1 id, 443453 accno from dual union all
select 2 id, 99999 accno from dual union all
select 3 id, 88888 accno from dual union all
select 3 id, 77777 accno from dual),
b_res as (select id,
max(case when rn = 1 then accno end) accno1,
max(case when rn = 2 then accno end) accno2,
max(case when rn = 3 then accno end) accno3,
max(case when rn = 4 then accno end) accno4,
max(case when rn = 5 then accno end) accno5
from (select id,
accno,
row_number() over (partition by id order by accno) rn
from tableb)
group by id)
select ta.id ta_id,
ta.name ta_name,
ta.address ta_address,
tb.id tb_id,
tb.accno1 tb_accno1,
tb.accno2 tb_accno2,
tb.accno3 tb_accno3,
tb.accno4 tb_accno4,
tb.accno5 tb_accno5
from tablea ta
inner join b_res tb on ta.id = tb.id;
TA_ID TA_NAME TA_ADDRESS TB_ID TB_ACCNO1 TB_ACCNO2 TB_ACCNO3 TB_ACCNO4 TB_ACCNO5
---------- ------- ------------ ---------- ---------- ---------- ---------- ---------- ----------
1 Jack 123 blahST 1 12345 345345 443453
2 John 234 blahAVE 2 99999
3 Sam 2123 blahWay 3 77777 88888

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

T-SQL Remove "duplicate/non-interesting" data rows - sql-server

since you use 2012 then this should work: SELECT ID, Status, Code, Type, ModDate FROM ( SELECT ID, Status, Code, Type, ModDate, lag(type,1) OVER (ORDER BY ID, moddate) prevtype FROM data )t WHERE type<>ISNULL(prevtype,'')

Related

SQL: loop over every entry in a table and create a new one with rowname, Columname and entry

Order by most duplicate record first

SQL Server : LEFT JOIN duplicated row

how to use distinct on multiple columns

ORACLE Join table to select multiple record in 1 row

Categories

Resources