Remove partially duplicated rows in SQL Server Express

Remove partially duplicated rows in SQL Server Express - sql-server

Got a table with several columns, with date at the end. It looks like:
Col1 Col2 Col3 date
-----+------+------+------------
x y z 2022-10-01
x y z 2022-10-10
a b c 2022-10-01
a b b 2022-10-10
w u c 2022-10-15
What I'm trying to do is remove duplicates based on first three columns. With latest date left in column 4.
Tried to list it with:
Select col1, col2, col3, count(*) as counter
from database
group by col1, col2, col3, date
having count (*) > 1;
It's not working because it counts each row, including different dates.. Haven't found any other clues
Expected output is:
Col1
Col2
Col3
Date
x
y
z
2022-10-10
a
b
c
2022-10-10
w
u
c
2022-10-15

You can use a common table expression and ROW_NUMBER to achieve this:
WITH cte
AS
(
SELECT col1, col2, col3, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3 ORDER BY date) as rn
from database
)
DELETE cte
WHERE rn>1;

Just aggregate by the first 3 columns and take the max of the fourth:
SELECT col1, col2, col3, MAX(date) AS date
FROM yourTable
GROUP BY col1, col2, col3;

Related

Error while creating pivot on SQL Server

I am facing error in last like while creating a pivot table on SQL Server.
Following is the code:
SELECT
COL1, 'X'
FROM
(SELECT COL1, COL2
FROM TABLE_X
WHERE COL3 = 'B' AND COL4 = 'Activation') AS SourceTable
PIVOT
(COUNT(COL1)
FOR COL2 IN ('X')
) AS PivotTable
Error:
Incorrect syntax near 'X'.
Thanks in advance.

Column COL1 will not exist in the Pivot result since it is the Aggregated column.
you can change this example to just
SELECT
*
FROM
(SELECT COL1, COL2
FROM TABLE_X
WHERE COL3 = 'B' AND COL4 = 'Activation') AS SourceTable
PIVOT
(COUNT(COL1)
FOR COL2 IN ([X]) -- put the values in square brackets instead of single quote
) AS PivotTable
and you should only get a single column back named X

How to join more than two tables using sql without Common Columns

Example:
Table 1:
Col1 Col2
----------- -------------
A 1
B 2
D 3
E 4
Table 2:
Col3 Col4
----------- -------------
A 7
E 9
Z 5
Table 3:
Col5 Col6
----------- -------------
Y 8
Expected output:
Col1 Col2 Col3 Col4 Col5 Col6
---- ---- ---- ---- ---- ----
A 1 A 7 Y 8
B 2 E 9 NULL NULL
D 3 Z 5 NULL NULL
E 4 NULL NULL NULL NULL
I want the output as shown in the picture.
If there are three tables with columns as Col11, Col2 and Col3, Col4 and Col5 and Col6
then expected output should be as Col1, Col2, Col3, Col4, Col5, Col6 without any joins. It should just Table2 is at right hand side of the Table1 And Table3 should be at right of the Table2. If number of rows don't match then Null values will consume the space.
I know the solution for Two tables. But Need SQL query syntax for n number of tables.
-Thanks in Advance.

Although I can't quite see why you'd want to do this, I've attempted to answer the question as an exercise for my own learning!
Thanks to #user2864740 for the idea of using ROW_NUMBER to synthesize a common value between each table!
Here's the query:
SELECT
col1,
col2,
col3,
col4,
col5,
col6
FROM
(SELECT ROW_NUMBER() OVER(ORDER BY col1) AS Row,
col1, col2
FROM table1) T1
FULL OUTER JOIN
(SELECT ROW_NUMBER() OVER(ORDER BY col3) AS Row,
col3, col4
FROM table2) T2 ON T1.Row = T2.Row
FULL OUTER JOIN
(SELECT ROW_NUMBER() OVER(ORDER BY col5) AS Row,
col5, col6
FROM table3) T3 ON T1.Row = T3.Row
And a SQL Fiddle demonstrating it: http://sqlfiddle.com/#!3/6c2db/6

When you are joining for no reason you must "make something up" in this case use row_number and join on that. Notice this would cut off table2 and table3 results if they had more than table 1, to avoid that change left joins to full outer joins and Cartesian everything. Simplest way to do this is :
Select Col1, Col2, Col3, Col4, Col5, Col6
From
(select Col1, Col2, ROW_NUMBER() Over (order by Col1,Col2) 'Rnk1'
From table1 ) t1
left join
(select Col3, Col4, ROW_NUMBER() Over (order by Col3,Col4) 'Rnk2'
From table1 ) t2
On t1.rnk1 = t2.rnk2
left join
(select Col5, Col6, ROW_NUMBER() Over (order by Col5,Col6) 'Rnk3'
From table3 ) t3
On t1.rnk1 = t3.rnk3

Removing duplicate combinations from result set in SQL Server

I have a table with two columns with data like this:
1,2
1,3
1,4
2,1
2,2
3,1
I want to select just unique combinations, so out of those I would end up with:
1,2
1,3
1,4
2,2
because 1,2 is the same combination as 2,1 etc
How would I go about that in a SQL statement?
In reality, my table has a third column and I want to add a where clause based on that third column so that only those rows are considered

SELECT * FROM (
SELECT
CASE WHEN Col1 <= Col2 THEN Col1 ELSE Col2 END AS Col1,
CASE WHEN Col1 <= Col2 THEN Col2 ELSE Col1 END AS Col2
FROM
MyTable
) Ordered
GROUP BY
Col1, Col2
You could do it without the subquery by GROUPing on the CASE expressions, but it's longer to read.

Another way to achieve the same thing:
SELECT a, b
FROM tableX
WHERE a <= b
AND (other conditions)
UNION
SELECT b, a
FROM tableX
WHERE a > b
AND (other conditions) ;
This variation may be different (regarding efficiency), depending on the indexes you have:
SELECT *
FROM
( SELECT a, b
FROM tableX
WHERE (other conditions)
UNION
SELECT b, a
FROM tableX
WHERE (other conditions)
) AS tmp
WHERE a <= b ;

You can try something like:
select distinct col1, col2 from table
where col2 + '-' + col1 not in (select col1 + '-' + col2 from your_table)
Notice that you have to concatenate the fields and it depends of the column type (col1 + '-' + col2 works well with char and varchar types)

How about:
SELECT
COL1, COL2, COUNT(*)
FROM
Your_Table
GROUP BY
COL1, COL2

Is there a T-SQL shortcut for getting the max values of two columns

What I mean is, say you have a table like:
Col1 Col2
---- ----
1 1
1 9
2 1
2 3
4 1
4 2
I want to get: Col1=4 and Col2=2, because Col1 has precendence. In other words, I want the largest value of Col1 and for that value the largest value of Col2 in a minimal T-SQL expression. It's almost like saying:
SELECT TOP 1 Col1, Col2
FROM MyTable
ORDER BY Col1, Col2 DESC
But doing this in such a way that the Col1, Col2 values are usable within another query.

Not really anything like MAX(Col1, Col2). If you wanted to simulate MAX ... GROUP BY X you could use
WITH T AS
(
SELECT Col1,
Col2,
ROW_NUMBER () OVER (PARTITION BY X ORDER BY Col1 DESC, Col2 DESC) AS RN
FROM MyTable
)
SELECT Col1,
Col2,
X
FROM T
WHERE RN= 1;

Like this? One row
SELECT ...
FROM
SOmeTable
JOIN
(
SELECT TOP 1 Col1, Col2
FROM MyTable
ORDER BY Col1, Col2 DESC
) foo ON S.Col1 = foo.Col1
Or per outer row?
SELECT ...
FROM
SOmeTable S
CROSS APPLY
(
SELECT TOP 1 Col2
FROM MyTable M
WHERE S.somecol = M.SomeCol
ORDER BY Col2 DESC
) foo
SELECT ...
FROM
SOmeTable S
CROSS APPLY
(
SELECT Col1, MAX(Col2) AS MaxCOl2
FROM MyTable M
GROUP BY Col1
) foo ON S.Col1 = foo.Col1

WITH t(Col1,maxCol1,maxCol2) AS (
SELECT
Col1,
MAX(Col1) OVER(),
MAX(Col2) OVER(PARTITION BY Col1)
)
SELECT TOP 1 maxCo11,maxCol2 FROM t WHERE Col1 = maxCol1

How to filter rows by values of one column?

I need to get several columns form sql query. Then I have to filter this answer by the "distinct" values of one column, but in the output I need to have all columns, not only this which values has to be distinct. Can anybody help me? Order by clause is not an answer for me.
A,B,C,D
E,F,G,H
I,J,C,L
M,N,Z,H
Above is a simple rows output. Please have a look onto 3rd column. Let's assume that we don't know how many rows do we have. I need to select only rows which has distinct value in 3rd column. (C,G,Z) - We need to filter anyone from "C" rows.

I've arbitrarily chosen to use col1 to break ties on col3. You can adjust the order by portion of the partition to suit your needs.
/* Set up test data */
declare #test table (
col1 char(1),
col2 char(1),
col3 char(1),
col4 char(1)
)
insert into #test
(col1, col2, col3, col4)
select 'A','B','C','D' union all
select 'E','F','G','H' union all
select 'I','J','C','L' union all
select 'M','N','Z','H'
/* Here's the query */
;with cteRowNumber as (
select col1, col2, col3, col4,
row_number() over (partition by col3 order by col1) as RowNumber
from #test
)
select col1, col2, col3, col4
from cteRowNumber
where RowNumber = 1
Returns
col1 col2 col3 col4
----------------------------
A B C D
E F G H
M N Z H

ROLL UP or CUBE could be helpful for your problem, since they can aggregate (i.e. subtotal) data based on the GROUP BY and still return the individual rows.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Remove partially duplicated rows in SQL Server Express - sql-server

You can use a common table expression and ROW_NUMBER to achieve this: WITH cte AS ( SELECT col1, col2, col3, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3 ORDER BY date) as rn from database ) DELETE cte WHERE rn>1;

Just aggregate by the first 3 columns and take the max of the fourth: SELECT col1, col2, col3, MAX(date) AS date FROM yourTable GROUP BY col1, col2, col3;

Related

Error while creating pivot on SQL Server

How to join more than two tables using sql without Common Columns

Removing duplicate combinations from result set in SQL Server

Is there a T-SQL shortcut for getting the max values of two columns

How to filter rows by values of one column?

Categories

Resources