SubSelect Top 1 OR Left Join - sql-server

I have a Select with sub selects using Top 1 and where clause.
I tried to optimize the select by doing a Left Join of the sub selects but the query time took longer. Is subselect better in this case? I couldnt post my whole select because it is too long and confidential but I will try to recreate the important part below:
Sub Select
SELECT
(select top 1 colId FROM table1 WHERE col1 = b.Id and col2 = 3 Order by 1) Id3,
(select top 1 colId FROM table1 WHERE col1 = b.Id and col2 = 5 Order by 1) Id5,
(select top 1 colId FROM table1 WHERE col1 = b.Id and col2 = 7 Order by 1) Id7
FROM table2 b
Trying it w/ Left Join
SELECT
t1.colid id3,
t2.colid id5,
t3.colid id7
FROM table2 b
LEFT JOIN (
select colId, col1 FROM table1 WHERE col2 = 3
) t1 ON t1.col1 = b.Id
LEFT JOIN (
select colId, col1 FROM table1 WHERE col2 = 5
) t2 ON t1.col1 = b.Id
LEFT JOIN (
select colId, col1 FROM table1 WHERE col2 = 7
) t3 ON t1.col1 = b.Id
Is there a better way to do this? and why is it the Left join takes longer query time?

You can use ROW_NUMBER:
;WITH cte AS
(
SELECT a.colId,
rn = ROWN_NUMBER() OVER (PARTITION BY a.col2 ORDER BY a.col1)
FROM table1 a
LEFT JOIN table2 b on a.col1 = b.id
WHERE a.col2 IN (3,5,7)
)
SELECT *
FROM cte
WHERE rn = 1
This will give you the first row for each col2 value and you can restrict the values you want to 3,5,7.

Related

Query in getting multiple duplicate rows in SQL Server

I have 2 tables Table1 and Table2 in which I want to get the total count of duplicate rows:
Expected output:
Query tested:
SELECT
t1.name,
t1.duplicates,
ISNULL(t2.active, 0) AS active,
ISNULL(t3.inactive, 0) AS inactive
FROM
(SELECT
t1.name, COUNT(*) AS duplicates
FROM
(SELECT c.name
FROM table1 c
INNER JOIN table2 as cd on cd.id = c.id)) t1
GROUP BY
name
HAVING
COUNT(*) > 1) t1
LEFT JOIN
(SELECT c.name, COUNT(*) AS active
FROM table1 c
WHERE name IN (SELECT c.name FROM table1 c)
GROUP BY c.name AND status = 'Active'
GROUP BY name) t2 ON t1.name = t2.name
LEFT JOIN
(SELECT c.name, COUNT(*) AS inactive
FROM table1 c
WHERE name IN (SELECT c.name FROM table1 c GROUP BY c.name)
AND status = 'InActive'
GROUP BY name) t3 ON t1.name = t3.name
ORDER BY
name
It is still returning duplicate rows and I'm unable to get the id and creator column
If you would pardon subquery and left join, i'd suggest the following query:
select b.*,
count(creator) as creator_count
from
(select a.mainid,
a.name,
sum(case when a.status = "active"
then 1 else 0 end) as active_count,
sum(case when a.status = "inactive"
then 1 else 0 end) as inactive_count,
count(a.name) as duplicate_count
from table1 as a
group by a.name
having count(a.name) > 1) as b
left join table2 as c
on b.mainid = c.mainid
group by c.mainid
having count(c.creator) > 1
rather than forcing our way to join the two table directly. First, derive the information we can get from the Table1 then join it with the Table2 to get the creator count.
SQL Fiddle: http://sqlfiddle.com/#!9/4daa19e/28

Combining subselects in SQL query

How can I simplify this query by combining the subselects?
SELECT *
FROM table
WHERE id1 IN (SELECT id1 FROM table WHERE [keyid] = 123)
AND id2 IN (SELECT [id2] FROM table WHERE [keyid] = 123)
I naively tried:
SELECT *
FROM table
WHERE id1 = t.id1
AND id2 = t.id2
IN (SELECT id1, id2 FROM table WHERE keyid = 123) AS t
There is no real need to rewrite your query, it is fine as-is. You could rewrite the subqueries using exists logic:
SELECT t1.*
FROM yourTable t1
WHERE
EXISTS (SELECT 1 FROM yourTable t2 WHERE t2.id1 = t1.id1 AND t2.keyid = 123) AND
EXISTS (SELECT 1 FROM yourTable t3 WHERE t3.id2 = t1.id2 AND t3.keyid = 123);
The exists logic would let SQL Server stop scanning your table as soon as it finds a single match. This might mean improved performance over the version you currently have.
If you wanted to rewrite using a series of self joins, here is what you could try:
SELECT DISTINCT t1.*
FROM yourTable t1
INNER JOIN yourTable t2 ON t2.id1 = t1.id1
INNER JOIN yourTable t3 ON t3.id2 = t1.id2
WHERE t2.keyid = 123 AND t3.keyid = 123;
From your tablenames, I assume that both IN Clauses refer to same table and refer to same key field.
SELECT t1.* FROM Table AS t1
INNER JOIN
(
SELECT Id1, Id2 FROM Table WHERE keyid = 123
) as t2
ON t1.id1 = t2.id1 AND t1.id2 = t2.id2

SQL: Delete only one row if join returns multiple matches

I have an SQL table that looks as follows:
col1 col2
a b
b a
c d
d c
f g
As you can see there are rows where both columns col1 and col2 are inverted. What I mean is that in the first row the values a and b are in both columns and in row 2 the values are also there, but the other way round.
I now want to delete one row of each of these pairs. I do not care which side of the pair is deleted. So either row 1 and row 3 or row 2 and row 4 should be deleted.
The result should looks as follows:
col1 col2
a b
c d
f g
or
col1 col2
b a
d c
f g
I achieved this with the following query that creates two artificial columns that contain the values in a sorted order and then applies a GROUP BY, but I assume there should be a nicer looking solution.
DELETE t1
FROM testtable t1
INNER JOIN (
SELECT CASE WHEN col1 < col2 THEN col1 ELSE col2 END AS first,
CASE WHEN col1 < col2 THEN col2 ELSE col1 END AS second
FROM testtable
GROUP BY CASE WHEN col1 < col2 THEN col1 ELSE col2 END, CASE WHEN col1 < col2 THEN col2 ELSE col1 END
) t2 ON t2.first = t1.col1 AND t2.second = t1.col2
I think you can simplify your query by adding conditions to the join:
DELETE T1
FROM #testable T1
INNER JOIN #testable T2 ON T1.col1 = T2.col2 AND T1.col2 = T2.col1 AND T1.col1 > T1.col2
You can use exists & not exists :
select t.*
from testtable t
where exists (select 1
from testtable t1
where t1.col1 > t.col1 and t1.col1 = t.col2
) or
not exists (select 1
from testtable t1
where t1.col1 < t.col1 and t1.col1 = t.col2
);
If you want to remove unwanted records then you can do :
delete t
from testtable t
where not exists (select 1
from testtable t1
where t1.col1 > t.col1 and t1.col1 = t.col2
) and
exists (select 1
from testtable t1
where t1.col1 < t.col1 and t1.col1 = t.col2
);
Assuming no actual duplicates, I would do:
delete t from testtable t
where col1 > col2 and
exists (select 1
from testtable t2
where t2.col1 = t.col2 and t2.col2 = t.col1
);
That is, delete the rows where col1 > col2 but only if the "paired" row already exists in the table.

Using where condition in sql query

I have an sql query like this
Select col1, (select abc from table2 where def=1) as col2
From Table1 inner join table3 on Table1.id = table3.id
Where col2 = 4
The problem is that the where condition doesn't work. I get an error saying
Invalid column name 'col2'
Kindly help me fix this sql query.
Thanks in advance
You can define it in a CROSS APPLY and then reference in the SELECT and WHERE
SELECT col1,
col2
FROM Table1
INNER JOIN table3
ON Table1.id = table3.id
CROSS APPLY (SELECT abc
FROM table2
WHERE def = 1) C(col2)
WHERE col2 = 4
Using a CTE (Common Table Expression):
WITH SubQuery AS (Col2) {
SELECT
ABC
FROM
table2
WHERE
def = 1
}
SELECT
T.Col1,
S.Col2
FROM
SubQuery S,
Table1 T
INNER JOIN table3 t3
ON T.id = t3.id
WHERE
S.Col2 = 4
Although I must say I agree with the first comment - this makes no sense since your subquery is not correlated (joined) to the rest of your query...

Limited T-SQL Join

This should be simple enough, but somehow my brain stopped working.
I have two related tables:
Table 1:
ID (PK), Value1
Table 2:
BatchID, Table1ID (FK to Table 1 ID), Value2
Example data:
Table 1:
ID Value1
1 A
2 B
Table 2:
BatchID Table1ID Value2
1 1 100
2 1 101
3 1 102
1 2 200
2 2 201
Now, for each record in Table 1, I'd like to do a matching record on Table 2, but only the most recent one (batch ID is sequential). Result for the above example would be:
Table1.ID Table1.Value1 Table2.Value2
1 A 102
2 B 201
The problem is simple, how to limit join result with Table2. There were similar questions on SO, but can't find anything like mine. Here's one on MySQL that looks similar:
LIMITing an SQL JOIN
I'm open to any approach, although speed is still the main priority since it will be a big dataset.
WITH Latest AS (
SELECT Table1ID
,MAX(BatchID) AS BatchID
FROM Table2
GROUP BY Table1ID
)
SELECT *
FROM Table1
INNER JOIN Latest
ON Latest.Table1ID = Table1.ID
INNER JOIN Table2
ON Table2.BatchID = Latest.BatchID
SELECT id, value1, value2
FROM (
SELECT t1.id, t2.value1, t2.value2, ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.BatchID DESC) AS rn
FROM table1 t1
JOIN table2 t2
ON t2.table1id = t1.id
) q
WHERE rn = 1
Try
select t1.*,t2.Value2
from(
select Table1ID,max(Value2) as Value2
from [Table 2]
group by Table1ID) t2
join [Table 1] t1 on t2.Table1ID = t1.id
Either GROUP BY or WHERE clause that filters on the most recent:
SELECT * FROM Table1 a
INNER JOIN Table2 b ON (a.id = b.Table1ID)
WHERE NOT EXISTS(
SELECT 1 FROM Table2 c WHERE c.Table1ID = a.id AND c.BatchID > b. BatchID
)

Resources