MSSQL - Adding condition on the where and on clause performance - sql-server

Consider the following two queries:
select *
from
table1 t1
left join
table2 t2
on t1.Id = t2.t1Id and (t1.Status = 1 or t2.Id is not null)
And this one
select *
from
table1 t1
left join
table2 t2
on t1.Id = t2.t1Id
where
t1.Status = 1 or t2.Id is not null
The first one runs in 2 seconds. The second one in 2 minutes. Shouldn't the execution plan be the same?

The query plans are different because the queries (and results) are different.
You're using a LEFT JOIN, so the first query will return rows with NULL values where not in table 2.
The second query will not return those rows.
If it was an INNER JOIN, they would essentially be the same query.

Here the Below Query Returns all the "Table1" results with additional matching Columns based on the "ON Clause" condition.
select * from table1 t1
left join table2 t2
on t1.Id = t2.t1Id and (t1.Status = 1 or t2.Id is not null)
Now, the below query matches the 2 tables and returns the rows based on the ON Clause and an additional WHERE Clause filters the Rows again based on the Condition.
select * from
table1 t1
left join table2 t2 on t1.Id = t2.t1Id
where t1.Status = 1 or t2.Id is not null
Here, Even though we used LEFT JOIN But in this case it acts like an INNER JOIN
So, Here Both the Queries produce Different Result Sets. The Execution Plan Also Vary which results in Different Execution Time.

The best way to deal with an OR is to eliminate it (if possible) or break it into smaller queries. Breaking a short and simple query into a longer, more drawn-out query may not seem elegant, but when dealing with OR problems, it is often the best choice:
select *
from table1 t1
left join table2 t2 t1.Id = t2.t1Id
where t1.Status = 1
union all
select *
from table1 t1
left join table2 t2 t1.Id = t2.t1Id
where t2.Id is not null
You can read more in this article:
https://www.sqlshack.com/query-optimization-techniques-in-sql-server-tips-and-tricks/

Related

Why these queries returns different results?

Both of these sql server queries should return the same count result, but returns different - 8219 and 7876.
Left join should return all rows from left table (8219).
What could be the reason of such result (7876)?
select count(*)
from t1 left join t2 on t1.id=t2.id
where t2.[date]='20191001'
-- returns 7876
select count(*)
from t1 left join t2 on t1.id=t2.id
-- returns 8219
select count(*)
from t2
where [date]<>'20191001' or [date] is null
-- returns 0
Your first query has a LEFT JOIN, but the WHERE clause is turning it into an inner join, because it filters out all NULL values.
The correct solution is to move the condition to the ON clause:
select count(*)
from t1 left join
t2
on t1.id = t2.id and t2.[date] = '20191001';

Using inner join to reduce results. Do I need to reference new table anywhere beyond the join statement?

I'm wondering if I can just do an inner join as kind of a where clause by itself. Or if I use a field from the joined table in my where clause, if it's redundant.
select * from T1 inner join T2 on T1.id = T2.id where T2.z is not null
Is the "T2.z is null" part redundant if all I want returned are records in T1 where the same id exists in T2?
For one thing, select * from t1 inner join t2 [...] will not return records in t1 - it will return all the columns of t1 and t2. You could fix that by selecting specifically the columns in t1 - don't select *.
Then, if there are many rows in t2 with the same t2.id, matching a given t1.id, you will get a whole bunch of rows in the result for that one row in the input t1. So you will not always "reduce" the result set.
It seems what you want can be achieved with the in operator, something like
select * from t1 where t1.id in (select id from t2);
This is equivalent to the following modification of your query. You do not need a where clause for this to work:
select t1.* from t1 inner join (select distinct id from t2) b on t1.id = b.id;
In the following query,
select t1.* from T1 inner join T2 on T1.id = T2.id where T2.z is NOT null
The WHERE condition is redundant, assuming that T2.Z is a NOT NULL column.
That would leave you with this:
select t1.* from T1 inner join T2 on T1.id = T2.id
, which is a little odd because, in a normally designed database, either T1.id or T2.id would be the primary key of its table.
If T1.id is the primary key of T1, then your query is going to return duplicates -- each T1 row will be repeated once for each child that exists in T2.
If T2.id is the primary key of T2, then you should not need to join to T2 at all, because every possible T1.id value must exist in T2.id, because of the FOREIGN KEY relationship that (should) exist. In that case, you could have written:
select t1.* from T1 WHERE T1.id is not null;
So, the answer to your question is that you do not need to reference the tables outside of the join condition in order for the join to be applied. But something seems a little off about the approach.

T-SQL Join on columns OR fixed value

I'm trying to figure out some basic rules in T-SQL.
What I'm trying to achieve here, is to get only the records from Table1 which has a match in Table2 - AND - all records from Table1 where the 'Valid' column has a value of 1 (=true).
Previously I've done this with two selects and a UNION like this:
SELECT T1.*
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.ID = T2.ID
UNION
SELECT T1.*
FROM Table1 T1
WHERE T1.Valid = 1
But isn't there any other way than using multiple selects and UNION to achieve this?
While fiddling, I did the following code bit, which however only works if there's exactly one match in Table2 (otherwise it'll multiply the records by the number of matches in T2).
SELECT T1.*
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.ID = T2.ID
OR T1.Valid = 1
What would be the best way to achieve my goal in terms of performance?
Also please don't hold back on the comments, possible flaws, or explanations of how and why another solution might be better.
assuming that T1.ID and T2.ID is unique or a primary key:
If there are duplicates you may have to write SELECT DISTINCT T1.*. The UNION operator in the orinal selects only distinct values.
this one should do:
SELECT T1.*
FROM Table1 T1
WHERE T1.ID IN ( SELECT T2.ID FROM Table2 T2 WHERE T2.ID IS NOT NULL)
OR T1.Valid = 1
or
SELECT T1.*
FROM Table1 T1
LEFT JOIN Table2 T2 ON T1.ID = T2.ID
WHERE T2.ID IS NOT NULL OR T1.Valid = 1
but i think, the execution plan will be the same at the end.

SQL Server nesting subquery performance?

I am trying to evaluate and see what the best practice is to accomplish the following. I need to do 2 queries from the same tables only with different criteria, and then join them together with a unique ID. At the end, what I want to accomplish is getting all the records that exists in Query A but not Query B.
SELECT *
FROM (
Select
t1.ID from t1
Left Join t2
Left join t3
where
t1.field1 IN ('criteria 1', 'criteria 2')) QueryA
RIGHT JOIN
(
Select
t1.ID from t1
Left Join t2
Left join t3
where
t1.field1 IN ('criteria 3', 'criteria 4')) QueryB
ON QueryA.ID = QueryB.ID
WHERE QueryA is Null
Each QueryA and QueryB return about 2500 records, and took each less than 10 seconds . The result of the whole query returns about 30 records and over 3 minutes.
Thank you so much.
Try using SQL's "except." Do you need the joins to t2 and t3? It looks like you're only filtering on and returning t1 columns.
Select ID from t1 where t1.Field in (criteria)
except
Select ID from t1 where t1.Field in (other criteria)

Optimize CASE Test in SQL Server

I'm wondering if there's any way to optimize the following SELECT query. (Note: I typed this when writing my question for nonexistent tables and I might not have the correct syntax.)
The goal is, if Table2 contains any related rows I want to set the value of the third column to the number of related rows in Table2. Otherwise, if Table3 contains any related rows I want to set the column to the number of related rows in Table3. Otherwise, I want to set the column value to 0.
SELECT Id, Title,
CASE
WHEN EXISTS (SELECT * FROM Table2 t2 WHERE t2.RelatedId = Table1.Id) THEN
(SELECT COUNT(1) FROM Table2 t2 WHERE t2.RelatedId = Table1.Id)
WHEN EXISTS (SELECT * FROM Table3 t3 WHERE t3.RelatedId = Table1.Id) THEN
(SELECT COUNT(1) FROM Table3 t3 WHERE t3.RelatedId = Table1.Id)
ELSE 0
END AS RelatedCount
FROM Table1
I don't like the fact that I'm basically performing the same query twice (in two cases). Is there any way to do what I want while only performing the query once?
Note that this is part of a much larger query with multiple JOINs and UNIONs so it's not easy to take a completely different approach.
This query should perform much better. You are not just performing the same query twice; since they are correlated subqueries, they will run once per row.
SELECT Id, Title,
coalesce(t2.Count, t3.Count, 0) AS RelatedCount
FROM Table1 t
left outer join (
SELECT RelatedId, count(*) as Count
FROM Table2
group by RelatedId
) t2 on t1.Id = t2.RelatedId
left outer join (
SELECT RelatedId, count(*) as Count
FROM Table3
group by RelatedId
) t3 on t1.Id = t3.RelatedId

Resources