How to represent aggregated null values as 0 in sqlserver [duplicate] - sql-server

I feel like I was always taught to use LEFT JOINs and I often see them mixed with INNERs to accomplish the same type of query throughout several pieces of code that are supposed to do the same thing on different pages. Here goes:
SELECT ac.reac, pt.pt_name, soc.soc_name, pt.pt_soc_code
FROM
AECounts ac
INNER JOIN 1_low_level_term llt on ac.reac = llt.llt_name
LEFT JOIN 1_pref_term pt ON llt.pt_code = pt.pt_code
LEFT JOIN 1_soc_term soc ON pt.pt_soc_code = soc.soc_code
LIMIT 100,10000
Thats one I am working on:
I see a lot like:
SELECT COUNT(DISTINCT p.`case`) as count
FROM FDA_CaseReports cr
INNER JOIN ae_indi i ON i.isr = cr.isr
LEFT JOIN ae_case_profile p ON cr.isr = p.isr
This seems like the LEFT may as well be INNER is there any catch?

Is there any catch? Yes there is -- left joins are a form of outer join, while inner joins are a form of, well, inner join.
Here's examples that show the difference. We'll start with the base data:
mysql> select * from j1;
+----+------------+
| id | thing |
+----+------------+
| 1 | hi |
| 2 | hello |
| 3 | guten tag |
| 4 | ciao |
| 5 | buongiorno |
+----+------------+
mysql> select * from j2;
+----+-----------+
| id | thing |
+----+-----------+
| 1 | bye |
| 3 | tschau |
| 4 | au revoir |
| 6 | so long |
| 7 | tschuessi |
+----+-----------+
And here we'll see the difference between an inner join and a left join:
mysql> select * from j1 inner join j2 on j1.id = j2.id;
+----+-----------+----+-----------+
| id | thing | id | thing |
+----+-----------+----+-----------+
| 1 | hi | 1 | bye |
| 3 | guten tag | 3 | tschau |
| 4 | ciao | 4 | au revoir |
+----+-----------+----+-----------+
Hmm, 3 rows.
mysql> select * from j1 left join j2 on j1.id = j2.id;
+----+------------+------+-----------+
| id | thing | id | thing |
+----+------------+------+-----------+
| 1 | hi | 1 | bye |
| 2 | hello | NULL | NULL |
| 3 | guten tag | 3 | tschau |
| 4 | ciao | 4 | au revoir |
| 5 | buongiorno | NULL | NULL |
+----+------------+------+-----------+
Wow, 5 rows! What happened?
Outer joins such as left join preserve rows that don't match -- so rows with id 2 and 5 are preserved by the left join query. The remaining columns are filled in with NULL.
In other words, left and inner joins are not interchangeable.

Here's a rough answer, that is sort of how I think about joins. Hoping this will be more helpful than a very precise answer due to the aforementioned math issues... ;-)
Inner joins narrow down the set of rows returns. Outer joins (left or right) don't change number of rows returned, but just "pick up" additional columns if possible.
In your first example, the result will be rows from AECounts that match the conditions specified to the 1_low_level_term table. Then for those rows, it tries to join to 1_pref_term and 1_soc_term. But if there's no match, the rows remain and the joined in columns are null.

An INNER JOIN will only return the rows where there are matching values in both tables, whereas a LEFT JOIN will return ALL the rows from the LEFT table even if there is no matching row in the RIGHT table
A quick example
TableA
ID Value
1 TableA.Value1
2 TableA.Value2
3 TableA.Value3
TableB
ID Value
2 TableB.ValueB
3 TableB.ValueC
An INNER JOIN produces:
SELECT a.ID,a.Value,b.ID,b.Value
FROM TableA a INNER JOIN TableB b ON b.ID = a.ID
a.ID a.Value b.ID b.Value
2 TableA.Value2 2 TableB.ValueB
3 TableA.Value3 3 TableB.ValueC
A LEFT JOIN produces:
SELECT a.ID,a.Value,b.ID,b.Value
FROM TableA a LEFT JOIN TableB b ON b.ID = a.ID
a.ID a.Value b.ID b.Value
1 TableA.Value1 NULL NULL
2 TableA.Value2 2 TableB.ValueB
3 TableA.Value3 3 TableB.ValueC
As you can see, the LEFT JOIN includes the row from TableA where ID = 1 even though there's no matching row in TableB where ID = 1, whereas the INNER JOIN excludes the row specifically because there's no matching row in TableB
HTH

Use an inner join when you want only the results that appear in both tables that matches the Join condition.
Use a left join when you want all the results from Table A, but if Table B has data relevant to some of Table A's records, then you also want to use that data in the same query.
Use a full join when you want all the results from both Tables.

For newbies, because it helped me when I was one: an INNER JOIN is always a subset of a LEFT or RIGHT JOIN, and all of these are always subsets of a FULL JOIN. It helped me understand the basic idea.

Related

SQL (SSMS) join all records from one table and second table but exclude 'duplicates' from second

I'm having an issue where I went all records in Table B and any non matching records in Table A but it's bringing back the matching records in Table A. There is another left join to an additional table which is brought in for reference only.
I'm using SSMS v18.
So ID will be on Table A and Table B. There will be multiple records of this ID on A and B but I don't want the duplicate records if date/time and ID is the same in Table A and in Table B.
e.g. - I've simplified the query I'm using below.
Select
a.id
a.datetime
a.emp_id
c.team_id
From
table_a as a
Left Join
table_b as b On a.id = b.id
And a.datetime <> b.datetime
Left Join
table_c On a.emp_id = c.emp_id
As there isn't NULLs I don't think I can use that. I don't believe a full outer join will return what I need.
Is there a method is solve this? A union query solution will not work as Table A and Table B do not have the same columns/column names.
Please let me know if more information is required.
EDIT - Additional
Apologies but now there's been a change of requirement where I now need to remove the matching records rather than remove just the duplicates. Is there a way around this?
Additional - Data Examples
Table A:
+----+------------------+--------+
| Id | Datetime | emp_id |
+----+------------------+--------+
| 1 | 20/04/2021 10:30 | a |
| 1 | 20/04/2021 11:15 | a |
| 2 | 21/04/2021 12:10 | b |
| 2 | 21/04/2021 13:20 | b |
| 2 | 22/04/2021 15:30 | c |
| 3 | 23/04/2021 09:45 | d |
| 4 | 23/04/2021 14:35 | e |
+----+------------------+--------+
Table B:
+----+------------------+-------------+
| Id | Datetime | other_field |
+----+------------------+-------------+
| 1 | 20/04/2021 10:30 | x |
| 2 | 21/04/2021 13:20 | y |
| 4 | 23/04/2021 14:35 | z |
+----+------------------+-------------+
Desired Output:
+----+------------------+--------+---------+
| Id | Datetime | emp_id | team_id |
+----+------------------+--------+---------+
| 1 | 20/04/2021 11:15 | a | team_01 |
| 2 | 21/04/2021 12:10 | b | team_02 |
| 2 | 22/04/2021 15:30 | c | team_01 |
| 3 | 23/04/2021 09:45 | d | team_02 |
+----+------------------+--------+---------+
So the duplicate ID & Datetime in Table B does not show in final output (regardless of any other fields)
You seem to need a right join instead of a left join. A left join will bring back all rows in table A, and all rows in table B which match the condition which you provided. You seem to want all in table B, which requires a right join.
I know some developers who have an aversion to right joins, if you feel that way, you can simply switch the order of the tables in your query to have table B listed first, left join to table A. I feel that the first solution is the easier one, though you need to be comfortable with it.
Here are my solutions, listed in the order in which I mentioned above.
Select
a.id
,a.datetime
,a.emp_id
,c.team_id
From
table_a as a
RIGHT Join -- here is my change
table_b as b On a.id = b.id
And a.datetime <> b.datetime
Left Join
table_c On a.emp_id = c.emp_id;
/*solution II*/
Select
a.id
,a.datetime
,a.emp_id
,c.team_id
From
table_b as b
Left Join
table_a as a On a.id = b.id
And a.datetime <> b.datetime
Left Join
table_c On a.emp_id = c.emp_id;
/*Updated solution, based on the comments (requirements seem to have changed)*/
Select
a.id
,a.datetime
,a.emp_id
,c.team_id
From
table_b as b
Left Join
table_a as a On a.id = b.id
Left Join
table_c On a.emp_id = c.emp_id
WHERE (a.datetime <> b.datetime OR b.datetime IS NULL);
Explanation of the updated solution: there was nothing to take into account the rows which would not match, hence the OR in the join
Please see Microsoft documentation on joins below.
https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver15#:~:text=Joins%20indicate%20how%20SQL%20Server,be%20used%20for%20the%20join.

Combine two queries with different 'FROM' tables but similar 'JOIN' tables

I have two queries that I'm trying to combine into one result set.
Query 1:
SELECT t1.evalID, t2.[Order], COUNT(t2.StepID) AS 'Total Categories'
FROM Evals t1
JOIN Steps t2 ON t1.TemplateID = t2.TemplateID
JOIN GradingCats t3 ON t2.StepID = t3.StepID
GROUP BY t1.EvalID, t2.[Order]
ORDER BY t2.[Order]
Query 2:
SELECT t4.EvaluatorID, t6.StepID, t6.[Order], COUNT(t4.Grade) AS 'Grades Entered'
FROM Grading t4
JOIN GradingCats t5 ON t4.GradingCatID = t5.GradingCatID
JOIN Steps t6 ON t5.StepID = t6.StepID
GROUP BY t6.StepID, t4.EvaluatorID, t6.[Order]
My end goal is to locate which steps of an evaluation have missing grades.
edit (sample data):
Query #1
|---------------------|------------------|---------------------|
| evalID | Order | Total Categories |
|---------------------|------------------|---------------------|
| 81 | 01.00 | 17 |
|---------------------|------------------|---------------------|
| 81 | 02.00 | 17 |
|---------------------|------------------|---------------------|
| 81 | 03.00 | 17 |
|---------------------|------------------|---------------------|
Query #2
|---------------------|------------------|---------------------|------------------|
| evaluatorID | Step | Order | Grades Entered |
|---------------------|------------------|---------------------|------------------|
| 1178 | 609 | 01.00 | 2 |
|---------------------|------------------|---------------------|------------------|
| 1178 | 615 | 02.00 | 3 |
|---------------------|------------------|---------------------|------------------|
| 9441 | 609 | 01.00 | 17 |
|---------------------|------------------|---------------------|------------------|
| 9441 | 609 | 02.00 | 17 |
|---------------------|------------------|---------------------|------------------|
| 9441 | 609 | 03.00 | 17 |
|---------------------|------------------|---------------------|------------------|
Starting with the first query which shows all the steps associated with an EVAL, you can LEFT OUTER JOIN the second query, and the steps that are NULL on the right side of the query will be the ones that are missing grades.
In order to do this, there must be some way in your tables to link Grading to Evals. This column is not evident from the code you posted, but I will assume it is there. Maybe it's through GradingCats.
In shortened psuedo-code, just to show what I mean:
SELECT ...
FROM Evals e
INNER JOIN Steps s ON e.TemplateID = s.TemplateID
LEFT OUTER JOIN Grading g ON g.EvalID = e.EvalID --use whatever means you have to show which Eval a Grade is from
LEFT OUTER JOIN Steps gs ON {join to Grading through GradingCats as in your second query}
WHERE gs.StepID IS NULL
In analyzing the result of this query, all the Steps of every Eval will be in s.StepID, and when the same row has a NULL for gs.StepID, that means that step did not get a grade.
Note that you won't want to do any GROUP BY in this query, since you want a row-level analysis.
A coworker (with more knowledge of the data than I) slightly modified my query:
SELECT query1.stepID, Categories, Graded
FROM
(
SELECT rs.stepid, COUNT(c.category) AS 'Categories'
FROM Evals e
JOIN RunScriptSteps rs ON e.TemplateID = rs.TemplateID
JOIN GradingCats c ON rs.StepID = c.StepID
WHERE EvalID = *(someNumber)*
GROUP BY rs.stepid
)AS query1
LEFT JOIN
(
SELECT s.StepID, COUNT(Grade) AS 'Graded'
FROM Grading g
JOIN GradingCats c ON g.GradingCatID = c.GradingCatID
JOIN Steps s ON c.StepID = s.StepID
WHERE EvalID = *(someNumber)*
GROUP BY s.stepid
) AS query2
ON query1.stepid = query2.stepid
ORDER BY stepid ASC

Many-to-many that shows columns with null

I'm having some difficulty trying to figure how to adjust my query. I'm not very good at SQL queries as it's not my forte. Anyway, I'm not sure what I'm doing wrong. Here's my table setup.
ID | Customer
---+-------------
1 | John
2 | Jane
3 | Steve
ID | Assets
---+-------------
1 | RealEstate
2 | Currency
3 | Stocks
CustomerID | AssetConfigurationId | Status
-----------+----------------------+-------
1 | 1 | E
1 | 2 | F
1 | 3 | X
2 | 3 | X
And if I query customer = 3, I want to get the following
AssetConfigurationId | Status
---------------------+------------
1 | null
2 | null
3 | X
Currently have this. I'm trying to understand how I can use left join to show all the assets and just have the values of the statuses to null for a specific customer. Right now it only shows the 3rd row. Trying to do this in a SQL Server stored procedure so that my .net application can get a list of the assets already and I'll just modify the statuses when it comes to converting them to objects.
select
ac.Id,
r.Status
from
assets ac
left join
assets_ref r on r.AssetConfigurationId = ac.Id
where
r.CustomerID = 3
Move your WHERE condition in the inner query.
select
ac.Id,
r.Status
from assets ac
left join
(select * from assets_ref where CustomerID = 3) r
on r.AssetConfigurationId = ac.Id;
You can use multiple conditions in JOINs:
select
ac.Id,
r.Status
from assets ac
left join assets_ref r
on r.AssetConfigurationId = ac.Id
and CustomerID = 3;

SQL Server - performance for filtering using LEFT JOIN ON condition versus WHERE condition [duplicate]

After reading it, this is not a duplicate of Explicit vs Implicit SQL Joins.
The answer may be related (or even the same) but the question is different.
What is the difference and what should go in each?
If I understand the theory correctly, the query optimizer should be able to use both interchangeably.
They are not the same thing.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345
The first will return an order and its lines, if any, for order number 12345.
The second will return all orders, but only order 12345 will have any lines associated with it.
With an INNER JOIN, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
Does not matter for inner joins
Matters for outer joins
a. WHERE clause: After joining. Records will be filtered after join has taken place.
b. ON clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result (since OUTER join).
Example: Consider the below tables:
documents:
id
name
1
Document1
2
Document2
3
Document3
4
Document4
5
Document5
downloads:
id
document_id
username
1
1
sandeep
2
1
simi
3
2
sandeep
4
2
reya
5
3
simi
a) Inside WHERE clause:
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
WHERE username = 'sandeep'
For above query the intermediate join table will look like this.
id(from documents)
name
id (from downloads)
document_id
username
1
Document1
1
1
sandeep
1
Document1
2
1
simi
2
Document2
3
2
sandeep
2
Document2
4
2
reya
3
Document3
5
3
simi
4
Document4
NULL
NULL
NULL
5
Document5
NULL
NULL
NULL
After applying the WHERE clause and selecting the listed attributes, the result will be:
name
id
Document1
1
Document2
3
b) Inside JOIN clause
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
AND username = 'sandeep'
For above query the intermediate join table will look like this.
id(from documents)
name
id (from downloads)
document_id
username
1
Document1
1
1
sandeep
2
Document2
3
2
sandeep
3
Document3
NULL
NULL
NULL
4
Document4
NULL
NULL
NULL
5
Document5
NULL
NULL
NULL
Notice how the rows in documents that did not match both the conditions are populated with NULL values.
After Selecting the listed attributes, the result will be:
name
id
Document1
1
Document2
3
Document3
NULL
Document4
NULL
Document5
NULL
On INNER JOINs they are interchangeable, and the optimizer will rearrange them at will.
On OUTER JOINs, they are not necessarily interchangeable, depending on which side of the join they depend on.
I put them in either place depending on the readability.
The way I do it is:
Always put the join conditions in the ON clause if you are doing an INNER JOIN. So, do not add any WHERE conditions to the ON clause, put them in the WHERE clause.
If you are doing a LEFT JOIN, add any WHERE conditions to the ON clause for the table in the right side of the join. This is a must, because adding a WHERE clause that references the right side of the join will convert the join to an INNER JOIN.
The exception is when you are looking for the records that are not in a particular table. You would add the reference to a unique identifier (that is not ever NULL) in the RIGHT JOIN table to the WHERE clause this way: WHERE t2.idfield IS NULL. So, the only time you should reference a table on the right side of the join is to find those records which are not in the table.
On an inner join, they mean the same thing. However you will get different results in an outer join depending on if you put the join condition in the WHERE vs the ON clause. Take a look at this related question and this answer (by me).
I think it makes the most sense to be in the habit of always putting the join condition in the ON clause (unless it is an outer join and you actually do want it in the where clause) as it makes it clearer to anyone reading your query what conditions the tables are being joined on, and also it helps prevent the WHERE clause from being dozens of lines long.
Short answer
It depends on whether the JOIN type is INNER or OUTER.
For INNER JOIN the answer is yes since an INNER JOIN statement can be rewritten as a CROSS JOIN with a WHERE clause matching the same condition you used in the ON clause of the INNER JOIN query.
However, this only applies to INNER JOIN, not for OUTER JOIN.
Long answer
Considering we have the following post and post_comment tables:
The post has the following records:
| id | title |
|----|-----------|
| 1 | Java |
| 2 | Hibernate |
| 3 | JPA |
and the post_comment has the following three rows:
| id | review | post_id |
|----|-----------|---------|
| 1 | Good | 1 |
| 2 | Excellent | 1 |
| 3 | Awesome | 2 |
SQL INNER JOIN
The SQL JOIN clause allows you to associate rows that belong to different tables. For instance, a CROSS JOIN will create a Cartesian Product containing all possible combinations of rows between the two joining tables.
While the CROSS JOIN is useful in certain scenarios, most of the time, you want to join tables based on a specific condition. And, that's where INNER JOIN comes into play.
The SQL INNER JOIN allows us to filter the Cartesian Product of joining two tables based on a condition that is specified via the ON clause.
SQL INNER JOIN - ON "always true" condition
If you provide an "always true" condition, the INNER JOIN will not filter the joined records, and the result set will contain the Cartesian Product of the two joining tables.
For instance, if we execute the following SQL INNER JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 1
We will get all combinations of post and post_comment records:
| p.id | pc.id |
|---------|------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
So, if the ON clause condition is "always true", the INNER JOIN is simply equivalent to a CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 1
ORDER BY p.id, pc.id
SQL INNER JOIN - ON "always false" condition
On the other hand, if the ON clause condition is "always false", then all the joined records are going to be filtered out and the result set will be empty.
So, if we execute the following SQL INNER JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 0
ORDER BY p.id, pc.id
We won't get any result back:
| p.id | pc.id |
|---------|------------|
That's because the query above is equivalent to the following CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 0
ORDER BY p.id, pc.id
SQL INNER JOIN - ON clause using the Foreign Key and Primary Key columns
The most common ON clause condition is the one that matches the Foreign Key column in the child table with the Primary Key column in the parent table, as illustrated by the following query:
SELECT
p.id AS "p.id",
pc.post_id AS "pc.post_id",
pc.id AS "pc.id",
p.title AS "p.title",
pc.review AS "pc.review"
FROM post p
INNER JOIN post_comment pc ON pc.post_id = p.id
ORDER BY p.id, pc.id
When executing the above SQL INNER JOIN query, we get the following result set:
| p.id | pc.post_id | pc.id | p.title | pc.review |
|---------|------------|------------|------------|-----------|
| 1 | 1 | 1 | Java | Good |
| 1 | 1 | 2 | Java | Excellent |
| 2 | 2 | 3 | Hibernate | Awesome |
So, only the records that match the ON clause condition are included in the query result set. In our case, the result set contains all the post along with their post_comment records. The post rows that have no associated post_comment are excluded since they can not satisfy the ON Clause condition.
Again, the above SQL INNER JOIN query is equivalent to the following CROSS JOIN query:
SELECT
p.id AS "p.id",
pc.post_id AS "pc.post_id",
pc.id AS "pc.id",
p.title AS "p.title",
pc.review AS "pc.review"
FROM post p, post_comment pc
WHERE pc.post_id = p.id
The non-struck rows are the ones that satisfy the WHERE clause, and only these records are going to be included in the result set. That's the best way to visualize how the INNER JOIN clause works.
| p.id | pc.post_id | pc.id | p.title | pc.review |
|------|------------|-------|-----------|-----------|
| 1 | 1 | 1 | Java | Good |
| 1 | 1 | 2 | Java | Excellent |
| 1 | 2 | 3 | Java | Awesome |
| 2 | 1 | 1 | Hibernate | Good |
| 2 | 1 | 2 | Hibernate | Excellent |
| 2 | 2 | 3 | Hibernate | Awesome |
| 3 | 1 | 1 | JPA | Good |
| 3 | 1 | 2 | JPA | Excellent |
| 3 | 2 | 3 | JPA | Awesome |
Conclusion
An INNER JOIN statement can be rewritten as a CROSS JOIN with a WHERE clause matching the same condition you used in the ON clause of the INNER JOIN query.
Not that this only applies to INNER JOIN, not for OUTER JOIN.
Let's consider those tables :
A
id | SomeData
B
id | id_A | SomeOtherData
id_A being a foreign key to table A
Writting this query :
SELECT *
FROM A
LEFT JOIN B
ON A.id = B.id_A;
Will provide this result :
/ : part of the result
B
+---------------------------------+
A | |
+---------------------+-------+ |
|/////////////////////|///////| |
|/////////////////////|///////| |
|/////////////////////|///////| |
|/////////////////////|///////| |
|/////////////////////+-------+-------------------------+
|/////////////////////////////|
+-----------------------------+
What is in A but not in B means that there is null values for B.
Now, let's consider a specific part in B.id_A, and highlight it from the previous result :
/ : part of the result
* : part of the result with the specific B.id_A
B
+---------------------------------+
A | |
+---------------------+-------+ |
|/////////////////////|///////| |
|/////////////////////|///////| |
|/////////////////////+---+///| |
|/////////////////////|***|///| |
|/////////////////////+---+---+-------------------------+
|/////////////////////////////|
+-----------------------------+
Writting this query :
SELECT *
FROM A
LEFT JOIN B
ON A.id = B.id_A
AND B.id_A = SpecificPart;
Will provide this result :
/ : part of the result
* : part of the result with the specific B.id_A
B
+---------------------------------+
A | |
+---------------------+-------+ |
|/////////////////////| | |
|/////////////////////| | |
|/////////////////////+---+ | |
|/////////////////////|***| | |
|/////////////////////+---+---+-------------------------+
|/////////////////////////////|
+-----------------------------+
Because this removes in the inner join the values that aren't in B.id_A = SpecificPart
Now, let's change the query to this :
SELECT *
FROM A
LEFT JOIN B
ON A.id = B.id_A
WHERE B.id_A = SpecificPart;
The result is now :
/ : part of the result
* : part of the result with the specific B.id_A
B
+---------------------------------+
A | |
+---------------------+-------+ |
| | | |
| | | |
| +---+ | |
| |***| | |
| +---+---+-------------------------+
| |
+-----------------------------+
Because the whole result is filtered against B.id_A = SpecificPart removing the parts B.id_A IS NULL, that are in the A that aren't in B
There is great difference between where clause vs. on clause, when it comes to left join.
Here is example:
mysql> desc t1;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | | NULL | |
| fid | int(11) | NO | | NULL | |
| v | varchar(20) | NO | | NULL | |
+-------+-------------+------+-----+---------+-------+
There fid is id of table t2.
mysql> desc t2;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | | NULL | |
| v | varchar(10) | NO | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Query on "on clause" :
mysql> SELECT * FROM `t1` left join t2 on fid = t2.id AND t1.v = 'K'
-> ;
+----+-----+---+------+------+
| id | fid | v | id | v |
+----+-----+---+------+------+
| 1 | 1 | H | NULL | NULL |
| 2 | 1 | B | NULL | NULL |
| 3 | 2 | H | NULL | NULL |
| 4 | 7 | K | NULL | NULL |
| 5 | 5 | L | NULL | NULL |
+----+-----+---+------+------+
5 rows in set (0.00 sec)
Query on "where clause":
mysql> SELECT * FROM `t1` left join t2 on fid = t2.id where t1.v = 'K';
+----+-----+---+------+------+
| id | fid | v | id | v |
+----+-----+---+------+------+
| 4 | 7 | K | NULL | NULL |
+----+-----+---+------+------+
1 row in set (0.00 sec)
It is clear that,
the first query returns a record from t1 and its dependent row from t2, if any, for row t1.v = 'K'.
The second query returns rows from t1, but only for t1.v = 'K' will have any associated row with it.
In terms of the optimizer, it shouldn't make a difference whether you define your join clauses with ON or WHERE.
However, IMHO, I think it's much clearer to use the ON clause when performing joins. That way you have a specific section of you query that dictates how the join is handled versus intermixed with the rest of the WHERE clauses.
Are you trying to join data or filter data?
For readability it makes the most sense to isolate these use cases to ON and WHERE respectively.
join data in ON
filter data in WHERE
It can become very difficult to read a query where the JOIN condition and a filtering condition exist in the WHERE clause.
Performance wise you should not see a difference, though different types of SQL sometimes handle query planning differently so it can be worth trying ¯\_(ツ)_/¯ (Do be aware of caching effecting the query speed)
Also as others have noted, if you use an outer join you will get different results if you place the filter condition in the ON clause because it only effects one of the tables.
I wrote a more in depth post about this here:
https://dataschool.com/learn/difference-between-where-and-on-in-sql
I think this distinction can best be explained via the logical order of operations in SQL, which is, simplified:
FROM (including joins)
WHERE
GROUP BY
Aggregations
HAVING
WINDOW
SELECT
DISTINCT
UNION, INTERSECT, EXCEPT
ORDER BY
OFFSET
FETCH
Joins are not a clause of the select statement, but an operator inside of FROM. As such, all ON clauses belonging to the corresponding JOIN operator have "already happened" logically by the time logical processing reaches the WHERE clause. This means that in the case of a LEFT JOIN, for example, the outer join's semantics has already happend by the time the WHERE clause is applied.
I've explained the following example more in depth in this blog post. When running this query:
SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
WHERE film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;
The LEFT JOIN doesn't really have any useful effect, because even if an actor did not play in a film, the actor will be filtered, as its FILM_ID will be NULL and the WHERE clause will filter such a row. The result is something like:
ACTOR_ID FIRST_NAME LAST_NAME COUNT
--------------------------------------
194 MERYL ALLEN 1
198 MARY KEITEL 1
30 SANDRA PECK 1
85 MINNIE ZELLWEGER 1
123 JULIANNE DENCH 1
I.e. just as if we inner joined the two tables. If we move the filter predicate in the ON clause, it now becomes a criteria for the outer join:
SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
AND film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;
Meaning the result will contain actors without any films, or without any films with FILM_ID < 10
ACTOR_ID FIRST_NAME LAST_NAME COUNT
-----------------------------------------
3 ED CHASE 0
4 JENNIFER DAVIS 0
5 JOHNNY LOLLOBRIGIDA 0
6 BETTE NICHOLSON 0
...
1 PENELOPE GUINESS 1
200 THORA TEMPLE 1
2 NICK WAHLBERG 1
198 MARY KEITEL 1
In short
Always put your predicate where it makes most sense, logically.
In SQL, the 'WHERE' and 'ON' clause,are kind of Conditional Statemants, but the major difference between them are, the 'Where' Clause is used in Select/Update Statements for specifying the Conditions, whereas the 'ON' Clause is used in Joins, where it verifies or checks if the Records are Matched in the target and source tables, before the Tables are Joined
For Example: - 'WHERE'
SELECT * FROM employee WHERE employee_id=101
For Example: - 'ON'
There are two tables employee and employee_details, the matching columns are employee_id.
SELECT * FROM employee
INNER JOIN employee_details
ON employee.employee_id = employee_details.employee_id
Hope I have answered your Question.
Revert for any clarifications.
I think it's the join sequence effect.
In the upper left join case, SQL do Left join first and then do where filter.
In the downer case, find Orders.ID=12345 first, and then do join.
For an inner join, WHERE and ON can be used interchangeably. In fact, it's possible to use ON in a correlated subquery. For example:
update mytable
set myscore=100
where exists (
select 1 from table1
inner join table2
on (table2.key = mytable.key)
inner join table3
on (table3.key = table2.key and table3.key = table1.key)
...
)
This is (IMHO) utterly confusing to a human, and it's very easy to forget to link table1 to anything (because the "driver" table doesn't have an "on" clause), but it's legal.
for better performance tables should have a special indexed column to use for JOINS .
so if the column you condition on is not one of those indexed columns then i suspect it is better to keep it in WHERE .
so you JOIN using the indexed columns, then after JOIN you run the condition on the none indexed column .
Normally, filtering is processed in the WHERE clause once the two tables have already been joined. It’s possible, though that you might want to filter one or both of the tables before joining them.
i.e, the where clause applies to the whole result set whereas the on clause only applies to the join in question.
They are equivalent, literally.
In most open-source databases (most notable examples, in MySql and postgresql) the query planning is a variant of the classic algorithm appearing in Access Path Selection in a Relational Database Management System (Selinger et al, 1979). In this approach, the conditions are of two types
conditions referring to a single table (used for filtering)
conditions referring to two tables (treated as join conditions, regardless of where they appear)
Especially in MySql, you can see yourself, by tracing the optimizer, that the join .. on conditions are replaced during parsing by the equivalent where conditions. A similar thing happens in postgresql (though there's no way to see it through a log, you have to read the source description).
Anyway, the main point is, the difference between the two syntax variants is lost during the parsing/query-rewriting phase, it does not even reach the query planning and execution phase. So, there's no question about whether they are equivalent in terms of performance, they become identical long before they reach the execution phase.
You can use explain, to verify that they produce identical plans. Eg, in postgres, the plan will contain a join clause, even if you didn't use the join..on syntax anywhere.
Oracle and SQL server are not open source, but, as far as I know, they are based equivalence rules (similar to those in relational algebra), and they also produce identical execution plans in both cases.
Obviously, the two syntax styles are not equivalent for outer joins, for those you have to use the join ... on syntax
Regarding your question,
It is the same both 'on' or 'where' on an inner join as long as your server can get it:
select * from a inner join b on a.c = b.c
and
select * from a inner join b where a.c = b.c
The 'where' option not all interpreters know so maybe should be avoided. And of course the 'on' clause is clearer.
a. WHERE clause: After joining, Records will be filtered.
b. ON clause - Before joining, Records (from right table) will be filtered.
To add onto Joel Coehoorn's response, I'll add some sqlite-specific optimization info (other SQL flavors may behave differently). In the original example, the LEFT JOINs have a different outcome depending on whether you use JOIN ON ... WHERE or JOIN ON ... AND. Here is a slightly modified example to illustrate:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON Orders.ID = OrderLines.OrderID
WHERE Orders.Username = OrderLines.Username
versus
SELECT *
FROM Orders
LEFT JOIN OrderLines ON Orders.ID = OrderLines.OrderID
AND Orders.Username = OrderLines.Username
Now, the original answer states that if you use a plain inner join instead of a left join, the outcome of both queries will be the same, but the execution plan will differ. I recently realized that the semantic difference between the two is that the former forces the query optimizer to use the index associated with the ON clause, while the latter allows the optimizer to choose any index within the ON ... AND clauses, depending on what it thinks will work best.
Occasionally, the optimizer will guess wrong and you'll want to force a certain execution plan. In this case, let's say that the SQLite optimizer wrongly concludes that the fastest way to perform this join would be to use the index on Orders.Username, when you know from empirical testing that the index on Orders.ID would deliver your query faster.
In this case, the former JOIN ON ... WHERE syntax essentially allows you to force the primary join operation to occur on the ID parameter, with secondary filtering on Username performed only after the main join is complete. In contrast, the JOIN ON ... AND syntax allows the optimizer to pick whether to use the index on Orders.ID or Orders.Username, and there is the theoretical possibility that it picks the one that ends up slower.
It matters:
Look for instance,
This is when you are using WHERE clause at the end
where cat.category is null or cat.category <> 'OTHER'
and here you are using AND clause on join
category 'OTHER' or category is null (I don't know why it doesn't show not equal sign)
Since when you are joining it you are taking the filtred value as a NULL
this is my solution.
SELECT song_ID,songs.fullname, singers.fullname
FROM music JOIN songs ON songs.ID = music.song_ID
JOIN singers ON singers.ID = music.singer_ID
GROUP BY songs.fullname
You must have the GROUP BY to get it to work.
Hope this help.

how should i join these five tables & SUM multiple columns from multiple tables

I have a database with 5 tables that have related data..
it looks something like this..
The table "associate_payin_ad" stores the date of registration & annexure id. Physically an Annexure is just a piece of paper which can have zero or more "Payin" or "Associate" entries..
Also 'payin' & 'associate' tables have multiple mode's of payment (like cash, cheque, bdcash, bdcheque) for the [amount] & [payment] column.. there are separate tables present for bycash, bycheque, bybdcash & bybdcheque, I have shown just the 'bycash' tables...
If the tables are filled with the following below given data..
[associate_payin_ad] Table:
adid | date_register | annexure_id
1 | 05/12/2011 | 1
2 | 05/12/2011 | 2
3 | 06/12/2011 | 1
4 | 07/12/2011 | 1
[payin] Table:
fid | amount | adid
1 | 10000 | 1 [this entry was made on 05/12/2011 in annexure no 1]
2 | 10000 | 1 [this entry was made on 05/12/2011 in annexure no 1]
3 | 40000 | 2 [this entry was made on 05/12/2011 in annexure no 2]
4 | 10000 | 4 [this entry was made on 07/12/2011 in annexure no 1]
[payin_bycash] Table:
fid | bycash
1 | 10000
2 | 10000
3 | 40000
4 | 10000
[associate] table...
aid | payment | adid
1 | 200 | 1 [this entry was made on 05/12/2011 in annexure no 1]
2 | 200 | 3 [this entry was made on 06/12/2011 in annexure no 1]
[associate_bycash] table...
aid | bycashajf
1 | 200
2 | 200
I need the SUM of [payin_bycash.bycash] & [associate_bycash.bycashajf] for a particular date range.. (for eg. 05/12/2011 to 07/12/2011)
date_register | amount
05/12/2011 | 60200
06/12/2011 | 200
07/12/2011 | 10000
I have been running around in circles since yesterday trying to figure out the appropriate query.. the best I could come up with it is this, but in vain:
SELECT apad.date_register,
SUM(ISNULL(pica.cash_in_hand, 0)) + SUM(ISNULL(aca.bycashajf, 0)) AS amount
FROM associate_payin_ad AS apad LEFT OUTER JOIN
payin AS pi ON apad.adid = pi.adid INNER JOIN
payin_bycash AS pica ON pi.fid = pica.fid
LEFT OUTER JOIN associate AS asso ON apad.adid = asso.adid INNER JOIN
associate_bycash AS aca ON asso.aid = aca.aid
WHERE (apad.date_register BETWEEN #date_initial AND #date_final)
GROUP BY apad.date_register
The above query returns me just this..
date_register | amount
05/12/2011 | 20400
What am i doing wrong?
thnx in advance
You can't mix inner and outer joins like that. When you use a left outer join, it will return null records in the right hand table to ensure that all rows from the left hand table are returned as expected. However, if you then try to join the right hand table to another table using an INNER join, the null records will be filtered out as you won't have matching null records in the other table.
In your case, this is happening when you join to payin. You'll get a row for aid=3, but then that row is filtered out when you try to join to payin_bycash, as aid=3 doesn't exist in payin.. Same problem for your join to associate.
The best way to around this problem is to left join to a subquery (or you could do it with a CRE).. Try this:
SELECT apad.date_register,
SUM(ISNULL(pica.cash_in_hand, 0)) + SUM(ISNULL(aca.bycashajf, 0)) AS amount
FROM associate_payin_ad AS apad
LEFT OUTER JOIN
(
SELECT payin_bycash.cash_in_hand
FROM payin
INNER JOIN payin_bycash ON payin.fid = payin_bycash.fid
) pi ON apad.adid = pi.adid
LEFT OUTER JOIN
(
SELECT associate_bycash.bycashajf
FROM associate
INNER JOIN associate_bycash ON associate.aid = associate_bycash.aid
) asso ON apad.adid = asso.adid
WHERE (apad.date_register BETWEEN #date_initial AND #date_final)
GROUP BY apad.date_register
Also, have a read of this: http://weblogs.sqlteam.com/jeffs/archive/2007/10/11/mixing-inner-outer-joins-sql.aspx

Resources