Left outer join funny results [duplicate] - sql-server

This question already has answers here:
Why and when a LEFT JOIN with condition in WHERE clause is not equivalent to the same LEFT JOIN in ON? [duplicate]
(5 answers)
Closed 9 years ago.
please take a look at below 2 queries regarding left outer join and tell me why there are differences.
Query 1 returns 1489 rows:
SELECT distinct a.GMS_MATERIALNUMBER,a.MATERIAL_DESCRIPTION, b.LDMC
FROM [AP_GDC2_PREPARATION_TEST].[dbo].[GDM_AUTOPULL] a
left outer join [AP_GDC2_STAGING_TEST].[dbo].[CFS_DIS_LDMC] b on
a.GMS_MATERIALNUMBER = b. GMS_MATERIALNUMBER and b.SAP_COMPANY_CODE= '1715'
and a.CFS_ORGANIZATION_CODE like 'rd_kr'
Query 2 returns only 295 rows which gives the same number of rows as when i do a simple select * from a where CFS_ORGANIZATION_CODE like 'rd_kr'
SELECT distinct a.GMS_MATERIALNUMBER,a.MATERIAL_DESCRIPTION, b.LDMC
FROM [AP_GDC2_PREPARATION_TEST].[dbo].[GDM_AUTOPULL] a
left outer join [AP_GDC2_STAGING_TEST].[dbo].[CFS_DIS_LDMC] b on
a.GMS_MATERIALNUMBER = b. GMS_MATERIALNUMBER and b.SAP_COMPANY_CODE= '1715'
where a.CFS_ORGANIZATION_CODE like 'rd_kr'
Basically query 2 is the result i wanted, but my question is why query 1 does not work? how exactly does the SQL server work in the background when it comes to the ON clause in the left outer join ?
Cheers

Both are literally different.
The first query does the filtering of table before the joining of tables will take place.
The second one filters from the total result after the joining the tables is done.
Here's an example
Table1
ID Name
1 Stack
2 Over
3 Flow
Table2
T1_ID Score
1 10
2 20
3 30
In your first query, it looks like this,
SELECT a.*, b.Score
FROM Table1 a
LEFT JOIN Table2 b
ON a.ID = b.T1_ID AND
b.Score >= 20
What it does is before joining the tables, the records of table2 are filtered first by the score. So the only records that will be joined on table1 are
T1_ID Score
2 20
3 30
SQLFiddle Demo
because the Score of T1_ID is only 10. The result of the query is
ID Name Score
1 Stack NULL
2 Over 20
3 Flow 30
SQLFiddle Demo
While the second query is different.
SELECT a.*, b.Score
FROM Table1 a
LEFT JOIN Table2 b
ON a.ID = b.T1_ID
WHERE b.Score >= 20
It joins the records first whether it has a matching record on the other table or not. So the result will be
ID Name Score
1 Stack 10
2 Over 20
3 Flow 30
SQLFiddle Demo
and the filtering takes place b.Score >= 20. So the final result will be
ID Name Score
2 Over 20
3 Flow 30
SQLFiddle Demo

The difference is because you made an LEFT JOIN.
So you get all rows from your first table and all that match from your second table.
In the second query you JOIN first, and after you set your WHERE statement to reduce the result.

Related

SQL Server joining multiple CTE

I have 4 Common Table Expressions each contains 2 columns
(RowNumber, AccountNumber) but contain variable records in each CTE depending upon the query parameters. Purpose is to keep all non null account numbers at the top for each CTE after joining.
I am joining 4 CTE's using FULL Join on the basis of RowNumber. The problem I am getting is the sequence of AccountNumber is not continuous i.e. it includes some null values in between Accountnumber in some cases. I want to keep all non null values always combined and at the top with nulls. The number of AccountNumber's in each CTE are always different.
SELECT​
ISNULL(Cte_FirstYear.AccountNumber,'') as FirstYear,​
ISNULL(Cte_SecondYear.AccountNumber,'') as SecondYear,​
ISNULL(cte_ThirdYear.AccountNumber,'') as ThirdYear,​
ISNULL(cte_FourthYear.AccountNumber,'') as FourthYear​
FROM cte_ThirdYear​
FULL OUTER JOIN​
cte_FirstYear
on ​
cte_ThirdYear.RowNumber=cte_FirstYear.RowNumber​​
full outer join Cte_SecondYear​
on ​
cte_ThirdYear.RowNumber=Cte_SecondYear.RowNumber​​
full outer join cte_FourthYear​
on ​
cte_ThirdYear.RowNumber=cte_FourthYear.RowNumber​​
Here is how I am getting the output;
FirstYear SecondYear ThirdYear FourthYear
1 2 3 4
5 6 7 1
9 NULL NULL
NULL
9 9
10 NULL
Here is expected output;
FirstYear SecondYear ThirdYear FourthYear
1 2 3 4
5 6 7 1
9 9 9
10
Based on the explained by #Donnie in the link
A cross join produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.
A full outer join is a combination of a left outer and right outer join. It returns all rows in both tables that match the query's where clause, and in cases where the on condition can't be satisfied for those rows it puts null values in for the unpopulated fields.
You can add this line to ignore :
on cte_ThirdYear.RowNumber=cte_FirstYear.RowNumber​​ and on ​ cte_ThirdYear.RowNumber is not null and cte_FirstYear.RowNumber​​ is not null
Read this pdf : http://stevestedman.com/wp-content/uploads/TSqlJoinTypePoster1.pdf
I created another CTE which takes the maximum records from 4 ctes and generates RowNumber 1 to N (Max. Number of records in 4 CTE's) and joined all 4 CTE's with it using LEFT JOIN.
Here is how I modified the query to achieve the result;
Cte_Max(RowNumber) AS (
SELECT TOP
(
select max(c) from
(
select count(*) c from cte_FirstYear
UNION
select count(*) c from Cte_SecondYear
UNION
select count(*) c from cte_ThirdYear
UNION
select count(*) c from cte_FourthYear
) x
)
ROW_NUMBER() OVER (ORDER BY c1.id asc) as RowNumber
FROM syscolumns AS c1
CROSS JOIN syscolumns AS c2
)
select
ISNULL(cte_FirstYear.AccountNumber,'') as FirstYear,​
ISNULL(Cte_SecondYear.AccountNumber,'') as SecondYear,​
ISNULL(cte_ThirdYear.AccountNumber,'') as ThirdYear,​
ISNULL(cte_FourthYear.AccountNumber,'') as FourthYear
from Cte_Max
LEFT join cte_FirstYear
on
Cte_Max.RowNumber=cte_FirstYear.RowNumber
LEFT join
Cte_SecondYear
on
Cte_Max.RowNumber=Cte_SecondYear.RowNumber
LEFT join
cte_ThirdYear
on
Cte_Max.RowNumber =cte_ThirdYear.RowNumber
LEFT join
cte_FourthYear
on
Cte_Max.RowNumber =cte_FourthYear.RowNumber
ORDER BY Cte_Max.RowNumber

Selecting Max with Lots of Other Items

Sorry for the poor title. I wasn't sure how to describe my problem. I've written a query that returns about 23,000 records. A lot of those records have similar information and I want to only select the records with the maximum of the field dbo.tblMsgsOnAir_Type8.fldBuddyLinkSigStrength. I've tried grouping by all of the other columns being selected, but it doesn't appear to work correctly. I don't fully understand SQL, especially the max and group functions. I can do simple max functions when I only want or need to select one thing. I don't understand how it works when I want to select a bunch of other data. Below is the query.
SELECT
dbo.tblmeterinfo.fldMeterSerialNumber AS "MOP_FNP_Meter",
dbo.tblMsgsOnAir_Type8.fldRBuddyId AS "MOP_FNP_FNID",
dbo.TBLMETERMAINT.fldmeterid AS "Meter_ID_Helped",
dbo.tblMsgsOnAir_Type8.fldCBuddyId AS "FNID_Helped",
dbo.fn_dt(dbo.tblMsgsOnAir_Type8.fldRBuddyToi) AS "TOI",
dbo.tblMsgsOnAir_Type8.fldBuddyLinkSigStrength AS "Sig_Str",
dbo.TBLSAWN_CIS_INFO.SML AS "Buddy_SML",
dbo.TBLMETERLIST.fldaddress AS "Buddy_Address",
dbo.TBLSAWNGISCOORD.X_COORD AS "X_Coord",
dbo.TBLSAWNGISCOORD.Y_COORD AS "Y_Coord"
FROM dbo.tblMsgsOnAir_Type8
LEFT OUTER JOIN dbo.TBLMETERLIST
ON (dbo.TBLMETERLIST.FLDREPID = dbo.tblMsgsOnAir_Type8.fldCBuddyId)
LEFT OUTER JOIN dbo.TBLMETERMAINT
ON (dbo.TBLMETERMAINT.FLDREPID = dbo.tblMsgsOnAir_Type8.fldCBuddyID)
LEFT OUTER JOIN dbo.TBLSAWN_CIS_INFO
ON (dbo.TBLSAWN_CIS_INFO.FLDREPID = dbo.tblMsgsOnAir_Type8.fldCBuddyId)
LEFT OUTER JOIN dbo.TBLSAWNGISCOORD
ON (dbo.TBLSAWNGISCOORD.SRV_MAP_LOC = dbo.TBLSAWN_CIS_INFO.SML)
LEFT OUTER JOIN dbo.tblmeterinfo
ON (dbo.tblmeterinfo.fldRepId = dbo.tblMsgsOnAir_Type8.fldRBuddyId)
WHERE dbo.tblMsgsOnAir_Type8.fldRBuddyId IN (SELECT
dbo.tblSAWN_FNPmap.Repid
FROM dbo.tblSAWN_FNPmap)
AND dbo.TBLMETERMAINT.fldmeterid IS NOT NULL
The query below is simple and does what I want, but doesn't get all of the other field. This query only returns 617 records. I would like the above query to return 617 records, but include all of the other information I've selected.
SELECT
dbo.TBLMETERMAINT.fldmeterid AS "Meter_ID_Helped",
MAX(dbo.tblMsgsOnAir_Type8.fldBuddyLinkSigStrength) AS "Max_Sig"
FROM dbo.tblMsgsOnAir_Type8
LEFT OUTER JOIN dbo.TBLMETERMAINT
ON (dbo.TBLMETERMAINT.FLDREPID = dbo.tblMsgsOnAir_Type8.fldCBuddyID)
WHERE dbo.tblMsgsOnAir_Type8.fldRBuddyId IN (SELECT
dbo.tblSAWN_FNPmap.Repid
FROM dbo.tblSAWN_FNPmap)
AND dbo.TBLMETERMAINT.fldmeterid IS NOT NULL
GROUP BY dbo.TBLMETERMAINT.fldmeterid
Probably row_number() to the rescue. You can use it to find the best records in a set, with a grouping by some subset or other. Something like
select *
from ....
where row_number over (partition by id order by fldBuddyLinkSigStrength) = 1
So SQL Server assigns a row number within the groups. Each record will be sub-grouped by id, in this case, and given 1 if it's the best strength, 2 if it's next, etc.
If you are getting duplicates have you tried using SELECT DISTINCT?
Basically how Max works is that it will select the highest value in the group.
So if you have a table:
ID | VALUE
1 | 10
1 | 7
1 | 9
2 | 6
2 | 8
And do
SELECT ID, MAX(VALUE)
FROM TABLE
GROUP BY ID
You'll get the max value per ID
ID | VALUE
1 | 10
2 | 8
If you want to get the Max while not grouping the result then you can do the group in a subselect
SELECT ID, VALUE, MAX_VALUE etc etc
FROM TABLE
JOIN ( SELECT ID, MAX(VALUE) AS MAX_VALUE FROM TABLE GROUP BY ID) as MAX ON MAX.ID = TABLE.ID
Without knowing your table structures in more detail I can't be sure this is the best way, but here's something that should work. Use the 2nd query as the left side of a left join, to pick up the extra columns:
select a.*
from (<your 2nd query>) a
left join dbo.TBLMETERLIST
on (a.FLDREPID = dbo.tblMsgsOnAir_Type8.fldCBuddyId)
left join <next table> ...
and so on. You'll also have to left join on dbo.tblMsgsOnAir_Type8 in order to pick up the columns in that table, so that's one additional left join beyond what your first query does. By the way, it's a good idea to post code here laid out so it's readable; it makes it a lot easier for others to understand.

Rank Based on two tables

I need some help to calculate the rank from two table.
Suppose i have two table - table1 and table2.
In table1, i have below info
Disease value
A 20
B 10
C 35
In table2, i have below info
Diseaselist Othervalue
A 20
B 10
D 35
E 20
I want to check here, if A from table1 is available in table 2 then it will get high rank othewise less rank. Here C in table1 has more value than A but it is not available in table2 so it will get less rank than A and B.
Kindly sugges how would i accomplish this.
Regards,
Ratan
You can join the two tables using LEFT JOIN. And to order the rows, use CASE statement.
SELECT a.Disease, a.Value
FROM Table1 a
LEFT JOIN Table2 b
ON a.Disease = b.DiseaseList
ORDER BY CASE WHEN b.DiseaseList IS NULL THEN 1 ELSE 0 END,
a.Value DESC

Sql Server,use a subquery to find value in one table that is not in another

I am comparing an original table in SQL server to an Update table. I'm trying to find how many "First Numbers" have changed. As they do change in this system. But, this query seems to bring back "First Numbers" that that are equal to both tables. What am I doing wrong?
select *
from
tblBlue
where
Exists (Select 'x'
From tblRed
Where tblRed.FirstNumber != tblBlue.FirstNumber
and tblRed.ID = tblBlue.ID)
Example data:
tblRed
ID FirstNumber
1 10
2 20
3 30
4 40
tblBlue
1 5
2 20
3 35
4 40
I would expect to see:
1 5
3 35
Your query should work (see example at SQL Fiddle.) Could you post example data for which it's returning the wrong results?
A slightly clearer way to write it:
select *
from tblBlue new
join tblRed old
on new.ID = red.ID
where new.FirstNumber <> old.FirstNumber
Easier solution: use a left join
SELECT r.*
FROM tblRed r
LEFT JOIN tblBlue b ON b.ID = r.ID AND b.FirstNumber = r.FirstNumber
WHERE b.ID IS NULL
This will return records in tblRed that satisfy one of two conditions: 1) the ID isn't even found in tblBlue, i.e. a new record. or 2) the IDs were found, but the numbers have changed. Because if both the ID is the same and the FirsTNumber is the same, then b.ID will not be NULL, thus a match, and you can exclude it from the resultset of different values.

SQL Server - ISNULL not working on Update Query

Rather than have a NULL in a column, I want a 0 to be present.
Given the following two tables:
TABLE1
ClientID OrderCount
1 NULL
2 NULL
3 NULL
4 NULL
Table2
ClientID OrderCount
1 2
3 4
4 6
NOTE: The OrderCount column in both tables is INT datatype.
UPDATE TABLE1
SET OrderCount = ISNULL(TABLE2.OrderCount,0)
FROM TABLE1
INNER JOIN TABLE2 ON TABLE2.ClientID = TABLE1.CLIENTID
When I look at table1, I see this:
ClientID OrderCount
1 2
2 NULL
3 4
4 6
So, I thought to myself - "Obviously, I should be using NULLIF and not ISNULL", so I reversed them. Same result.
What am I doing wrong here? How do I get a 0 rather than a NULL in the column?
You need a LEFT JOIN rather than an INNER JOIN. The records that don't have a matching ClientID are not even being touched by your query.
you are using INNER JOIN but you don't have client ID 2 on table2 so your result set wont include a line with 2. Replace it with LEFT JOIN
Your join is probably filtering out rows.

Resources