INNER And OUTER JOIN On Multiple Nullable Columns - sql-server

The query we are currently testing gives doubtable results, where Inner JOIN + LEFT OUTER JOIN != B Count.
The comparaison columns used in both tables are Nullable, but our goal is clear:
In case of INNER JOIN or intersection, Nulls should not be used for comparaison.
In case of LEFT OUTER JOIN, if both Bs are Nulls, should be included as not found in A, otherwise, we should compaire only the Not-Nulls.
Intesection
SELECT b.c1,b.c2
FROM B as b
Inner JOIN A as a
ON (( b.c1= a.c1 and not a.c1 is null)
OR (b.c2=a.c2 and not a.c2 is null))
The previous code gives duplicate values when one of the columns is null!
In B not in A
SELECT b.c1,b.c2
FROM B as b
LEFT OUTER JOIN A as a
ON (( b.c1= a.c1 and not a.c1 is null)
OR (b.c2=a.c2 and not a.c2 is null))
WHERE a.id IS NULL
-- filter duplicates
GROUP BY b.c1,b.c2
I guess the problem lies in the ON part where we join tables.
Thank you in advance

You could run this with sub-queries rather than a join if you prefer.
SELECT b.c1,b.c2
FROM B as b
WHERE
b.c1 IN (SELECT c1 from a WHERE c1 IS NOT NULL)
OR (b.c2 IN (SELECT c2 from a WHERE c2 IS NOT NULL);

Related

Snowflake Inner join with same table

I am trying to bring this query in Snowflake. But, getting huge numbers with the last 3 inner joins which has same tables with different conditions.
select count(*) from table2; --5
select count(*) from table_3;--2824134
select count(*) from table1;--478015
Original Query:
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id and Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id and first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id and second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
I tried breaking the queries into 3 parts.
create temp table tb1 as
select *
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number ;
create temp table tb2 as
select ID_I , time as time as Zero_time,time as first_time,time as second_time
from table_3
where typeid in (8,9,7,19)
Note: saving the time column with different names for reference.
create temp table final_table as
select * from tb1 r
inner join tb2
on tb2.ID_I = r.id
where tb2.Zero_time <= r.starttime
and tb2.first_time <= Zero.time
and tb2.second_time >= r.endtime
Basically, I am trying to break the conditions in the joins to different parts.
This same logic has to be applied for different tables and do a union all for final table values.
Please help if this would work or let me know if this shall be handled with a better approach that executes faster.
TIA.
Try and convert following -
select * from
from table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 Zero on Zero.ID_I = r.id
AND Zero.time <= d_tbl.starttime and Zero.typeid in (7,19)
inner join table_3 first on first.ID_I = r.id
AND first.time <= Zero.time and first.typeid in (8,9)
inner join table_3 second on second.ID_I = r.id
AND second.time >= d_tbl.endtime and second.typeid in (8,9)
where d_tbl.mode = 0;
To something like below -
select whatever-columns,
case when t3.time <= d_tbl.starttime
AND Zero.typeid in (7,19) then t3.time_1 end as zero_time,
case when t3.time <= zero_time
AND Zero.typeid in (8,9) then t3.time_1 end as first_time, --- snowflake allows to select/reference previous column
case when t3.time >= d_tbl.endtime
AND second.typeid in (8,9)then t3.time_1 end as second_time
from
table1 d_tbl
inner join table2 r on r.number = d_tbl.number
inner join table_3 t3 on t3.ID_I = r.id
where d_tbl.mode = 0;
This will help to reduce data-set being searched by avoiding multiple inner join on same table that has most records.

SQL to find which table has diag_code filled, and then look it up in lookup_table

I have a query and a diag_code is either in one table (UM_SERVICE) or the other (LOS), but I can't join both tables to get diag_code that isn't null, that I can think of. Does this look ok for finding if diag_code is in one of the tables and lookup table? It's possible to have both LOS and UM_SERVICE have a diag code on different rows, and they could be different, and both or one could be in the lookup table. I'm not seeing anything in internet search.
Here's a simplified stored procedure:
SELECT distinct
c.id
,uc.id
,c.person_id
FROM dbo.CASES c
INNER JOIN dbo.UM_CASE uc with (NOLOCK) ON uc.case_id = c.id
LEFT JOIN dbo.UM_SERVICE sv (NOLOCK) ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT usc on usc.service_id = sv.id
LEFT JOIN dbo.LOS S WITH (NOLOCK) ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION SC WITH (NOLOCK) ON SC.los_id = S.id
INNER JOIN dbo.PERSON op with (NOLOCK) on op.id = c.Person_id
WHERE
(sv.diag_code is not null and c.case_id = sv.case_id
or
s.diag_code is not null and c.case_id = s.case_id)
and
(sv.diag_code is not null and sv.diag_code in (select diag_code from TABLE_LOOKUP)
or
s.diag_code is not null and s.diag_code in (select diag_code from TABLE_LOOKUP)
Table setups like this:
CASES
id person_id
UM_CASE
case_id
LOS
case_id id
LOS_EXTENSION
los_id
Person
id cid
UM_SERVICE
case_id diag_code
UM_SERVICE_CERT
service_id id
TABLE_LOOKUP
diag_code
Since you have two different searches being run, it is going to be much easier to write/read by writing the searches individually and then bringing your two results sets together using the UNION operator. The UNION will eliminate duplicates across the two result sets in a similar manner to what your usage of SELECT DISTINCT is doing for a single result set.
Like so:
/*first part of union performs seach using filter on dbo.UM_SERVICE*/
SELECT
c.id
,uc.id
,c.person_id
FROM
dbo.CASES AS c
INNER JOIN dbo.UM_CASE AS uc ON uc.case_id=c.id
LEFT JOIN dbo.UM_SERVICE AS sv ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT AS usc on usc.service_id=sv.id
LEFT JOIN dbo.LOS AS S ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION AS SC ON SC.los_id= S.id
INNER JOIN dbo.PERSON AS op on op.id=c.Person_id
WHERE
sv.diag_code in (select diag_code from TABLE_LOOKUP) /*will eliminate null values in sv.diag_code*/
UNION /*deduplicate result sets*/
/*second part of union performs search using filter on dbo.LOS*/
SELECT
c.id
,uc.id
,c.person_id
FROM
dbo.CASES AS c
INNER JOIN dbo.UM_CASE AS uc ON uc.case_id=c.id
LEFT JOIN dbo.UM_SERVICE AS sv ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT AS usc on usc.service_id=sv.id
LEFT JOIN dbo.LOS AS S ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION AS SC ON SC.los_id= S.id
INNER JOIN dbo.PERSON AS op on op.id=c.Person_id
WHERE
s.diag_code in (select diag_code from TABLE_LOOKUP); /*will eliminate null values in s.diag_code*/

Multiple Nested Inner Joins: not all records are shown

I have difficulty joining two tables that look like the following:
The main table PMEOBJECT which has a unique key named OBJECTID and
has in total 12768 rows.
Then I want to join PMEOBJECTVALIDITY on it which has an n:1 relationship with PMEOBJECT, since it has more rows,
because it saves the changes over time of PMEOBJECT (i.e. when a certain object is not
valid anymore), this one has 12789 rows (meaning only 21 objects
changed over time). However, I only want to have the current last
VALIDFROM date shown in the query. This all works fine.
Then the trouble starts when I want to join PMEOBJECTDIMENSION, which has an
n:1 relationship with PMEOBJECTVALIDITY and has 36737 rows in total.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
LEFT JOIN PMEOBJECTDIMENSION
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
Results in query per step:
SELECT PMEOBJECT: 12768 rows
LEFT JOIN PMEVALIDITY: 12789 rows
INNER JOIN PMEVALIDITY: 12768 rows
LEFT JOIN PMEOBJECTDIMENSION: 36737 rows
INNER JOIN PMEOBJECTDIMENSION: 12729 rows
I want the end result again to have the same 12768 rows, I don't want any ObjectId to be left out.
What am I missing here?
Kind regards,
Igor
Following might help:
from PMEOBJECTDIMENSION onwards:
LEFT JOIN (SELECT PMEOBJECTDIMENSION.OBJECTVALIDITYID, PMEOBJECTDIMENSION.DATAAREAID
FROM PMEOBJECTDIMENSION
INNER JOIN(SELECT OBJECTVALIDITYID, MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
)X
ON X.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND X.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
and select the distinct records if duplicates present.
The INNER JOINs are filtering out records- what you want is that the LEFT JOIN table (PMEOBJECTVALIDITY and PMEOBJECTDIMENSION) should only include records that have at least a match on the INNER JOIN queries (alias B and C). You can accomplish this with by nesting the INNER JOIN with the LEFT JOIN, generally done as follows:
SELECT *
FROM A
LEFT JOIN B
INNER JOIN C
ON B.ID = C.BID
ON A.ID = B.AID
Now B is INNER JOINed on C and will only contain records that have a match in C, but will preserve the LEFT JOIN not remove any records from A.
In your case, you can simply move the ON clause from the LEFT JOIN to the end of the following INNER JOIN.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID --here it is!
LEFT JOIN PMEOBJECTDIMENSION
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID --I'm here

Cascade of outer joins

As a schematic example, I have 3 tables that I desire to join, A,B,C where A to B is joined via an outer join and B to C is potentially joined via an inner join. In this constellation, I have to write two outer joins to get data if the first join does not have a match A-B in a line:
SELECT [fields] FROM
A
LEFT OUTER JOIN
B ON [a.field]=[b.field]
LEFT OUTER JOIN
C ON [b.field]=[c.field]
It seems to me logically that I have to write the second statement as an outer join. However I'm curious if there is a possiblity to set brackets for the join scope to signal that the second join should only used if the first inner join has found matching data for A-B. Something like:
SELECT [fields] FROM
A
(LEFT OUTER JOIN
B ON [a.field]=[b.field]
INNER JOIN
C ON [b.field]=[c.field]
)
I have played around a little but not found a possiblity to set brackets. The only way I have found to make this working is with a sub-query. Is this the only way to go?
Actually there is a syntax for that case.
SELECT fields
FROM A
LEFT OUTER JOIN (
B INNER JOIN C ON b.field = c.field
) ON a.field = b.field
The parentheses are optional, and the result is the same without them, being equivalent to the result of
SELECT fields
FROM A
LEFT OUTER JOIN B ON a.field = b.field
LEFT OUTER JOIN C ON b.field = c.field
You could perform it as follows
SELECT Fields
FROM TableA a
LEFT OUTER JOIN (SELECT Fields
FROM TableB b
INNER JOIN TableC c ON b.Field = c.Field) x on a.Field = x.Fi
eld
Not too sure if there would be any performance benefit to this without testing it out though.
The 2nd way would be with a subquery as per Jon Bridges' answer
However, they are the same semantically.
A CTE could be used if you have a complex subquery
;WITH BjoinC AS
(
SELECT Fields
FROM TableB b
INNER JOIN
TableC c ON b.Field = c.Field
)
SELECT [fields] FROM
A
LEFT OUTER JOIN
BjoinC ON ...
What About something like:
SELECT [fields]
FROM A
LEFT JOIN ( SELECT DISTINCT [fields]
FROM B
LEFT JOIN C ON b.field = c.field
) on a.field = b.field

using (+) after where statement in sql server

How can i convert following plsql to tsql. B.STATUS(+)=1 doesnt work in tsql.
Select * from A,B where A.ID=B.ID(+)
WHERE B.STATUS(+)=1
this doesnt return rows in mssql because it doesnt understand B.STATUS is optional
Select * from A LEFT JOIN B ON A.ID=B.ID
WHERE B.STATUS=1
An OUTER JOIN changes to an INNER JOIN when a condition is applied to the outer table in the WHERE clause. In the ON clause, it stays as OUTER.
You need to push the predicate "in"
Select * from A LEFT JOIN B ON A.ID = B.ID AND B.STATUS=1
OR
Select * from
A LEFT JOIN
(SELECT * FROM B WHERE B.STATUS=1) B1 ON A.ID = B1.ID
Just make it an OR "(b.status = 1 OR b.status IS NULL)"

Resources