How to join on two fields where one could be null

How to join on two fields where one could be null - sql-server

This query may seem basic, but i'm at a fairly basic level.
So here is my data - Sorry about the formatting, i've tried following the help but the table formatting is obviously not working for me (Can someone please advise?):
Table 1
ID |Country
---| -------
1 | UK
1 | IE
1 | US
2 | UK
2 | FR
Table 2
ID |Country
---| -------
1 | UK
1 | IE
2 | UK
The result i want is this
Table 1----- | ----Table 2
ID |Country |-----ID |Country
---| ------- |--------|--------
1 | UK | 1 | UK
1 | IE | 1 | IE
1 | US | 1 | NULL
2 | UK | 2 | UK
2 | FR | 2 | NULL
But more specifically i want to identify the NULL's so that i get this result
Table 1----- | ----Table 2
ID |Country |-----ID |Country
---| ------- |--------|--------
1 | US | 1 | NULL
2 | FR | 2 | NULL
The code i have used so far is:
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id and t1.country = t2.country
where t1.id is not null
and t2.country is null

Try this
select t1.id, t1.country, isnull(t2.id, t1.id) AS T2_ID, t2.country
from table1 t1
left outer join table2 t2 on t1.id = t2.upc and t1.country = t2.country
if you want to only show the ones where you have nulls in t2, you can add
where t2.id is null
But if you want to show all the records, just leave it without the WHERE condition

You were close, you just need to use isnull() or coalesce().
select
t1.id
, t1.country
, t2_Id = isnull(t2.id,t1.id)
, t2_country = t2.country
from table1 t1
left outer join table2 t2 on t1.id = t2.id and t1.country = t2.country
where t1.id is not null
--and t2.country is null
rextester demo: http://rextester.com/XCNH52338
returns:
+----+---------+-------+------------+
| id | country | t2_Id | t2_country |
+----+---------+-------+------------+
| 1 | UK | 1 | UK |
| 1 | IE | 1 | IE |
| 1 | US | 1 | NULL |
| 2 | UK | 2 | UK |
| 2 | FR | 2 | NULL |
+----+---------+-------+------------+
with the additional filter of t2.country is null
returns:
+----+---------+-------+------------+
| id | country | t2_Id | t2_country |
+----+---------+-------+------------+
| 1 | US | 1 | NULL |
| 2 | FR | 2 | NULL |
+----+---------+-------+------------+
The main difference between the two is that coalesce() can support more than 2 parameters, and it selects the first one that is not null. More differences between the two are answered here.
coalesce() is standard ANSI sql, so it is available in most RDBMS. isnull() is specific to sql server.
Reference:
isnull() - msdn
coalesce() - msdn
coalesce vs isnull - Itzik Ben-Gan

Related

SQL Sub rows - UNION vs JOIN

i would like to join 2 table but not duplicating all data.
try to be more clear
I have table_A
| ID | Description | Total
| 1. | Test a. | 10
| 2. | Test B. | 8
and my total is 18
Table_B
|ID| Site
|1 | Site A
|1 | Site B
|2 | Site C
If i do a left
Select a.ID,a.Description,b.Site,a.Total from table_a as a
left outer join table_b as b on a.id =b.it
i get
| ID | Description | Site | Total
| 1. | Test a. | Site A|. 10
| 1. | Test a. | Site B|. 10
| 2. | Test B. | Site C| 8
so my total became 28
i would like to get something like
| a.ID |b.ID| Description | Site | Total
| 1. | | Test a. | |. 10
| 1. | 1 | | Site A|
| 1. | 1 | | Site B|
| 2. | | Test B. | | 8
| 2. | 2 | | Site C|
so i can have it in excel and create a group by into the row

I think this is what you want. There are a few methods of doing this, one is using a UNION to get the 2 datasets:
SELECT aID,
bID,
description,
Site,
Total
FROM (SELECT a.ID AS aID,
b.ID AS bID,
a.description,
b.Site,
NULL AS Total
FROM dbo.TableA a
JOIN dbo.TableB b ON A.ID = B.ID
UNION ALL
SELECT a.ID AS aID,
NULL,
a.description,
NULL,
a.Total
FROM dbo.TableA a) U
ORDER BY aID,
bID,
Site;
db<>fiddle

How to find difference between two tables in MSSQL

I have got two tables 'Customer'.
The first one:
ID | UserID | Date
1. | 1 | 2018-05-01
2. | 1 | 2018-05-02
The second one:
ID | UserID | Date
1. | 1 | 2018-05-01
2. | 1 | 2018-05-02
3. | 1 | 2018-05-03
So, as you can see in the second table, there is one row more.
I have written so far this code:
;with cte_table1 as (
select UserID, count(id) cnt from db1.Customer group by UserID
),
cte_table2 as (
select UserID, count(id) cnt from db2.Customer group by UserID
)
select * from cte_table1 t1
join cte_table2 t2 on t2.UserID = t1.UserID
where t1.cnt <> t2.cnt
and this gives me expected result:
UserID | cnt | UserID | cnt
1 | 2 | 1 | 3
And so far, everything is fine. The thing is, these two tables have many rows and I'd like to have result with dates, where cnt does not match.
In other words, I'd like to have something like this:
UserID | cnt | Date | UserID | cnt | Date
1 | 2 | 2018-05-01 | 1 | 3 | 2018-05-01
1 | 2 | 2018-05-02 | 1 | 3 | 2018-05-01
1 | 2 | NULL | 1 | 3 | 2018-05-03
The best soulution would be resultset where both cte's are joined to give this:
UserID | cnt | Date | UserID | cnt | Date
1 | 2 | 2018-05-01 | 1 | 3 | 2018-05-01
1 | 2 | 2018-05-02 | 1 | 3 | 2018-05-01
1 | 2 | NULL | 1 | 3 | 2018-05-03
1 | 2 | 2018-05-30 | 1 | 3 | NULL

You should do a FULL OUTER JOIN query like below
Select
C1.UserID,
C1.cnt,
C1.Date,
C2.UserID,
C2.cnt,
C2.Date
from
db1.Customer C1
FULL OUTER JOIN
db2.Customer C2
on C1.UserId=C2.UserId and C1.date=C2.Date

SQL server: complex left join query

there are 2 tables:
people
+------------+--------------+------+
| name | place | pid |
+------------+--------------+------+
| Mr John | place1 | 1 |
| Miss Smith | place2 | 2 |
+------------+--------------+------+
places
+------+------+----------------------+
| pid | owner| address |
+------+------+----------------------+
| 1 | 1 | address1 |
| 1 | null | address2 |
| 2 | null | address3 |
| 2 | null | address4 |
| 2 | null | address5 |
+------+------+----------------------+
I am looking for a query which will return:
people (complex left join) places on people.pid = places.pid
Mr John | place1 | 1 | 1 | 1 | address1
Miss Smith | place2 | 2 | 2 | null | address3
Miss Smith | place2 | 2 | 2 | null | address4
Miss Smith | place2 | 2 | 2 | null | address5
In words a join on pid but if there is a non null owner value for the specific person then get only that row, if there is not a non null owner value then get all the rows for the specific person. Using left join because I need also the people with pid = null

One strategy is to pre process the places table in a CTE to identify which pid group of records have at least one non NULL owner value. Such pid records need to all be included in the join. On the other hand, if a pid group has at least one non NULL owner, then we will only include non NULL matches in the join.
WITH cte AS (
SELECT pid, owner, address,
SUM(CASE WHEN owner IS NOT NULL THEN 1 ELSE 0 END) OVER
(PARTITION BY pid) AS non_null_cnt
FROM places
)
SELECT t1.name, t1.place, t1.pid, t2.owner, t2.address
FROM people t1
LEFT JOIN cte t2
ON t1.pid = t2.pid AND
(t2.owner IS NOT NULL OR t2.non_null_cnt = 0)
ORDER BY t1.pid;
Demo

;WITH CTE as
(
SELECT count(owner)over(partition by pid) mo,*
FROM places
)
SELECT *
FROM people p
LEFT JOIN CTE
ON
p.pid = CTE.pid
and (mo = 0
or owner is not null)

SQL Server join query optimization with group by

Just for knowledge, I want to know that, can the below given query be achieve by any other feasible way like using group by.
SELECT
GROUPMAS.GRPCODE, GROUPMAS.GRPNAME,
GRPDTLS.ACCODE, GRPDTLS.ACNAME, GRPDTLS.DOA "ADMISSION DATE",
LOANMAST.LOANCODE, LOANMAST.VCHDATE "LOAN SANCTION DATE",
LOANMAST.LANAMT,
(SELECT SUM(RECPDTLS.INSTAMT)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS REPAYMENT,
(SELECT SUM(RECPDTLS.INTAMT)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS INTREST,
(SELECT MAX(RECPDTLS.VCHDATE)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS "LAST PAYMENT ON"
FROM
GROUPMAS
JOIN
GRPDTLS ON (GROUPMAS.GRPCODE = GRPDTLS.GRPCODE AND GRPDTLS.DOA <= '2009-03-31')
JOIN
LOANMAST ON (GRPDTLS.GRPCODE = LOANMAST.GRPCODE AND GRPDTLS.ACCODE = LOANMAST.ACCODE AND LOANMAST.VCHDATE <= '2009-03-31')
Table GROUPMAS structure
GRPCODE | GRPNAME
--------| -------
1 | A
2 | B
Table GRPDTLS structure
GRPCODE | ACCODE | ACNAME | DOA
--------|--------|--------|-----
1 | 1 | name1A | 2007-07-05
1 | 2 | name2A | 2008-07-05
2 | 1 | name1B | 2007-07-06
2 | 2 | name2B | 2007-07-05
Table LOANMAST structure
LOANCODE | GRPCODE | ACCODE | VCHDATE | LANAMT
---------|---------|--------|--------- |--------
1 | 1 | 2 |2009-01-06|2000
2 | 2 | 1 |2008-09-06|5000
Table RECPDTLS structure
TXNNO | LOANCODE | INSTAMT | INTAMT | VCHDATE
------|----------|---------|--------|---------
1 | 1 | 200 | 0 | 2009-02-06
2 | 1 | 200 | 10 | 2009-03-06
3 | 2 | 500 | 0 | 2008-10-06
4 | 2 | 1500 | 50 | 2009-03-28
5 | 2 | 500 | 0 | 2010-03-28
It will output something like this
GRPCODE | GRPNAME | ACCODE | ACNAME | ADMISSION DATE | LOANCODE | LOAN SANCTION DATE | LANAMT | REPAYMENT | INTREST | LAST PAYMENT ON
--------| --------| -------| ------ | ---------------| -------- | ------------------ | -------| ----------| ------- | --------------
1 | A | 2 | name2A | 2008-07-05 | 1 |2009-01-06 | 2000 | 400 | 10 | 2009-03-06
2 | B | 1 | name1B | 2007-07-06 | 2 |2008-09-06 | 5000 | 2000 | 50 | 2009-03-28
Thanks for the help.

You can replace the sub queries in your select statement with LEFT OUTER OR INNER JOIN depending on requirements. If all LOANCODE records will have matching RECPDTLS records then use INNER JOIN else use LEFT OUTER JOIN. Keep your aggregate functions the same.
...
Repayment=SUM(RECPDTLS.INSTAMT),
Interest=SUM(RECPDTLS.INTAMT),
LastPaymentOn=MAX(RECPDTLS.VCHDATE)
...
LEFT OUTER/INNER JOIN RECPDTLS ON RECPDTLS.LOANCODE = LOANMAST.LOANCODE AND Repayment.VCHDATE <= #HighDate
...
GROUP BY
GROUPMAS.GRPCODE, GROUPMAS.GRPNAME,
GRPDTLS.ACCODE, GRPDTLS.ACNAME, GRPDTLS.DOA,
LOANMAST.LOANCODE, LOANMAST.VCHDATE,
LOANMAST.LANAMT
You will need to run the query analyzer to see the efficiency gain between the old and the new queries.
NOTE : As I said above, be sure to use LET OUTER JOIN if the LOANCODE is not required to have a RECPDTLS as an INNER JOIN will only return matches in both tables.

You can use CTE to simplify the request :
;WITH LOANMASTAGG AS
(
SELECT SUM(r.INSTAMT) REPAYMENT, SUM(r.INTAMT) INTREST, MAX(r.VCHDATE) [LAST PAYMENT ON], l.LOANCODE, l.VCHDATE, l.LANAMT, l.ACCODE, l.GRPCODE
FROM #RECPDTLS r
INNER JOIN #LOANMAST l ON r.LOANCODE = l.LOANCODE
WHERE l.VCHDATE <= '2009-03-31'
GROUP BY l.LOANCODE, l.VCHDATE, l.LANAMT, l.ACCODE, l.GRPCODE
)
SELECT
g.GRPCODE,
g.GRPNAME,
gl.ACCODE,
gl.ACNAME,
gl.DOA "ADMISSION DATE",
la.LOANCODE,
la.VCHDATE "LOAN SANCTION DATE",
la.LANAMT,
la.REPAYMENT AS REPAYMENT,
la.INTREST AS INTREST,
la.[LAST PAYMENT ON] "LAST PAYMENT ON"
FROM LOANMASTAGG la
INNER JOIN #GRPDTLS gl ON gl.GRPCODE = la.GRPCODE AND gl.ACCODE = la.ACCODE
INNER JOIN #GROUPMAS g ON (g.GRPCODE = gl.GRPCODE)
WHERE gl.DOA <= '2009-03-31'

Where to use Outer Apply

MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
I am getting the same results when LEFT JOIN and OUTER APPLY is used.
LEFT JOIN
SELECT T1.ID,T1.NAME,T2.PERIOD,T2.QTY
FROM MASTER T1
LEFT JOIN DETAILS T2 ON T1.ID=T2.ID
OUTER APPLY
SELECT T1.ID,T1.NAME,TAB.PERIOD,TAB.QTY
FROM MASTER T1
OUTER APPLY
(
SELECT ID,PERIOD,QTY
FROM DETAILS T2
WHERE T1.ID=T2.ID
)TAB
Where should I use LEFT JOIN AND where should I use OUTER APPLY

A LEFT JOIN should be replaced with OUTER APPLY in the following situations.
1. If we want to join two tables based on TOP n results
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
Here is the working : In LEFT JOIN , TOP 2 dates will be joined to the MASTER only after executing the query inside derived table D. In OUTER APPLY, it uses joining WHERE M.ID=D.ID inside the OUTER APPLY, so that each ID in Master will be joined with TOP 2 dates which will bring the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN functionality using functions.
OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
3. Retain NULL values when unpivoting
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases an APPLY can be used(either CROSS APPLY or OUTER APPLY, which is interchangeable).
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x

In your example queries the results are indeed the same.
But OUTER APPLY can do more: For each outer row you can produce an arbitrary inner result set. For example you can join the TOP 1 ORDER BY ... row. A LEFT JOIN can't do that.
The computation of the inner result set can reference outer columns (like your example did).
OUTER APPLY is strictly more powerful than LEFT JOIN. This is easy to see because each LEFT JOIN can be rewritten to an OUTER APPLY just like you did. It's syntax is more verbose, though.