SQL best alternative to a LEFT JOIN and WHERE statement? - sql-server

Basically I have two tables. Customer and Purchase table. My Problem is the Purchase Table is very large and causing performance issues, and I'm trying to keep my code organized into relevant CTE's.
I'm trying to pull all the purchase records for those who purchased a Guitar Type A or have no purchases.
I want to filter out any customer who didn't buy a GuitarType A but still keep customer who didn't buy anything.
Here's my code:
WITH Purchases AS
(
SELECT
, CustID
, GuitarType
FROM
Purchase
WHERE
GuitarType = 'A'
)
,
RelevantCustomers AS
(
SELECT
P.Custid
, P.PurchaseDate
, C.CustType
FROM
Customer
)
SELECT
Custid
, GuitarType
, PurchaseDate
FROM
Purchases AS p
INNER JOIN
RelevantCustomers AS rc ON p.CustId= rc.CustId
Customer:
+--------+-------------+----------+
| CustId | CreatedDate | CustType |
+--------+-------------+----------+
| 1 | 01/01/2017 | A |
+--------+-------------+----------+
| 2 | 01/01/2018 | B |
+--------+-------------+----------+
| 4 | 01/01/2018 | C |
+--------+-------------+----------+
Purchase
+----------+--------------+------------+
| GuitarId | PurchaseDate | GuitarType |
+----------+--------------+------------+
| 1 | 04/01/2018 | A |
+----------+--------------+------------+
| 1 | 05/01/2018 | A |
+----------+--------------+------------+
| 1 | 06/01/2018 | C |
+----------+--------------+------------+
| 2 | 06/01/2018 | A |
+----------+--------------+------------+
| 2 | 06/01/2018 | B |
+----------+--------------+------------+
| 2 | 06/01/2018 | A |
+----------+--------------+------------+
If I use INNER JOIN then it will only return those who bought Guitar Type A. If I use LEFT then it will include all Customers.
One alternative is to move the "Where GuitarType = 'A' down to the where clause and do a LEFT JOIN but this will cause my code to be unorganized and potentially some performance issues.

This might do it
SELECT rc.Custid, p.GuitarType, p.PurchaseDate
FROM RelevantCustomers rc
LEFT JOIN Purchases p
ON p.CustId = rc.CustId
LEFT JOIN Purchases pn
ON pn.CustId = rc.CustId
AND p.GuitarType != 'A'
WHERE (p.GuitarType = 'A' OR p.CustID IS NULL)
and pn.CustID is null

You appear to want:
SELECT rc.Custid, p.GuitarType, p.PurchaseDate
FROM RelevantCustomers rc LEFT JOIN
Purchases p
ON p.CustId = rc.CustId
WHERE p.GuitarType = 'A' OR p.GuitarType IS NULL;
For performance, you want an index on Purchases(CustId).

Related

SQL Sub rows - UNION vs JOIN

i would like to join 2 table but not duplicating all data.
try to be more clear
I have table_A
| ID | Description | Total
| 1. | Test a. | 10
| 2. | Test B. | 8
and my total is 18
Table_B
|ID| Site
|1 | Site A
|1 | Site B
|2 | Site C
If i do a left
Select a.ID,a.Description,b.Site,a.Total from table_a as a
left outer join table_b as b on a.id =b.it
i get
| ID | Description | Site | Total
| 1. | Test a. | Site A|. 10
| 1. | Test a. | Site B|. 10
| 2. | Test B. | Site C| 8
so my total became 28
i would like to get something like
| a.ID |b.ID| Description | Site | Total
| 1. | | Test a. | |. 10
| 1. | 1 | | Site A|
| 1. | 1 | | Site B|
| 2. | | Test B. | | 8
| 2. | 2 | | Site C|
so i can have it in excel and create a group by into the row
I think this is what you want. There are a few methods of doing this, one is using a UNION to get the 2 datasets:
SELECT aID,
bID,
description,
Site,
Total
FROM (SELECT a.ID AS aID,
b.ID AS bID,
a.description,
b.Site,
NULL AS Total
FROM dbo.TableA a
JOIN dbo.TableB b ON A.ID = B.ID
UNION ALL
SELECT a.ID AS aID,
NULL,
a.description,
NULL,
a.Total
FROM dbo.TableA a) U
ORDER BY aID,
bID,
Site;
db<>fiddle

SQL server: complex left join query

there are 2 tables:
people
+------------+--------------+------+
| name | place | pid |
+------------+--------------+------+
| Mr John | place1 | 1 |
| Miss Smith | place2 | 2 |
+------------+--------------+------+
places
+------+------+----------------------+
| pid | owner| address |
+------+------+----------------------+
| 1 | 1 | address1 |
| 1 | null | address2 |
| 2 | null | address3 |
| 2 | null | address4 |
| 2 | null | address5 |
+------+------+----------------------+
I am looking for a query which will return:
people (complex left join) places on people.pid = places.pid
Mr John | place1 | 1 | 1 | 1 | address1
Miss Smith | place2 | 2 | 2 | null | address3
Miss Smith | place2 | 2 | 2 | null | address4
Miss Smith | place2 | 2 | 2 | null | address5
In words a join on pid but if there is a non null owner value for the specific person then get only that row, if there is not a non null owner value then get all the rows for the specific person. Using left join because I need also the people with pid = null
One strategy is to pre process the places table in a CTE to identify which pid group of records have at least one non NULL owner value. Such pid records need to all be included in the join. On the other hand, if a pid group has at least one non NULL owner, then we will only include non NULL matches in the join.
WITH cte AS (
SELECT pid, owner, address,
SUM(CASE WHEN owner IS NOT NULL THEN 1 ELSE 0 END) OVER
(PARTITION BY pid) AS non_null_cnt
FROM places
)
SELECT t1.name, t1.place, t1.pid, t2.owner, t2.address
FROM people t1
LEFT JOIN cte t2
ON t1.pid = t2.pid AND
(t2.owner IS NOT NULL OR t2.non_null_cnt = 0)
ORDER BY t1.pid;
Demo
;WITH CTE as
(
SELECT count(owner)over(partition by pid) mo,*
FROM places
)
SELECT *
FROM people p
LEFT JOIN CTE
ON
p.pid = CTE.pid
and (mo = 0
or owner is not null)

Get all categories with number of associated records with where clause

So I have two tables:
Categories
-------------------
| Id | Name |
-------------------
| 1 | Category1 |
-------------------
| 2 | Category2 |
-------------------
| 3 | Category3 |
-------------------
Products
--------------------------------------------
| Id | CategoryId | Name | CreatedDate |
--------------------------------------------
| 1 | 1 | Product1 | 2017-05-05 |
--------------------------------------------
| 1 | 1 | Product2 | 2017-05-06 |
--------------------------------------------
| 2 | 2 | Product3 | 2017-12-21 |
--------------------------------------------
I need a query to select all categories along with the number of products for each for a specific time range in which those products were created (CreatedDate).
What I currently have is this:
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
WHERE p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
My issue is that I'm not seeing Category2 and Category3 in the results set because they don't pass the WHERE clause. I want to see all categories in the results set.
Put the where condition in the left join clause
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
AND p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
This way it is applied to the join only and not to the complete result set.

SQL Server Aggregation Using Partitioning

I have been trouble writing a SQL query to roll up multiple balances based on a similar ID group and display the balance against products with a flag of N. I imagine I need to use a partition function or a max funcction to do this.
The desired results are in the table below underneath the sample dataset. Would anyone have a fix for this available?
Would anyone know logic to help for this? Basically I need to group everything by the ID and where there is a flag of N roll the balances up to that record, if there is no record with a flag of N we just aggregate by pdct_Type_C.
SELECT
Client,
SUM(Limit) Limit,
SUM(Balance) Balance,
SUM(Exposure) Exposure,
MAX(CASE WHEN Flag = 'N' THEN Pdct_type_c ELSE NULL END) Pdct_type_c,
ID
FROM Table
GROUP BY Client, ID
SAMPLE DATASET
Client | Limit | Balance | Exposure | Pdct_type_c | Flag | ID
--------------------------------------------------------------------------------
John | 60,000,000.00| - | 5,000,000| DERIV | N | 2
John | - | 1,000,000.00 | - | FX | y | 2
John | - | 2,000,000.00 | - | IC | y | 2
John | 1,000,000.00 | 3,000,000.00 | - | DCO | y | 3
John | 1,000,000.00 | 3,000,000.00 | - | DCO | y | 3
CURRENT RESULTS
Client | Limit | Balance | Exposure | Pdct_type_c | Flag | ID
--------------------------------------------------------------------------------
John | 60,000,000.00| 3,000,000.00 | 5,000,000| DERIV | N | 2
John | 2,000,000.00 | 6,000,000.00 | - | NULL | Y | 3
DESIRED RESULTS
Client | Limit | Balance | Exposure | Pdct_type_c | Flag | ID
--------------------------------------------------------------------------------
John | 60,000,000.00| 3,000,000.00 | 5,000,000| DERIV | N | 2
John | 2,000,000.00 | 6,000,000.00 | - | DCO | Y | 3
It is entirely possible that this is doable with a windowing function. However here is the old fashioned way of doing it
This shows us just the records that have an N entry
SELECT
ID,
MIN(Pdct_type_c) Pdct_type_c
FROM Table
WHERE Flag = 'N'
GROUP BY ID
This outer joins to it to decide what to group on
SELECT
T.Client,
SUM(T.Limit) Limit,
SUM(T.Balance) Balance,
SUM(T.Exposure) Exposure,
ISNULL(N.Pdct_type_c, T.Pdct_type_c) Pdct_type_c
CASE WHEN N.Pdct_type_c IS NULL THEN T.Flag ELSE 'N' END Flag,
T.ID
FROM Table T
LEFT OUTER JOIN
(
SELECT
ID,
MIN(Pdct_type_c) Pdct_type_c
FROM Table
WHERE Flag = 'N'
GROUP BY ID
) N
ON T.ID = N.ID
GROUP BY T.Client, T.ID,
ISNULL(N.Pdct_type_c, T.Pdct_type_c),
CASE WHEN N.Pdct_type_c IS NULL THEN T.Flag ELSE 'N' END

SQL Server join query optimization with group by

Just for knowledge, I want to know that, can the below given query be achieve by any other feasible way like using group by.
SELECT
GROUPMAS.GRPCODE, GROUPMAS.GRPNAME,
GRPDTLS.ACCODE, GRPDTLS.ACNAME, GRPDTLS.DOA "ADMISSION DATE",
LOANMAST.LOANCODE, LOANMAST.VCHDATE "LOAN SANCTION DATE",
LOANMAST.LANAMT,
(SELECT SUM(RECPDTLS.INSTAMT)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS REPAYMENT,
(SELECT SUM(RECPDTLS.INTAMT)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS INTREST,
(SELECT MAX(RECPDTLS.VCHDATE)
FROM RECPDTLS
WHERE LOANCODE = LOANMAST.LOANCODE
AND RECPDTLS.VCHDATE <= '2009-03-31') AS "LAST PAYMENT ON"
FROM
GROUPMAS
JOIN
GRPDTLS ON (GROUPMAS.GRPCODE = GRPDTLS.GRPCODE AND GRPDTLS.DOA <= '2009-03-31')
JOIN
LOANMAST ON (GRPDTLS.GRPCODE = LOANMAST.GRPCODE AND GRPDTLS.ACCODE = LOANMAST.ACCODE AND LOANMAST.VCHDATE <= '2009-03-31')
Table GROUPMAS structure
GRPCODE | GRPNAME
--------| -------
1 | A
2 | B
Table GRPDTLS structure
GRPCODE | ACCODE | ACNAME | DOA
--------|--------|--------|-----
1 | 1 | name1A | 2007-07-05
1 | 2 | name2A | 2008-07-05
2 | 1 | name1B | 2007-07-06
2 | 2 | name2B | 2007-07-05
Table LOANMAST structure
LOANCODE | GRPCODE | ACCODE | VCHDATE | LANAMT
---------|---------|--------|--------- |--------
1 | 1 | 2 |2009-01-06|2000
2 | 2 | 1 |2008-09-06|5000
Table RECPDTLS structure
TXNNO | LOANCODE | INSTAMT | INTAMT | VCHDATE
------|----------|---------|--------|---------
1 | 1 | 200 | 0 | 2009-02-06
2 | 1 | 200 | 10 | 2009-03-06
3 | 2 | 500 | 0 | 2008-10-06
4 | 2 | 1500 | 50 | 2009-03-28
5 | 2 | 500 | 0 | 2010-03-28
It will output something like this
GRPCODE | GRPNAME | ACCODE | ACNAME | ADMISSION DATE | LOANCODE | LOAN SANCTION DATE | LANAMT | REPAYMENT | INTREST | LAST PAYMENT ON
--------| --------| -------| ------ | ---------------| -------- | ------------------ | -------| ----------| ------- | --------------
1 | A | 2 | name2A | 2008-07-05 | 1 |2009-01-06 | 2000 | 400 | 10 | 2009-03-06
2 | B | 1 | name1B | 2007-07-06 | 2 |2008-09-06 | 5000 | 2000 | 50 | 2009-03-28
Thanks for the help.
You can replace the sub queries in your select statement with LEFT OUTER OR INNER JOIN depending on requirements. If all LOANCODE records will have matching RECPDTLS records then use INNER JOIN else use LEFT OUTER JOIN. Keep your aggregate functions the same.
...
Repayment=SUM(RECPDTLS.INSTAMT),
Interest=SUM(RECPDTLS.INTAMT),
LastPaymentOn=MAX(RECPDTLS.VCHDATE)
...
LEFT OUTER/INNER JOIN RECPDTLS ON RECPDTLS.LOANCODE = LOANMAST.LOANCODE AND Repayment.VCHDATE <= #HighDate
...
GROUP BY
GROUPMAS.GRPCODE, GROUPMAS.GRPNAME,
GRPDTLS.ACCODE, GRPDTLS.ACNAME, GRPDTLS.DOA,
LOANMAST.LOANCODE, LOANMAST.VCHDATE,
LOANMAST.LANAMT
You will need to run the query analyzer to see the efficiency gain between the old and the new queries.
NOTE : As I said above, be sure to use LET OUTER JOIN if the LOANCODE is not required to have a RECPDTLS as an INNER JOIN will only return matches in both tables.
You can use CTE to simplify the request :
;WITH LOANMASTAGG AS
(
SELECT SUM(r.INSTAMT) REPAYMENT, SUM(r.INTAMT) INTREST, MAX(r.VCHDATE) [LAST PAYMENT ON], l.LOANCODE, l.VCHDATE, l.LANAMT, l.ACCODE, l.GRPCODE
FROM #RECPDTLS r
INNER JOIN #LOANMAST l ON r.LOANCODE = l.LOANCODE
WHERE l.VCHDATE <= '2009-03-31'
GROUP BY l.LOANCODE, l.VCHDATE, l.LANAMT, l.ACCODE, l.GRPCODE
)
SELECT
g.GRPCODE,
g.GRPNAME,
gl.ACCODE,
gl.ACNAME,
gl.DOA "ADMISSION DATE",
la.LOANCODE,
la.VCHDATE "LOAN SANCTION DATE",
la.LANAMT,
la.REPAYMENT AS REPAYMENT,
la.INTREST AS INTREST,
la.[LAST PAYMENT ON] "LAST PAYMENT ON"
FROM LOANMASTAGG la
INNER JOIN #GRPDTLS gl ON gl.GRPCODE = la.GRPCODE AND gl.ACCODE = la.ACCODE
INNER JOIN #GROUPMAS g ON (g.GRPCODE = gl.GRPCODE)
WHERE gl.DOA <= '2009-03-31'

Resources