Where to use Outer Apply - sql-server

MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
I am getting the same results when LEFT JOIN and OUTER APPLY is used.
LEFT JOIN
SELECT T1.ID,T1.NAME,T2.PERIOD,T2.QTY
FROM MASTER T1
LEFT JOIN DETAILS T2 ON T1.ID=T2.ID
OUTER APPLY
SELECT T1.ID,T1.NAME,TAB.PERIOD,TAB.QTY
FROM MASTER T1
OUTER APPLY
(
SELECT ID,PERIOD,QTY
FROM DETAILS T2
WHERE T1.ID=T2.ID
)TAB
Where should I use LEFT JOIN AND where should I use OUTER APPLY

A LEFT JOIN should be replaced with OUTER APPLY in the following situations.
1. If we want to join two tables based on TOP n results
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
Here is the working : In LEFT JOIN , TOP 2 dates will be joined to the MASTER only after executing the query inside derived table D. In OUTER APPLY, it uses joining WHERE M.ID=D.ID inside the OUTER APPLY, so that each ID in Master will be joined with TOP 2 dates which will bring the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN functionality using functions.
OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
3. Retain NULL values when unpivoting
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases an APPLY can be used(either CROSS APPLY or OUTER APPLY, which is interchangeable).
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x

In your example queries the results are indeed the same.
But OUTER APPLY can do more: For each outer row you can produce an arbitrary inner result set. For example you can join the TOP 1 ORDER BY ... row. A LEFT JOIN can't do that.
The computation of the inner result set can reference outer columns (like your example did).
OUTER APPLY is strictly more powerful than LEFT JOIN. This is easy to see because each LEFT JOIN can be rewritten to an OUTER APPLY just like you did. It's syntax is more verbose, though.

Related

Results of join listed in rows vs additional columns?

I have 2 tables, with the same exact fields and fields names. i am trying to inner join them but im having some difficulty determining how i can get my results in my desired format.
I know i can do select a.customer, a.id, a.date, a.line, a.product, b.customer, b.id, b.date, b.line, b.product but instead of having my A data and B data on the same row, id like for them to be on seperate rows.
I have 2 tables, with the same exact fields and fields names, i am trying to inner join them so that unique line becomes a row.
Table A:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 123 | 1/1/22 | 10 | 88975 |
| 853652 | 456 | 1/10/22 | 5 | 55876 |
| 845689 | 789 | 1/25/22 | 1 | 45587 |
TABLE B:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 489 | 1/1/22 | 1 | 87574 |
| 853652 | 853 | 1/10/22 | 12 | 45678 |
| 587435 | 157 | 2/12/22 | 3 | 25896 |
DESIRED RESULTS:
|customer| id | Date | line | Product|
|--------|-----|---------|------|--------|
| 445678 | 123 | 1/1/22 | 10 | 88975 |
| 445678 | 489 | 1/1/22 | 1 | 87574 |
| 853652 | 456 | 1/10/22 | 5 | 55876 |
| 853652 | 853 | 1/10/22 | 12 | 45678 |
my query:
select a.customer, a.id, a.date, a.line, a.product
from data1 a
inner join data2 b
on a.date = b.date
and a.customer = b.customer

Divide selected value by count(*)

I have a Microsoft SQL Server with the following tables:
Projects
BookedHours (with fk_Project = Projects.ID)
Products
ProjectsToProducts (n:m with fk_Projects = Projects.ID and fk_Products = Products.ID)
I now want to select how many hours are booked to which product per month. The problem is, that one project can have multiple products (that's why I need the n:m table).
If I do the following, it will count the hours twice if a project has two products.
SELECT
P.ID AS fk_Product, MONTH(B.Datum) AS Monat, SUM(B.Hours) AS Stunden
FROM
tbl_BookedHours AS B
INNER JOIN
tbl_Projects AS M on B.fk_Project = M.ID
INNER JOIN
tbl_ProjectProduct AS PP ON PP.fk_Project = M.ID
INNER JOIN
tbl_Products AS P ON PP.fk_Product = P.ID
WHERE
YEAR(B.Datum) = 2020
GROUP BY
P.ID, MONTH(B.Datum)
ORDER BY
P.ID, MONTH(B.Datum)
I can get the number of products for each project with this SQL:
SELECT fk_Project, COUNT(*) AS Cnt
FROM tbl_ProjectProduct
GROUP By fk_MainProject
But how can I now divide the hours for each project by its individual factor and add it all up per product and month?
I could do it in my C# program or I could use a cursor and iterate through all projects, but I think there should be an more elegant way.
Edit with sample data:
|----------------| |----------------| |------------------------------|
| tbl_Projects | | tbl_Products | | tbl_ProjectProduct |
|----------------| |----------------| |------------------------------|
| ID | Name | | ID | Name | | ID | fk_Project | fk_Product |
|----+-----------| |----+-----------| |------------------------------|
| 1 | Project 1 | | 1 | Product 1 | | 1 | 1 | 1 |
| 2 | Project 2 | | 2 | Product 2 | | 2 | 1 | 2 |
| 3 | Project 3 | | 3 | Product 3 | | 3 | 2 | 1 |
| 4 | Project 4 | | 4 | Product 4 | | 4 | 3 | 3 |
|----------------| |----------------| | 5 | 4 | 1 |
| 6 | 4 | 2 |
| 7 | 4 | 4 |
|------------------------------|
|--------------------------------------|
| tbl_BookedHours |
|--------------------------------------|
| ID | fk_Project | Hours | Date |
|--------------------------------------|
| 1 | 1 | 10 | 2020-01-15 |
| 2 | 1 | 20 | 2020-01-20 |
| 3 | 2 | 10 | 2020-01-15 |
| 4 | 3 | 30 | 2020-01-18 |
| 5 | 2 | 20 | 2020-01-20 |
| 6 | 4 | 30 | 2020-01-25 |
| 7 | 1 | 10 | 2020-02-15 |
| 8 | 1 | 20 | 2020-02-20 |
| 9 | 2 | 10 | 2020-02-15 |
| 10 | 3 | 30 | 2020-03-18 |
| 11 | 2 | 20 | 2020-03-20 |
| 12 | 4 | 30 | 2020-03-25 |
|--------------------------------------|
The Result should be:
|----------------------------|
| fk_Product | Month | Hours |
|----------------------------|
| 1 | 1 | 55 |
| 2 | 1 | 25 |
| 3 | 1 | 30 |
| 4 | 1 | 10 |
| 1 | 2 | 25 |
| 2 | 2 | 15 |
| 1 | 3 | 30 |
| 2 | 3 | 10 |
| 3 | 3 | 30 |
| 4 | 3 | 10 |
|----------------------------|
For example booking Nr. 1 has to be divided by 2 (because Project 1 has two products) and one half of amount added to Product 1 and the other to Product 2 (Both in January). Booking Nr. 4 should not be divided, because Project 3 only has one product. Booking Numer 12 for example has to be divided by 3.
So that in total the Hours in the end add up to the same total.
I hope it's clearer now.
*** EDIT 2***
DECLARE #tbl_Projects TABLE (ID INT, [Name] VARCHAR(MAX))
INSERT INTO #tbl_Projects VALUES
(1,'Project 1'),
(2,'Project 2'),
(3,'Project 3'),
(4,'Project 4')
DECLARE #tbl_Products TABLE (ID INT, [Name] VARCHAR(MAX))
INSERT INTO #tbl_Products VALUES
(1,'Product 1'),
(2,'Product 2'),
(3,'Product 3'),
(4,'Product 4')
DECLARE #tbl_ProjectProduct TABLE (ID INT, fk_Project int, fk_Product int)
INSERT INTO #tbl_ProjectProduct VALUES
(1,1,1),
(2,1,2),
(3,2,1),
(4,3,3),
(5,4,1),
(6,4,2),
(7,4,4)
DECLARE #tbl_BookedHours TABLE (ID INT, fk_Project int, Hours int, [Date] Date)
INSERT INTO #tbl_BookedHours VALUES
(1,1,10,'2020-01-15'),
(2,1,20,'2020-01-20'),
(3,2,10,'2020-01-15'),
(4,3,30,'2020-01-18'),
(5,2,20,'2020-01-20'),
(6,4,30,'2020-01-25'),
(7,1,10,'2020-02-15'),
(8,1,20,'2020-02-20'),
(9,2,10,'2020-02-15'),
(10,3,30,'2020-03-18'),
(11,2,20,'2020-03-20'),
(12,4,30,'2020-03-25')
SELECT P.ID AS fk_Product, MONTH(B.Date) AS Month, SUM(B.Hours) AS SumHours
FROM #tbl_BookedHours AS B INNER JOIN #tbl_Projects AS M on B.fk_Project = M.ID
INNER JOIN #tbl_ProjectProduct AS PP ON PP.fk_Project = M.ID
INNER JOIN #tbl_Products AS P ON PP.fk_Product = P.ID
GROUP BY P.ID,MONTH(B.Date)
ORDER BY P.ID, MONTH(B.Date)
This gives me the wrong result, because it Counts the hours for both products:
| fk_Product | Month | SumHours |
|-------------------------------|
| 1 | 1 | 90 |
| 1 | 2 | 40 |
| 1 | 3 | 50 |
| 2 | 1 | 60 |
| 2 | 2 | 30 |
| 2 | 3 | 30 |
| 3 | 1 | 30 |
| 3 | 3 | 30 |
| 4 | 1 | 30 |
| 4 | 3 | 30 |
|-------------------------------|
Consider the following query. I modified your table variables to temp tables so it was easier to debug.
;WITH CTE AS
(
SELECT fk_Project, count(fk_Product) CNT
FROM #tbl_ProjectProduct
GROUP BY fk_Project
)
,CTE2 AS
(
SELECT t1.Date, t2.fk_Project, Hours/CNT NewHours
FROM #tbl_BookedHours t1
INNER JOIN CTE t2 on t1.fk_Project = t2.fk_Project
)
SELECT t4.ID fk_Product, MONTH(date) MN, SUM(NewHours) HRS
FROM CTE2 t1
INNER JOIN #tbl_Projects t2 on t1.fk_Project = t2.id
INNER JOIN #tbl_ProjectProduct t3 on t3.fk_Project = t2.ID
INNER JOIN #tbl_Products t4 on t4.ID = t3.fk_Product
GROUP BY t4.ID,MONTH(date)

Sql server- joining with a temp table does not return the right values

I am not sure why my query isn't returning the right values when I join a temp table with an actual table. I am using SQL Server 2008 as my database.
I have the below query
WITH TradeHistory AS (
SELECT
tran_num,
version_number,
row_creation AS last_update,
tran_status,
ROW_NUMBER() OVER(PARTITION BY tran_num ORDER BY tran_num DESC, row_creation DESC, version_number DESC) AS RowNum
FROM tran_history_view (NOLOCK)
where
row_creation BETWEEN '2018-01-22 17:02:47.083' AND '2019-01-23 19:02:47.083'
AND tran_status IN (3,5)
AND update_type NOT IN (15,52,24)
AND deal_tracking_num= 10738416
)
select ab.tran_num as ab_tran_num, ab.version_number as ab_version_num, hist.version_number as hist_version_num, ab.deal_tracking_num as ab_track_num,
ab.tran_status as ab_tran_status, hist.tran_status as hist_tran_status from
TradeHistory hist join ab_tran ab on ab.tran_num = hist.tran_num
and hist.RowNum =1
I am not sure why my join does not pick the matching row for tran_num and version_num? I get a different version_number and tran_status from tran_history table. If I join on version_number across the two tables, I don't get any rows (as expected based on the values in the final output).
My 'TradeHistory' temp table returns the below values
+----------+----------------+-------------------------+-------------+--------+
| tran_num | version_number | last_update | tran_status | RowNum |
+----------+----------------+-------------------------+-------------+--------+
| 10738416 | 2 | 2019-01-23 16:02:09.760 | 3 | 1 |
| 10738416 | 1 | 2019-01-23 16:01:51.803 | 3 | 2 |
| 10738422 | 4 | 2019-01-23 16:02:30.600 | 3 | 1 |
| 10738422 | 3 | 2019-01-23 16:02:30.243 | 3 | 2 |
| 10738422 | 2 | 2019-01-23 16:02:09.973 | 3 | 3 |
+----------+----------------+-------------------------+-------------+--------+
Result of output joining the select query with the temp table is as below. The version_number and tran_number from the two tables are different and I don't understand why? Can someone please explain?
+---------------+-------------+----------------+--------------+--------------+----------------+------------------+
| hist_tran_num | ab_tran_num | ab_version_num | hist_ver_num | ab_track_num | ab_tran_status | hist_tran_status |
+---------------+-------------+----------------+--------------+--------------+----------------+------------------+
| 10738416 | 10738416 | 4 | 2 | 10738416 | 10 | 3 |
| 10738422 | 10738422 | 6 | 4 | 10738416 | 10 | 3 |
+---------------+-------------+----------------+--------------+--------------+----------------+------------------+

Return one row when multiples are returned

Firstly apologies for this as a number of similar posts have been posted but I just can't seem to return what I would like
My data returns
desc | date | taken | result | text | notes | page | group | q | answer | value | state | time |
------------------------------------------------------------------------------
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Asess1 | 20170101 | John | 5 | Injury | xxx | Page1 | Assess11 | 1 | 1234567 | 1 | 1 | 0 |
Code as follows
select t.desc,a.date,a.taken,a.result,a.text,a.notes,d.page,d.group,d.q,d.answer,d.value,d.state,d.timeSpanSeconds
from cc_clientAssessments a
inner join cs_assessmentData d on a.assessmentId=d.assessment
inner join cs_clients c on c.person=a.residentId
inner join cs_facilities f on f.guid=a.facilityId
inner join cs_assessmentTypes t on t.assessmentTypeId=a.assessmentTypeId
where c.surname='smith'
and f.name='home'
and t.description ='injury'
and a.dateTaken='2017-05-28 00:00:00.000'
and d.questionName='1'
and d.answer='1234567'
order by t.desc, a.date desc,d.page,d.group,d.q
any help would be great.
One (or more) of your joins is causing this duplication, because you have not been specific enough in your join criteria.
As other commenters have said, remove all your joins and then add them back in one by one until you start to see duplicates. Using select * you can see what additional data is being pulled back and therefore what additional filters you need to include in that join. Once you have no duplication, add in the next join and repeat the whole process.
This is the sensible way to resolve this as nine times out of ten you can stop this duplication with more specific join criteria, which will also ensure your query is processing less data and will therefore be more efficient.
Although the elegant solution is to fix your joins so they don't result in as many rows, a very quick fix would be to just use distinct to eliminate duplicates and convert your text field to an string, so it can be compared.
select distinct t.desc,a.date,a.taken,a.result,substring(a.text,1,512) as text,a.notes,d.page,d.group,d.q,d.answer,d.value,d.state,d.timeSpanSeconds
from cc_clientAssessments a
inner join cs_assessmentData d on a.assessmentId=d.assessment
inner join cs_clients c on c.person=a.residentId
inner join cs_facilities f on f.guid=a.facilityId
inner join cs_assessmentTypes t on t.assessmentTypeId=a.assessmentTypeId
where c.surname='smith'
and f.name='home'
and t.description ='injury'
and a.dateTaken='2017-05-28 00:00:00.000'
and d.questionName='1'
and d.answer='1234567'
order by t.desc, a.date desc,d.page,d.group,d.q

Finding the max and min date values in SQL Server tables

I have two tables:
A lookup table (tabOne):
KEY | Group | Name | Desc | Val_Key
----------------------------------------
1 | a | NameA | DescA | 10
2 | b | NameB | DescB | 20
3 | c | NameC | DescC | 30
4 | d | NameD | DescD | 40
5 | e | NameE | DescE | 50
6 | f | NameF | DescF | 60
A second table containing readings (tabTwo):
KEY | Date | Reading | Val_Key
----------------------------------------
1 | Date | Read | 10
2 | Date | Read | 20
3 | Date | Read | 40
4 | Date | Read | 40
5 | Date | Read | 30
6 | Date | Read | 20
7 | Date | Read | 40
8 | Date | Read | 20
9 | Date | Read | 10
10 | Date | Read | 20
11 | Date | Read | 50
12 | Date | Read | 60
What I need to do is join tabTwo with TabOne and create a column with the newest Reading and a column with the oldest reading for each item in the group column of TabOne.
At the end of the day I want a table that look as follow:
KEY | Group | Name | Desc | Val_Key | LastReading | FirstReading |
-------------------------------------------------------------------------
1 | a | NameA | DescA | 10 | | |
2 | b | NameB | DescB | 20 | | |
3 | c | NameC | DescC | 30 | | |
4 | d | NameD | DescD | 40 | | |
5 | e | NameE | DescE | 50 | | |
6 | f | NameF | DescF | 60 | | |
Thanks!
Freddie
If this is Sql Server 2005 or newer, outer apply will help:
select TabOne.*,
last.Reading LastReading,
first.Reading FirstReading
from TabOne
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date desc
) last
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date asc
) first
Live test is # Sql Fiddle.
#Nikola Markovinović's solution can be made more universally applicable if the subqueries are moved directly to the main query's SELECT clause, which is possible each of them retrieves only one value and is, therefore, valid as a scalar expression:
SELECT
t1.[KEY],
t1.[Group],
t1.Name,
t1.[Desc],
t1.Val_Key,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date DESC
) AS LastReading,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date ASC
) AS FirstReading
FROM TabOne t1
If you needed e.g. dates along the way, you would probably have to stick to Nikola's solution. There is an alternative to it, but it's more cumbersome (albeit more standard too): it would involve grouping TabTwo's data by Val_Key to get earliest/latest dates per Val_Key, then joining back to TabTwo to access entire rows corresponding to the found dates to finally pull the necessary columns, and ultimately joining both result sets to TabOne to get the final column set.

Resources