Optimize multiple joins with table functions

Optimize multiple joins with table functions - sql-server

I would like to join several times with the same table function for different input variables in the same query. But this turns in my case out to be much slower than using table variables and selecting from the table functions separately.
How can I avoid table variables and still have a fast query?
For example, we have a SQL query like
SELECT P.ProjectName, A.Number, B.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateA) AS A
ON P.ProjectID = A.ProjectID
LEFT JOIN dbo.fn_ProjectNumber(#dateB) AS B
ON P.ProjectID = B.ProjectID
but it is much slower than selecting from the functions separately into variables and then joining later, for example:
INSERT INTO #tempA
SELECT P.ProjectID, A.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateA) AS A
ON P.ProjectID = A.ProjectID
INSERT INTO #tempB
SELECT P.ProjectID, B.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateB) AS B
ON P.ProjectID = B.ProjectID
SELECT P.ProjectName, A.Number, B.Number
FROM Project AS P
LEFT JOIN #tempA AS A
ON P.ProjectID = A.ProjectID
LEFT JOIN #tempA AS B
ON P.ProjectID = B.ProjectID
What could be the cause of this? Is there a way I can get a fast query and avoid table variables?
More details:
This is only an example similar to what I'm doing, but the function fn_ProjectNumber(#date datetime) would contain something like joins between four tables...

Try fixing the join, you refer to the wrong alias in the second LEFT JOIN:
ORIGINAL:
SELECT P.ProjectName, A.Number, B.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateA) AS A
ON P.ProjectID = A.ProjectID
LEFT JOIN dbo.fn_ProjectNumber(#dateB) AS B
ON P.ProjectID = A.ProjectID
FIXED:
SELECT P.ProjectName, A.Number, B.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateA) AS A
ON P.ProjectID = A.ProjectID
LEFT JOIN dbo.fn_ProjectNumber(#dateB) AS B
ON P.ProjectID = B.ProjectID --<<<<<

Is there any particular reason you're trying to avoid table variables? They can be a good optimisation technique and don't leave any temp objects to clean up.
Anyway, if you don't want to do it that way you could always try
SELECT
P.ProjectID, A.Number, B.Number
FROM
Project AS P
LEFT JOIN
(SELECT P.ProjectID, A.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateA) AS A
ON P.ProjectID = A.ProjectID
) AS A
ON P.ProjectID = A.ProjectID
LEFT JOIN
(SELECT P.ProjectID, B.Number
FROM Project AS P
LEFT JOIN dbo.fn_ProjectNumber(#dateB) AS B
ON P.ProjectID = B.ProjectID
) AS B
ON P.ProjectID = B.ProjectID

The joins are slow because, in your example query, you are calling each function once for each row in table Project. It's faster with the temp tables because you're only calling the function once.
One way to avoid temp tables would be to use CTEs (common table expressions--they're not just for recursion--available in SQL 2005 and up.). The general syntax would be something like:
WITH cteTempName (<list of columns>)
as (<your table function call>)
SELECT <your query here, with "cteTempName" appearing as just another table to select from>

Maybe the joins are slower because you haven't defined relations between the tables that are joined together?
I don't know much about performance of queries in SQL Server, but defining the relations will improve the performance of joins.

Related

SQL to find which table has diag_code filled, and then look it up in lookup_table

I have a query and a diag_code is either in one table (UM_SERVICE) or the other (LOS), but I can't join both tables to get diag_code that isn't null, that I can think of. Does this look ok for finding if diag_code is in one of the tables and lookup table? It's possible to have both LOS and UM_SERVICE have a diag code on different rows, and they could be different, and both or one could be in the lookup table. I'm not seeing anything in internet search.
Here's a simplified stored procedure:
SELECT distinct
c.id
,uc.id
,c.person_id
FROM dbo.CASES c
INNER JOIN dbo.UM_CASE uc with (NOLOCK) ON uc.case_id = c.id
LEFT JOIN dbo.UM_SERVICE sv (NOLOCK) ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT usc on usc.service_id = sv.id
LEFT JOIN dbo.LOS S WITH (NOLOCK) ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION SC WITH (NOLOCK) ON SC.los_id = S.id
INNER JOIN dbo.PERSON op with (NOLOCK) on op.id = c.Person_id
WHERE
(sv.diag_code is not null and c.case_id = sv.case_id
or
s.diag_code is not null and c.case_id = s.case_id)
and
(sv.diag_code is not null and sv.diag_code in (select diag_code from TABLE_LOOKUP)
or
s.diag_code is not null and s.diag_code in (select diag_code from TABLE_LOOKUP)
Table setups like this:
CASES
id person_id
UM_CASE
case_id
LOS
case_id id
LOS_EXTENSION
los_id
Person
id cid
UM_SERVICE
case_id diag_code
UM_SERVICE_CERT
service_id id
TABLE_LOOKUP
diag_code

Since you have two different searches being run, it is going to be much easier to write/read by writing the searches individually and then bringing your two results sets together using the UNION operator. The UNION will eliminate duplicates across the two result sets in a similar manner to what your usage of SELECT DISTINCT is doing for a single result set.
Like so:
/*first part of union performs seach using filter on dbo.UM_SERVICE*/
SELECT
c.id
,uc.id
,c.person_id
FROM
dbo.CASES AS c
INNER JOIN dbo.UM_CASE AS uc ON uc.case_id=c.id
LEFT JOIN dbo.UM_SERVICE AS sv ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT AS usc on usc.service_id=sv.id
LEFT JOIN dbo.LOS AS S ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION AS SC ON SC.los_id= S.id
INNER JOIN dbo.PERSON AS op on op.id=c.Person_id
WHERE
sv.diag_code in (select diag_code from TABLE_LOOKUP) /*will eliminate null values in sv.diag_code*/
UNION /*deduplicate result sets*/
/*second part of union performs search using filter on dbo.LOS*/
SELECT
c.id
,uc.id
,c.person_id
FROM
dbo.CASES AS c
INNER JOIN dbo.UM_CASE AS uc ON uc.case_id=c.id
LEFT JOIN dbo.UM_SERVICE AS sv ON sv.case_id = omc.case_id
LEFT JOIN dbo.UM_SERVICE_CERT AS usc on usc.service_id=sv.id
LEFT JOIN dbo.LOS AS S ON S.case_id = UC.case_id
LEFT JOIN dbo.LOS_EXTENSION AS SC ON SC.los_id= S.id
INNER JOIN dbo.PERSON AS op on op.id=c.Person_id
WHERE
s.diag_code in (select diag_code from TABLE_LOOKUP); /*will eliminate null values in s.diag_code*/

Multiple Nested Inner Joins: not all records are shown

I have difficulty joining two tables that look like the following:
The main table PMEOBJECT which has a unique key named OBJECTID and
has in total 12768 rows.
Then I want to join PMEOBJECTVALIDITY on it which has an n:1 relationship with PMEOBJECT, since it has more rows,
because it saves the changes over time of PMEOBJECT (i.e. when a certain object is not
valid anymore), this one has 12789 rows (meaning only 21 objects
changed over time). However, I only want to have the current last
VALIDFROM date shown in the query. This all works fine.
Then the trouble starts when I want to join PMEOBJECTDIMENSION, which has an
n:1 relationship with PMEOBJECTVALIDITY and has 36737 rows in total.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
LEFT JOIN PMEOBJECTDIMENSION
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
Results in query per step:
SELECT PMEOBJECT: 12768 rows
LEFT JOIN PMEVALIDITY: 12789 rows
INNER JOIN PMEVALIDITY: 12768 rows
LEFT JOIN PMEOBJECTDIMENSION: 36737 rows
INNER JOIN PMEOBJECTDIMENSION: 12729 rows
I want the end result again to have the same 12768 rows, I don't want any ObjectId to be left out.
What am I missing here?
Kind regards,
Igor

Following might help:
from PMEOBJECTDIMENSION onwards:
LEFT JOIN (SELECT PMEOBJECTDIMENSION.OBJECTVALIDITYID, PMEOBJECTDIMENSION.DATAAREAID
FROM PMEOBJECTDIMENSION
INNER JOIN(SELECT OBJECTVALIDITYID, MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
)X
ON X.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND X.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
and select the distinct records if duplicates present.

The INNER JOINs are filtering out records- what you want is that the LEFT JOIN table (PMEOBJECTVALIDITY and PMEOBJECTDIMENSION) should only include records that have at least a match on the INNER JOIN queries (alias B and C). You can accomplish this with by nesting the INNER JOIN with the LEFT JOIN, generally done as follows:
SELECT *
FROM A
LEFT JOIN B
INNER JOIN C
ON B.ID = C.BID
ON A.ID = B.AID
Now B is INNER JOINed on C and will only contain records that have a match in C, but will preserve the LEFT JOIN not remove any records from A.
In your case, you can simply move the ON clause from the LEFT JOIN to the end of the following INNER JOIN.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID --here it is!
LEFT JOIN PMEOBJECTDIMENSION
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID --I'm here

join all at once vs creating temp tables

I have two queries:
1)
select a.*
,b.*
,c.*
into final_table
from table_a as a
left join table_b as b
on a.id_1 = b.id_1
left join table_c as c
on a.id_2 = c.id_2;
2)
select a.*
,b.*
into #temp_1
from table_a as a
left join table_b as b
on a.id_1 = b.id_1;
select t.*
,c.*
into final_table
from #temp_1 as t
left join table_c as c
on t.id_2 = c.id_2;
I observe that second query runs significantly faster on SQL Server 2012. Is is by accident, or there is a good reason for it?

Ambiguous left joins in MS Access

I want to convert the following query from T-SQL
SELECT
*
FROM
A LEFT JOIN
B ON A.field1 = B.field1 LEFT JOIN
C ON C.field1 = A.field2 AND
C.field2 = B.field2
to Jet SQL. Now MS Access does not accept ambiguous queries. How can I do that? I can't put the second comparison in the WHERE clause. Why? Because my scenario is that I am selecting records that does not exist in C.
How to select all records from one table that do not exist in another table?
Now, how do you that in MS Access? Thanks in advance for your time and expertise.

You need a derived table to make this work in MS Access:
SELECT *
FROM (
SELECT A.Field1, A.Field2 As A2, B.Field2
FROM A
LEFT JOIN B ON A.field1 = B.field1) AS x
LEFT JOIN C ON x.A2 = C.field1 AND x.field2= C.field2

From Help LEFT JOIN, RIGHT JOIN Operations
You can link multiple ON clauses. See the discussion of clause linking
in the INNER JOIN topic to see how this is done.
You can also link several ON clauses in a JOIN statement, using the
following syntax:
SELECT fields
FROM table1
INNER JOIN table2 ON table1.field1 compopr table2.field1
AND ON table1.field2 compopr table2.field2)
OR ON table1.field3 compopr table2.field3)];
But works this (it seems there is an error in help):
SELECT *
FROM A
LEFT JOIN B ON A.field1 = B.field1
LEFT JOIN C ON (C.field1 = A.field2 AND C.field2 = B.field2)

Cascade of outer joins

As a schematic example, I have 3 tables that I desire to join, A,B,C where A to B is joined via an outer join and B to C is potentially joined via an inner join. In this constellation, I have to write two outer joins to get data if the first join does not have a match A-B in a line:
SELECT [fields] FROM
A
LEFT OUTER JOIN
B ON [a.field]=[b.field]
LEFT OUTER JOIN
C ON [b.field]=[c.field]
It seems to me logically that I have to write the second statement as an outer join. However I'm curious if there is a possiblity to set brackets for the join scope to signal that the second join should only used if the first inner join has found matching data for A-B. Something like:
SELECT [fields] FROM
A
(LEFT OUTER JOIN
B ON [a.field]=[b.field]
INNER JOIN
C ON [b.field]=[c.field]
)
I have played around a little but not found a possiblity to set brackets. The only way I have found to make this working is with a sub-query. Is this the only way to go?

Actually there is a syntax for that case.
SELECT fields
FROM A
LEFT OUTER JOIN (
B INNER JOIN C ON b.field = c.field
) ON a.field = b.field
The parentheses are optional, and the result is the same without them, being equivalent to the result of
SELECT fields
FROM A
LEFT OUTER JOIN B ON a.field = b.field
LEFT OUTER JOIN C ON b.field = c.field

You could perform it as follows
SELECT Fields
FROM TableA a
LEFT OUTER JOIN (SELECT Fields
FROM TableB b
INNER JOIN TableC c ON b.Field = c.Field) x on a.Field = x.Fi
eld
Not too sure if there would be any performance benefit to this without testing it out though.

The 2nd way would be with a subquery as per Jon Bridges' answer
However, they are the same semantically.
A CTE could be used if you have a complex subquery
;WITH BjoinC AS
(
SELECT Fields
FROM TableB b
INNER JOIN
TableC c ON b.Field = c.Field
)
SELECT [fields] FROM
A
LEFT OUTER JOIN
BjoinC ON ...

What About something like:
SELECT [fields]
FROM A
LEFT JOIN ( SELECT DISTINCT [fields]
FROM B
LEFT JOIN C ON b.field = c.field
) on a.field = b.field

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Optimize multiple joins with table functions - sql-server

Maybe the joins are slower because you haven't defined relations between the tables that are joined together? I don't know much about performance of queries in SQL Server, but defining the relations will improve the performance of joins.

Related

SQL to find which table has diag_code filled, and then look it up in lookup_table

Multiple Nested Inner Joins: not all records are shown

join all at once vs creating temp tables

Ambiguous left joins in MS Access

Cascade of outer joins

Categories

Resources