SQL Server conditional join techniques - sql-server

I have come across this situation multiple times wherein I need to grab data from one or another table based on some parameter to the stored procedure. Let me clarify with an example. Suppose we need to grab some data from either an archived table or an online table and a bunch of other tables. I can think of 3 ways to accomplish this:
Use an if condition and store result in a temp table and then join temp table to other tables
Use an if condition and grab data either from archive table or online table and join other tables. The entire query will be duplicated except for the part of archive table or online table.
Use a union subquery
Query for Approach 1
create table #archivedOrOnline (Id int);
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
insert into #archivedOrOnline
select
at.Id
from
dbo.ArchivedTable at
end
else
begin
insert into #archivedOrOnline
select
ot.Id
from
dbo.OnlineTable ot
end
select
*
from
#archivedOrOnline ao
inner join dbo.AnotherTable at on ao.Id = at.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Query for Approach 2
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
select
*
from
dbo.ArchivedTable at
inner join dbo.AnotherTable another on at.Id = another.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
else
begin
select
*
from
dbo.OnlineTable ot
inner join dbo.AnotherTable at on ot.Id = at.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
Query for Approach 3
declare #archivedData as bit = 1;
select
*
from
(
select
m.Id
from
dbo.OnlineTable ot
where
#archivedData = 0
union
select
m.Id
from
dbo.ArchivedTable at
where
#archivedData = 1
) archiveOrOnline
inner join dbo.AnotherTable at on at.Id = archiveOrOnline.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Basically I am asking which approach to choose or if there is a better approach. Approach 2 will have a lot of duplicate code. The other 2 approaches remove code duplication. I even have the query plans but my knowledge of making decisions based on the query plan is limited. I always go with the approach which removes code duplication. If there is a performance issue, I may choose another approach.

Your approach 3 can work fine. You should definitely use UNION ALL not UNION though so SQL Server does not add operations to remove duplicates from the tables.
For best chances of success with approach 3 you would need to add an OPTION (RECOMPILE) hint so that SQL Server simplifies out the unneeded table reference at compile time at the expense of recompiling it on each execution.
If the query is executed too frequently to make that approach attractive then you may get an OK plan without it and filters with startup predicates to only access the relevant table at run time - but you may have problems with cardinality estimates with this more generic approach and it might limit the optimisations available and give you a worse plan than option 2.

If you don't mind extra unused columns in your results, you can represent such "IF"s with additional join conditions.
SELECT stuff
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
;
If the Archive and Online tables have the same fields, you can even avoid extra result fields with select expressions like COALESCE(a.field1, b.field1) AS field1
If there are following joins that are dependent on values from ArchiveTable OnlineTable, this can be simplified by performing these core joins in a subquery (at least some coalesces will be necessary though)
SELECT stuff
FROM (
SELECT m.stuff, a.stuff, o.stuff
, COALESCE(a.field1, b.field1) AS xValue
, COALESCE(a.field2, b.field2) AS yValue
, COALESCE(a.field3, b.field3) AS zValue
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
) AS coreQuery
INNER JOIN xTable AS x ON x.something = coreQuery.xValue
INNER JOIN yTable AS y ON y.something = coreQuery.yValue
INNER JOIN zTable AS z ON z.something = coreQuery.zValue
;
If there is criteria narrowing down the MainTable rows to be used, the WHERE for them should be included in the subquery to minimize the amount of Archive/Online carried out of the subquery.
If the Archive/Online table is actually the "main" table, the question's option 3 should work, but I would suggest putting any filtering criteria relevant to those tables in the their UNIONed subqueries.
If there is no filtering criteria on whatever table is "main", I would consider just maintaining two queries (or building one dynamically) so that the subqueries these approaches necessitate are not needed and will not interfere with index use.

Related

Joining Results of Stored Procedure with Temporary Table

I have two tables that I have joined together. I'd like to join the result of the joined table with the results of a stored procedure that has two variables.
I'm not sure whether or not I should create two temporary tables or another function, so I'm a little lost on where I should even start and what the easiest method would be.
Below is my first join.
SELECT *
FROM dbo.Users a WITH (NOLOCK)
JOIN Company b ON a.email = b.email
Below is my stored procedure, all it does is split one column into more rows. Split is another function. I would like to use an inner join.
SELECT a.*, b.*
FROM [dbo].[Menu] a
CROSS APPLY dbo.Split(SalesPersons, ',') b
WHERE ID = #ID AND Date = #Date
The easiest way to do this, assuming that the output from the stored procedure is deterministic would be to populate the output of the stored procedure into a temp table and then join to it.
CREATE TABLE #tmp
(
COL1 INT NOT NULL,
COL2 INT NOT NULL
)
INSERT INTO #tmp
Exec sproc_YourSproc 'Params'
SELECT *
FROM dbo.Users u
INNER JOIN dbo.Company c ON u.email = c.email
INNER JOIN #tmp t ON t.ID = c.ID
That being said, as Martin Smith said above, you probably want to move that logic into the stored procedure if possible.
Also, please don't use (NOLOCK) it doesn't really help the way most people think that it does, and it can cause some really nasty results. (Double reading rows, ghost records, ect)
If you need to be able to perform reads without causing read/write contention, I would investigate using more optimistic isolation levels, find ways to optimize the read performance to reduce possible congestion, or find indexing strategies that would make it possible to satisfy reads without locking the table itself.

Too many parameter values slowing down query

I have a query that runs fairly fast under normal circumstances. But it is running very slow (at least 20 minutes in SSMS) due to how many values are in the filter.
Here's the generic version of it, and you can see that one part is filtering by over 8,000 values, making it run slow.
SELECT DISTINCT
column
FROM
table_a a
JOIN
table_b b ON (a.KEY = b.KEY)
WHERE
a.date BETWEEN #Start and #End
AND b.ID IN (... over 8,000 values)
AND b.place IN ( ... 20 values)
ORDER BY
a.column ASC
It's to the point where it's too slow to use in the production application.
Does anyone know how to fix this, or optimize the query?
To make a query fast, you need indexes.
You need a separate index for the following columns: a.KEY, b.KEY, a.date, b.ID, b.place.
As gotqn wrote before, if you put your 8000 items to a temp table, and inner join it, it will make the query even faster too, but without the index on the other part of the join it will be slow even then.
What you need is to put the filtering values in temporary table. Then use the table to apply filtering using INNER JOIN instead of WHERE IN. For example:
IF OBJECT_ID('tempdb..#FilterDataSource') IS NOT NULL
BEGIN;
DROP TABLE #FilterDataSource;
END;
CREATE TABLE #FilterDataSource
(
[ID] INT PRIMARY KEY
);
INSERT INTO #FilterDataSource ([ID])
-- you need to split values
SELECT DISTINCT column
FROM table_a a
INNER JOIN table_b b
ON (a.KEY = b.KEY)
INNER JOIN #FilterDataSource FS
ON b.id = FS.ID
WHERE a.date BETWEEN #Start and #End
AND b.place IN ( ... 20 values)
ORDER BY .column ASC;
Few important notes:
we are using temporary table in order to allow parallel execution plans to be used
if you have fast (for example CLR function) for spiting, you can join the function itself
it is not good to use IN with many values, the SQL Server is not able to build always the execution plan which may lead to time outs/internal error - you can find more information here

Join with Or Condition

Is there a more efficient way to write this? I'm not sure this is the best way to implement this.
select *
from stat.UniqueTeams uTeam
Left Join stat.Matches match
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
OR in JOINS is a bad practice, because MSSQL can not use indexes in right way.
Better way - use two selects with UNION:
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.AwayTeam = uTeam.Id
UNION
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.HomeTeam = uTeam.id
Things to be noted while using LEFT JOIN in query:
1) First of all, left join can introduce NULL(s) that can be a performance issue because NULL(s) are treated separately by server engine.
2) The table being join as null-able should not be bulky otherwise it will be costly to execute (performance + resource).
3) Try to include column(s) that has been already indexed. Otherwise, if you need to include such column(s) than better first you build some index(es) for them.
In your case you have two columns from the same table to be left joined to another table. So, in this case a good approach would be if you can have a single table with same column of required data as I have shown below:
; WITH Match AS
(
-- Select all required columns and alise the key column(s) as shown below
SELECT match1.*, match1.AwayTeam AS TeamId FROM stat.Matches match1
UNION
SELECT match2.*, match2.HomeTeam AS TeamId FROM stat.Matches match2
)
SELECT
*
FROM
stat.UniqueTeams uTeam
OUTER APPLY Match WHERE Match.TeamId = uTeam.Id
I have used OUTER APPLY which is almost similar to LEFT OUTER JOIN but it is different during query execution. It works as Table-Valued Function that can preform better in your case.
my answer is not to the point, but i found this question seeking for "or" condition for inner join, so it maybe be useful for the next seeker
we can use legacy syntax for case of inner join:
select *
from stat.UniqueTeams uTeam, stat.Matches match
where match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
note - this query has bad perfomance (cross join at first, then filter). but it can work with lot of conditions, and suitable for dirty data research(for example t1.id=t2.id or t1.name=t2.name)

How can I perform a conditional join in mssql?

I want to join a table to one of two possible tables, depending on data. Here's an attempt that did not work, but gets the idea across, I hope. Also, this is a mocked up example that may not be very realistic, so don't get too hung up on the idea this is representing real students and classes.
SELECT *
FROM
student
INNER JOIN class ON class.student_id = student.student_id
CASE
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
END
Any ideas?
You have two join conditions and if either are true you want to commit a join - That's a boolean OR operation.
You simply need to:
RIGHT OUTER JOIN report ON (CONDITION1) OR (CONDITION2)
Let's unravel that a moment though, what is condition 1 and what is condition 2?
WHEN class.complete=0
THEN RIGHT OUTER JOIN report ON report.label_id = inprogress.class_id
WHEN class.complete=1
THEN RIGHT OUTER JOIN report ON report.label_id = completed.class_id
Here you're putting together two conditions on each of your condition 1 and 2, so your condition 1 is:
class.complete = 0 AND report.label_id = inprogress.class_id
and your condition 2 is
class.complete = 1 AND report.label_id = completed.class_id
So the completed SQL should be something like (and this is untested off the top of my head)
RIGHT OUTER JOIN report ON (
class.complete = 0 AND report.label_id = inprogress.class_id
) OR (
class.complete = 1 AND report.label_id = completed.class_id
)
Worth mentioning..
I haven't run the above join but I know from experience the execution plan on that will be absolutely abominable, won't matter if performance isn't important and or your data set is small, but if the performance matters I strongly suggest you post a broader scope of what you want here and we can talk about a better approach to getting your particular data set that won't perform so terribly. I would personally write a join like above only as a last resort or if I was hacking something truly irrelevant.
try this (untested) -
SELECT *
FROM student S
INNER JOIN class c ON c.student_id = S.student_id
left outer join report ir on ir.label_id = inprogress.class_id AND c.complete=0
left outer join report cr on cr.label_id = completed.class_id AND c.complete=1
If you want to join to either of 2 tables with reasonable performance, write a stored procedure for each path (one that joins table 1 to table A, one that joins table 1 to table B). Make a third stored procedure that calls either the 1-A stored procedure or the 1-B stored procedure. This way an efficient query plan will be performed in each case without having to make it recompile on each call or generate a strange query plan.
In your case, you actually want some records from both of the tables you might join to. In that case, you want to union the results together rather than pick one or the other (and you can combine them in one sproc if you want to, it shouldn't hurt the query plan). If you are sure the records won't duplicate between the two queries (it seems like they wouldn't), then as usual use UNION ALL for performance.

SQL Query execution shortcut OR logic?

I have three tables:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable has, at most, a few dozen records. BigTable has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.
Here's the join logic:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
Average joined size is a few thousand results. Everything shown is indexed.
For most SmallTable records, flag1 and flag2 are set to 1, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.
Is there a better way to hint to SQL Server that, if flag1 and flag2 are both set to 1, it shouldn't even bother looking at text1?
Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key query even faster.
SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.
On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?
I don't believe SQL Server will short-circuit conditions like that unfortunately.
SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.
This article on the matter is worth a read, and includes the bottom line:
...SQL Server does not do
short-circuiting like it is done in
other programming languages and
there's nothing you can do to force it
to.
Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).
It's not elegant, but it should work...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1 = 1 and s.flag2 = 1) OR
(
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
)
SQL Server usually grabs the subquery hint (though it's free to discard it):
SELECT *
FROM (
SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
) s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
...
No idea if this will be faster without test data... but it sounds like it might
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
Please let me know what happens
Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.
edit
something like this?
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
(SELECT 1 from BigTable b
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
)
Hope this works - careful of shortcut logic in case statements around aggregates but...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE 1=case when (s.flag1 = 1 and s.flag2 = 1) then 1
when (
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
) then 1
else 0 end

Resources