SQL Server - JOIN ON conditional statement - sql-server

I have a specific problem where the JOIN ON can be based on one of two statements:
LEFT JOIN acc_seminar.t_Seminar_Gebühr semg ON
CASE
WHEN #Bool = 1
THEN ss1.TNOrder = semg.SemG_TN OR (ss1.TNOrder > #MaxTN AND semg.SemG_TN = #MaxTN)
ELSE
semg.SemG_TN = 1
END
As you can see, if a variable has a value equal to 1 then it should left join on one statement or join on the other if the variable value is not equal to 1.
As far as Googling tells me, something like this is not possible in SQL because CASE returns a value, not a statement. How could I change this to make it logically work as shown above?

CASE is an expression in T-SQL; there is no Case Statement in the language.
What you need here is just "normal" Boolean Logic:
LEFT JOIN acc_seminar.t_Seminar_Gebühr semg ON (#Bool = 1
AND ss1.TNOrder = semg.SemG_TN
OR (ss1.TNOrder > #MaxTN AND semg.SemG_TN = #MaxTN))
OR (#Bool = AND semg.SemG_TN = 1)
Note that Joins like this can be bad for performance due to bad query plan caching. As a result using Dynamic SQL, or OPTION (RECOMPILE) can help avoid using the wrong cached plan, or recreating it each time (respectively). Personally, I prefer the dynamic approach.

Related

Why case..when get a table scan ? how to workarround

When I use CASE .. WHEN .. END I get an index scan less efficient than the index seek.
I have complex business rules I need to use the CASE, is there any workaround ?
Query A:
select * from [dbo].[Mobile]
where((
CASE
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
END
) = 1)
This query gets an index scan and 199 logical reads.
Query B:
select * from [dbo].[Mobile]
where ([MobileNumber] = (LTRIM(RTRIM('987654321'))))
This query gets an index seek and 122 logical reads.
For the table
CREATE TABLE #T(X CHAR(1) PRIMARY KEY);
And the query
SELECT *
FROM #T
WHERE CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1;
It is apparent without that much thought that the only circumstances in which the CASE expression evaluates to 1 are when X = 'A' and that the query has the same semantics as
SELECT *
FROM #T
WHERE X = 'A';
However the first query will get a scan and the second one a seek.
The SQL Server optimiser will try all sorts of relational transformations on queries but will not even attempt to rearrange expressions such as CASE WHEN X = 'A' THEN 1 ELSE 0 END = 1 to express it as an X = expression so it can perform an index seek on it.
It is up to the query writer to write their queries in such a way that they are sargable.
There is no workaround to get an index seek on column MobileNumber with your existing CASE predicate. You just need to express the condition differently (as in your example B).
Potentially you could create a computed column with the CASE expression and index that - and you could then see an index seek on the new column. However this is unlikely to be useful to you as I assume in reality the mobile number 987654321 is dynamic and not something to be hardcoded into a column used by an index.
After cleaning up and fixing your code, you have a WHERE which is boolean expression based around a CASE.
As mentioned by #MartinSmith, there is simply no way SQL Server will re-arrange this. It does not do the kind of dynamic slicing that would allow it to re-arrange the first query into the second version.
select *
from [dbo].[Mobile]
where
CASE
WHEN [MobileNumber] = LTRIM(RTRIM('987654321'))
THEN 1
END
= 1
You may ask: the second version also has an expression in it, why does this not also get a scan?
select *
from [dbo].[Mobile]
where [MobileNumber] = LTRIM(RTRIM('987654321'))
The reason is that what SQL Server can recognize is that LTRIM(RTRIM('987654321')) is a deterministic constant expression: it does not change depending on runtime settings, nor on the result of in-row calculations.
Therefore, it can optimize by calculating it at compile time. The query therefore becomes this under the hood, which can be used against an index on MobileNumber.
select *
from [dbo].[Mobile]
where [MobileNumber] = '987654321'

Is there a way to compare more than one value of dataset against a single value of another dataset in left outer join(Flink)

I am trying to find out a way to check whether two values of a dataset can be checked against one value of another dataset using Flink Left Outer Join?
final DataSet<type> finalDataSet = dataSet1
.leftOuterJoin(dataSet2)
.where("value1")
.equalTo("value2")
.with(new FunctionNameToBeImplemented())
.name("StepName");
This works fine for a one to one check.
Can there be a way to do something similar:
final DataSet<type> finalDataSet = dataSet1
.leftOuterJoin(dataSet2)
.where(["value1","value2"]) // List of values
.contains("value2")
.with(new FunctionNameToBeImplemented())
.name("StepName");
I expect the output to check value1 and then value2 and if any (or both) matches, pass it to the function "FunctionNameToBeImplemented()" for further processing.
The outer join in Flink's DataSet API are strictly equality joins.
You can implement your use case with two separate joins and union the result. In order to avoid duplicates, on of the join functions should check if the other condition applies as well and only produce a result if it does not.
left -\
> JOIN(l.val1 == r.val2)[emit result] ---------------------\
right -/ \
> UNION
left -\ /
> JOIN(l.val2 == r.val2)[emit result if l.val1 != r.val2) -/
right -/

Getting a warning about null values being eliminated from a SET, but there is no null value

I know what this warning means, however in this case there are definitely no null values.
The result also appears to be correct, but the warning makes me curious (and concerned) since it means there's something in play that I don't understand.
The warning is:
Warning: Null value is eliminated by an aggregate or other SET operation.
Here's the statement:
UPDATE Contacts
SET IsActiveCampaignClient = 1,
NeedsActiveCampaignSync = 1
FROM (
SELECT dbo.Contacts.ID
FROM dbo.Contacts
LEFT OUTER JOIN dbo.Order_Batches ON dbo.Order_Batches.EnteredByContactID = dbo.Contacts.ID
GROUP BY dbo.Contacts.ID
HAVING (COUNT(dbo.Order_Batches.ID) > 0)
) i
WHERE i.ID = Contacts.ID
As you can see, this is an UPDATE statement where I needed to do some join logic, and the COUNT() statement must be the problem. But if I take that nested SELECT out and run it by itself, there are no nulls in the result:
ID
37
39
52
54
79
81
I assume there's something about how this nested select works that I don't understand. I've tried looking at execution plans and pulling this apart in various ways to reveal a null value or some other problem. I've tried making tweaks to the statement to try and get a null to appear - no luck.
So, to be clear, I would like to understand why this message occurs, but only when the query is nested inside an UPDATE statement's FROM clause.
Does not need aggregation if you are ensuring that there is at least an order by the contact to update this. Use EXISTS instead.
UPDATE c
SET IsActiveCampaignClient = 1,
NeedsActiveCampaignSync = 1
FROM Contacts c
WHERE exists (
(
SELECT 1 -- making sure there is at least an order
FROM Order_Batches o
where o.EnteredByContactID = c.ID
)

Using SQL to find entries that were originally X and later changed to Y

I recently started using SQL for work and don't have much experience of it so I'm sorry if this is a ridiculous question.
I'm looking for an entry that was originally listed as X but was then later changed to Y, I figure that a nested sub query is the way to go but the one I'm trying doesn't seem to use the nested bit.
Here is the code I'm trying
SELECT *
FROM [HOME].[dba].[ARCHIVE]
where FRIE like 'AR8%'
and RESULT = 'X'
and EXISTS(SELECT FRIE, RESULT
FROM [HOME].[dba].[ARCHIVE]
where RESULT = 'Y');
Everything as far as the EXISTS works but afterwards it just ignores the nested query
Your query doesn't have the same WHERE clause in the EXISTS portion. I think this will work for you:
SELECT *
FROM [HOME].[dba].[ARCHIVE]
WHERE FRIE like 'AR8%'
AND RESULT = 'X'
AMD EXISTS(SELECT TOP 1 1
FROM [HOME].[dba].[ARCHIVE]
where FRIE like 'AR8%' AND RESULT = 'Y');
I'd recommend using an INNER JOIN to a subquery rather than using an EXISTS statement. Something like this:
SELECT *
FROM [HOME].[dba].[ARCHIVE] a
INNER JOIN (SELECT FRIE
FROM [HOME].[dba].[ARCHIVE]
WHERE RESULT = 'Y') t1 ON a.FRIE = t1.FRIE
WHERE
FRIE like 'AR8%'
and RESULT = 'X'
That would return all rows from ARCHIVE where they there is a row with the same FRIE with a RESULT of X and a RESULT of Y.
Hopefully that helps.

Querying all records are true in sql-server - is casting expensive performance wise

I have a table with a column of bit values. I want to write a function that returns true if all records of an associated item are true.
One way I found of doing it is:
Select #Ret = CAST(MIN(CAST(IsCapped as tinyInt)) As Bit)
from ContractCover cc
Inner join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
return #ret
But is the casting to int to get the minimum expensive? Should I instead just be querying based on say:
(count(Id) where IsCapped = 0 > 0) returning false rather than doing the multiple casts?
In the execution plan it doesn't seem like calling this function is heavy in the execution (but I'm not too familiar with analysing query plans - it just seems to have the same % cost as another section of the stored proc of like 2%).
Edit - when I execute the stored proc which calls the function and look at the execution plan - the part where it calls the function has a query cost (relative to the batch) : 1% which is comparable to other sections of the stored proc. Unless I'm looking at the wrong thing :)
Thanks!!
I would do this with an exists statement as it will jump out of the query from the moment it finds 1 record where IsCapped = 0 where as your query will always read all data.
CREATE FUNCTION dbo.fn_are_contracts_capped(#ContractVersionId int)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE #return_value bit
IF EXISTS(
SELECT 1
FROM dbo.ContractCover cc
JOIN dbo.ContractRiskVersion crv
ON cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
AND IsCapped = 0)
BEGIN
SET #return_value = 0
END
ELSE
BEGIN
SET #return_value = 1
END
RETURN #return_value
END
Compared to the IO required to read the data, the cast will not add a lot of overhead.
Edit: wrapped code in a scalar function.
Casting in the SELECT would be CPU and memory bound. Not sure how much in this case--under normal circumstances we usually try to optimize for IO first, and then worry about CPU and memory second. So I don't have a definite answer for you there.
That said, the problem with this particular solution to your problem is that it won't short-circuit. SQL Server will read out all rows where ContractVersionId = #ContractVersionId and IsActive = 1, convert IsCapped to an INT, and take the min, where really, you can quit as soon as you find a single row where IsCapped = 0. It won't matter much if ContactVersionId is highly selective, and only returns a very small fraction of the table, or if most rows are capped. But if ContactVersionId is not highly selective, or if a high percentage of the rows are uncapped, then you are asking SQL Server to do too much work.
Second consideration is that scalar-valued functions are a notorious performance drag in SQL Server. It is better to create as an in-line table function if possible, eg:
create function AreAllCapped(#ContractVersionId int)
returns table as return (
select
ContractVersionId = #ContractVersionId
, AreAllCapped = case when exists (
select *
from ContractCover cc
join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
where crv.ContractVersionId = #ContractVersionId
and cc.IsActive = 1
and IsCapped = 0
)
then 0 else 1 end
)
Which you then can call using CROSS APPLY in the FROM clause (assuming SQL 2005 or later).
Final note: taking the count where IsCapped = 0 has similar problems. It's like the difference between Any() and Count() in LINQ, if you are familiar. Any() will short-circuit, Count() has to actually count all the elements. SELECT COUNT(*) ... WHERE IsCapped = 0 still has to count all the rows, even though a single row is all you need to move on.
Of course, it is a known fact that a bit column can't be passed as an argument to an aggregate function (and thus, if it needs to be passed, you have to cast it as an integer first), but bit columns can be sorted by. Your query, therefore, could be rewritten like this:
SELECT TOP 1 #Ret = IsCapped
FROM ContractCover cc
INNER JOIN ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
ORDER BY IsCapped;
Note that in this particular query it is assumed that IsCapped can't be NULL. If it can, you'll need to add an additional filter to the WHERE clause:
AND IsCapped IS NOT NULL
Unless, of course, you would actually prefer to return NULL instead of 0, if any.
As for the cost of casting, I don't really have anything to add to what has already been said by Filip and Peter. I do find it a nuisance that bit data require casting before aggregating, but that's never something of a primary concern.

Resources