WHERE clause in SQL ignoring OR - sql-server

I have a stored procedure that is calculating the number of documents I have that are not in complete (1000) or canceled (1100). When I have just one of those conditionals it counts the number correctly but once I add the or, it simply grabs everything ignoring any of the logic. There must be some fundamental thing i'm missing with SQL here
SELECT
[PartnerCoId] as DisplayID
,[PartnerCompanyName] as DisplayName
,Count(*) as DocumentTotal
FROM vwDocuments
where coid = #inputCoid and DocumentType = 'order'
and Status <> 1100
and PartnerCoId <> #inputCoid
group by
[PartnerCoId]
,[PartnerCompanyName]
union all
SELECT [CoId] as DisplayID
,[CompanyName] as DisplayName
,Count(*) as DocumentTotal
FROM vwDocuments
where PartnerCoId = #inputCoid and DocumentType = 'order'
and Status <> 1100
and CoId <> #inputCoid
group by [CoId]
,[CompanyName]
order by [DisplayName]
This will return the number of documents not in canceled status. If I change the 1100 to 1000 it returns the number of documents not in complete status. Once I update the query to:
and (Status <> 1100 or Status <> 1000)
It breaks the logic.
Thoughts? I have tried quite a number of different combinations of query logic and cannot straighten this out.

Rather than wrestle with boolean logic, use not in:
and Status not in (1100, 1000)
It's easier to read and understand, because it's practically English, and because it's all in one statement you don't need brackets around it either.

If I understand you correctly, you want everything where Status is neither 1100 nor 1000.
If so, then you need this:
and (Status <> 1100 and Status <> 1000)
If you use or, then a Status of 1100 will pass the test because it is <> 1000.

I think your logic may be a little bit off. Saying (NOT complete or NOT cancelled) will return all of your documents because all rows will fit this criteria.
Take a document with status of 1100. This will be returned in the query because it evaluates to TRUE on half of the OR statement.
Try replacing that line with
AND Status NOT IN (1000,1100)
This should return only documents that are neither completed or cancelled. Hope this helps!

Let's simplify and think about your logic.
DECLARE #i INT = 1;
IF (#i <> 1 OR #i <> 2)
PRINT 'true';
ELSE
PRINT 'false';
Can you provide a value for #i that generates false? I can't think of one. Think about it:
if #i = 1, then #i <> 1 is false, but #i <> 2 is true. Since only one of these conditions has to be true for the whole condition to be true, the result is true.
if #i = 2, then #i <> 1 is true, but #i <> 2 is false. Since only one of these conditions has to be true for the whole condition to be true, the result is true.
if #i is any other value outside of 1 and 2, then both conditions are true.
As the other answers demonstrate, the way to fix this is to change OR to AND, or use NOT IN (though I like NOT IN less because when you are checking a column, and the column is NULLable, the results surprise and confuse most people - and I prefer to program consistently instead of having to be aware of the cases where something works and the cases where it really, really doesn't).

Related

What id default value of variable in T-SQL?

In the below-given code, if I run it with uncommented "SET #LossScenarioID = NULL" it will break but with commented "SET #LossScenarioID = NULL" it goes for an infinite loop. What is the value of the variable when we defined it and does it reset the value of the variable when we define it again as we are doing it here again in a loop? What is default value of a variable in SQL?
DECLARE #LossScenario AS TABLE
(
LossScenarioId INT
, IsProcessed BIT
)
INSERT INTO #LossScenario
(
LossScenarioId
, IsProcessed
)
Values ( 220, 0)
, (221, 0)
WHILE 1=1
BEGIN
DECLARE #LossScenarioID INT
--SET #LossScenarioID = NULL
SELECT TOP 1 #LossScenarioID = LossScenarioId
FROM #LossScenario
WHERE IsProcessed = 0
IF #LossScenarioID IS NULL
BEGIN
BREAK
END
UPDATE #LossScenario
SET IsProcessed = 1
WHERE LossScenarioId = #LossScenarioID
END
The 2nd time round the loop, the #LossScenarioID variable does NOT automatically get re-initialized to NULL. And, if the TOP 1 query finds no matching results, that does NOT re-initialize the variable to NULL either. No assignment to the variable takes place - it will retain the previous ID it held, hence why it will continually loop.
This is expected, although I have seen this trip people up before! So you should definitely manually re-initialize the variable to NULL at the start of each iteration.
On a wider note, RBAR (Row-By-Agonizing-Row) operations should generally be avoided whenever possible, in favour of set-based approaches. You'll find better you get better performance, a more scalable solution, and avoid certain traps like this.

CASE Statement SQL: Priority in cases?

I have a general question for when you are using a CASE statement in SQL (Server 2008), and more than one of your WHEN conditions are true but the resulting flag is to be different.
This is hypothetical example but may be transferable when applying checks across multiple columns to classify data in rows. The output of the code below is dependant on how the cases are ordered, as both are true.
DECLARE #TESTSTRING varchar(5)
SET #TESTSTRING = 'hello'
SELECT CASE
WHEN #TESTSTRING = 'hello' THEN '0'
WHEN #TESTSTRING <> 'hi' THEN '1'
ELSE 'N/A'
END AS [Output]
In general, would it be considered bad practice to create flags in this way? Would a WHERE, OR statement be better?
Case statements are guaranteed to be evaluated in the order they are written. The first matching value is used. So, for your example, the value 0 would be returned.
This is clearly described in the documentation:
Searched CASE expression:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
Returns result_expression of the first Boolean_expression that evaluates to TRUE.
If no Boolean_expression evaluates to TRUE, the Database Engine returns the else_result_expression if an ELSE clause is specified, or
a NULL value if no ELSE clause is specified.
As for whether this is good or bad practice, I would lean on the side of neutrality. This is ANSI behavior so you can depend on it, and in some cases it is quite useful:
select (case when val < 10 then 'Less than 10'
when val < 100 then 'Between 10 and 100'
when val < 1000 then 'Between 100 and 1000'
else 'More than 1000' -- or NULL
end) as MyGroup
To conclude further - SQL will stop reading the rest of the of the case/when statement when one of the WHEN clauses is TRUE. Example:
SELECT
CASE
WHEN 3 = 3 THEN 3
WHEN 4 = 4 THEN 4
ELSE NULL
END AS test
This statement returns 3 since this is the first WHEN clause to return a TRUE, even though the following statement is also a TRUE.

Why does SUM(...) on an empty recordset return NULL instead of 0?

I understand why null + 1 or (1 + null) returns null: null means "unknown value", and if a value is unknown, its successor is unknown as well. The same is true for most other operations involving null.[*]
However, I don't understand why the following happens:
SELECT SUM(someNotNullableIntegerField) FROM someTable WHERE 1=0
This query returns null. Why? There are no unknown values involved here! The WHERE clause returns zero records, and the sum of an empty set of values is 0.[**] Note that the set is not unknown, it is known to be empty.
I know that I can work around this behaviour by using ISNULL or COALESCE, but I'm trying to understand why this behaviour, which appears counter-intuitive to me, was chosen.
Any insights as to why this makes sense?
[*] with some notable exceptions such as null OR true, where obviously true is the right result since the unknown value simply does not matter.
[**] just like the product of an empty set of values is 1. Mathematically speaking, if I were to extend $(Z, +)$ to $(Z union {null}, +)$, the obvious choice for the identity element would still be 0, not null, since x + 0 = x but x + null = null.
The ANSI-SQL-Standard defines the result of the SUM of an empty set as NULL. Why they did this, I cannot tell, but at least the behavior should be consistent across all database engines.
Reference: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt on page 126:
b) If AVG, MAX, MIN, or SUM is specified, then
Case:
i) If TXA is empty, then the result is the null value.
TXA is the operative resultset from the selected column.
When you mean empty table you mean a table with only NULL values, That's why we will get NULL as output for aggregate functions. You can consider this as by design for SQL Server.
Example 1
CREATE TABLE testSUMNulls
(
ID TINYINT
)
GO
INSERT INTO testSUMNulls (ID) VALUES (NULL),(NULL),(NULL),(NULL)
SELECT SUM(ID) FROM testSUMNulls
Example 2
CREATE TABLE testSumEmptyTable
(
ID TINYINT
)
GO
SELECT SUM(ID) Sums FROM testSumEmptyTable
In both the examples you will NULL as output..

Querying all records are true in sql-server - is casting expensive performance wise

I have a table with a column of bit values. I want to write a function that returns true if all records of an associated item are true.
One way I found of doing it is:
Select #Ret = CAST(MIN(CAST(IsCapped as tinyInt)) As Bit)
from ContractCover cc
Inner join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
return #ret
But is the casting to int to get the minimum expensive? Should I instead just be querying based on say:
(count(Id) where IsCapped = 0 > 0) returning false rather than doing the multiple casts?
In the execution plan it doesn't seem like calling this function is heavy in the execution (but I'm not too familiar with analysing query plans - it just seems to have the same % cost as another section of the stored proc of like 2%).
Edit - when I execute the stored proc which calls the function and look at the execution plan - the part where it calls the function has a query cost (relative to the batch) : 1% which is comparable to other sections of the stored proc. Unless I'm looking at the wrong thing :)
Thanks!!
I would do this with an exists statement as it will jump out of the query from the moment it finds 1 record where IsCapped = 0 where as your query will always read all data.
CREATE FUNCTION dbo.fn_are_contracts_capped(#ContractVersionId int)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE #return_value bit
IF EXISTS(
SELECT 1
FROM dbo.ContractCover cc
JOIN dbo.ContractRiskVersion crv
ON cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
AND IsCapped = 0)
BEGIN
SET #return_value = 0
END
ELSE
BEGIN
SET #return_value = 1
END
RETURN #return_value
END
Compared to the IO required to read the data, the cast will not add a lot of overhead.
Edit: wrapped code in a scalar function.
Casting in the SELECT would be CPU and memory bound. Not sure how much in this case--under normal circumstances we usually try to optimize for IO first, and then worry about CPU and memory second. So I don't have a definite answer for you there.
That said, the problem with this particular solution to your problem is that it won't short-circuit. SQL Server will read out all rows where ContractVersionId = #ContractVersionId and IsActive = 1, convert IsCapped to an INT, and take the min, where really, you can quit as soon as you find a single row where IsCapped = 0. It won't matter much if ContactVersionId is highly selective, and only returns a very small fraction of the table, or if most rows are capped. But if ContactVersionId is not highly selective, or if a high percentage of the rows are uncapped, then you are asking SQL Server to do too much work.
Second consideration is that scalar-valued functions are a notorious performance drag in SQL Server. It is better to create as an in-line table function if possible, eg:
create function AreAllCapped(#ContractVersionId int)
returns table as return (
select
ContractVersionId = #ContractVersionId
, AreAllCapped = case when exists (
select *
from ContractCover cc
join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
where crv.ContractVersionId = #ContractVersionId
and cc.IsActive = 1
and IsCapped = 0
)
then 0 else 1 end
)
Which you then can call using CROSS APPLY in the FROM clause (assuming SQL 2005 or later).
Final note: taking the count where IsCapped = 0 has similar problems. It's like the difference between Any() and Count() in LINQ, if you are familiar. Any() will short-circuit, Count() has to actually count all the elements. SELECT COUNT(*) ... WHERE IsCapped = 0 still has to count all the rows, even though a single row is all you need to move on.
Of course, it is a known fact that a bit column can't be passed as an argument to an aggregate function (and thus, if it needs to be passed, you have to cast it as an integer first), but bit columns can be sorted by. Your query, therefore, could be rewritten like this:
SELECT TOP 1 #Ret = IsCapped
FROM ContractCover cc
INNER JOIN ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
ORDER BY IsCapped;
Note that in this particular query it is assumed that IsCapped can't be NULL. If it can, you'll need to add an additional filter to the WHERE clause:
AND IsCapped IS NOT NULL
Unless, of course, you would actually prefer to return NULL instead of 0, if any.
As for the cost of casting, I don't really have anything to add to what has already been said by Filip and Peter. I do find it a nuisance that bit data require casting before aggregating, but that's never something of a primary concern.

Check Constraints and Case Statement

I need help with this check constraint, I get the following error message: "Msg 102, Level 15, State 1, Line 14
Incorrect syntax near '='."
Or maybe the question I should ask is if this is possible using a check constraint
What I am trying to achieve is: If InformationRestricted is True, InformationNotRestricted cannot be true and InformationRestrictedFromLevel1, InformationRestrictedFromLevel2, InformationRestrictedFromLevel3, InformationRestrictedFromLevel4, InformationRestrictedFromLevel5 cannot be true
I am not trying to assign values to the columns, just trying to ensure the values of the columns = 0 (i.e. false) if InformationRestricted is True
Here is the script:
CREATE TABLE EmployeeData
(FirstName varchar(50),
Last Name varchar(50),
Age int,
Address varchar(100),
InformationRestricted bit,
InformationNotRestricted bit,
InformationRestrictedFromLevel1 bit,
InformationRestrictedFromLevel2 bit
InformationRestrictedFromLevel3 bit
InformationRestrictedFromLevel4 bit
InformationRestrictedFromLevel5 bit);
ALTER TABLE EmployeeData ADD CONSTRAINT ck_EmployeeData
CHECK (CASE WHEN InformationRestricted = 1 THEN InformationNotRestricted = 0 --InformationRestricted is true, InformationNotRestricted is false
AND( InformationRestrictedFromLevel1 = 0 --is false
OR InformationRestrictedFromLevel2 = 0 --is false
OR InformationRestrictedFromLevel3 = 0 --is false
OR InformationRestrictedFromLevel4 = 0 --is false
OR InformationRestrictedFromLevel5 = 0)); --is false
A CASE expression is something that returns a value of a particular data type (the type to be determined by the various datatypes of each THEN clause).
SQL Server doesn't have a boolean data type, so you can't return the result of a comparison operation.
Try adding additional comparisons into WHEN clauses, and having the THENs return either 1 or 0, if you want to allow or disallow the outcome (respectively). Then compare the overall result to 1.
I can't parse out the sense of your condition entirely, but something like:
CHECK(CASE WHEN InformationRestricted = 1 THEN
CASE WHEN InformationNotRestricted = 0 AND
(InformationRestrictedFromLevel1 = 0 --is false
OR InformationRestrictedFromLevel2 = 0 --is false
OR InformationRestrictedFromLevel3 = 0 --is false
OR InformationRestrictedFromLevel4 = 0 --is false
OR InformationRestrictedFromLevel5 = 0)
THEN 1
ELSE 0
END
--Other conditions?
END = 1)
My confusion is I'd have though you'd want to check that one and only one of the InformationRestrictedFromXXX columns would be one. In fact, from the general description, (without knowing more about your problem domain), I'd have probably just created a column InformationRestrictionLevel, of type int, with 0 meaning unrestricted, and higher values indicating the level it's restricted from.
Looks like you're not closing the case with end. The basic format of a check constraint using case is:
check(case when <condition> then 1 else 0 end = 1)
If you nest multiple cases, be sure to match the number of cases with the number of ends:
check
(
1 =
case
when <condition> then
case
when <condition> then 1
else 0
end
else 0
end
)
Formatting all elements of the same case with the same indentation can be a big help.

Resources