What id default value of variable in T-SQL? - sql-server

In the below-given code, if I run it with uncommented "SET #LossScenarioID = NULL" it will break but with commented "SET #LossScenarioID = NULL" it goes for an infinite loop. What is the value of the variable when we defined it and does it reset the value of the variable when we define it again as we are doing it here again in a loop? What is default value of a variable in SQL?
DECLARE #LossScenario AS TABLE
(
LossScenarioId INT
, IsProcessed BIT
)
INSERT INTO #LossScenario
(
LossScenarioId
, IsProcessed
)
Values ( 220, 0)
, (221, 0)
WHILE 1=1
BEGIN
DECLARE #LossScenarioID INT
--SET #LossScenarioID = NULL
SELECT TOP 1 #LossScenarioID = LossScenarioId
FROM #LossScenario
WHERE IsProcessed = 0
IF #LossScenarioID IS NULL
BEGIN
BREAK
END
UPDATE #LossScenario
SET IsProcessed = 1
WHERE LossScenarioId = #LossScenarioID
END

The 2nd time round the loop, the #LossScenarioID variable does NOT automatically get re-initialized to NULL. And, if the TOP 1 query finds no matching results, that does NOT re-initialize the variable to NULL either. No assignment to the variable takes place - it will retain the previous ID it held, hence why it will continually loop.
This is expected, although I have seen this trip people up before! So you should definitely manually re-initialize the variable to NULL at the start of each iteration.
On a wider note, RBAR (Row-By-Agonizing-Row) operations should generally be avoided whenever possible, in favour of set-based approaches. You'll find better you get better performance, a more scalable solution, and avoid certain traps like this.

Related

SQL Server CHOOSE() function behaving unexpectedly with RAND() function

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting
You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )

WHERE clause in SQL ignoring OR

I have a stored procedure that is calculating the number of documents I have that are not in complete (1000) or canceled (1100). When I have just one of those conditionals it counts the number correctly but once I add the or, it simply grabs everything ignoring any of the logic. There must be some fundamental thing i'm missing with SQL here
SELECT
[PartnerCoId] as DisplayID
,[PartnerCompanyName] as DisplayName
,Count(*) as DocumentTotal
FROM vwDocuments
where coid = #inputCoid and DocumentType = 'order'
and Status <> 1100
and PartnerCoId <> #inputCoid
group by
[PartnerCoId]
,[PartnerCompanyName]
union all
SELECT [CoId] as DisplayID
,[CompanyName] as DisplayName
,Count(*) as DocumentTotal
FROM vwDocuments
where PartnerCoId = #inputCoid and DocumentType = 'order'
and Status <> 1100
and CoId <> #inputCoid
group by [CoId]
,[CompanyName]
order by [DisplayName]
This will return the number of documents not in canceled status. If I change the 1100 to 1000 it returns the number of documents not in complete status. Once I update the query to:
and (Status <> 1100 or Status <> 1000)
It breaks the logic.
Thoughts? I have tried quite a number of different combinations of query logic and cannot straighten this out.
Rather than wrestle with boolean logic, use not in:
and Status not in (1100, 1000)
It's easier to read and understand, because it's practically English, and because it's all in one statement you don't need brackets around it either.
If I understand you correctly, you want everything where Status is neither 1100 nor 1000.
If so, then you need this:
and (Status <> 1100 and Status <> 1000)
If you use or, then a Status of 1100 will pass the test because it is <> 1000.
I think your logic may be a little bit off. Saying (NOT complete or NOT cancelled) will return all of your documents because all rows will fit this criteria.
Take a document with status of 1100. This will be returned in the query because it evaluates to TRUE on half of the OR statement.
Try replacing that line with
AND Status NOT IN (1000,1100)
This should return only documents that are neither completed or cancelled. Hope this helps!
Let's simplify and think about your logic.
DECLARE #i INT = 1;
IF (#i <> 1 OR #i <> 2)
PRINT 'true';
ELSE
PRINT 'false';
Can you provide a value for #i that generates false? I can't think of one. Think about it:
if #i = 1, then #i <> 1 is false, but #i <> 2 is true. Since only one of these conditions has to be true for the whole condition to be true, the result is true.
if #i = 2, then #i <> 1 is true, but #i <> 2 is false. Since only one of these conditions has to be true for the whole condition to be true, the result is true.
if #i is any other value outside of 1 and 2, then both conditions are true.
As the other answers demonstrate, the way to fix this is to change OR to AND, or use NOT IN (though I like NOT IN less because when you are checking a column, and the column is NULLable, the results surprise and confuse most people - and I prefer to program consistently instead of having to be aware of the cases where something works and the cases where it really, really doesn't).

Querying all records are true in sql-server - is casting expensive performance wise

I have a table with a column of bit values. I want to write a function that returns true if all records of an associated item are true.
One way I found of doing it is:
Select #Ret = CAST(MIN(CAST(IsCapped as tinyInt)) As Bit)
from ContractCover cc
Inner join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
return #ret
But is the casting to int to get the minimum expensive? Should I instead just be querying based on say:
(count(Id) where IsCapped = 0 > 0) returning false rather than doing the multiple casts?
In the execution plan it doesn't seem like calling this function is heavy in the execution (but I'm not too familiar with analysing query plans - it just seems to have the same % cost as another section of the stored proc of like 2%).
Edit - when I execute the stored proc which calls the function and look at the execution plan - the part where it calls the function has a query cost (relative to the batch) : 1% which is comparable to other sections of the stored proc. Unless I'm looking at the wrong thing :)
Thanks!!
I would do this with an exists statement as it will jump out of the query from the moment it finds 1 record where IsCapped = 0 where as your query will always read all data.
CREATE FUNCTION dbo.fn_are_contracts_capped(#ContractVersionId int)
RETURNS bit
WITH SCHEMABINDING
AS
BEGIN
DECLARE #return_value bit
IF EXISTS(
SELECT 1
FROM dbo.ContractCover cc
JOIN dbo.ContractRiskVersion crv
ON cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
AND IsCapped = 0)
BEGIN
SET #return_value = 0
END
ELSE
BEGIN
SET #return_value = 1
END
RETURN #return_value
END
Compared to the IO required to read the data, the cast will not add a lot of overhead.
Edit: wrapped code in a scalar function.
Casting in the SELECT would be CPU and memory bound. Not sure how much in this case--under normal circumstances we usually try to optimize for IO first, and then worry about CPU and memory second. So I don't have a definite answer for you there.
That said, the problem with this particular solution to your problem is that it won't short-circuit. SQL Server will read out all rows where ContractVersionId = #ContractVersionId and IsActive = 1, convert IsCapped to an INT, and take the min, where really, you can quit as soon as you find a single row where IsCapped = 0. It won't matter much if ContactVersionId is highly selective, and only returns a very small fraction of the table, or if most rows are capped. But if ContactVersionId is not highly selective, or if a high percentage of the rows are uncapped, then you are asking SQL Server to do too much work.
Second consideration is that scalar-valued functions are a notorious performance drag in SQL Server. It is better to create as an in-line table function if possible, eg:
create function AreAllCapped(#ContractVersionId int)
returns table as return (
select
ContractVersionId = #ContractVersionId
, AreAllCapped = case when exists (
select *
from ContractCover cc
join ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
where crv.ContractVersionId = #ContractVersionId
and cc.IsActive = 1
and IsCapped = 0
)
then 0 else 1 end
)
Which you then can call using CROSS APPLY in the FROM clause (assuming SQL 2005 or later).
Final note: taking the count where IsCapped = 0 has similar problems. It's like the difference between Any() and Count() in LINQ, if you are familiar. Any() will short-circuit, Count() has to actually count all the elements. SELECT COUNT(*) ... WHERE IsCapped = 0 still has to count all the rows, even though a single row is all you need to move on.
Of course, it is a known fact that a bit column can't be passed as an argument to an aggregate function (and thus, if it needs to be passed, you have to cast it as an integer first), but bit columns can be sorted by. Your query, therefore, could be rewritten like this:
SELECT TOP 1 #Ret = IsCapped
FROM ContractCover cc
INNER JOIN ContractRiskVersion crv on cc.ContractRiskId = crv.ContractRiskId
WHERE crv.ContractVersionId = #ContractVersionId
AND cc.IsActive = 1
ORDER BY IsCapped;
Note that in this particular query it is assumed that IsCapped can't be NULL. If it can, you'll need to add an additional filter to the WHERE clause:
AND IsCapped IS NOT NULL
Unless, of course, you would actually prefer to return NULL instead of 0, if any.
As for the cost of casting, I don't really have anything to add to what has already been said by Filip and Peter. I do find it a nuisance that bit data require casting before aggregating, but that's never something of a primary concern.

Is there an Alternate for Where is Null using Where = Null?

If not, is there an alternate way to switch through SELECT statements using a CASE or IF/THEN identifier WITHOUT putting the statement in a scalar variable first?
Is there a way to format this without using IS and using an = sign for it to work?
SELECT ID FROM TABLE WHERE ID = Null
No. NULL isn't a value. Think of NULL as a condition, with IS NULL or IS NOT NULL is testing for this condition.
In this example you can test for the actual value, or lack of value represented by a conditon
WHERE
(X IS NULL OR X = #X)
OR
WHERE
(#X IS NULL OR X = #X)
Or test for your definite conditions first:
WHERE
CASE X
WHEN 1 THEN
WHEN 2 THEN
ELSE -- includes NULL
END = ...
Your question is abstract so hard to give a more precise answer.
For example, are you having problems with NOT IN and NULL? If so, use NOT EXISTS.

How to manage NULL values with numeric fields in cursor?

How to manage NULL values in numeric fields returned by cursor in Select stament, to manage efficienly aritmetic operations ?
Don't use cursors.
If you must (really?), you can use the ISNULL function:
SELECT ISNULL(fieldname, 0)
will give you a "0" (zero) instead of NULL.
ISNULL(value, replacement)
will replace value with replacement if value IS NULL
http://msdn.microsoft.com/en-us/library/ms184325.aspx
Assuming you cannot avoid a cursor in the first place, I don't understand why a NULL would be handled much differently in a variable than you would in a query - there is INSULL, COALESCE, CASE WHEN etc.
One interesting thing:
DECLARE #v as int -- initialized to NULL
{ -- loop through a cursor
FETCH NEXT INTO #v
}
You won't be able to necessarily distinguish an uninitialized #v from when the last row's #v setting was NULL.

Resources