SQL Cast Mystery - sql-server

I have a real mystery with the T-SQL below. As it is, it works with either the DATAP.Private=1, or the cast as int on Right(CRS,1). That is, if I uncomment the DATAP.Private=1, I get the error Conversion failed when converting the varchar value 'M' to data type int, and if I then remove that cast, the query works again. With the cast in place, the query only works without the Private=1.
I cannot for the life of me see how the Private=1 can add anything to the result set that will cause the error, unless Private is ever 'M', but Private is a bit field!
SELECT
cast(Right(CRS,1) as int) AS Company
, cast(PerNr as int) AS PN
, Round(Sum(Cost),2) AS Total_Cost
FROM
DATAP
LEFT JOIN BU_Summary ON DATAP.BU=BU_Summary.BU
WHERE
DATAP.Extension Is Not Null
--And DATAP.Private=1
And Left(CRS,2)='SB'
And DATAP.PerNr Between '1' And '9A'
and Right(CRS,1) <> 'm'
GROUP BY
cast(Right(CRS,1) as int)
, cast(PerNr as int)
ORDER BY
cast(PerNr as int)

I've seen something like this in the past. It's possible the DATAP.Private = 1 clause is generating a query plan that is performing the CRS cast before the Right(CRS,1) <> 'm' filter is applied.
It sure shouldn't do that, but I've had similar problems in T-SQL I've written, particularly when views are involved.
You might be able to reorder the elements to get the query to work, or select uncast data values into a temporary table or table variable and then select and cast from there in a separate statement as a stopgap.
If you check the execution plan it might shed some light about what is being calculated where. This might give you more ideas as to what you might change.

Just a guess, but it may be that when Private = 1, PerNr cannot be anything but a castable number in your data (as it is in the PerNr can equal '9A [or whatever else]', breaking the cast in the group by and order by clauses).

CAST('9A' AS int) fails when I tested it. It looks like you're doing unnecessary CASTS for grouping and sorting. Particularly in the GROUP BY, that will at least kill any chance for optimizing.

Related

SQL SUBSTRING & PATINDEX of varying lengths

SQL Server 2017.
Given the following 3 records with field of type nvarchar(250) called fileString:
_318_CA_DCA_2020_12_11-01_00_01_VM6.log
_319_CA_DCA_2020_12_12-01_VM17.log
_333_KF_DCA01_00_01_VM232.log
I would want to return:
VM6
VM17
VM232
Attempted thus far with:
SELECT
SUBSTRING(fileString, PATINDEX('%VM[0-9]%', fileString), 3)
FROM dbo.Table
But of course that only returns VM and 1 number.
How would I define the parameter for number of characters when it varies?
EDIT: to pre-emptively answer a question that may come up, yes, the VM pattern will always be proceeded immediately by .log and nothing else. But even if I took that approach and worked backwards, I still don't understand how to define the number of characters to take when the number varies.
here is one way :
DECLARE #test TABLE( fileString varchar(500))
INSERT INTO #test VALUES
('_318_CA_DCA_2020_12_11-01_00_01_VM6.log')
,('_319_CA_DCA_2020_12_12-01_00_01_VM17.log')
,('_333_KF_DCA_2020_12_15-01_00_01_VM232.log')
-- 5 is the length of file extension + 1 which is always the same size '.log'
SELECT
REVERSE(SUBSTRING(REVERSE(fileString),5,CHARINDEX('_',REVERSE(fileString))-5))
FROM #test AS t
This will dynamically grab the length and location of the last _ and remove the .log.
It is not the most efficient, if you are able to write a CLR function usnig C# and import it into SQL, that will be much more efficient. Or you can use this as starting point and tweak it as needed.
You can remove the variable and replace it with your table like below
DECLARE #TESTVariable as varchar(500)
Set #TESTVariable = '_318_CA_DCA_2020_12_11-01_00_01_VM6adf.log'
SELECT REPLACE(SUBSTRING(#TESTVariable, PATINDEX('%VM[0-9]%', #TESTVariable), PATINDEX('%[_]%', REVERSE(#TESTVariable))), '.log', '')
select *,
part = REPLACE(SUBSTRING(filestring, PATINDEX('%VM[0-9]%', filestring), PATINDEX('%[_]%', REVERSE(filestring))), '.log', '')
from table
Your lengths are consistent at the beginning. So get away from patindex and use substring to crop out the beginning. Then just replace the '.log' with an empty string at the end.
select *,
part = replace(substring(filestring,33,255),'.log','')
from table;
Edit:
Okay, from your edit you show differing prefix portions. Then patindex is in fact correct. Here's my solution, which is not better or worse than the other answers but differs with respect to the fact that it avoids reverse and delegates the patindex computation to a cross apply section. You may find it a bit more readable.
select filestring,
part = replace(substring(filestring, ap.vmIx, 255),'.log','')
from table
cross apply (select
vmIx = patindex('%_vm%', filestring) + 1
) ap

SSRS: SUM values if contending with NULLS

In SSMS, this is easy with an ISNULL function. So in SSRS, how would I go about getting a SUM value from two columns where one of them has NULLS?
Here's an example of the data:
So if a pro_rate value exists, I want to use it in the aggregation, but if it is NULL, then I need to pull from the cost column. Here's my most recent attempt at the expression:
=iif(isnothing(sum(Fields!pro_rate.Value), sum(Fields!cost.Value), sum(Fields!pro_rate.Value)))
In SQL, I would simply write sum(isnull(pro_rate, cost))(to also ensure it's clear what my result should be) but I keep getting the dreaded #Error in my report.
UPDATE 1: I altered my query a bit to the following and got the #Error.
=sum(iif(isnothing(Fields!pro_rate.Value), Fields!cost.Value, Fields!pro_rate.Value))
After looking at your data and the expressions being used, I came to the conclusion that you are seeing that error message due to a conversion error. It looks like your cost field is an INT datatype as the values all seem to be rounded off to the nearest integer and the pro_rate field appears to be a DECIMAL datatype. This appears to cause a problem when you are attempting to SUM an integer into a decimal without the proper level of precision.
To test this, I created the following simple dataset:
CREATE TABLE #temp (A VARCHAR(2), B INT, C DECIMAL(4,2))
INSERT INTO #temp (A,B,C)
VALUES ('A', '5', '1.7'), ('B', '10', '2.6'), ('C', '9', NULL)
SELECT * FROM #temp
With this data, I was able to test and confirm that my theory was correct and you should be able to fix your #Error by putting a CDEC function around the cost field in the expression provided by Derrick Moeller.
It looks to me like you want to evaluate each row individually and you have your order of operations incorrect.
=sum(iif(isnothing(Fields!pro_rate.Value), Fields!cost.Value, Fields!pro_rate.Value))

SQL Conversion failed when converting the varchar value '*' to data type int

Aside: Please note that this is not a duplicate of the countless other string-to-int issues in sql. This is a specific cryptic error message (with the asterisk) that hides another problem
Today I came across an issue in my SQL that took me all day to solve. I was passing a data table parameter into my stored procedure and then inserting part of it into another table. The code used was similar to the following:
INSERT INTO tblUsers
(UserId,ProjectId)
VALUES ((SELECT CAST(CL.UserId AS int) FROM #UserList AS UL),#ProjectId)
I didn't seem to get the error message when using test data, only when making the call from the dev system.
What is the issue and how should the code look?
I could bet that (SELECT CAST(CL.UserId AS int) FROM #UserList AS UL) returns more than 1 row and your test scenario had only 1 row. But that may be just me.
Anyway, the way the code should look is:
INSERT INTO tblUsers (UserId,ProjectId)
SELECT CAST(CL.UserId AS int),
#ProjectId
FROM #UserList AS UL
After some time of trawling through google and such places, I have determined that this SQL code is wrong. I believe the correct code is something more like this:
INSERT INTO tblUsers
(UserId,ProjectId)
SELECT CAST(CL.UserId AS int) ,#ProjectId
FROM #UserList AS UL
The issue is that the other way of doing it attempts to insert one record with the data of all of the rows in the table parameter. I needed to remove the VALUES statement. Also, in order to add other data, I can simply put that as part of the select as you would otherwise

Is Sql Server's ISNULL() function lazy/short-circuited?

TIs ISNULL() a lazy function?
That is, if i code something like the following:
SELECT ISNULL(MYFIELD, getMyFunction()) FROM MYTABLE
will it always evaluate getMyFunction() or will it only evaluate it in the case where MYFIELD is actually null?
This works fine
declare #X int
set #X = 1
select isnull(#X, 1/0)
But introducing an aggregate will make it fail and proving that the second argument could be evaluated before the first, sometimes.
declare #X int
set #X = 1
select isnull(#X, min(1/0))
It's whichever it thinks will work best.
Now it's functionally lazy, which is the important thing. E.g. if col1 is a varchar which will always contain a number when col2 is null, then
isnull(col2, cast(col1 as int))
Will work.
However, it's not specified whether it will try the cast before or simultaneously with the null-check and eat the error if col2 isn't null, or if it will only try the cast at all if col2 is null.
At the very least, we would expect it to obtain col1 in any case because a single scan of a table obtaining 2 values is going to be faster than two scans obtaining one each.
The same SQL commands can be executed in very different ways, because the instructions we give are turned into lower-level operations based on knowledge of the indices and statistics about the tables.
For that reason, in terms of performance, the answer is "when it seems like it would be a good idea it is, otherwise it isn't".
In terms of observed behaviour, it is lazy.
Edit: Mikael Eriksson's answer shows that there are cases that may indeed error due to not being lazy. I'll stick by my answer here in terms of the performance impact, but his is vital in terms of correctness impact across at least some cases.
Judging from the different behavior of
SELECT ISNULL(1, 1/0)
SELECT ISNULL(NULL, 1/0)
the first SELECT returns 1, the second raises a Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered. error.
This "lazy" feature you are referring to is in fact called "short-circuiting"
And it does NOT always work especially if you have a udf in the ISNULL expression.
Check this article where tests were run to prove it:
Short-circuiting (mainly in VB.Net and SQL Server)
T-SQL is a declarative language hence it cannot control the algorithm used to get the results.. it just declares what results it needs. It is upto the query engine/optimizer to figure out the cost-effective plan. And in SQL Server, the optimizer uses "contradiction detection" which will never guarantee a left-to-right evaluation as you would assume in procedural languages.
For your example, did a quick test:
Created the scalar-valued UDF to invoke the Divide by zero error:
CREATE FUNCTION getMyFunction
( #MyValue INT )
RETURNS INT
AS
BEGIN
RETURN (1/0)
END
GO
Running the below query did not give me a Divide by zero error encountered error.
DECLARE #test INT
SET #test = 1
SET #test = ISNULL(#test, (dbo.getMyFunction(1)))
SELECT #test
Changing the SET to the below statement did give me the Divide by zero error encountered. error. (introduced a SELECT in ISNULL)
SET #test = ISNULL(#test, (SELECT dbo.getMyFunction(1)))
But with values instead of variables, it never gave me the error.
SELECT ISNULL(1, (dbo.getMyFunction(1)))
SELECT ISNULL(1, (SELECT dbo.getMyFunction(1)))
So unless you really figure out how the optimizer is evaluating these expressions for all permutations, it would be safe to not rely on the short-circuit capabilities of T-SQL.

NULL vs empty string

What is the difference between the below queries & how it works?
SELECT * FROM some_table WHERE col IS NOT NULL
&
SELECT * FROM some_table WHERE col <> ''
Regards,
Mubarak
The NULL is special data type, it means absence of value.
An empty string on the other hand means a string or value which is empty.
Both are different.
For example, if you have name field in table and by default you have set it to NULL. When no value is specified for it, it will be NULL but if you specify a real name or an empty string, it won't be NULL then, it will contain an empty string instead.
NULL is the absence of value, and usually indicates something meaningful, such as unknown or not (yet) determined. For example, if I start a project today, the StartDate is 2012-02-25. If I don't know how long the project is going to take, what should the EndDate be? I might have some idea what the ProjectedEndDate may be, but I would set the EndDate to NULL, and update it when the project is complete.
'' is a zero-length (or "empty") string. It is not technically the absence of data, since it might actually be meaningful. For example, if I don't have a middle name, depending on your data model '' might make more sense than NULL since the latter implies unknown but '' can imply that it is known that I don't have one. NULL can be used the same way of course, but then it is difficult to decipher whether it is not known whether it exists, or known that it does not exist. A lot of standards have dedicated values for things where it might not be known - for example Gender has I believe 9 different character codes so that if M or F are not specified, we always know exactly why (unknown, unspecified, transgender, etc). Also think of the case where HeartRate is NULL - is it because there was no pulse, or because we haven't taken it yet?
They are not the same, though unfortunately many people treat them the same. If your column allows NULL it means that you know in advance that sometimes you may not know this information. If you are not treating them as the same thing, then your queries would differ. For example if col does not allow NULL, your first query will always return all results in the table, since none of them can be NULL. However NOT NULL still allows an empty string to be entered unless you have also set up a check constraint to prevent zero-length strings also.
Allowing both for the same column is usually a bit confusing for someone trying to understand the data model, though I believe in most cases a NOT NULL constraint is not combined with a LEN(col)>0 check constraint. The problem if both are allowed is that it is difficult to know what it means if the column is NULL or the column is "empty" - they could mean the same thing, but they may not - and this will vary from shop to shop.
Another key point is that NULL compared to anything (at least by default in SQL Server*) evaluates to unknown, which in turn evaluates to false. As an example, these queries all return 0:
DECLARE #x TABLE(i INT);
INSERT #x VALUES(NULL);
SELECT COUNT(*) FROM #x WHERE i = 1;
SELECT COUNT(*) FROM #x WHERE i <> 1;
SELECT COUNT(*) FROM #x WHERE i <= 3;
SELECT COUNT(*) FROM #x WHERE i > 3;
SELECT COUNT(*) FROM #x WHERE i IN (1,2,3);
SELECT COUNT(*) FROM #x WHERE i NOT IN (1,2,3);
Since the comparisons in the where clause always evaluate to unknown, they always come back false, so no rows ever meet the criteria and all counts come back as 0.
In addition, the answers to this question on dba.stackexchange might be useful:
https://dba.stackexchange.com/questions/5222/why-shouldnt-we-allow-nulls
* You can change this by using SET ANSI_NULLS OFF - however this is not advised both because it provides non-standard behavior and because this "feature" has been deprecated since SQL Server 2005 and will become a no-op in a future version of SQL Server. But you can play with the query above and see that the NOT IN behaves differently with SET ANSI_NULLS OFF.
NULL means the value is missing but '' means the value is there but just empty string
so first query means query all rows that col value is not missing, second one means select those rows that col not equals empty string
Update
For further information, I suggest you read this article:
https://sqlserverfast.com/blog/hugo/2007/07/null-the-databases-black-hole/
Select * from table where col IS NOT NULLwould return results excluded from Select * from table where col <> ‘’ because an empty string is also NOT NULL.
https://data.stackexchange.com/stackoverflow/query/62491/http-stackoverflow-com-questions-9444638-null-vs-empty-in-sql-server
SET NOCOUNT ON;
DECLARE #tbl AS TABLE (value varchar(50) NULL, description varchar(50) NOT NULL);
INSERT INTO #tbl VALUES (NULL, 'A Null'), ('', 'Empty String'), ('Some Text', 'A non-empty string');
SELECT * FROM #tbl;
SELECT * FROM #tbl WHERE value IS NOT NULL;
SELECT * FROM #tbl WHERE value <> '';
Note that in the display you cannot distinguish between NULL and '' - this is only an artifact of how the grid and text client display the data, but the data in the set is stored differently for NULL and ''.
As stated in other answers, NULL means 'no value' while empty string '' means just that - empty string. You can think of fields that allow NULLs as optional fields - they can be ignored and value for them may just not be provided.
Imagine an application where respondent selects their title (Mr, Mrs, Miss, Dr) but you do not require him/her to select any of those and leave it blank. In this case you would put NULL into relevant database field.
Distinction between NULL and empty string may not be obvious because they both can mean 'no value' if you decide to. It depends entirely up to you but using NULL would be better mainly because it is a special case for databases which are designed to handle NULLs quickly and efficiently (much faster than strings). If you use it instead of an empty string your queries will be faster and more reliable.

Resources