T-SQL ORDER BY ignores " '-' + ... " but not " '+' + ... " - sql-server

So i recently encountered a wierd bug when comparing two values.
My values was a range from -1 to 2.
Sometimes it thought that -1 was bigger than 0, the solution was easy. Apparently was the column set to varchar(50) instead of int.
But this made me think why this happened. Because even if the column was set to varchar(50) the '-' should have a lower char value than '0' (charvalue for '-' is 45 and charvalue for '0' should be 48)
I made some tests and it turns out, what i can find, that '-' is the only character that ORDER BY doesn't care about.
Example:
SELECT
A.x
FROM
(
VALUES
('-5'), ('-4'), ('-3'), ('-2'), ('-1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) A(x)
ORDER BY
A.x;
SELECT
B.x
FROM
(
VALUES
('+5'), ('+4'), ('+3'), ('+2'), ('+1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) B(x)
ORDER BY
B.x
Result:
Result of A
0
1
-1
2
-2
3
-3
4
-4
5
-5
Result of B
+1
+2
+3
+4
+5
0
1
2
3
4
5
(+ has a charvalue of 43)
The '+' order by feels right but the '-' seems... wrong
Anyone knows why it is like this?
Additional info
Server version: 12.0.4213
Collation: Finnish_Swedish_CI_AS
No clue what else could skew the result. Ask if you need more information.

Found out why.
TLDR: Non-unicode and unicode collation sorts '-' differently.
"A SQL collation's rules for sorting non-Unicode data are incompatible
with any sort routine that is provided by the Microsoft Windows
operating system; however, the sorting of Unicode data is compatible
with a particular version of the Windows sorting rules. Because the
comparison rules for non-Unicode and Unicode data are different, when
you use a SQL collation you might see different results for
comparisons of the same characters, depending on the underlying data
type. For example, if you are using the SQL collation
"SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less
than the string 'ab' because the hyphen ("-") is sorted as a separate
character that comes before "b". However, if you convert these strings
to Unicode and you perform the same comparison, the Unicode string
N'a-c' is considered to be greater than N'ab' because the Unicode
sorting rules use a "word sort" that ignores the hyphen."
Source: https://support.microsoft.com/en-us/kb/322112

Related

SQL String are same but case equals method returns false

I am using SQL to compare two columns and return TRUE/FALSE if they are equal.
In some cases, the two columns contain exactly the same string (no spaces or anything) but I am still getting false.
What may the reason for this be?
I am using this code:
CASE WHEN column1 = column2 THEN 0 ELSE 1 END AS [check]
The values are different despite the displayed value.
Using T-SQL, run a query like this to see the exact difference in the underlying raw values:
SELECT
column1
, CAST(column1 AS varbinary(MAX)) AS column1Binary
, column2
, CAST(column2 AS varbinary(MAX)) AS column2Binary
FROM dbo.YourTable;
This will reveal underlying differences like tabs or subtle character differences.
In fact, a likely explanation for what you are seeing is that one/both of the strings has leading and/or trailing whitespace. On SQL Server you may try:
CASE WHEN LTRIM(column1) = LTRIM(column2) THEN 0 ELSE 1 END AS [check]
If the above does not detect the problematical records, then try checking the length:
CASE WHEN LEN(column1) = LEN(column2) THEN 0 ELSE 1 END AS [check2]

Why is '-' equal to 0 (zero) in SQL?

When you run the following query in SQL Management studio the result will be 1.
SELECT
CASE WHEN '-' = 0 THEN
1
ELSE
0
END
That scares me a bit, because I have to check for 0 value a numerous number of times and it seems it is vulnerable for being equal to value '-'.
You're looking at it the wrong way around.
'-' is a string, so it will get implicitly casted to an integer value when comparing it with an integer:
select cast('-' as int) -- outputs 0
To make sure that you are actually comparing a value to the string '0', make your comparison like this instead:
select case when '-' = '0' then 1 else 0 end
In general, you're asking for trouble when you're comparing values of different data types, since implicit conversions happen behind the scene - so avoid it at all costs.

Overcoming SQL's lack of short-circuit of AND conditions

I'm processing an audit log, and want to ignore entries where a NULL value is changed to zero (or remains NULL). The Old and New values are held in NVARCHAR fields regardless of the type of the fields being logged. In order to CAST a new value to decimal, to determine if it's zero, I need to restrict to cases where ISNUMERIC of the field returns 1.
I've got it to work with this strange bit of SQL - but feel sure there must be a better way.
WHERE MessageDesc LIKE 'poitem%'
AND NOT(OldValue IS NULL AND 0.0 =
CASE
WHEN ISNUMERIC(NewValue) = 1 THEN CAST(NewValue AS DECIMAL(18,4))
WHEN NewValue IS NULL THEN 0.0
ELSE 1.0
END)
Any suggestions?
SQL Server 2012 added a Try_Convert function, which returns NULL if the value cannot be casted as the given type. http://technet.microsoft.com/en-us/library/hh230993.aspx
WHERE NOT (OldValue is Null AND
(NewValue is null OR try_convert(float, NewValue) = 0.0)
)
If using a version prior to 2012, check out Damien_The_Unbeliever's answer here: Check if a varchar is a number (TSQL) ...based on Aaron's comment this will not work in all cases.
Since you are using SQL 2008, then it appears a combination of isnumeric and a modified version of Damien's answer from the link above will work. Your current solution in your question would have problems with values like '.', '-', currency symbols ($, etc.), and scientific notation like '1e4'.
Try this for SQL 2008 (here is SQLFiddle with test cases: http://sqlfiddle.com/#!3/fc838/3 ): Note: this solution will not convert text values to numeric if the text has commas (ex: 1,000) or accounting notation with parens (ex: using "(1)" to represent "-1"), because SQL Server will throw an error when trying to cast to decimal.
WHERE t.OldValue is null
AND
(
t.NewValue is null
OR
0.0 =
case
when isnumeric(t.NewValue) = 1
--Contains only characters that are numeric, negative sign, or decimal place.
--This is b/c isnumeric will return true for currency symbols, scientific notation, or '.'
and not (t.NewValue like '%[^0-9.\-\+]%' escape '\')
--Not other single char values where isnumeric returns true.
and t.NewValue not in ( '.', '-', '+')
then cast(t.NewValue as decimal(18,4))
else null --can't convert to a decimal type
end
)
Avoid ISNUMERIC() since it is problematic with '.'.
-- Dot comes up as valid numeric
select
ISNUMERIC('.') as test1,
ISNUMERIC('1A') as test2,
ISNUMERIC('1') as test3,
ISNUMERIC('A1') as test4
-- Results window (text)
test1 test2 test3 test4
----------- ----------- ----------- -----------
1 0 1 0
Use COALESCE() instead.
WHERE MessageDesc LIKE 'poitem%'
AND
NOT (OldValue IS NULL AND
CAST(COALESCE(NewValue, '0') AS DECIMAL(18,4)) = 0)

How to determine the field value which can not convert to (decimal, float,int) in SQL Server

I have a SQL Server database.
One field has values which are like
ID VALUE
1 NEGATIF
2 11.4
3 0.2
4 A RH(+)
5 -----
6 >>>>>
7 5.6<
8 -13.9
I want to CONVERT VALUE field to decimal, of course convert-able fields.
What kind of SQL statement can do this?
How can I understand which value is raising error while converting?
PS: I think this can solve WHERE VALUE LIKE '[a-z]' but how can I add more filter like [-+ ()] ?
Plain ISNUMERIC is rubbish
Empty string, +, - and . are all valid
So is +. etc
1e-3 is valid for float but not decimal (unless you CAST to float then to decimal)
For a particularly cryptic but failsafe solution, append e0 or .0e0 then use ISNUMERIC
SELECT
ISNUMERIC(MyCOl + 'e0') --decimal check,
ISNUMERIC(MyCOl + '.0e0') --integer check
So
SELECT
ID, VALUE,
CAST(
CASE WHEN ISNUMERIC(VALUE + 'e0') = 1 THEN VALUE ELSE NULL END
AS decimal(38, 10)
) AS ConvertedVALUE
FROM
Mytable

SQL Server: sort a column numerically if possible, otherwise alpha

I am working with a table that comes from an external source, and cannot be "cleaned". There is a column which an nvarchar(20) and contains an integer about 95% of the time, but occasionally contains an alpha. I want to use something like
select * from sch.tbl order by cast(shouldBeANumber as integer)
but this throws an error on the odd "3A" or "D" or "SUPERCEDED" value.
Is there a way to say "sort it like a number if you can, otherwise just sort by string"? I know there is some sloppiness in that statement, but that is basically what I want.
Lets say for example the values were
7,1,5A,SUPERCEDED,2,5,SECTION
I would be happy if these were sorted in any of the following ways (because I really only need to work with the numeric ones)
1,2,5,7,5A,SECTION,SUPERCEDED
1,2,5,5A,7,SECTION,SUPERCEDED
SECTION,SUPERCEDED,1,2,5,5A,7
5A,SECTION,SUPERCEDED,1,2,5,7
I really only need to work with the
numeric ones
this will give you only the numeric ones, sorted properly:
SELECT
*
FROM YourTable
WHERE ISNUMERIC(YourColumn)=1
ORDER BY YourColumn
select
*
from
sch.tbl
order by
case isnumeric(shouldBeANumber)
when 1 then cast(shouldBeANumber as integer)
else 0
end
Provided that your numbers are not more than 100 characters long:
WITH chars AS
(
SELECT 1 AS c
UNION ALL
SELECT c + 1
FROM chars
WHERE c <= 99
),
rows AS
(
SELECT '1,2,5,7,5A,SECTION,SUPERCEDED' AS mynum
UNION ALL
SELECT '1,2,5,5A,7,SECTION,SUPERCEDED'
UNION ALL
SELECT 'SECTION,SUPERCEDED,1,2,5,5A,7'
UNION ALL
SELECT '5A,SECTION,SUPERCEDED,1,2,5,7'
)
SELECT rows.*
FROM rows
ORDER BY
(
SELECT SUBSTRING(mynum, c, 1) AS [text()]
FROM chars
WHERE SUBSTRING(mynum, c, 1) BETWEEN '0' AND '9'
FOR XML PATH('')
) DESC
SELECT
(CASE ISNUMERIC(shouldBeANumber)
WHEN 1 THEN
RIGHT(CONCAT('00000000',shouldBeANumber), 8)
ELSE
shouoldBeANumber) AS stringSortSafeAlpha
ORDEER BY
stringSortSafeAlpha
This will add leading zeros to all shouldBeANumber values that truly are numbers and leave all remaining values alone. This way, when you sort, you can use an alpha sort but still get the correct values (with an alpha sort, "100" would be less than "50", but if you change "50" to "050", it works fine). Note, for this example, I added 8 leading zeros, but you only need enough leading zeros to cover the largest possible integer in your column.

Resources