Differences between Excel and SQL sorting - sql-server

Programs used:
SQL Server 2000, Excel 2003
We have a table in our database called Samples. Using the following query...
SELECT [Sample], [Flag] FROM Samples
ORDER BY [Sample]
... we get the following results:
Sample Flag
---------- ----
12-ABC-345 1
123-45-AB 0
679-ADC-12 1
When the user has the same data in an Excel spreadsheet, and sorts by the Sample column, they get the following sort order:
Sample Flag
---------- ----
123-45-AB 0
12-ABC-345 1
679-ADC-12 1
Out of curiosity, why is there a discrepancy between the sort in SQL and Excel (other than, "because it's Microsoft").
Is there a way in SQL to sort on the Sample column in the same method as the Excel method, or vice versa?

The SQL server sorting is determined by the database, table, or field collation. By default, this is a standard lexicographical string sort (the character code for the hyphen is numerically lower than the character code for 1). Unfortunately, according to this Microsoft link, Excel ignores hyphens and apostrophes when sorting, except for tie-breaking. There's no collation that does this specifically (that I'm aware of), so you'll have to fake it.
To achieve the same result in SQL Server, you'd need to do:
SELECT [Sample], [Flag] FROM Samples
ORDER BY REPLACE(REPLACE([Sample], '-', ''), '''', ''),
(CASE WHEN CHARINDEX([Sample], '-') > 0 THEN 1 ELSE 0 END) +
(CASE WHEN CHARINDEX([Sample], '''') > 0 THEN 1 ELSE 0 END) ASC
This orders the results by the string as if it had all hyphens and apostrophe's removed, then orders by a computed value that will yield 1 for any value that contains a hyphen or an apostrophe, or 2 for any value that contains both (and 0 for a value that contains neither). This expression will cause any value that contains a hyphen and/or apostrophe to sort after an expression that is otherwise equivalent, just like Excel.

I personally consider SQL Server sorting order correct and I'd intervene on Excel, as it's the one following an "unusual" method (at least, from my experience).
Here's an explanation of how Excel sorts alphanumeric data, and how to fix it: How to correctly sort alphanumeric data in Excel.

Related

SQL Server full text is not working if there is one word in the column

I have SQL Server 2016 Developer Edition.
I applied full text for a column. I am getting all results having character 'AB' in LIKE clause, but it's not working if I give same 'AB' for contains.
SELECT *
FROM metal_master
WHERE CONTAINS (description, ' "*ab*" ')
SELECT *
FROM metal_master
WHERE description LIKE '%AB%'
In first case, I got 1378 records, the second case, I got 15,945 records.
When I checked, all the result of 'contains' have more than 1 word. It's not considering if 1 word is there, which LIKE is considering.

SQL Server Determining Hard Coded Date as Larger When It's Not?

An old employee left a massive query behind that I've been debugging and it appears that the issue has come down to SQL Server itself determining a comparison differently than what I would have expected.
I have a table with a column col1 containing the value 20191215 as a datetime.
The part in question is similar to the following:
select case when col1 > '01/01/2020' then 1 else 0 end
This statement is returning 1, suggesting that '12/15/2019' is larger than '01/01/2020'.
I do not need assistance correcting the query, as I have already made changes to do so other than using the comparison the previous employee was using, I am simply curious as to why SQL Server would evaluate this as I have described.
I understand that this is not the typically way SQL Server would store dates as well, would the issue simply be the formatting of the dates?
Current SQL Server version is: SQL Server 2014 SP3 CU3.
SQL Fiddle link that shows the same results
Please note that the link does not contain an exact replica of my case
Edit: Included additional info relevant to actual query.
It is a string comparison not a date comparison:
select case when '12/15/2019' > '01/01/2020' then 1 else 0 end
vs
select case when CAST('12/15/2019' AS DATE) > CAST('01/01/2020' AS DATE) then 1 else 0 end
db<>fiddle demo
I am simply curious as to why SQL Server would evaluate this as I have described.
'12/15/2019' it is a string literal, SQL Server does not know you want to treat a date unless you explicitly express your intention.
I have a table with a column col1 containing the value 20191216
If you are comparing with a column then the data type of column matters and data type precedence rules

SQL Server Convert Data

I'm having trouble trying to clean a database because SQL Server doesn't differentiate '2¹59' from '2159', but when when try to convert into INT it obviously returns an error.
In this case I need to replace by NULL, every non numerical data.
Can someone help please? (I'm using Sql Server 2008)
From SQL SERVER 2012 there is a new function which have been added called TRY_PARSE,
If you use it then it will automatically make non int to null.
select TRY_PARSE('2¹59' as int)
Output of above query will be null.
You can use a different collation to change the way the strings are compared:
select
case when N'2¹59' = N'2159' collate Latin1_General_BIN then 1 else 0 end
This will select 0 as you'd expect.
More importantly, since MS SQL understands unicode properly, you can do this:
select cast(N'2¹59' as varchar)
which will give you '2159' - properly replacing the "broken" digits.
If you have no other option, you could also build a helper table to handle indexing the string (just a single column with numbers 1..1000 for example), and do something like this:
exists
(
select 1 from [Numbers]
where
[Numbers].[Index] < len([Value]) + 1
and
unicode(substring([Value], [Numbers].[Index], 1)) > 127
)
Needless to say, this is going to be rather slow. For simple integers, though, this can work as a decent validation - simply use (unicode(substring([Value], [Numbers].[Index], 1)) not between 48 and 57) and ([Numbers].[Index] <> 0 or substring([Value], 1, 1) <> '-')) for example.

SQL Server select-where statement issue

I am using SQL Server 2008 Enterprise on Windows Server 2008 Enterprise. I have a question about tsql in SQL Server 2008. For select-where statement, there are two differnet forms,
(1) select where foo between [some value] and [some other value],
(2) select where foo >= [some value] and foo <= [some other value]? I am not sure whether between-and is always the same as using <= and >= sign?
BTW: whether they are always the same - even for differnet types of data (e.g. compare numeric value, comparing string values), appreciate if anyone could provide some documents to prove whether they are always the same, so that I can learn more from it.
thanks in advance,
George
Yes they are always the same. The entry in Books Online for BETWEEN says
BETWEEN returns TRUE if the value of
test_expression is greater than or
equal to the value of begin_expression
and less than or equal to the value of
end_expression.
Indeed you can see this easily by looking at the execution plans. You will see that Between doesn't even appear in the text. It has been replaced with >= and <= there is no distinction made between the two.
SELECT * FROM master.dbo.spt_values
WHERE number between 1 and 3 /*Numeric*/
SELECT * FROM master.dbo.spt_values
WHERE name between 'a' and 'b' /*String*/
select * from sys.objects
WHERE create_date between GETDATE() and GETDATE()+100 /*Date*/

Get a number from a sql string range

I have a column of data that contains a percentage range as a string that I'd like to convert to a number so I can do easy comparisons.
Possible values in the string:
'<5%'
'5-10%'
'10-15%'
...
'95-100%'
I'd like to convert this in my select where clause to just the first number, 5, 10, 15, etc. so that I can compare that value to a passed in "at least this" value.
I've tried a bunch of variations on substring, charindex, convert, and replace, but I still can't seem to get something that works in all combinations.
Any ideas?
Try this,
SELECT substring(replace(interest , '<',''), patindex('%[0-9]%',replace(interest , '<','')), patindex('%[^0-9]%',replace(interest, '<',''))-1) FROM table1
Tested at my end and it works, it's only my first try so you might be able to optimise it.
#Martin: Your solution works.
Here is another I came up with based on inspiration from #mercutio
select cast(replace(replace(replace(interest,'<',''),'%',''),'-','.0') as numeric) test
from table1 where interest is not null
You can convert char data to other types of char (convert char(10) to varchar(10)), but you won't be able to convert character data to integer data from within SQL.
I don't know if this works in SQL Server, but within MySQL, you can use several tricks to convert character data into numbers. Examples from your sample data:
"<5%" => 0
"5-10%" => 5
"95-100%" => 95
now obviously this fails your first test, but some clever string replacements on the start of the string would be enough to get it working.
One example of converting character data into numbers:
SELECT "5-10%" + 0 AS foo ...
Might not work in SQL Server, but future searches may help the odd MySQL user :-D
You can do this in sql server with a cursor. If you can create a CLR function to pull out number groupings that will help. Its possible in T-SQL, just will be ugly.
Create the cursor to loop over the list.
Find the first number, If there is only 1 number group in their then return it. Otherwise find the second item grouping.
if there is only 1st item grouping returned and its the first item in the list set it to upper bound.
if there is only 1st item grouping returned and its the last item in the list set it to lower bound.
Otherwise set the 1st item grouping to lower, and the 2nd item grouping to upper bound
Just set the resulting values back to a table
The issue you are having is a symptom of not keeping the data atomic. In this case it looks purely unintentional (Legacy) but here is a link about it.
To design yourself out of this create a range_lookup table:
Create table rangeLookup(
rangeID int -- or rangeCD or not at all
,rangeLabel varchar(50)
,LowValue int--real or whatever
,HighValue int
)
To hack yourself out here some pseudo steps this will be a deeply nested mess.
normalize your input by replacing all your crazy charecters.
replace(replace(rangeLabel,"%",""),"<","")
--This will entail many nested replace statments.
Add a CASE and CHARINDEX to look for a space if there is none you have your number
else use your substring to take everything before the first " ".
-- theses steps are wrapped around the previous step.
It's complicated, but for the test cases you provided, this works. Just replace #Test with the column you are looking in from your table.
DECLARE #TEST varchar(10)
set #Test = '<5%'
--set #Test = '5-10%'
--set #Test = '10-15%'
--set #Test = '95-100%'
Select CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
0
ELSE
CONVERT(integer,SUBSTRING(#TEST,1,CHARINDEX('-',#TEST)-1))
END
AS LowerBound
,
CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
CONVERT(integer,Substring(#TEST,2,CHARINDEX('%',#TEST)-2))
ELSE
CONVERT(integer,Substring(#TEST,CHARINDEX('-',#TEST)+1,CHARINDEX('%',#TEST)-CHARINDEX('-',#TEST)-1))
END
AS UpperBound
You'd probably be much better off changing <5% and 5-10% to store 2 values in 2 fields. Instead of storing <5%, you would store 0, and 5, and instead of 5-10%, yould end up with 5 and 10. You'd end up with 2 columns, one called lowerbound, and one called upperbound, and then just check value >= lowerbound AND value < upperbound.

Resources