SQL SERVER - Understanding how MIN(text) works

SQL SERVER - Understanding how MIN(text) works - sql-server

I'm doing a little digging and looking for a explanation on how SQL server evaluates MIN(Varchar).
I found this remark in BOL: MIN finds the lowest value in the collating sequence defined in the underlying database
So if I have a table that has one row with the following values:
Data
AA
AB
AC
Doing a SELECT MIN(DATA) would return back AA. I just want to understand the why behind this and understand the BOL a little better.
Thanks!

It's determined by the collation (sort order). For most cultures the collation order is the same as the alphabetical order in the English alphabet so:
'AA' < 'AB'
'AA' < 'AC'
'AB' < 'AC'
Therefore 'AA' is the minimum value. For other cultures this may not hold. For example a Danish collation would return 'AB' as the minimum because 'AA' > 'AB'. This is because 'AA' is treated as equivalent to 'Å' which is the last letter in the Danish alphabet.
SELECT MIN(s COLLATE Danish_Norwegian_CI_AS) FROM table1;
min_s
AB
To get an "ordinary" sort order use the Latin1_General_Bin collation:
SELECT MIN(s COLLATE Latin1_General_Bin) FROM table1;
min_s
AA
To reproduce this result you can create this test table:
CREATE TABLE table1 (s varchar(100));
INSERT INTO table1 (s) VALUES ('AA'), ('AB'), ('AC');

No, MIN is used in a SELECT statement that scans more than one line. It takes a column as an argument, and returns the "lowest" value (again, according to the collation sequence) found in that column.
Used without a GROUP BY clause, the result set will have a single row, and the value of MIN will be the lowest value found in that column. Used with a GROUP BY clause, the result set will have one row for each group and the value of MIN will be the lowest value in that column for any row in the group.

min(x), where is a char (string) type -- char(), varchar(), nchar(), nvarchar(), finds the lowest value in the group, based on SQL's string comparison rules:
if two strings differ in length, the shorter is padded with SP characters (spaces) to the length of the longer.
comparison proceeds left-to-right, character by character, according to the rule of the collation sequence in use.
in comparisons, the value NULL compares lower than any non-null values (the ISO/ANSI SQL standard says that it is an implementation choice as to whether NULL collates lower or higher than any non-null value).
So, if you have a table
create table foo
(
myString varchar(16) not null ,
)
then running the query
select min(myString) from foo
will give you the same result set as if you executed
set rowcount 1
select myString
from foo
order by myString
set rowcount 0
You are basically ordering the set in ascending sequence and selecting the first value. MAX(), or course, gives you the inverse, ordering the set in descending sequence and selecting the first value.

Related

Snowflake Ordering on Text seems Incorrect

I am new to snowflake and have noticed the ordering on text columns does not behave as expected.
Take this simple example:
select *
from ( values ('ab'), ('aBc'), ('acd') ) t(col1)
order by col1
Expected order: ab, aBc, acd
Actual order: aBc, ab, acd
Am I missing something?
Thank you.

You could use COLLATE specification directly in the order by clause.
The collate lets you specify following configuration settings to be used when comparing values:
Locale
Case-sensitivity
Accent-sensitivity
Punctuation-sensitivity
First-letter preference
Case-conversion
Space-trimming
Following example uses English Locale(en) and Case Insensitive(ci) collation:
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by collate(col1, 'en-ci');
Result returned:
ab
aBc
acd
Z

According to documentation:
All data is sorted according to the numeric byte value of each character in the ASCII table. UTF-8 encoding is supported.
In the ASCII table, B comes before b.
It's weird the order by didn't consider the string length in the ordering.

This will also sort "Z" before "a" because it's first in ASCII / Unicode order. You can order by with an upper function:
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by col1
To sort without case sensitivity, you can use the upper or lower function.
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by upper(col1)

Is there a good expression for the maximum character value in an SQL collation?

I'm building a simple SQL report that assembles possible product titles from several tables and displays unnamed products last. We have a setup that allows individual locations to override the product title from a master product table, and I thought I could do something like
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY COALESCE(a.ProdNameOverride, m.ProdName, char(255)) ASC, a.ProdCode ASC
because, hey, char(255) has to sort after all of the other possible ASCII characters, right?
Well, no. It's a diacritical Y, which in the standard (SQL_Latin1_General_CP1_CI_AS) collation gets sorted before Z.
I eventually just resorted to brute force, finding that char(254) sorted after the conventional alphanumerics, but that got me curious - is there a reliable way to assign something "the last possible value in the relevant collation"?

If you want to find the last character in a varchar collation, you can just create a table with all possible characters and sort it. eg:
declare #chars table
(
CodePoint binary(1) primary key,
Character char(1) collate Arabic_CI_AI_KS_WS
)
declare #codePoint binary(1) = 0x0
while (#codePoint < 255)
begin
insert into #chars(CodePoint,Character)
values (#codePoint, cast(#codePoint as char(1)));
set #codePoint += 1;
end
Select *
from #chars
order by Character

Even if it were reliable, sort by what you actually want last.
This report is assuming nobody could ever enter a product override that began with the last possible character. You can't be sure of that, even if char(254) doesn't show up on conventional keyboards.
If you want products to appear last when they have a NULL override and main product name, construct a sort using CASE WHEN around that condition, as:
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY CASE WHEN a.ProdNameOverride IS NULL AND m.ProdName IS NULL THEN 1 ELSE 0 END ASC,
COALESCE(a.ProdNameOverride, m.ProdName) ASC, a.ProdCode ASC

Order by string returns wrong order

I am using a column which is named ItemCode. ItemCode is Varchar(50) type.
Here is my query
Select * from Inventory order by ItemCode
So, now my result is looks like
ItemCode-1
ItemCode-10
ItemCode-2
ItemCode-20
And so on.
How can I order my string as the example below?
ItemCode-1
ItemCode-2
ItemCode-10
ItemCode-20
Should I convert my column as number? Also I mention that I have some fields that contain no number.

You could order by the numbers as
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20')
) T(Str)
ORDER BY CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT)
Note: Since you tagged your Q with SQL Server 2008 tag, you should upgrade as soon as possible because it's out of support.
UPDATE:
Since you don't provide a good sample data, I'm just guessing.
Here is another way may feet your requirements
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20'),
('Item-Code')
) T(Str)
ORDER BY CASE WHEN Str LIKE '%[0-9]' THEN CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT) ELSE 0 END

This is an expected behavior, based on this:
that is lexicographic sorting which means basically the language
treats the variables as strings and compares character by character
You need to use something like this:
ORDER BY
CASE WHEN ItemCode like '%[0-9]%'
THEN Replicate('0', 100 - Len(ItemCode)) + ItemCode
ELSE ItemCode END

SQL Server ORDER BY Multiple values in case statement

I am trying to do an order by with several different columns. The first column has several conditions.
The base order was something like
SELECT *
FROM Table T
ORDER BY T.A, T.B, T.C
Most of the time column A is an int. Sometimes A is an int with a letter appended at the end. I want the order by to be my the number portion. I was able to achieve that by modifying the query to the following which has been working for months now.
SELECT *
FROM Table T
ORDER BY
CASE WHEN ISNUMERIC(a.[HUDLine]) = 0 THEN CAST(SUBSTRING(T.A,1, PATINDEX('%[^0-9]%',T.A - 1) AS INT)
ELSE CAST (T.A AS INT) end
, T.B, T.C
Recently a new requirement came up which allows the value of "OFFLINE" to exist in column A.
I want to modify the ORDER BY to keep the same logic as before with the exception of all records with "OFFLINE" are at the end.

First, I am very sorry about your requirements, it makes absolutely no sense that bizzarely mixed values are being stored in a varchar column.
Second try this
CASE WHEN a.[HUDLine] = 'OFFLINE'
THEN 2147483647 --This is the maximum signed int value
ELSE
(CASE WHEN ISNUMERIC(a.[HUDLine]) = 0
THEN CAST(SUBSTRING(T.A,1, PATINDEX('%[^0-9]%',T.A - 1) AS INT)
ELSE CAST (T.A AS INT)
END)
END
Show that to whoever is in charge of your datamodel, and politely ask them store meanigful numeric data in a numeric column.

Maintain ordering of characters if there is no id (SQL Server 2005)

I have the following
Chars
A
C
W
B
J
M
How can I insert some sequential numbers so that after insertion of the numbers the order of characters will not change?
I mean if I use row_number(), the output Character order is changing like
select
ROW_NUMBER() over(order by chars) as id,
t.* from #t t
Output:
id chars
1 A
2 B
3 C
4 J
5 M
6 W
My desired expectation is
id chars
1 A
2 C
3 W
4 B
5 J
6 M
Also, I cannot use any identity field like id int identity because I am in the middle of a query and I need to maintain a inner join for achieving something.
I hope I do make myself clear.
Please help.
Thanks in advance

There is no implicit ordering of rows in SQL. If some ordering is desired, be it order in which items were inserted or any other order, it must be supported by a user-defined column.
In other words, the SQL standard doesn't require the SQL implementations to maintain any order. On the other hand the ORDER BY clause in a SELECT statement can be used to specify the desired order, but such ordering is supported by the values in a particular (again, user defined) column.
This user defined column may well be an auto-incremented column for which SQL assigns incremental (or otherwise) values to, and this may be what you need.
Maybe something like...
CREATE TABLE myTable
(
InsertID smallint IDENTITY(1,1),
OneChar CHAR(1),
SomeOtherField VARCHAR(20)
-- ... etc.
)
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('A', 'Alpha')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('W', 'Whiskey')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('B', 'Bravo')
-- ... etc.
SELECT OneChar
FROM myTable
ORDER BY InsertId
'A'
'W'
'B'
--...