Snowflake Ordering on Text seems Incorrect - snowflake-cloud-data-platform

I am new to snowflake and have noticed the ordering on text columns does not behave as expected.
Take this simple example:
select *
from ( values ('ab'), ('aBc'), ('acd') ) t(col1)
order by col1
Expected order: ab, aBc, acd
Actual order: aBc, ab, acd
Am I missing something?
Thank you.

You could use COLLATE specification directly in the order by clause.
The collate lets you specify following configuration settings to be used when comparing values:
Locale
Case-sensitivity
Accent-sensitivity
Punctuation-sensitivity
First-letter preference
Case-conversion
Space-trimming
Following example uses English Locale(en) and Case Insensitive(ci) collation:
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by collate(col1, 'en-ci');
Result returned:
ab
aBc
acd
Z

According to documentation:
All data is sorted according to the numeric byte value of each character in the ASCII table. UTF-8 encoding is supported.
In the ASCII table, B comes before b.
It's weird the order by didn't consider the string length in the ordering.

This will also sort "Z" before "a" because it's first in ASCII / Unicode order. You can order by with an upper function:
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by col1
To sort without case sensitivity, you can use the upper or lower function.
select *
from ( values ('ab'), ('aBc'), ('acd'), ('Z') ) t(col1)
order by upper(col1)

Related

Order by string returns wrong order

I am using a column which is named ItemCode. ItemCode is Varchar(50) type.
Here is my query
Select * from Inventory order by ItemCode
So, now my result is looks like
ItemCode-1
ItemCode-10
ItemCode-2
ItemCode-20
And so on.
How can I order my string as the example below?
ItemCode-1
ItemCode-2
ItemCode-10
ItemCode-20
Should I convert my column as number? Also I mention that I have some fields that contain no number.
You could order by the numbers as
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20')
) T(Str)
ORDER BY CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT)
Note: Since you tagged your Q with SQL Server 2008 tag, you should upgrade as soon as possible because it's out of support.
UPDATE:
Since you don't provide a good sample data, I'm just guessing.
Here is another way may feet your requirements
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20'),
('Item-Code')
) T(Str)
ORDER BY CASE WHEN Str LIKE '%[0-9]' THEN CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT) ELSE 0 END
This is an expected behavior, based on this:
that is lexicographic sorting which means basically the language
treats the variables as strings and compares character by character
You need to use something like this:
ORDER BY
CASE WHEN ItemCode like '%[0-9]%'
THEN Replicate('0', 100 - Len(ItemCode)) + ItemCode
ELSE ItemCode END

SQL Server - collation - difference between Latin1_General_CI_AS and Latin1_General_CS_AS

How can I see the difference between
SELECT *
FROM (VALUES ('A'),('B'),('Y'),('Z'), ('a'),('b'),('y'),('z')) V(C)
ORDER BY C COLLATE Latin1_General_CS_AS
and
SELECT *
FROM (VALUES ('A'),('B'),('Y'),('Z'), ('a'),('b'),('y'),('z')) V(C)
ORDER BY C COLLATE Latin1_General_CI_AS
?. For this set of characters there is no difference.
Cheers
Bartosz
Add an integer ID column to your set of values, and order by that after ordering by C.
SELECT *
FROM (VALUES (1,'a'),(2,'b'),(3,'y'),(4,'z'),(5,'A'),(6,'B'),(7,'Y'),(8,'Z'),(9,'a'),(10,'b'),(11,'y'),(12,'z')) V(ID,C)
ORDER BY C COLLATE Latin1_General_CS_AS,ID
SELECT *
FROM (VALUES (1,'a'),(2,'b'),(3,'y'),(4,'z'),(5,'A'),(6,'B'),(7,'Y'),(8,'Z'),(9,'a'),(10,'b'),(11,'y'),(12,'z')) V(ID,C)
ORDER BY C COLLATE Latin1_General_CI_AS,ID
For the first table, which is case sesitive, 'a' <> 'A', so they are treated seperately. Our ordering puts lowercase a first, and then orders them by ID (1, 9), and then follows with uppercase A.
ID C
1 a
9 a
5 A
In the second table, 'a'='A', so they are treated in the same group, and the 3 a (or A) values are ordered together on ID number
ID C
1 a
5 A
9 a
And this pattern continues for b, y and z.

A query to get me specific alphabetic order for sybase

I have a table called telephone_contacts that contain two columns:
telephone_contacts (
Name varchar(100)
Numbers number(20)
)
the column name contains about 20,000 rows.
I want to filter the name by alphabetic , example:
I want a query that get me only the first 6 alphabetic (A , B, C , D ,E ,F G)
Then, a query that get me the last 6 alphabetic (U,V,W,X,Y,Z)
example: the column name contains the following data:
Abe, car, night, range, chicken, zoo, whatsapp,facebook, viber Adu , aramt, Bike, Male, dog,egg
I want a query that get me only (A , B, C , D ,E ,F G) so the results will be
abe ,care ,chicken facebook,adu,aramt,bike, dog, egg
the rest are ignored
In oracle I can do it like this, how do I do it for sybase?
SELECT * FROM user_tab_cols WHERE SUBSTR(UPPER(table_name),1) BETWEEN 'A' and 'Q'
SELECT * FROM user_tab_cols WHERE SUBSTR(UPPER(table_name),1) BETWEEN 'P' and 'Z'
In Sybase you can use the regex (regular expression) to sepecify character ranges [A-G] Assuming your server is set to case insensitive you can do the following:
SELECT * FROM telephone_contacts WHERE name LIKE "[A-G]%"
SELECT * FROM telephone_contacts WHERE name LIKE "[U-Z]%"
or
SELECT * FROM telephone_contacts WHERE name BETWEEN "A%" and "G%"
SELECT * FROM telephone_contacts WHERE name BETWEEN "U%" and "Z%"
If you find that your server is case sensitive, then you can do what was suggested in another answer, and use upper(name)
It's even simpler:
select * from yourtable where upper(name) like "[A-Q]%"
select * from yourtable where upper(name) like "[P-Z]%"

SQL SERVER - Understanding how MIN(text) works

I'm doing a little digging and looking for a explanation on how SQL server evaluates MIN(Varchar).
I found this remark in BOL: MIN finds the lowest value in the collating sequence defined in the underlying database
So if I have a table that has one row with the following values:
Data
AA
AB
AC
Doing a SELECT MIN(DATA) would return back AA. I just want to understand the why behind this and understand the BOL a little better.
Thanks!
It's determined by the collation (sort order). For most cultures the collation order is the same as the alphabetical order in the English alphabet so:
'AA' < 'AB'
'AA' < 'AC'
'AB' < 'AC'
Therefore 'AA' is the minimum value. For other cultures this may not hold. For example a Danish collation would return 'AB' as the minimum because 'AA' > 'AB'. This is because 'AA' is treated as equivalent to 'Å' which is the last letter in the Danish alphabet.
SELECT MIN(s COLLATE Danish_Norwegian_CI_AS) FROM table1;
min_s
AB
To get an "ordinary" sort order use the Latin1_General_Bin collation:
SELECT MIN(s COLLATE Latin1_General_Bin) FROM table1;
min_s
AA
To reproduce this result you can create this test table:
CREATE TABLE table1 (s varchar(100));
INSERT INTO table1 (s) VALUES ('AA'), ('AB'), ('AC');
No, MIN is used in a SELECT statement that scans more than one line. It takes a column as an argument, and returns the "lowest" value (again, according to the collation sequence) found in that column.
Used without a GROUP BY clause, the result set will have a single row, and the value of MIN will be the lowest value found in that column. Used with a GROUP BY clause, the result set will have one row for each group and the value of MIN will be the lowest value in that column for any row in the group.
min(x), where is a char (string) type -- char(), varchar(), nchar(), nvarchar(), finds the lowest value in the group, based on SQL's string comparison rules:
if two strings differ in length, the shorter is padded with SP characters (spaces) to the length of the longer.
comparison proceeds left-to-right, character by character, according to the rule of the collation sequence in use.
in comparisons, the value NULL compares lower than any non-null values (the ISO/ANSI SQL standard says that it is an implementation choice as to whether NULL collates lower or higher than any non-null value).
So, if you have a table
create table foo
(
myString varchar(16) not null ,
)
then running the query
select min(myString) from foo
will give you the same result set as if you executed
set rowcount 1
select myString
from foo
order by myString
set rowcount 0
You are basically ordering the set in ascending sequence and selecting the first value. MAX(), or course, gives you the inverse, ordering the set in descending sequence and selecting the first value.

Maintain ordering of characters if there is no id (SQL Server 2005)

I have the following
Chars
A
C
W
B
J
M
How can I insert some sequential numbers so that after insertion of the numbers the order of characters will not change?
I mean if I use row_number(), the output Character order is changing like
select
ROW_NUMBER() over(order by chars) as id,
t.* from #t t
Output:
id chars
1 A
2 B
3 C
4 J
5 M
6 W
My desired expectation is
id chars
1 A
2 C
3 W
4 B
5 J
6 M
Also, I cannot use any identity field like id int identity because I am in the middle of a query and I need to maintain a inner join for achieving something.
I hope I do make myself clear.
Please help.
Thanks in advance
There is no implicit ordering of rows in SQL. If some ordering is desired, be it order in which items were inserted or any other order, it must be supported by a user-defined column.
In other words, the SQL standard doesn't require the SQL implementations to maintain any order. On the other hand the ORDER BY clause in a SELECT statement can be used to specify the desired order, but such ordering is supported by the values in a particular (again, user defined) column.
This user defined column may well be an auto-incremented column for which SQL assigns incremental (or otherwise) values to, and this may be what you need.
Maybe something like...
CREATE TABLE myTable
(
InsertID smallint IDENTITY(1,1),
OneChar CHAR(1),
SomeOtherField VARCHAR(20)
-- ... etc.
)
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('A', 'Alpha')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('W', 'Whiskey')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('B', 'Bravo')
-- ... etc.
SELECT OneChar
FROM myTable
ORDER BY InsertId
'A'
'W'
'B'
--...

Resources