Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.
The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.
Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');
The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.
DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
I would like to get all the type names of a user seperated in commas and included in single quotes. The problem I have is that &apos ; character is displayed as output instead of '.
Trial 1
SELECT LISTAGG(TYPE_NAME, ''',''') WITHIN GROUP (ORDER BY TYPE_NAME)
FROM ALL_TYPES
WHERE OWNER = 'USER1';
ORA-01489: result of string concatenation is too long
01489. 00000 - "result of string concatenation is too long"
*Cause: String concatenation result IS more THAN THE maximum SIZE.
*Action: Make sure that the result is less than the maximum size.
Trial 2
SELECT '''' || RTRIM(XMLAGG(XMLELEMENT(E,TYPE_NAME,q'$','$' ).EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal(),q'$','$') AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1';
&apos ;TYPE1&apos ;,&apos ;TYPE2&apos ;, ............... ,'TYPE3&apos ;,&apos ;
Trial 3
SELECT
dbms_xmlgen.CONVERT(XMLAGG(XMLELEMENT(E,TYPE_NAME,''',''').EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal())
AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1';
TYPE1& ;apos ;,& ;apos ;TYPE2& ;apos ;, ......... ,& ;apos ;TYPE3& ;apos ;,& ;apos ;
I don;t want to call replace function and then make substring as follow
With tbla as (
SELECT REPLACE('''' || RTRIM(XMLAGG(XMLELEMENT(E,TYPE_NAME,q'$','$' ).EXTRACT('//text()')
ORDER BY TYPE_NAME).GetClobVal(),q'$','$'),''',''') AS LIST
FROM ALL_TYPES
WHERE OWNER = 'USER1')
select SUBSTR(list, 1, LENGTH(list) - 2)
from tbla;
Is there any other way ?
use dbms_xmlgen.convert(col, 1) to prevent escaping.
According to Official docs, the second param flag is:
flag
The flag setting; ENTITY_ENCODE (default) for encode, and
ENTITY_DECODE for decode.
ENTITY_DECODE - 1
ENTITY_ENCODE - 0 default
Try this:
select
''''||substr(s, 1, length(s) - 2) list
from (
select
dbms_xmlgen.convert(xmlagg(xmlelement(e,type_name,''',''')
order by type_name).extract('//text()').getclobval(), 1) s
from all_types
where owner = 'USER1'
);
Tested the similar code below with 100000 rows:
with t (s) as (
select level
from dual
connect by level < 100000
)
select
''''||substr(s, 1, length(s) - 2)
from (
select
dbms_xmlgen.convert(xmlagg(XMLELEMENT(E,s,''',''') order by s desc).extract('//text()').getClobVal(), 1) s
from t);
I have the created the table below
CREATE TABLE geom (g GEOMETRY);
and have inserted many rows, example below:
INSERT INTO geom (g)
VALUES(PolygonFromText('POLYGON((
9.190586853 45.464518970,
9.190602686 45.463993916,
9.191572471 45.464001929,
9.191613325 45.463884676,
9.192136130 45.463880767,
9.192111509 45.464095594,
9.192427961 45.464117804,
9.192417811 45.464112862,
9.192509035 45.464225851,
9.192493139 45.464371079,
9.192448471 45.464439002,
9.192387444 45.464477861,
9.192051402 45.464483037,
9.192012814 45.464643592,
9.191640825 45.464647090,
9.191622331 45.464506215,
9.190586853 45.464518970))')
);
Now I want to search all the data and return the entries where a lat / long I have falls withn any of the polygons.
How can this be done using mysql? or is anyone aware of any links that will point me in the right direction?
MySQL as of v5.1 only supports operations on the minimum bounding rectangles (MBR). While there is a "Contains" function which would do what you need, it is not fully implemented and falls back to using MBRContains
From the relevant manual page
Currently, MySQL does not implement
these functions according to the
specification. Those that are
implemented return the same result as
the corresponding MBR-based functions.
This includes functions in the
following list other than Distance()
and Related().
These functions may be implemented in
future releases with full support for
spatial analysis, not just MBR-based
support.
What you could do is let MySQL give you an approximate result based on MBR, and then post process it to perform a more accurate test. Alternatively, switch to PostGIS!
(Update May 2012 - thanks Mike Toews)
MySQL 5.6.1+ offers functions which use object shapes rather than MBR
MySQL originally implemented these functions such that they used
object bounding rectangles and returned the same result as the
corresponding MBR-based functions. As of MySQL 5.6.1, corresponding
versions are available that use precise object shapes. These versions
are named with an ST_ prefix. For example, Contains() uses object
bounding rectangles, whereas ST_Contains() uses object shapes.
If you cannot change dbs to one that has spatial operators implemented correctly like PostgreSQL's PostGIS extension http://postgis.refractions.net/ , you can solve this problem using a two-part approach.
First let MySQL give you an bounding box pre-filtering result based on the bounding box (that is what it does by default) using their intersects operator (http://dev.mysql.com/doc/refman/5.1/en/functions-that-test-spatial-relationships-between-geometries.html#function_intersects).
If queries are slow, make sure that you have an index on your geometry field first.
Then hydrate the original geometry that you used in your query into a geometry object of GIS geometry library like GEOS (http://trac.osgeo.org/geos/) (C++ based, although it also has bindings for different languages like Python), Shapely (http://trac.gispython.org/lab/wiki/Shapely), OGR ( or the Java Topology Suite (JTS) http://www.vividsolutions.com/jts/jtshome.htm).
Test each of the geometries that you get back from your query result using the appropriate operator like within or intersects. Any of these libraries will give you a boolean result.
Personally, I would look at the samples for OGR since it has a big community that is ready to help.
Oh yeah, and sorry for putting the links like that... I guess since I am "new" I can only post one link (?)
The function given in this post on the MySQL forums works perfectly for me.
It's not very quick and you have to ensure the parameter 'mp' is the same type as the spatial column you are using (I used ogr2ogr to import an Ordnance Survey shapefile into MySQL, so had to change it from 'MULTIPOLYGON' to 'GEOMETRY')
I have rewritten function that was given in previous post by #danherd, so it can work with real multipolygons which consist from more that one polygon. For those of you who
still keep using old MySql version it should help.
Here it is:
DELIMITER //
CREATE FUNCTION GISWithin(pt POINT, mp MULTIPOLYGON) RETURNS INT(1) DETERMINISTIC
BEGIN
DECLARE str_big, str, xy LONGTEXT;
DECLARE x, y, p1x, p1y, p2x, p2y, m, xinters DECIMAL(16, 13) DEFAULT 0;
DECLARE counter INT DEFAULT 0;
DECLARE p, pb, pe, sb, se, ct DECIMAL(16, 0) DEFAULT 0;
SELECT MBRWithin(pt, mp) INTO p;
IF p != 1 OR ISNULL(p) THEN
return p;
END IF;
SELECT X(pt), Y(pt), ASTEXT(mp) INTO x, y, str_big;
SET str_big = REPLACE(str_big, 'MULTIPOLYGON(((','');
SET str_big = REPLACE(str_big, ')))', '');
SET str_big = REPLACE(str_big, ')),((', '|');
SET str_big = CONCAT(str_big, '|');
SET sb = 1;
SET se = LOCATE('|', str_big);
SET str = SUBSTRING(str_big, sb, se - sb);
WHILE se > 0 DO
SET ct = ct + 1;
SET str = SUBSTRING(str_big, sb, se - sb);
SET pb = 1;
SET pe = LOCATE(',', str);
SET xy = SUBSTRING(str, pb, pe - pb);
SET p = INSTR(xy, ' ');
SET p1x = SUBSTRING(xy, 1, p - 1);
SET p1y = SUBSTRING(xy, p + 1);
SET str = CONCAT(str, xy, ',');
WHILE pe > 0 DO
SET xy = SUBSTRING(str, pb, pe - pb);
SET p = INSTR(xy, ' ');
SET p2x = SUBSTRING(xy, 1, p - 1);
SET p2y = SUBSTRING(xy, p + 1);
IF p1y < p2y THEN SET m = p1y; ELSE SET m = p2y; END IF;
IF y > m THEN
IF p1y > p2y THEN SET m = p1y; ELSE SET m = p2y; END IF;
IF y <= m THEN
IF p1x > p2x THEN SET m = p1x; ELSE SET m = p2x; END IF;
IF x <= m THEN
IF p1y != p2y THEN
SET xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + p1x;
END IF;
IF p1x = p2x OR x <= xinters THEN
SET counter = counter + 1;
END IF;
END IF;
END IF;
END IF;
SET p1x = p2x;
SET p1y = p2y;
SET pb = pe + 1;
SET pe = LOCATE(',', str, pb);
END WHILE;
SET sb = se + 1;
SET se = LOCATE('|', str_big, sb);
END WHILE;
RETURN counter % 2;
END
DELIMITER ;
We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.
Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.
While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.
For instance, sorting by a string record id gives something like:
rec1
rec10
rec14
rec2
rec20
rec3
rec4
...and so on.
I want this to take account of the number, so:
rec1
rec2
rec3
rec4
rec10
rec14
rec20
I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".
I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.
Does anyone know a way to quickly apply a natural sort in Sql server?
We're using:
ROW_NUMBER() over (order by {field name} asc)
And then we're paging by that.
We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.
We have valid user input that follows different formats for different clients.
One might go rec1, rec2, rec3, ... rec100, rec101
While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301
When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.
These formats vary a lot, but are often mixtures of letters and numbers.
Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.
However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.
SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.
#moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.
order by LEN(value), value
Not perfect, but works well in a lot of cases.
Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.
Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:
will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
properly sorts decimals, so 123.333 comes before 123.45
because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
supports sorting a string with an arbitrary number of numbers within it
will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)
The code is here:
using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
public static SqlString Naturalize(string val)
{
if (String.IsNullOrEmpty(val))
return val;
while(val.Contains(" "))
val = val.Replace(" ", " ");
const int maxLength = 1000;
const int padLength = 25;
bool inNumber = false;
bool isDecimal = false;
int numStart = 0;
int numLength = 0;
int length = val.Length < maxLength ? val.Length : maxLength;
//TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
var sb = new StringBuilder();
for (var i = 0; i < length; i++)
{
int charCode = (int)val[i];
if (charCode >= 48 && charCode <= 57)
{
if (!inNumber)
{
numStart = i;
numLength = 1;
inNumber = true;
continue;
}
numLength++;
continue;
}
if (inNumber)
{
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
inNumber = false;
}
isDecimal = (charCode == 46);
sb.Append(val[i]);
}
if (inNumber)
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
var ret = sb.ToString();
if (ret.Length > maxLength)
return ret.Substring(0, maxLength);
return ret;
}
static string PadNumber(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:
CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(#val as nvarchar(max)) RETURNS nvarchar(1000)
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go
Then, you can use it like so:
select *
from MyTable
order by dbo.Naturalize(MyTextField)
Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.
Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!
Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.
I know this is an old question but I just came across it and since it's not got an accepted answer.
I have always used ways similar to this:
SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)
The only common times that this has issues is if your column won't cast to a VARCHAR(MAX), or if LEN([Column]) > 1000 (but you can change that 1000 to something else if you want), but you can use this rough idea for what you need.
Also this is much worse performance than normal ORDER BY [Column], but it does give you the result asked for in the OP.
Edit: Just to further clarify, this the above will not work if you have decimal values such as having 1, 1.15 and 1.5, (they will sort as {1, 1.5, 1.15}) as that is not what is asked for in the OP, but that can easily be done by:
SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')
Result: {1, 1.15, 1.5}
And still all entirely within SQL. This will not sort IP addresses because you're now getting into very specific number combinations as opposed to simple text + number.
RedFilter's answer is great for reasonably sized datasets where indexing is not critical, however if you want an index, several tweaks are required.
First, mark the function as not doing any data access and being deterministic and precise:
[SqlFunction(DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true, IsPrecise = true)]
Next, MSSQL has a 900 byte limit on the index key size, so if the naturalized value is the only value in the index, it must be at most 450 characters long. If the index includes multiple columns, the return value must be even smaller. Two changes:
CREATE FUNCTION Naturalize(#str AS nvarchar(max)) RETURNS nvarchar(450)
EXTERNAL NAME ClrExtensions.Util.Naturalize
and in the C# code:
const int maxLength = 450;
Finally, you will need to add a computed column to your table, and it must be persisted (because MSSQL cannot prove that Naturalize is deterministic and precise), which means the naturalized value is actually stored in the table but is still maintained automatically:
ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED
You can now create the index!
CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)
I've also made a couple of changes to RedFilter's code: using chars for clarity, incorporating duplicate space removal into the main loop, exiting once the result is longer than the limit, setting maximum length without substring etc. Here's the result:
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public static class Util
{
[SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
public static SqlString Naturalize(string str)
{
if (string.IsNullOrEmpty(str))
return str;
const int maxLength = 450;
const int padLength = 15;
bool isDecimal = false;
bool wasSpace = false;
int numStart = 0;
int numLength = 0;
var sb = new StringBuilder();
for (var i = 0; i < str.Length; i++)
{
char c = str[i];
if (c >= '0' && c <= '9')
{
if (numLength == 0)
numStart = i;
numLength++;
}
else
{
if (numLength > 0)
{
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
numLength = 0;
}
if (c != ' ' || !wasSpace)
sb.Append(c);
isDecimal = c == '.';
if (sb.Length > maxLength)
break;
}
wasSpace = c == ' ';
}
if (numLength > 0)
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
if (sb.Length > maxLength)
sb.Length = maxLength;
return sb.ToString();
}
private static string pad(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
Here's a solution written for SQL 2000. It can probably be improved for newer SQL versions.
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* #author Alexandre Potvin Latreille (plalx)
* #param {nvarchar(4000)} string The formatted string.
* #param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* #param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* #return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
#string nvarchar(4000),
#numberLength int = 10,
#sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #sortString varchar(4000),
#numStartIndex int,
#numEndIndex int,
#padLength int,
#totalPadLength int,
#i int,
#sameOrderCharsLen int;
SELECT
#totalPadLength = 0,
#string = RTRIM(LTRIM(#string)),
#sortString = #string,
#numStartIndex = PATINDEX('%[0-9]%', #string),
#numEndIndex = 0,
#i = 1,
#sameOrderCharsLen = LEN(#sameOrderChars);
-- Replace all char that have the same order by a space.
WHILE (#i <= #sameOrderCharsLen)
BEGIN
SET #sortString = REPLACE(#sortString, SUBSTRING(#sameOrderChars, #i, 1), ' ');
SET #i = #i + 1;
END
-- Pad numbers with zeros.
WHILE (#numStartIndex <> 0)
BEGIN
SET #numStartIndex = #numStartIndex + #numEndIndex;
SET #numEndIndex = #numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(#string, #numEndIndex, 1)) = 1)
BEGIN
SET #numEndIndex = #numEndIndex + 1;
END
SET #numEndIndex = #numEndIndex - 1;
SET #padLength = #numberLength - (#numEndIndex + 1 - #numStartIndex);
IF #padLength < 0
BEGIN
SET #padLength = 0;
END
SET #sortString = STUFF(
#sortString,
#numStartIndex + #totalPadLength,
0,
REPLICATE('0', #padLength)
);
SET #totalPadLength = #totalPadLength + #padLength;
SET #numStartIndex = PATINDEX('%[0-9]%', RIGHT(#string, LEN(#string) - #numEndIndex));
END
RETURN #sortString;
END
I know this is a bit old at this point, but in my search for a better solution, I came across this question. I'm currently using a function to order by. It works fine for my purpose of sorting records which are named with mixed alpha numeric ('item 1', 'item 10', 'item 2', etc)
CREATE FUNCTION [dbo].[fnMixSort]
(
#ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS
BEGIN
DECLARE #p1 NVARCHAR(255),
#p2 NVARCHAR(255),
#p3 NVARCHAR(255),
#p4 NVARCHAR(255),
#Index TINYINT
IF #ColValue LIKE '[a-z]%'
SELECT #Index = PATINDEX('%[0-9]%', #ColValue),
#p1 = LEFT(CASE WHEN #Index = 0 THEN #ColValue ELSE LEFT(#ColValue, #Index - 1) END + REPLICATE(' ', 255), 255),
#ColValue = CASE WHEN #Index = 0 THEN '' ELSE SUBSTRING(#ColValue, #Index, 255) END
ELSE
SELECT #p1 = REPLICATE(' ', 255)
SELECT #Index = PATINDEX('%[^0-9]%', #ColValue)
IF #Index = 0
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255),
#ColValue = ''
ELSE
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
SELECT #Index = PATINDEX('%[0-9,a-z]%', #ColValue)
IF #Index = 0
SELECT #p3 = REPLICATE(' ', 255)
ELSE
SELECT #p3 = LEFT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
IF PATINDEX('%[^0-9]%', #ColValue) = 0
SELECT #p4 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255)
ELSE
SELECT #p4 = LEFT(#ColValue + REPLICATE(' ', 255), 255)
RETURN #p1 + #p2 + #p3 + #p4
END
Then call
select item_name from my_table order by fnMixSort(item_name)
It easily triples the processing time for a simple data read, so it may not be the perfect solution.
Here is an other solution that I like:
http://www.dreamchain.com/sql-and-alpha-numeric-sort-order/
It's not Microsoft SQL, but since I ended up here when I was searching for a solution for Postgres, I thought adding this here would help others.
EDIT: Here is the code, in case the link goes away.
CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $$
SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$$ LANGUAGE SQL;
"C" is the default collation in postgresql; you may specify any collation you desire, or remove the collation statement if you can be certain your table columns will never have a nondeterministic collation assigned.
usage:
SELECT * FROM wtf w
WHERE TRUE
ORDER BY pad_numbers(w.my_alphanumeric_field)
For the following varchar data:
BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR
This worked best for me:
ORDER BY substring(fieldName, 1, 1), LEN(fieldName)
If you're having trouble loading the data from the DB to sort in C#, then I'm sure you'll be disappointed with any approach at doing it programmatically in the DB. When the server is going to sort, it's got to calculate the "perceived" order just as you would have -- every time.
I'd suggest that you add an additional column to store the preprocessed sortable string, using some C# method, when the data is first inserted. You might try to convert the numerics into fixed-width ranges, for example, so "xyz1" would turn into "xyz00000001". Then you could use normal SQL Server sorting.
At the risk of tooting my own horn, I wrote a CodeProject article implementing the problem as posed in the CodingHorror article. Feel free to steal from my code.
Simply you sort by
ORDER BY
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)
##
I've just read a article somewhere about such a topic. The key point is: you only need the integer value to sort data, while the 'rec' string belongs to the UI. You could split the information in two fields, say alpha and num, sort by alpha and num (separately) and then showing a string composed by alpha + num. You could use a computed column to compose the string, or a view.
Hope it helps
You can use the following code to resolve the problem:
Select *,
substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv
FROM Documents
left outer join Sites ON Sites.IDSite = Documents.IDSite
Order BY alpha, intv
regards,
rabihkahaleh#hotmail.com
I'm fashionably late to the party as usual. Nevertheless, here is my attempt at an answer that seems to work well (I would say that). It assumes text with digits at the end, like in the original example data.
First a function that won't end up winning a "pretty SQL" competition anytime soon.
CREATE FUNCTION udfAlphaNumericSortHelper (
#string varchar(max)
)
RETURNS #results TABLE (
txt varchar(max),
num float
)
AS
BEGIN
DECLARE #txt varchar(max) = #string
DECLARE #numStr varchar(max) = ''
DECLARE #num float = 0
DECLARE #lastChar varchar(1) = ''
set #lastChar = RIGHT(#txt, 1)
WHILE #lastChar <> '' and #lastChar is not null
BEGIN
IF ISNUMERIC(#lastChar) = 1
BEGIN
set #numStr = #lastChar + #numStr
set #txt = Substring(#txt, 0, len(#txt))
set #lastChar = RIGHT(#txt, 1)
END
ELSE
BEGIN
set #lastChar = null
END
END
SET #num = CAST(#numStr as float)
INSERT INTO #results select #txt, #num
RETURN;
END
Then call it like below:
declare #str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'
SELECT tbl.value --, sorter.txt, sorter.num
FROM STRING_SPLIT(#str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)
With results:
fox
fox00
Jen0
jen1
jen02
jen3
jen0004
jen15
rec1
rec2
rec3
rec4
rec10
rec14
rec20
sox
TT01
TT0001
TT002
zip1
zip1.3
zip1.32
zip1.33
I still don't understand (probably because of my poor English).
You could try:
ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)
But it won't work for millions of records.
That why I suggested to use trigger which fills separate column with human value.
Moreover:
built-in T-SQL functions are really
slow and Microsoft suggest to use
.NET functions instead.
human value is constant so there is no point calculating it each time
when query runs.