BigQuery - Random numbers repeating when generated inside arrays - arrays

I made a BigQuery query which involves generating an array of random numbers for each row. I use the random numbers to decide which elements to include from an array that exists in my source table.
I had a lot of trouble getting the arrays of random numbers to not repeat themselves over every single row. I found a workaround, but is this expected behavior? I'll post two "methods" (one with desired results, one with bad results) below. Note that both methods work fine if you don't use an array, but just generate a single random number.
Method 1 (BAD Results):
SELECT
(
SELECT
ARRAY(
SELECT AS STRUCT
RAND() AS random
FROM UNNEST(GENERATE_ARRAY(0, 10, 1)) AS _time
) AS random_for_times
)
FROM UNNEST(GENERATE_ARRAY(0, 10, 1))
Method 2 (GOOD results):
SELECT
(
SELECT
ARRAY(
SELECT AS STRUCT
RAND() AS random
FROM UNNEST(GENERATE_ARRAY(0, 10, 1)) AS _time
) AS random_for_times
FROM (SELECT NULL FROM UNNEST([0]))
)
FROM UNNEST(GENERATE_ARRAY(0, 10, 1))
Example Results - Method 1 (BAD):
Row 1
0.5431173080158003
0.5585452983410205
...
Row 2
0.5431173080158003
0.5585452983410205
...
Example Results - Method 2 (GOOD):
Row 1
0.49639706531271377
0.1604380522058521
...
Row 2
0.7971869432989377
0.9815667330115473
...
EDIT: See below for some alternative examples that are similar, after Yun Zhang's theory about subqueries. Your solution was useful for the problem I posted, but note that there are still some cases I am finding baffling. Also, although I agree that you are probably correct about the subqueries being tied to the problem: shouldn't a subquery (especially one without a FROM clause) be less likely to have its results re-used than selecting a "normal" value? People talk about performance issues with subqueries sometimes, because they are supposedly calculated one time for each row, even if the results may be the same.
Do you agree that this seems like it may be a bug?
The below examples show that it is not necessarily even creating an array of randoms that is the problem -- even performing a sub-select that just happens to have an unrelated array in it can cause problems with RAND(). The problem goes away by eliminating the sub-select, by choosing just the random value from the sub-select, or by including a value inside the array which varies by row. Weird !!!
BAD
SELECT
(SELECT AS STRUCT RAND() AS r, ARRAY(SELECT 1) AS a)
FROM UNNEST(GENERATE_ARRAY(0, 5, 1)) AS u
FIX #1 - No subselect
SELECT
STRUCT(RAND() AS r, ARRAY(SELECT 1) AS a)
FROM UNNEST(GENERATE_ARRAY(0, 5, 1)) AS u
FIX #2 - Select only r
SELECT
(SELECT AS STRUCT RAND() AS r, ARRAY(SELECT 1) AS a).r
FROM UNNEST(GENERATE_ARRAY(0, 5, 1)) AS u
Fix #3 - Array contains "u"
SELECT
(SELECT AS STRUCT RAND() AS r, ARRAY(SELECT u) AS a).r
FROM UNNEST(GENERATE_ARRAY(0, 5, 1)) AS u

Haven't understood why first query didn't work but I have a simpler version that works for you:
SELECT (
SELECT array_agg(RAND()) AS random
FROM UNNEST(GENERATE_ARRAY(0, 10, 1)) AS _time
) AS random_for_times
FROM UNNEST(GENERATE_ARRAY(0, 10, 1))
Update: I later realized that the problem is with ARRAY(subquery), as long as you can avoid using it for your case (like in my query above), you should be fine.

Related

How does the outer WHERE clause affect the way nested query is executed?

Let's say I have a table lines
b | a
-----------
17 7000
17 0
18 6000
18 0
19 5000
19 2500
I want to get positive values of a function: (a1 - a2) \ (b2 - b1) for all elements in cartesian product of lines with different b's. (If you are interested this will result in intersections of lines y1 = b1*x + a1 and y2 = b2*x + a2)
I wrote query1 for that cause
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
It throws a "division by zero" error. I tried the same query without the WHERE clause (query2) and it works just fine
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
as well as the variation with the defined SQL function (query3)
CREATE FUNCTION get_point(#a1 DECIMAL(18, 4), #a2 DECIMAL(18, 4), #b1 INT, #b2 INT)
RETURNS DECIMAL(18, 4)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN (SELECT (#a1 - #a2) / (#b2 - #b1))
END
GO
SELECT temp.point FROM
(SELECT DISTINCT dbo.get_point(l1.a, l2.a, l1.b, l2.b) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
I have an intuitive assumption that the outer SELECT shouldn't affect the way nested SELECT is executed (or at least shouldn't break it). Even if it is not true that wouldn't explain why query3 works when query1 doesn't.
Could someone explain the principle behind this? That would be much appreciated.
If you want to guarantee that the query will always work, you'd need to wrap your calculation in something like a case statement
case when l2.b - l1.b = 0
then null
else (l1.a - l2.a) / (l2.b - l1.b)
end
Technically, the optimizer is perfectly free to evaluate conditions in whatever order it expects will be more efficient. The optimizer is free to evaluate the division before the where clause that filters out rows where the divisor would be 0. It is also free to evaluate the where clause first. Your different queries have different query plans which result in different behavior.
Realistically, though, even though a particular query might have a "good" query plan today, there is no guarantee that the optimizer won't decide in a day, a month, or a year to change the query plan to something that would throw a division by 0 error. I suppose you could decide to use a bunch of hints/ plan guides to force a particular plan with a particular behavior to be used. But that tends to be the sort of thing that bites you in the hind quarters later. Wrapping the calculation in a case (or otherwise preventing the division by 0 error) will be much safer and easier to explain to the next developer.

SQL case in select query

I have following query
Select
a,b,c,
case totalcount
when 0 then 0
else abcd/totalcount
end AS 'd',
case totalcount
when 0 then 0
else defg/totalcount
end AS 'e'
From
Table1
In this query, I have same case statement in select query...Can i make it into single select statement.
"totalcount" is some value... abcd and defg are two column values from table1. If totalcount is zero, i dont want to divide, else I wish to divide to get an average value of abcd and defg.
No, since you seem to be requiring two output columns in your desired resultset... Each Case statement can generate at most one output value.
It doesn't matter how many columns the case statement needs to access to do it's job, or how many different values it has to select from (how many When... Then... expressions), it just matters how many columns you are trying to generate in the final resultset.
At first view, I would say no, since you seem to use the exact same condition leading to different results.
Besides, this looks odd to me, why would you need two SELECT CASE for only one condition? This makes no sense.
Could you be more specific or give a real-world example of what you're trying to ask, with "real" data so that we might better answer your question?
EDIT #1
Given that:
Yes...it's two different fields....and the valud i need is calculated one...which is Value = a/b. But i need to check if b is zero or not
I would still answer no, as if they are two different fields, and you want both results in your result set, then you will need to write two CASE statements, one for each of the fields you need to verify whether it is zero-valued or not. Keep in mind that one CASE statement is equivalent to one single column only. So, if you need to check for a zero value, you are required to check for both independently.
By the way, sorry for my misunderstanding, that must be a language barrier, unfortunately. =) But my requirement for "real" data example holds, since this might light us all up with some other related solutions, we never know! =)
EDIT #2
Considering you have to check for a value of 0, I would perhaps rewrite my code as follows, for the sake of readability:
select a, b, c
, case when condition1 <> 0 then abcd / condition1 else 0 end as Result1
, case when condition2 <> 0 then defg / condition2 else 0 end as Result2
from Table1
In my opinion, it is leaner and swifter in the code, and it is easier to understand what is the intention. But after all, the result would be the same! No offense made here! =)
EDIT #3
After having read your question edit:
"totalcount" is some value... abcd and defg are two column values from table1. If totalcount is zero, i dont want to divide, else I wish to divide to get an average value of abcd and defg.
I say, if it is the average that you're after, why not just use the AVG function which will deal with it internaly, without having to care about zero values?
select a, b, c
, AVG(abcd) as Result1
, AVG(defg) as Result2
from Table1
Plus, considering having no record, the average can only be 0!
I don't know about your database engine, but if it is Oracle or SQL Server, and I think DB2 as well, they support this function.
You need to elaborate more - what is the condition? With what you have entered so far, it looks like you are just using the result of the expression, making the CASE statement redundant.
For example, see my inline comments:
SELECT a,b,c,
case condition
when 0 then 0 --this is unnecessary
else abcd/condition, --this is just using the result of the condition?
case condition
when 0 then 0 --this is unnecessary
else defg/condition --this is just using the result of the condition?
from Table1
so you could just refactor this as:
SELECT a
,b
,c
,expression
,expression
from Table1
if you actually meant the else abcd/condition as evaluate another condition or use a default value then you need to expand on that so we can answer accurately.
EDIT: if you are looking to avoid a divide by zero and you only want to evaluate condition once, then you need to do it outside of the SELECT and use the variable inside the SELECT. I wouldn't be concerned about the performance impact of evaluating it multiple times, if the condition doesn't change then its value should be cached by the query execution engine. Although it could look ugly and unmaintainable, in which case you could possibly factor condition into a function.
If all you're trying to do is avoid dividing by zero, and if it's not critical that an attempt to divide by zero actually produces zero, then you can do the following:
Select
a,b,c,
abcd / NULLIF(totalcount,0) AS 'd',
defg / NULLIF(totalcount,0) AS 'e'
From
Table1
This transforms an attempt to divide by zero into an attempt to divide by NULL, which always produces NULL.
If you need zero and you know that abcd and defg will never be NULL, then you can wrap the whole thing in an ISNULL(...,0) to force it to be zero. If abcd or defg might legitimately be null and you need the zero output if totalcount is zero, then I don't know any other option than using a CASE statement, sorry.
Moreover, since it appears totalcount is an expression, for maintainability you could compute it in a subquery, such as:
Select
a,b,c,
abcd / NULLIF(totalcount,0) AS 'd',
defg / NULLIF(totalcount,0) AS 'e'
From
(
Select
a,b,c,
abcd,
defg,
(... expression goes here ...) AS totalcount
From
Table1
) Data
But if it's a simple expression this may be overkill.
Technically this is possible. I'm not suggesting it's a good idea though!
;with Table1 as
(
select 1 as a, 2 as b, 3 as c, 0 as condition, 1234 as abcd, 4567 AS defg UNION ALL
select 1 as a, 2 as b, 3 as c, 1 as condition, 1234.0 as abcd, 4567 AS defg
)
,
cte AS
(
Select
a,b,c,
case condition
when 0 then CAST(CAST(0 as decimal(18, 6)) as binary(9)) + CAST(CAST(0 as decimal(18, 6)) as binary(9))
else CAST(CAST(abcd/condition as decimal(18, 6)) as binary(9)) + CAST(CAST(defg/condition as decimal(18, 6)) as binary(9))
end AS d
From
Table1
)
Select
a,
b,
c,
cast(substring(d,1,9) as decimal(18, 6)) as d,
cast(substring(d,10,9) as decimal(18, 6)) as e
From cte

"if, then, else" in SQLite

Without using custom functions, is it possible in SQLite to do the following. I have two tables, which are linked via common id numbers. In the second table, there are two variables. What I would like to do is be able to return a list of results, consisting of: the row id, and NULL if all instances of those two variables (and there may be more than two) are NULL, 1 if they are all 0 and 2 if one or more is 1.
What I have right now is as follows:
SELECT
a.aid,
(SELECT count(*) from W3S19 b WHERE a.aid=b.aid) as num,
(SELECT count(*) FROM W3S19 c WHERE a.aid=c.aid AND H110 IS NULL AND H112 IS NULL) as num_null,
(SELECT count(*) FROM W3S19 d WHERE a.aid=d.aid AND (H110=1 or H112=1)) AS num_yes
FROM W3 a
So what this requires is to step through each result as follows (rough Python pseudocode):
if row['num_yes'] > 0:
out[aid] = 2
elif row['num_null'] == row['num']:
out[aid] = 'NULL'
else:
out[aid] = 1
Is there an easier way? Thanks!
Use CASE...WHEN, e.g.
CASE x WHEN w1 THEN r1 WHEN w2 THEN r2 ELSE r3 END
Read more from SQLite syntax manual (go to section "The CASE expression").
There's another way, for numeric values, which might be easier for certain specific cases.
It's based on the fact that boolean values is 1 or 0, "if condition" gives a boolean result:
(this will work only for "or" condition, depends on the usage)
SELECT (w1=TRUE)*r1 + (w2=TRUE)*r2 + ...
of course #evan's answer is the general-purpose, correct answer

How do I generate a random number for each row in a T-SQL select?

I need a different random number for each row in my table. The following seemingly obvious code uses the same random value for each row.
SELECT table_name, RAND() magic_number
FROM information_schema.tables
I'd like to get an INT or a FLOAT out of this. The rest of the story is I'm going to use this random number to create a random date offset from a known date, e.g. 1-14 days offset from a start date.
This is for Microsoft SQL Server 2000.
Take a look at SQL Server - Set based random numbers which has a very detailed explanation.
To summarize, the following code generates a random number between 0 and 13 inclusive with a uniform distribution:
ABS(CHECKSUM(NewId())) % 14
To change your range, just change the number at the end of the expression. Be extra careful if you need a range that includes both positive and negative numbers. If you do it wrong, it's possible to double-count the number 0.
A small warning for the math nuts in the room: there is a very slight bias in this code. CHECKSUM() results in numbers that are uniform across the entire range of the sql Int datatype, or at least as near so as my (the editor) testing can show. However, there will be some bias when CHECKSUM() produces a number at the very top end of that range. Any time you get a number between the maximum possible integer and the last exact multiple of the size of your desired range (14 in this case) before that maximum integer, those results are favored over the remaining portion of your range that cannot be produced from that last multiple of 14.
As an example, imagine the entire range of the Int type is only 19. 19 is the largest possible integer you can hold. When CHECKSUM() results in 14-19, these correspond to results 0-5. Those numbers would be heavily favored over 6-13, because CHECKSUM() is twice as likely to generate them. It's easier to demonstrate this visually. Below is the entire possible set of results for our imaginary integer range:
Checksum Integer: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Range Result: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5
You can see here that there are more chances to produce some numbers than others: bias. Thankfully, the actual range of the Int type is much larger... so much so that in most cases the bias is nearly undetectable. However, it is something to be aware of if you ever find yourself doing this for serious security code.
When called multiple times in a single batch, rand() returns the same number.
I'd suggest using convert(varbinary,newid()) as the seed argument:
SELECT table_name, 1.0 + floor(14 * RAND(convert(varbinary, newid()))) magic_number
FROM information_schema.tables
newid() is guaranteed to return a different value each time it's called, even within the same batch, so using it as a seed will prompt rand() to give a different value each time.
Edited to get a random whole number from 1 to 14.
RAND(CHECKSUM(NEWID()))
The above will generate a (pseudo-) random number between 0 and 1, exclusive. If used in a select, because the seed value changes for each row, it will generate a new random number for each row (it is not guaranteed to generate a unique number per row however).
Example when combined with an upper limit of 10 (produces numbers 1 - 10):
CAST(RAND(CHECKSUM(NEWID())) * 10 as INT) + 1
Transact-SQL Documentation:
CAST(): https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
RAND(): http://msdn.microsoft.com/en-us/library/ms177610.aspx
CHECKSUM(): http://msdn.microsoft.com/en-us/library/ms189788.aspx
NEWID(): https://learn.microsoft.com/en-us/sql/t-sql/functions/newid-transact-sql
Random number generation between 1000 and 9999 inclusive:
FLOOR(RAND(CHECKSUM(NEWID()))*(9999-1000+1)+1000)
"+1" - to include upper bound values(9999 for previous example)
Answering the old question, but this answer has not been provided previously, and hopefully this will be useful for someone finding this results through a search engine.
With SQL Server 2008, a new function has been introduced, CRYPT_GEN_RANDOM(8), which uses CryptoAPI to produce a cryptographically strong random number, returned as VARBINARY(8000). Here's the documentation page: https://learn.microsoft.com/en-us/sql/t-sql/functions/crypt-gen-random-transact-sql
So to get a random number, you can simply call the function and cast it to the necessary type:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint)
or to get a float between -1 and +1, you could do something like this:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint) % 1000000000 / 1000000000.0
The Rand() function will generate the same random number, if used in a table SELECT query. Same applies if you use a seed to the Rand function. An alternative way to do it, is using this:
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [RandomNumber]
Got the information from here, which explains the problem very well.
Do you have an integer value in each row that you could pass as a seed to the RAND function?
To get an integer between 1 and 14 I believe this would work:
FLOOR( RAND(<yourseed>) * 14) + 1
If you need to preserve your seed so that it generates the "same" random data every time, you can do the following:
1. Create a view that returns select rand()
if object_id('cr_sample_randView') is not null
begin
drop view cr_sample_randView
end
go
create view cr_sample_randView
as
select rand() as random_number
go
2. Create a UDF that selects the value from the view.
if object_id('cr_sample_fnPerRowRand') is not null
begin
drop function cr_sample_fnPerRowRand
end
go
create function cr_sample_fnPerRowRand()
returns float
as
begin
declare #returnValue float
select #returnValue = random_number from cr_sample_randView
return #returnValue
end
go
3. Before selecting your data, seed the rand() function, and then use the UDF in your select statement.
select rand(200); -- see the rand() function
with cte(id) as
(select row_number() over(order by object_id) from sys.all_objects)
select
id,
dbo.cr_sample_fnPerRowRand()
from cte
where id <= 1000 -- limit the results to 1000 random numbers
select round(rand(checksum(newid()))*(10)+20,2)
Here the random number will come in between 20 and 30.
round will give two decimal place maximum.
If you want negative numbers you can do it with
select round(rand(checksum(newid()))*(10)-60,2)
Then the min value will be -60 and max will be -50.
try using a seed value in the RAND(seedInt). RAND() will only execute once per statement that is why you see the same number each time.
If you don't need it to be an integer, but any random unique identifier, you can use newid()
SELECT table_name, newid() magic_number
FROM information_schema.tables
You would need to call RAND() for each row. Here is a good example
https://web.archive.org/web/20090216200320/http://dotnet.org.za/calmyourself/archive/2007/04/13/sql-rand-trap-same-value-per-row.aspx
The problem I sometimes have with the selected "Answer" is that the distribution isn't always even. If you need a very even distribution of random 1 - 14 among lots of rows, you can do something like this (my database has 511 tables, so this works. If you have less rows than you do random number span, this does not work well):
SELECT table_name, ntile(14) over(order by newId()) randomNumber
FROM information_schema.tables
This kind of does the opposite of normal random solutions in the sense that it keeps the numbers sequenced and randomizes the other column.
Remember, I have 511 tables in my database (which is pertinent only b/c we're selecting from the information_schema). If I take the previous query and put it into a temp table #X, and then run this query on the resulting data:
select randomNumber, count(*) ct from #X
group by randomNumber
I get this result, showing me that my random number is VERY evenly distributed among the many rows:
It's as easy as:
DECLARE #rv FLOAT;
SELECT #rv = rand();
And this will put a random number between 0-99 into a table:
CREATE TABLE R
(
Number int
)
DECLARE #rv FLOAT;
SELECT #rv = rand();
INSERT INTO dbo.R
(Number)
values((#rv * 100));
SELECT * FROM R
select ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) as [Randomizer]
has always worked for me
Use newid()
select newid()
or possibly this
select binary_checksum(newid())
If you want to generate a random number between 1 and 14 inclusive.
SELECT CONVERT(int, RAND() * (14 - 1) + 1)
OR
SELECT ABS(CHECKSUM(NewId())) % (14 -1) + 1
DROP VIEW IF EXISTS vwGetNewNumber;
GO
Create View vwGetNewNumber
as
Select CAST(RAND(CHECKSUM(NEWID())) * 62 as INT) + 1 as NextID,
'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'as alpha_num;
---------------CTDE_GENERATE_PUBLIC_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_PUBLIC_KEY;
GO
create function CTDE_GENERATE_PUBLIC_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #private_key NVARCHAR(32);
set #private_key = dbo.CTDE_GENERATE_32_BIT_KEY();
return #private_key;
END;
go
---------------CTDE_GENERATE_32_BIT_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_32_BIT_KEY;
GO
CREATE function CTDE_GENERATE_32_BIT_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #public_key NVARCHAR(32);
DECLARE #alpha_num NVARCHAR(62);
DECLARE #start_index INT = 0;
DECLARE #i INT = 0;
select top 1 #alpha_num = alpha_num from vwGetNewNumber;
WHILE #i < 32
BEGIN
select top 1 #start_index = NextID from vwGetNewNumber;
set #public_key = concat (substring(#alpha_num,#start_index,1),#public_key);
set #i = #i + 1;
END;
return #public_key;
END;
select dbo.CTDE_GENERATE_PUBLIC_KEY() public_key;
Update my_table set my_field = CEILING((RAND(CAST(NEWID() AS varbinary)) * 10))
Number between 1 and 10.
Try this:
SELECT RAND(convert(varbinary, newid()))*(b-a)+a magic_number
Where a is the lower number and b is the upper number
If you need a specific number of random number you can use recursive CTE:
;WITH A AS (
SELECT 1 X, RAND() R
UNION ALL
SELECT X + 1, RAND(R*100000) --Change the seed
FROM A
WHERE X < 1000 --How many random numbers you need
)
SELECT
X
, RAND_BETWEEN_1_AND_14 = FLOOR(R * 14 + 1)
FROM A
OPTION (MAXRECURSION 0) --If you need more than 100 numbers

SQL Server: sort a column numerically if possible, otherwise alpha

I am working with a table that comes from an external source, and cannot be "cleaned". There is a column which an nvarchar(20) and contains an integer about 95% of the time, but occasionally contains an alpha. I want to use something like
select * from sch.tbl order by cast(shouldBeANumber as integer)
but this throws an error on the odd "3A" or "D" or "SUPERCEDED" value.
Is there a way to say "sort it like a number if you can, otherwise just sort by string"? I know there is some sloppiness in that statement, but that is basically what I want.
Lets say for example the values were
7,1,5A,SUPERCEDED,2,5,SECTION
I would be happy if these were sorted in any of the following ways (because I really only need to work with the numeric ones)
1,2,5,7,5A,SECTION,SUPERCEDED
1,2,5,5A,7,SECTION,SUPERCEDED
SECTION,SUPERCEDED,1,2,5,5A,7
5A,SECTION,SUPERCEDED,1,2,5,7
I really only need to work with the
numeric ones
this will give you only the numeric ones, sorted properly:
SELECT
*
FROM YourTable
WHERE ISNUMERIC(YourColumn)=1
ORDER BY YourColumn
select
*
from
sch.tbl
order by
case isnumeric(shouldBeANumber)
when 1 then cast(shouldBeANumber as integer)
else 0
end
Provided that your numbers are not more than 100 characters long:
WITH chars AS
(
SELECT 1 AS c
UNION ALL
SELECT c + 1
FROM chars
WHERE c <= 99
),
rows AS
(
SELECT '1,2,5,7,5A,SECTION,SUPERCEDED' AS mynum
UNION ALL
SELECT '1,2,5,5A,7,SECTION,SUPERCEDED'
UNION ALL
SELECT 'SECTION,SUPERCEDED,1,2,5,5A,7'
UNION ALL
SELECT '5A,SECTION,SUPERCEDED,1,2,5,7'
)
SELECT rows.*
FROM rows
ORDER BY
(
SELECT SUBSTRING(mynum, c, 1) AS [text()]
FROM chars
WHERE SUBSTRING(mynum, c, 1) BETWEEN '0' AND '9'
FOR XML PATH('')
) DESC
SELECT
(CASE ISNUMERIC(shouldBeANumber)
WHEN 1 THEN
RIGHT(CONCAT('00000000',shouldBeANumber), 8)
ELSE
shouoldBeANumber) AS stringSortSafeAlpha
ORDEER BY
stringSortSafeAlpha
This will add leading zeros to all shouldBeANumber values that truly are numbers and leave all remaining values alone. This way, when you sort, you can use an alpha sort but still get the correct values (with an alpha sort, "100" would be less than "50", but if you change "50" to "050", it works fine). Note, for this example, I added 8 leading zeros, but you only need enough leading zeros to cover the largest possible integer in your column.

Resources