Natural (human alpha-numeric) sort in Microsoft SQL 2005 - sql-server

We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.
Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.
While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.
For instance, sorting by a string record id gives something like:
rec1
rec10
rec14
rec2
rec20
rec3
rec4
...and so on.
I want this to take account of the number, so:
rec1
rec2
rec3
rec4
rec10
rec14
rec20
I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".
I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.
Does anyone know a way to quickly apply a natural sort in Sql server?
We're using:
ROW_NUMBER() over (order by {field name} asc)
And then we're paging by that.
We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.
We have valid user input that follows different formats for different clients.
One might go rec1, rec2, rec3, ... rec100, rec101
While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301
When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.
These formats vary a lot, but are often mixtures of letters and numbers.
Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.
However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.
SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.
#moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.

order by LEN(value), value
Not perfect, but works well in a lot of cases.

Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.
Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:
will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
properly sorts decimals, so 123.333 comes before 123.45
because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
supports sorting a string with an arbitrary number of numbers within it
will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)
The code is here:
using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
public static SqlString Naturalize(string val)
{
if (String.IsNullOrEmpty(val))
return val;
while(val.Contains(" "))
val = val.Replace(" ", " ");
const int maxLength = 1000;
const int padLength = 25;
bool inNumber = false;
bool isDecimal = false;
int numStart = 0;
int numLength = 0;
int length = val.Length < maxLength ? val.Length : maxLength;
//TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
var sb = new StringBuilder();
for (var i = 0; i < length; i++)
{
int charCode = (int)val[i];
if (charCode >= 48 && charCode <= 57)
{
if (!inNumber)
{
numStart = i;
numLength = 1;
inNumber = true;
continue;
}
numLength++;
continue;
}
if (inNumber)
{
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
inNumber = false;
}
isDecimal = (charCode == 46);
sb.Append(val[i]);
}
if (inNumber)
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
var ret = sb.ToString();
if (ret.Length > maxLength)
return ret.Substring(0, maxLength);
return ret;
}
static string PadNumber(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:
CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(#val as nvarchar(max)) RETURNS nvarchar(1000)
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go
Then, you can use it like so:
select *
from MyTable
order by dbo.Naturalize(MyTextField)
Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.
Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!
Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.

I know this is an old question but I just came across it and since it's not got an accepted answer.
I have always used ways similar to this:
SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)
The only common times that this has issues is if your column won't cast to a VARCHAR(MAX), or if LEN([Column]) > 1000 (but you can change that 1000 to something else if you want), but you can use this rough idea for what you need.
Also this is much worse performance than normal ORDER BY [Column], but it does give you the result asked for in the OP.
Edit: Just to further clarify, this the above will not work if you have decimal values such as having 1, 1.15 and 1.5, (they will sort as {1, 1.5, 1.15}) as that is not what is asked for in the OP, but that can easily be done by:
SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')
Result: {1, 1.15, 1.5}
And still all entirely within SQL. This will not sort IP addresses because you're now getting into very specific number combinations as opposed to simple text + number.

RedFilter's answer is great for reasonably sized datasets where indexing is not critical, however if you want an index, several tweaks are required.
First, mark the function as not doing any data access and being deterministic and precise:
[SqlFunction(DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true, IsPrecise = true)]
Next, MSSQL has a 900 byte limit on the index key size, so if the naturalized value is the only value in the index, it must be at most 450 characters long. If the index includes multiple columns, the return value must be even smaller. Two changes:
CREATE FUNCTION Naturalize(#str AS nvarchar(max)) RETURNS nvarchar(450)
EXTERNAL NAME ClrExtensions.Util.Naturalize
and in the C# code:
const int maxLength = 450;
Finally, you will need to add a computed column to your table, and it must be persisted (because MSSQL cannot prove that Naturalize is deterministic and precise), which means the naturalized value is actually stored in the table but is still maintained automatically:
ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED
You can now create the index!
CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)
I've also made a couple of changes to RedFilter's code: using chars for clarity, incorporating duplicate space removal into the main loop, exiting once the result is longer than the limit, setting maximum length without substring etc. Here's the result:
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public static class Util
{
[SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
public static SqlString Naturalize(string str)
{
if (string.IsNullOrEmpty(str))
return str;
const int maxLength = 450;
const int padLength = 15;
bool isDecimal = false;
bool wasSpace = false;
int numStart = 0;
int numLength = 0;
var sb = new StringBuilder();
for (var i = 0; i < str.Length; i++)
{
char c = str[i];
if (c >= '0' && c <= '9')
{
if (numLength == 0)
numStart = i;
numLength++;
}
else
{
if (numLength > 0)
{
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
numLength = 0;
}
if (c != ' ' || !wasSpace)
sb.Append(c);
isDecimal = c == '.';
if (sb.Length > maxLength)
break;
}
wasSpace = c == ' ';
}
if (numLength > 0)
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
if (sb.Length > maxLength)
sb.Length = maxLength;
return sb.ToString();
}
private static string pad(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}

Here's a solution written for SQL 2000. It can probably be improved for newer SQL versions.
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* #author Alexandre Potvin Latreille (plalx)
* #param {nvarchar(4000)} string The formatted string.
* #param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* #param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* #return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
#string nvarchar(4000),
#numberLength int = 10,
#sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #sortString varchar(4000),
#numStartIndex int,
#numEndIndex int,
#padLength int,
#totalPadLength int,
#i int,
#sameOrderCharsLen int;
SELECT
#totalPadLength = 0,
#string = RTRIM(LTRIM(#string)),
#sortString = #string,
#numStartIndex = PATINDEX('%[0-9]%', #string),
#numEndIndex = 0,
#i = 1,
#sameOrderCharsLen = LEN(#sameOrderChars);
-- Replace all char that have the same order by a space.
WHILE (#i <= #sameOrderCharsLen)
BEGIN
SET #sortString = REPLACE(#sortString, SUBSTRING(#sameOrderChars, #i, 1), ' ');
SET #i = #i + 1;
END
-- Pad numbers with zeros.
WHILE (#numStartIndex <> 0)
BEGIN
SET #numStartIndex = #numStartIndex + #numEndIndex;
SET #numEndIndex = #numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(#string, #numEndIndex, 1)) = 1)
BEGIN
SET #numEndIndex = #numEndIndex + 1;
END
SET #numEndIndex = #numEndIndex - 1;
SET #padLength = #numberLength - (#numEndIndex + 1 - #numStartIndex);
IF #padLength < 0
BEGIN
SET #padLength = 0;
END
SET #sortString = STUFF(
#sortString,
#numStartIndex + #totalPadLength,
0,
REPLICATE('0', #padLength)
);
SET #totalPadLength = #totalPadLength + #padLength;
SET #numStartIndex = PATINDEX('%[0-9]%', RIGHT(#string, LEN(#string) - #numEndIndex));
END
RETURN #sortString;
END

I know this is a bit old at this point, but in my search for a better solution, I came across this question. I'm currently using a function to order by. It works fine for my purpose of sorting records which are named with mixed alpha numeric ('item 1', 'item 10', 'item 2', etc)
CREATE FUNCTION [dbo].[fnMixSort]
(
#ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS
BEGIN
DECLARE #p1 NVARCHAR(255),
#p2 NVARCHAR(255),
#p3 NVARCHAR(255),
#p4 NVARCHAR(255),
#Index TINYINT
IF #ColValue LIKE '[a-z]%'
SELECT #Index = PATINDEX('%[0-9]%', #ColValue),
#p1 = LEFT(CASE WHEN #Index = 0 THEN #ColValue ELSE LEFT(#ColValue, #Index - 1) END + REPLICATE(' ', 255), 255),
#ColValue = CASE WHEN #Index = 0 THEN '' ELSE SUBSTRING(#ColValue, #Index, 255) END
ELSE
SELECT #p1 = REPLICATE(' ', 255)
SELECT #Index = PATINDEX('%[^0-9]%', #ColValue)
IF #Index = 0
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255),
#ColValue = ''
ELSE
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
SELECT #Index = PATINDEX('%[0-9,a-z]%', #ColValue)
IF #Index = 0
SELECT #p3 = REPLICATE(' ', 255)
ELSE
SELECT #p3 = LEFT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
IF PATINDEX('%[^0-9]%', #ColValue) = 0
SELECT #p4 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255)
ELSE
SELECT #p4 = LEFT(#ColValue + REPLICATE(' ', 255), 255)
RETURN #p1 + #p2 + #p3 + #p4
END
Then call
select item_name from my_table order by fnMixSort(item_name)
It easily triples the processing time for a simple data read, so it may not be the perfect solution.

Here is an other solution that I like:
http://www.dreamchain.com/sql-and-alpha-numeric-sort-order/
It's not Microsoft SQL, but since I ended up here when I was searching for a solution for Postgres, I thought adding this here would help others.
EDIT: Here is the code, in case the link goes away.
CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $$
SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$$ LANGUAGE SQL;
"C" is the default collation in postgresql; you may specify any collation you desire, or remove the collation statement if you can be certain your table columns will never have a nondeterministic collation assigned.
usage:
SELECT * FROM wtf w
WHERE TRUE
ORDER BY pad_numbers(w.my_alphanumeric_field)

For the following varchar data:
BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR
This worked best for me:
ORDER BY substring(fieldName, 1, 1), LEN(fieldName)

If you're having trouble loading the data from the DB to sort in C#, then I'm sure you'll be disappointed with any approach at doing it programmatically in the DB. When the server is going to sort, it's got to calculate the "perceived" order just as you would have -- every time.
I'd suggest that you add an additional column to store the preprocessed sortable string, using some C# method, when the data is first inserted. You might try to convert the numerics into fixed-width ranges, for example, so "xyz1" would turn into "xyz00000001". Then you could use normal SQL Server sorting.
At the risk of tooting my own horn, I wrote a CodeProject article implementing the problem as posed in the CodingHorror article. Feel free to steal from my code.

Simply you sort by
ORDER BY
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)
##

I've just read a article somewhere about such a topic. The key point is: you only need the integer value to sort data, while the 'rec' string belongs to the UI. You could split the information in two fields, say alpha and num, sort by alpha and num (separately) and then showing a string composed by alpha + num. You could use a computed column to compose the string, or a view.
Hope it helps

You can use the following code to resolve the problem:
Select *,
substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv
FROM Documents
left outer join Sites ON Sites.IDSite = Documents.IDSite
Order BY alpha, intv
regards,
rabihkahaleh#hotmail.com

I'm fashionably late to the party as usual. Nevertheless, here is my attempt at an answer that seems to work well (I would say that). It assumes text with digits at the end, like in the original example data.
First a function that won't end up winning a "pretty SQL" competition anytime soon.
CREATE FUNCTION udfAlphaNumericSortHelper (
#string varchar(max)
)
RETURNS #results TABLE (
txt varchar(max),
num float
)
AS
BEGIN
DECLARE #txt varchar(max) = #string
DECLARE #numStr varchar(max) = ''
DECLARE #num float = 0
DECLARE #lastChar varchar(1) = ''
set #lastChar = RIGHT(#txt, 1)
WHILE #lastChar <> '' and #lastChar is not null
BEGIN
IF ISNUMERIC(#lastChar) = 1
BEGIN
set #numStr = #lastChar + #numStr
set #txt = Substring(#txt, 0, len(#txt))
set #lastChar = RIGHT(#txt, 1)
END
ELSE
BEGIN
set #lastChar = null
END
END
SET #num = CAST(#numStr as float)
INSERT INTO #results select #txt, #num
RETURN;
END
Then call it like below:
declare #str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'
SELECT tbl.value --, sorter.txt, sorter.num
FROM STRING_SPLIT(#str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)
With results:
fox
fox00
Jen0
jen1
jen02
jen3
jen0004
jen15
rec1
rec2
rec3
rec4
rec10
rec14
rec20
sox
TT01
TT0001
TT002
zip1
zip1.3
zip1.32
zip1.33

I still don't understand (probably because of my poor English).
You could try:
ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)
But it won't work for millions of records.
That why I suggested to use trigger which fills separate column with human value.
Moreover:
built-in T-SQL functions are really
slow and Microsoft suggest to use
.NET functions instead.
human value is constant so there is no point calculating it each time
when query runs.

Related

SQL: how to do an IF check for data range parameters

I am writing a stored procedure that takes in 4 parameters: confirmation_number, payment_amount, start_range, end_range.
The parameters are optional, so I am doing a check in this fashion for the confirmation_number, and the payment_amount parameters:
IF (#s_Confirmation_Number IS NOT NULL)
SET #SQL = #SQL + ' AND pd.TransactionNumber = #s_Confirmation_Number'
IF (#d_Payment_Amount IS NOT NULL)
SET #SQL = #SQL + ' AND pd.PaymentAmount = #d_Payment_Amount'
I would like to ask for help because I am not sure what is the best method to check for the date range parameters.
If someone could give me en example, or several on how this is best achieved it would be great.
Thank you in advance.
UPDATE - after receiving some great help -.
This is what I have so far, I am following scsimon recommendation, but I am not sure about the dates, I got the idea from another post I found and some playing around with it. Would you care looking at it and tell me what you all think?
Many thanks.
#s_Confirmation_Number NVARCHAR(50) = NULL
, #d_Payment_Amount DECIMAL(18, 2) = NULL
, #d_Start_Range DATE = NULL
, #d_End_Range DATE = NULL
...
....
WHERE
ph.SourceType = #s_Source_Type
AND ((pd.TransConfirmID = #s_Confirmation_Number) OR #s_Confirmation_Number IS NULL)
AND ((pd.PaymentAmount = #d_Payment_Amount) OR #d_Payment_Amount IS NULL)
AND (((NULLIF(#d_Start_Range, '') IS NULL) OR CAST(pd.CreatedDate AS DATE) >= #d_Start_Range)
AND ((NULLIF(#d_End_Range, '') IS NULL) OR CAST(pd.CreatedDate AS DATE) <= #d_End_Range))
(The parameter sourceType is a hard-coded value)
This is called a catch all or kitchen sink query. It is usually written as such:
create procedure myProc
(#Payment_Amount int = null
,#Confirmation_Number = null
,#start_range datetime
,#end_range datetime)
as
select ...
from ...
where
(pd.TransactionNumber = #Confirmation_Number or #Confirmation_Number is null)
and (pd.PaymentAmount = #Payment_Amount or #Payment_Amount is null)
The NULL on the two parameters gives them a default of NULL and makes them "optional". The WHERE clause evaluates this to only return rows where your user input matches the column value, or all rows when no user input was supplied (i.e. parameter IS NULL). You can use this with the date parameters as well. Just pay close attention to your parentheses. They matter a lot here because we are mixing and and or logic.
Aaron Bertrand has blogged extensively on this.
I do it like this
WHERE
COALESCE(#s_Confirmation_Number,pd.TransactionNumber) = pd.TransactionNumber AND
COALESCE(#d_Payment_Amount,pd.PaymentAmount) = pd.PaymentAmount
If we have a value for each of these parameters then it will check against the filter value otherwise it will always match the filter value if the parameter is null.
I've found that using COALESCE is faster and clearer than IF control statements or using OR in the WHERE clause.
There is another way.
But I tested and realized that a scsimon query is faster than mine.
AND (CASE
WHEN #Confirmation_Number is not null
THEN (CASE
WHEN pd.TransactionNumber = #Confirmation_Number
THEN 1
ELSE 0
END)
ELSE 1
END = 1)

SQL Server float comparison in stored procedure

Unfortunately, I have two tables to compare float datatypes between. I've read up on trying casts, converts, using a small difference and tried them all.
The strange part is, this only fails when I'm executing a stored procedure. If I cut-and-paste the body of the stored procedure into a SSMS window, it works just great.
Sample SQL:
set #newEnvRiskLevel = -1
select
#newEnvRiskLevel = rl.RiskLevelId
from
LookupTypes lt
inner join
RiskLevels rl on lt.LookupTypeId = rl.RiskLevelTypeFk
where
lt.Code = 'RISK_LEVEL_ENVIRONMENTAL'
and convert(numeric(1, 0), rl.RiskFactor) = #newEnvScore
set #errorCode = ##ERROR
if (#newEnvRiskLevel = -1 or #errorCode != 0)
begin
print 'newEnvScore = ' + cast(#newEnvScore as varchar) + ' and risk level = ' + cast(isnull(#newEnvRiskLevel, -1) as varchar)
print 'ERROR finding environmental risk level for code ' + #itemCode + ', skipping record'
set #recordsErrored = #recordsErrored + 1
goto NEXTREC
end
My #newEnvScore variable is also a float converted to numeric(1, 0). I've verified that there are only 0, 1, 2, and 3 for values in the RiskFactor column, and (via debug) that #newEnvScore has a value of 2. I've also verified that my query has a row with code = 'RISK_LEVEL_ENVIRONMENTAL' and RiskFactor = 2.
I've verified via debug that failure is due to #newEnvRiskLevel staying at -1 and that #errorCode is 0.
I've also tried cast to both decimal and int, convert to int, and "rl.RiskFactor - #newEnvScore < 1" in my where clause, none of which set newEnvRiskLevel.
As I say, it's only when running this as a stored procedure that failure happens, which is the part I really don't understand. I'd expect SQL Server to be deterministic, whether the SQL is running the body of a stored procedure, or running the exact same SQL in a SSMS tab.
It is unfortunate that you do post neither your stored procedure nor a complete script. It is difficult to diagnose a problem without a useful demonstration. But I see the use of "goto" which is concerning in many ways. I also see the use of a select statement to assign a local variable - which is often a problem because the developer might be assuming an assignment always occurs. To demonstrate - with a bonus at the end
set nocount on;
declare #risk smallint;
declare #risklevels table (risklevel float primary key, code varchar(10));
insert #risklevels(risklevel, code) values (1, 'test'), (2, 'test'), (-5, 'test');
-- here is your assignment logic. Notice that #risk is
-- never changed because there are no matching rows.
set #risk = 0;
select #risk = risklevel from #risklevels where code = 'zork';
select #risk;
-- here is a better IMO way to make the assignment. Note that
-- #risk is set to NULL when there are no matching rows.
set #risk = -1;
set #risk = (select risklevel from #risklevels where code = 'zork');
select #risk;
-- and a last misconception. What value is #risk set to? and why?
set #risk = -1;
select #risk = risklevel from #risklevels where code = 'test';
select #risk;
Whether this is the source of your problem (or contributes to it) I can't say. But it is a possibility. And storing integers in a floating point datatype is just a problem generally. Even if you cannot change your table, you can change your local variables and force the use of a more appropriate datatype. So perhaps that is another change you should consider.

LINQ to SQL anomaly

I'm trying to learn LINQ to SQL. I've run into something I just don't get. Here's the LINQ program (vb.net):
Imports System.IOModule Module1
Sub Main()
Dim crs = New DataClasses1DataContext()
Dim sw As New StringWriter()
crs.Log = sw
Dim reports = From report In crs.CRS_Report_Masters
Group report By report_id = report.Report_ID Into grouped = Group
Select New With {
.reportId = report_id,
.two = grouped.Sum(
Function(row) row.active_report * row.Report_ID)
}
For Each report In reports
Console.WriteLine("{0} {1}", report.reportId, report.two)
Next
MsgBox(sw.GetStringBuilder().ToString())
End Sub
End Module
Here's the SQL it produces:
SELECT SUM([t1].[value]) AS [two], [t1].[Report_ID] AS [reportId] FROM (
SELECT (-(CONVERT(Float,[t0].[active_report]))) *
(CONVERT(Float,CONVERT(Float,[t0].[Report_ID]))) AS [value], [t0].[Report_ID]
FROM [dbo].[CRS_Report_Master] AS [t0]
) AS [t1] GROUP BY [t1].[Report_ID]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
What I don't get is why there is a minus sign in the SQL before the math in the parentheses. I didn't specify that in the LINQ query.
If I recall correctly, VB.NET treats -1 as Boolean True, while SQL treats +1 as Boolean True. Therefore the minus sign is necessary for VB.NET to properly interpret the active_report field.
So, now I see what's going on. The LINQ to SQL definition of the column corresponding to a SQL bit is a .net Boolean. Fair enough. When doing math using it, however, the compiler negates the value as described by ekolis. I did not expect that -- in fact I think it's an error, since I do not get the results I expect. e.g. if active_report=1 and report_id=123, I expect to get "123" but I'm getting "-123". I modified the query like this:
Dim reports = (From report In crs.CRS_Report_Masters
Group report By report_id = report.Report_ID Into grouped = Group
Select New With {
.reportId = report_id,
.two = grouped.Sum(
Function(row) If(row.active_report, 1, 0) * CInt(row.Report_ID))
})
which changed the generated SQL to:
SELECT SUM([t1].[value]) AS [two], [t1].[Report_ID] AS [reportId] FROM (
SELECT (
(CASE
WHEN (COALESCE([t0].[active_report],#p0)) = 1 THEN #p1
ELSE #p2
END)) * (CONVERT(Int,[t0].[Report_ID])) AS [value], [t0].[Report_ID]
FROM [dbo].[CRS_Report_Master] AS [t0]
) AS [t1] GROUP BY [t1].[Report_ID]
-- #p0: Input Int (Size = -1; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = -1; Prec = 0; Scale = 0) [1]
-- #p2: Input Int (Size = -1; Prec = 0; Scale = 0) [0]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
Makes sense I suppose, though it's a roundabout way get the desired result. In T-SQL, a simple SELECT [BOOLEAN]*[INTEGER] will achieve the same result, I believe. Lesson learned: Don't trust LINQ to always do the right thing. Check its work!

how to encrypt the password column

I have user table in SQL Server 2008 r2. Nothing there is encrypted yet but I would like to at the least encrypt the passwords until the app is ready that will handle this better. Can i do this and how? to manually make the passwords encrypted.
You can encrypt columns using SQL Server, ( see: http://msdn.microsoft.com/en-us/library/ms179331.aspx for a walk-through).
You can also use a key given out from the server itself.
The risk of using this is if you had to do data recovery and move the DB to a different server it would be impossible to decrypt the column (reset passwords would be required).
Note: password hashing is not meant for 2-way encryption (where a rogue dba can decrypt it). It is meant for hashing it in a way that allows validation without trivially showing the password to anyone. A low or even moderate level of collisions is in some ways desirable so that it allows the password through (and unfortunately other variants) but with collisions you can never tell what the real password actually was.
A simple implementation would be to run HashBytes over the password. You compare the (hash of) password provided to the hash stored. Unless someone has a rainbow table ready, they will not be able to find the original password.
INSERT INTO <tbl> (..., passwd) values (...., HashBytes('SHA1', #password))
When validating passwords, you take the hash of the password
SELECT HashBytes('SHA1', #password);
And compare it against the input.
You actually don't want to encrypt it, but rather use a hash function on it. Unless there is an strong requirement to gain access to the unencrypted password.
We can Create some simple sql function to encrypt and decrypt the Password column in your web page:
Code:Encryption
`CREATE FUNCTION [dbo].[ENCRYPT]
(
#DB_ROLE_PASSWORD VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE
#STR_LEN NUMERIC(10),
#ENCRYPTED_PASSWORD VARCHAR(100),
#TRIAL_CHARACTER VARCHAR(1),
#TRIAL_NUMBER NUMERIC(4)
SET #ENCRYPTED_PASSWORD = NULL
SET #STR_LEN =LEN(#DB_ROLE_PASSWORD)
DECLARE
#I INT
SET #I = 1
DECLARE
#LOOP$BOUND INT
SET #LOOP$BOUND = #STR_LEN
WHILE #I <= #LOOP$BOUND
BEGIN
/* * SSMA WARNING MESSAGES: * O2SS0273: ORACLE SUBSTR FUNCTION AND SQL SERVER SUBSTRING FUNCTION MAY GIVE DIFFERENT RESULTS. */
SET #TRIAL_CHARACTER = SUBSTRING(#DB_ROLE_PASSWORD, #I, 1)
SET #TRIAL_NUMBER = ASCII(#TRIAL_CHARACTER)
IF (#TRIAL_NUMBER % 2) = 0
SET #TRIAL_NUMBER = #TRIAL_NUMBER - 6
ELSE
SET #TRIAL_NUMBER = #TRIAL_NUMBER - 8
SET #TRIAL_CHARACTER = CHAR(CAST(#TRIAL_NUMBER + #I AS INT))
SET #ENCRYPTED_PASSWORD = ISNULL(#ENCRYPTED_PASSWORD, '') + ISNULL(#TRIAL_CHARACTER, '')
SET #I = #I + 1
END
RETURN #ENCRYPTED_PASSWORD
END`
Code:Decryption
`CREATE FUNCTION [dbo].[DECRYPT]
(
#DB_ROLE_PASSWORD VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE
#STR_LEN NUMERIC(10),
#DECRYPTED_PASSWORD VARCHAR(100),
#TRIAL_CHARACTER VARCHAR(1),
#TRIAL_NUMBER NUMERIC(4),
#CHECK_CHARACTER VARCHAR(1),
#V_DB_ROLE_PASSWORD VARCHAR(100)
SET #V_DB_ROLE_PASSWORD = #DB_ROLE_PASSWORD
SET #DECRYPTED_PASSWORD = NULL
SET #STR_LEN = LEN(#V_DB_ROLE_PASSWORD)
DECLARE
#I INT
SET #I = 1
DECLARE
#LOOP$BOUND INT
SET #LOOP$BOUND = #STR_LEN
WHILE #I <= #LOOP$BOUND
BEGIN
/*
* SSMA WARNING MESSAGES:
* O2SS0273: ORACLE SUBSTR FUNCTION AND SQL SERVER SUBSTRING FUNCTION MAY GIVE DIFFERENT RESULTS.
*/
SET #TRIAL_CHARACTER = SUBSTRING(#V_DB_ROLE_PASSWORD, #I, 1)
SET #TRIAL_NUMBER = ASCII(#TRIAL_CHARACTER) - #I
IF (#TRIAL_NUMBER % 2) = 0
SET #TRIAL_NUMBER = #TRIAL_NUMBER + 6
/*-IE EVEN*/
ELSE
SET #TRIAL_NUMBER = #TRIAL_NUMBER + 8
/*-IE ODD*/
SET #DECRYPTED_PASSWORD = ISNULL(#DECRYPTED_PASSWORD,'') + ISNULL(CHAR(CAST(#TRIAL_NUMBER AS INT)), '')
SET #I = #I + 1
END
RETURN #DECRYPTED_PASSWORD
END`
Encryption & Decryption examples can be found here:
http://msdn.microsoft.com/en-us/library/ms179331.aspx
Hashing example can be found here:
http://msdn.microsoft.com/en-us/library/ms174415.aspx
You should not encrypt passwords if your only task is to verify that the password the user entered is correct. You should hash them instead. You could use any algorithm to hash them, but I recommend using MD5 because it is very secure.1 :)
for example:
public string EncodePassword(string originalPassword)
{
//Declarations
Byte[] originalBytes;
Byte[] encodedBytes;
MD5 md5;
//Instantiate MD5CryptoServiceProvider, get bytes for original password and compute hash (encoded password)
md5 = new MD5CryptoServiceProvider();
originalBytes = ASCIIEncoding.Default.GetBytes(originalPassword);
encodedBytes = md5.ComputeHash(originalBytes);
//Convert encoded bytes back to a 'readable' string
return BitConverter.ToString(encodedBytes);
}
1 Edit (not original answer author): MD5 for passwords is considered insecure and more robust algorithms should be used. You should do research into the contemporary algorithms at the point of reading this. This post might be a good starting point.

How to do hit-highlighting of results from a SQL Server full-text query

We have a web application that uses SQL Server 2008 as the database. Our users are able to do full-text searches on particular columns in the database. SQL Server's full-text functionality does not seem to provide support for hit highlighting. Do we need to build this ourselves or is there perhaps some library or knowledge around on how to do this?
BTW the application is written in C# so a .Net solution would be ideal but not necessary as we could translate.
Expanding on Ishmael's idea, it's not the final solution, but I think it's a good way to start.
Firstly we need to get the list of words that have been retrieved with the full-text engine:
declare #SearchPattern nvarchar(1000) = 'FORMSOF (INFLECTIONAL, " ' + #SearchString + ' ")'
declare #SearchWords table (Word varchar(100), Expansion_type int)
insert into #SearchWords
select distinct display_term, expansion_type
from sys.dm_fts_parser(#SearchPattern, 1033, 0, 0)
where special_term = 'Exact Match'
There is already quite a lot one can expand on, for example the search pattern is quite basic; also there are probably better ways to filter out the words you don't need, but it least it gives you a list of stem words etc. that would be matched by full-text search.
After you get the results you need, you can use RegEx to parse through the result set (or preferably only a subset to speed it up, although I haven't yet figured out a good way to do so). For this I simply use two while loops and a bunch of temporary table and variables:
declare #FinalResults table
while (select COUNT(*) from #PrelimResults) > 0
begin
select top 1 #CurrID = [UID], #Text = Text from #PrelimResults
declare #TextLength int = LEN(#Text )
declare #IndexOfDot int = CHARINDEX('.', REVERSE(#Text ), #TextLength - dbo.RegExIndexOf(#Text, '\b' + #FirstSearchWord + '\b') + 1)
set #Text = SUBSTRING(#Text, case #IndexOfDot when 0 then 0 else #TextLength - #IndexOfDot + 3 end, 300)
while (select COUNT(*) from #TempSearchWords) > 0
begin
select top 1 #CurrWord = Word from #TempSearchWords
set #Text = dbo.RegExReplace(#Text, '\b' + #CurrWord + '\b', '<b>' + SUBSTRING(#Text, dbo.RegExIndexOf(#Text, '\b' + #CurrWord + '\b'), LEN(#CurrWord) + 1) + '</b>')
delete from #TempSearchWords where Word = #CurrWord
end
insert into #FinalResults
select * from #PrelimResults where [UID] = #CurrID
delete from #PrelimResults where [UID] = #CurrID
end
Several notes:
1. Nested while loops probably aren't the most efficient way of doing it, however nothing else comes to mind. If I were to use cursors, it would essentially be the same thing?
2. #FirstSearchWord here to refers to the first instance in the text of one of the original search words, so essentially the text you are replacing is only going to be in the summary. Again, it's quite a basic method, some sort of text cluster finding algorithm would probably be handy.
3. To get RegEx in the first place, you need CLR user-defined functions.
It looks like you could parse the output of the new SQL Server 2008 stored procedure sys.dm_fts_parser and use regex, but I haven't looked at it too closely.
You might be missing the point of the database in this instance. Its job is to return the data to you that satisfies the conditions you gave it. I think you will want to implement the highlighting probably using regex in your web control.
Here is something a quick search would reveal.
http://www.dotnetjunkies.com/PrintContent.aspx?type=article&id=195E323C-78F3-4884-A5AA-3A1081AC3B35
Some details:
search_kiemeles=replace(lcase(search),"""","")
do while not rs.eof 'The search result loop
hirdetes=rs("hirdetes")
data=RegExpValueA("([A-Za-zöüóőúéáűíÖÜÓŐÚÉÁŰÍ0-9]+)",search_kiemeles) 'Give back all the search words in an array, I need non-english characters also
For i=0 to Ubound(data,1)
hirdetes = RegExpReplace(hirdetes,"("&NoAccentRE(data(i))&")","<em>$1</em>")
Next
response.write hirdetes
rs.movenext
Loop
...
Functions
'All Match to Array
Function RegExpValueA(patrn, strng)
Dim regEx
Set regEx = New RegExp ' Create a regular expression.
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True
Dim Match, Matches, RetStr
Dim data()
Dim count
count = 0
Redim data(-1) 'VBSCript Ubound array bug workaround
if isnull(strng) or strng="" then
RegExpValueA = data
exit function
end if
regEx.Pattern = patrn ' Set pattern.
Set Matches = regEx.Execute(strng) ' Execute search.
For Each Match in Matches ' Iterate Matches collection.
count = count + 1
Redim Preserve data(count-1)
data(count-1) = Match.Value
Next
set regEx = nothing
RegExpValueA = data
End Function
'Replace non-english chars
Function NoAccentRE(accent_string)
NoAccentRE=accent_string
NoAccentRE=Replace(NoAccentRE,"a","§")
NoAccentRE=Replace(NoAccentRE,"á","§")
NoAccentRE=Replace(NoAccentRE,"§","[aá]")
NoAccentRE=Replace(NoAccentRE,"e","§")
NoAccentRE=Replace(NoAccentRE,"é","§")
NoAccentRE=Replace(NoAccentRE,"§","[eé]")
NoAccentRE=Replace(NoAccentRE,"i","§")
NoAccentRE=Replace(NoAccentRE,"í","§")
NoAccentRE=Replace(NoAccentRE,"§","[ií]")
NoAccentRE=Replace(NoAccentRE,"o","§")
NoAccentRE=Replace(NoAccentRE,"ó","§")
NoAccentRE=Replace(NoAccentRE,"ö","§")
NoAccentRE=Replace(NoAccentRE,"ő","§")
NoAccentRE=Replace(NoAccentRE,"§","[oóöő]")
NoAccentRE=Replace(NoAccentRE,"u","§")
NoAccentRE=Replace(NoAccentRE,"ú","§")
NoAccentRE=Replace(NoAccentRE,"ü","§")
NoAccentRE=Replace(NoAccentRE,"ű","§")
NoAccentRE=Replace(NoAccentRE,"§","[uúüű]")
end function

Resources