Detect UNICODE characters that are not ASCII in table - sql-server

I have the following table:
Select
name,
address,
description
from dbo.users
I would like to search all this table for any characters that are UNICODE but not ASCII. Is this possible?

You can find non-ASCII characters quite simply:
SELECT NAME, ADDRESS, DESCRIPTION
FROM DBO.USERS
WHERE NAME != CAST(NAME AS VARCHAR(4000))
OR ADDRESS != CAST(ADDRESS AS VARCHAR(4000))
OR DESCRIPTION != CAST(DESCRIPTION AS VARCHAR(4000))

If you want to determine if there are any characters in an NVARCHAR / NCHAR / NTEXT column that cannot be converted to VARCHAR, you need to convert to VARCHAR using the _BIN2 variation of the collation being used for that particular column. For example, if a particular column is using Albanian_100_CI_AS, then you would specify Albanian_100_BIN2 for the test. The reason for using a _BIN2 collation is that non-binary collations will only find instances where there is at least one character that does not have any mapping at all in the code page and is thus converted into ?. But, non-binary collations do not catch instances where there are characters that don't have a direct mapping into the code page, but instead have a "best fit" mapping. For example, the superscript 2 character, ², has a direct mapping in code page 1252, so definitely no problem there. On the other hand, it doesn't have a direct mapping in code page 1250 (used by the Albanian collations), but it does have a "best fit" mapping which converts it into a regular 2. The problem with the non-binary collation is that 2 will equate to ² and so it won't register as a row that can't convert to VARCHAR. For example:
SELECT CONVERT(VARCHAR(MAX), N'²' COLLATE French_100_CI_AS); -- Code Page 1252
-- ²
SELECT CONVERT(VARCHAR(MAX), N'²' COLLATE Albanian_100_CI_AS); -- Code Page 1250
-- 2
SELECT CONVERT(VARCHAR(MAX), N'²' COLLATE Albanian_100_CI_AS)
WHERE N'²' <> CONVERT(NVARCHAR(MAX),
CONVERT(VARCHAR(MAX), N'²' COLLATE Albanian_100_CI_AS));
-- (no rows returned)
SELECT CONVERT(VARCHAR(MAX), N'²' COLLATE Albanian_100_BIN2)
WHERE N'²' <> CONVERT(NVARCHAR(MAX),
CONVERT(VARCHAR(MAX), N'²' COLLATE Albanian_100_BIN2));
-- 2
Ideally you would convert back to NVARCHAR explicitly for the code to be clear on what it's doing, though not doing this will still implicitly convert back to NVARCHAR, so the behavior is the same either way.
Please note that only MAX types are used. Do not use NVARCHAR(4000) or VARCHAR(4000) else you might get false positives due to truncation of data in NVARCHAR(MAX) columns.
So, in terms of the example code in the question, the query would be (assuming that a Latin1_General collation is being used):
SELECT usr.*
FROM dbo.[users] usr
WHERE usr.[name] <> CONVERT(NVARCHAR(MAX),
CONVERT(VARCHAR(MAX), usr.[name] COLLATE Latin1_General_100_BIN2))
OR usr.[address] <> CONVERT(NVARCHAR(MAX),
CONVERT(VARCHAR(MAX), usr.[address] COLLATE Latin1_General_100_BIN2))
OR usr.[description] <> CONVERT(NVARCHAR(MAX),
CONVERT(VARCHAR(MAX), usr.[description] COLLATE Latin1_General_100_BIN2));

There doesn't seem to be an inbuilt function for this as far as I can tell. A brute force approach is to pass each character to ascii and then pass the result to char and check if it returns '?', which would mean the character is out of range. You can write a UDF with the below code as reference, but I should think that it is a very inefficient solution:
declare #i int = 1
declare #x nvarchar(10) = N'vsdǣf'
declare #result nvarchar(100) = N''
while (#i < len(#x))
begin
if char(ascii(substring(#x,#i,1))) = '?'
begin
set #result = #result + substring(#x,#i,1)
end
set #i = #i+1
end
select #result

Related

Convert utf-8 encoded varbinary(max) data to nvarchar(max) string

Is there a simple way to convert a utf-8 encoded varbinary(max) column to varchar(max) in T-SQL. Something like CONVERT(varchar(max), [MyDataColumn]). Best would be a solution that does not need custom functions.
Currently, i convert the data on the client side, but this has the downside, that correct filtering and sorting is not as efficient as done server-side.
XML trick
Following solution should work for any encoding.
There is a tricky way of doing exactly what the OP asks. Edit: I found the same method discussed on SO (SQL - UTF-8 to varchar/nvarchar Encoding issue)
The process goes like this:
SELECT
CAST(
'<?xml version=''1.0'' encoding=''utf-8''?><![CDATA[' --start CDATA
+ REPLACE(
LB.LongBinary,
']]>', --we need only to escape ]]>, which ends CDATA section
']]]]><![CDATA[>' --we simply split it into two CDATA sections
) + ']]>' AS XML --finish CDATA
).value('.', 'nvarchar(max)')
Why it works: varbinary and varchar are the same string of bits - only the interpretation differs, so the resulting xml truly is utf8 encoded bitstream and the xml interpreter is than able to reconstruct the correct utf8 encoded characters.
BEWARE the 'nvarchar(max)' in the value function. If you used varchar, it would destroy multi-byte characters (depending on your collation).
BEWARE 2 XML cannot handle some characters, i.e. 0x2. When your string contains such characters, this trick will fail.
Database trick (SQL Server 2019 and newer)
This is simple. Create another database with UTF8 collation as the default one. Create function that converts VARBINARY to VARCHAR. The returned VARCHAR will have that UTF8 collation of the database.
Insert trick (SQL Server 2019 and newer)
This is another simple trick. Create a table with one column VARCHAR COLLATE ...UTF8. Insert the VARBINARY data into this table. It will get saved correctly as UTF8 VARCHAR. It is sad that memory optimized tables cannot use UTF8 collations...
Alter table trick (SQL Server 2019 and newer)
(don't use this, it is unnecessary, see Plain insert trick)
I was trying to come up with an approach using SQL Server 2019's Utf8 collation and I have found one possible method so far, that should be faster than the XML trick (see below).
Create temporary table with varbinary column.
Insert varbinary values into the table
Alter table alter column to varchar with utf8 collation
drop table if exists
#bin,
#utf8;
create table #utf8 (UTF8 VARCHAR(MAX) COLLATE Czech_100_CI_AI_SC_UTF8);
create table #bin (BIN VARBINARY(MAX));
insert into #utf8 (UTF8) values ('Žluťoučký kůň říčně pěl ďábelské ódy za svitu měsíce.');
insert into #bin (BIN) select CAST(UTF8 AS varbinary(max)) from #utf8;
select * from #utf8; --here you can see the utf8 string is stored correctly and that
select BIN, CAST(BIN AS VARCHAR(MAX)) from #bin; --utf8 binary is converted into gibberish
alter table #bin alter column BIN varchar(max) collate Czech_100_CI_AI_SC_UTF8;
select * from #bin; --voialá, correctly converted varchar
alter table #bin alter column BIN nvarchar(max);
select * from #bin; --finally, correctly converted nvarchar
Speed difference
The Database trick together with the Insert trick are the fastest ones.
The XML trick is slower.
The Alter table trick is stupid, don't do it. It loses out when you change lots of short texts at once (the altered table is large).
The test:
first string contains one replace for the XML trick
second string is plain ASCII with no replaces for XML trick
#TextLengthMultiplier determines length of the converted text
#TextAmount determines how many of them at once will be converted
------------------
--TEST SETUP
--DECLARE #LongText NVARCHAR(MAX) = N'český jazyk, Tiếng Việt, русский язык, 漢語, ]]>';
--DECLARE #LongText NVARCHAR(MAX) = N'JUST ASCII, for LOLZ------------------------------------------------------';
DECLARE
#TextLengthMultiplier INTEGER = 100000,
#TextAmount INTEGER = 10;
---------------------
-- TECHNICALITIES
DECLARE
#StartCDATA DATETIME2(7), #EndCDATA DATETIME2(7),
#StartTable DATETIME2(7), #EndTable DATETIME2(7),
#StartDB DATETIME2(7), #EndDB DATETIME2(7),
#StartInsert DATETIME2(7), #EndInsert DATETIME2(7);
drop table if exists
#longTexts,
#longBinaries,
#CDATAConverts,
#DBConverts,
#INsertConverts;
CREATE TABLE #longTexts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #longBinaries (LongBinary VARBINARY(MAX) NOT NULL);
CREATE TABLE #CDATAConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #DBConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
CREATE TABLE #InsertConverts (LongText VARCHAR (MAX) COLLATE Czech_100_CI_AI_SC_UTF8 NOT NULL);
insert into #longTexts --make the long text longer
(LongText)
select
REPLICATE(#LongText, #TextLengthMultiplier)
from
TESTES.dbo.Numbers --use while if you don't have number table
WHERE
Number BETWEEN 1 AND #TextAmount; --make more of them
insert into #longBinaries (LongBinary) select CAST(LongText AS varbinary(max)) from #longTexts;
--sanity check...
SELECT TOP(1) * FROM #longTexts;
------------------------------
--MEASURE CDATA--
SET #StartCDATA = SYSDATETIME();
INSERT INTO #CDATAConverts
(
LongText
)
SELECT
CAST(
'<?xml version=''1.0'' encoding=''utf-8''?><![CDATA['
+ REPLACE(
LB.LongBinary,
']]>',
']]]]><![CDATA[>'
) + ']]>' AS XML
).value('.', 'Nvarchar(max)')
FROM
#longBinaries AS LB;
SET #EndCDATA = SYSDATETIME();
--------------------------------------------
--MEASURE ALTER TABLE--
SET #StartTable = SYSDATETIME();
DROP TABLE IF EXISTS #AlterConverts;
CREATE TABLE #AlterConverts (UTF8 VARBINARY(MAX));
INSERT INTO #AlterConverts
(
UTF8
)
SELECT
LB.LongBinary
FROM
#longBinaries AS LB;
ALTER TABLE #AlterConverts ALTER COLUMN UTF8 VARCHAR(MAX) COLLATE Czech_100_CI_AI_SC_UTF8;
--ALTER TABLE #AlterConverts ALTER COLUMN UTF8 NVARCHAR(MAX);
SET #EndTable = SYSDATETIME();
--------------------------------------------
--MEASURE DB--
SET #StartDB = SYSDATETIME();
INSERT INTO #DBConverts
(
LongText
)
SELECT
FUNCTIONS_ONLY.dbo.VarBinaryToUTF8(LB.LongBinary)
FROM
#longBinaries AS LB;
SET #EndDB = SYSDATETIME();
--------------------------------------------
--MEASURE Insert--
SET #StartInsert = SYSDATETIME();
INSERT INTO #INsertConverts
(
LongText
)
SELECT
LB.LongBinary
FROM
#longBinaries AS LB;
SET #EndInsert = SYSDATETIME();
--------------------------------------------
-- RESULTS
SELECT
DATEDIFF(MILLISECOND, #StartCDATA, #EndCDATA) AS CDATA_MS,
DATEDIFF(MILLISECOND, #StartTable, #EndTable) AS ALTER_MS,
DATEDIFF(MILLISECOND, #StartDB, #EndDB) AS DB_MS,
DATEDIFF(MILLISECOND, #StartInsert, #EndInsert) AS Insert_MS;
SELECT TOP(1) '#CDATAConverts ', * FROM #CDATAConverts ;
SELECT TOP(1) '#DBConverts ', * FROM #DBConverts ;
SELECT TOP(1) '#INsertConverts', * FROM #INsertConverts;
SELECT TOP(1) '#AlterConverts ', * FROM #AlterConverts ;
SQL-Server does not know UTF-8 (at least all versions you can use productivly). There is limited support starting with v2014 SP2 (and some details about the supported versions)
when reading an utf-8 encoded file from disc via BCP (same for writing content to disc).
Important to know:
VARCHAR(x) is not utf-8. It is 1-byte-encoded extended ASCII, using a codepage (living in the collation) as character map.
NVARCHAR(x) is not utf-16 (but very close to it, it's ucs-2). This is a 2-byte-encoded string covering almost any known characters (but exceptions exist).
utf-8 will use 1 byte for plain latin characters, but 2 or even more bytes to encoded foreign charsets.
A VARBINARY(x) will hold the utf-8 as a meaningless chain of bytes.
A simple CAST or CONVERT will not work: VARCHAR will take each single byte as a character. For sure this is not the result you would expect. NVARCHAR would take each chunk of 2 bytes as one character. Again not the thing you need.
You might try to write this out to a file and read it back with BCP (v2014 SP2 or higher). But the better chance I see for you is a CLR function.
you can use the following to post string into varbinary field
Encoding.Unicode.GetBytes(Item.VALUE)
then use the following to retrive data as string
public string ReadCString(byte[] cString)
{
var nullIndex = Array.IndexOf(cString, (byte)0);
nullIndex = (nullIndex == -1) ? cString.Length : nullIndex;
return System.Text.Encoding.Unicode.GetString(cString);
}

What is the encode(<columnName>, 'escape') PostgreSQL equivalent in SQL Server?

In the same vein as this question, what is the equivalent in SQL Server to the following Postgres statement?
select encode(some_field, 'escape') from only some_table
As you were told already, SQL-Server is not the best with such issues.
The most important advise to avoid such issues is: Use the appropriate data type to store your values. Storing binary data as a HEX-string is running against this best practice. But there are some workarounds:
I use the HEX-string taken from the linked question:
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
--here I use dynamically created SQL to get the HEX-string as a real binary:
DECLARE #convBin VARBINARY(MAX);
DECLARE #cmd NVARCHAR(MAX)=N'SELECT #bin=' + #str;
EXEC sp_executeSql #cmd
,N'#bin VARBINARY(MAX) OUTPUT'
,#bin=#convBin OUTPUT;
--This real binary can be converted to a VARCHAR(MAX).
--Be aware, that in this case the input contains 00 as this is an array.
--It is possible to split the input at the 00s, but this is going to far...
SELECT #convBin AS HexStringAsRealBinary
,CAST(#convBin AS VARCHAR(MAX)) AS CastedToString; --You will see the first "asda" only
--If your HEX-string is not longer than 10 bytes there is an undocumented function:
--You'll see, that the final AA is cut away, while a shorter string would be filled with zeros.
SELECT sys.fn_cdc_hexstrtobin('0x00112233445566778899AA')
SELECT CAST(sys.fn_cdc_hexstrtobin(#str) AS VARCHAR(100));
UPDATE: An inlinable approach
The following recursive CTE will read the HEX-string character by character.
Furthermore it will group the result and return two rows in this case.
This solution is very specific to the given input.
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
WITH recCTE AS
(
SELECT 1 AS position
,1 AS GroupingKey
,SUBSTRING(#str,3,2) AS HEXCode
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,3,2)),1,1)) AS TheLetter
UNION ALL
SELECT r.position+1
,r.GroupingKey + CASE WHEN SUBSTRING(#str,2+(r.position)*2+1,2)='00' THEN 1 ELSE 0 END
,SUBSTRING(#str,2+(r.position)*2+1,2)
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,2+(r.position)*2+1,2)),1,1)) AS TheLetter
FROM recCTE r
WHERE position<LEN(#str)/2
)
SELECT r.GroupingKey
,(
SELECT x.TheLetter AS [*]
FROM recCTE x
WHERE x.GroupingKey=r.GroupingKey
AND x.HEXCode<>'00'
AND LEN(x.HEXCode)>0
ORDER BY x.position
FOR XML PATH(''),TYPE
).value('.','varchar(max)')
FROM recCTE r
GROUP BY r.GroupingKey;
The result
1 asdad
2 asdasd
Hint: Starting with SQL Server 2017 there is STRING_AGG(), which would reduce the final SELECT...
If you need this functionality, it's going to be up to you to implement it. Assuming you just need the escape variant, you can try to implement it as a T-SQL UDF. But pulling strings apart, working character by character and building up a new string just isn't a T-SQL strength. You'd be looking at a WHILE loop to count over the length of the input byte length, SUBSTRING to extract the individual bytes, and CHAR to directly convert the bytes that don't need to be octal encoded.1
If you're going to start down this route (and especially if you want to support the other formats), I'd be looking at using the CLR support in SQL Server, to create the function in a .NET language (C# usually preferred) and use the richer string manipulation functionality there.
Both of the above assume that what you're really wanting is to replicate the escape format of encode. If you just want "take this binary data and give me a safe string to represent it", just use CONVERT to get the binary hex encoded.
1Here's my attempt at it. I'd suggest a lot of testing and tweaking before you use it in anger:
create function Postgresql_encode_escape (#input varbinary(max))
returns varchar(max)
as
begin
declare #i int
declare #len int
declare #out varchar(max)
declare #chr int
select #i = 1, #out = '',#len = DATALENGTH(#input)
while #i <= #len
begin
set #chr = SUBSTRING(#input,#i,1)
if #chr > 31 and #chr < 128
begin
set #out = #out + CHAR(#chr)
end
else
begin
set #out = #out + '\' +
RIGHT('000' + CONVERT(varchar(3),
(#chr / 64)*100 +
((#chr / 8)%8)*10 +
(#chr % 8))
,3)
end
set #i = #i + 1
end
return #out
end

SQL Server - error converting data type nvarchar to float

I am inserting table A to table B. The problematic column looks like -$25.2. I first replaced the $ and tried insert. Got this error
Error converting data type nvarchar to float.
I then checked by
SELECT *
FROM B
WHERE ISNUMERIC([Col Name]) <> 1
and no results were returned.
This is odd. It is supposed to return something.
What should I check next?
I also tried something like
CAST(REPLACE([Col Name], '-$', '') AS FLOAT)
Try using this
DECLARE #Text nvarchar(100)
SET #Text = '-$1234.567'
SET #Text = Replace(#Text,'$', '')
Select CONVERT(float, #Text) AS ColumnValue
Select ABS(CONVERT(float, #Text)) AS ColumnValue
While the 'money' data type isn't great for doing calculations, in this scenario you can use it as an intermediary.
declare #a nvarchar(10)
set #a = '-$25.2'
select
#a,
cast(cast(#a as money) as float)
Only use this though if your data only goes to a max of 4 decimal places, otherwise you will lose precision in the conversion.

How can I search for a sequence of bytes in SQL Server varbinary(max) field?

I am trying to write a query on SQL Server 2012 that will return varbinary(max) columns that contain a specified byte sequence. I am able to do that with a query that converts the varbinary field to varchar and uses LIKE:
SELECT * FROM foo
WHERE CONVERT(varchar(max), myvarbincolumn) LIKE
'%' + CONVERT(varchar(max), 0x626C6168) + '%'
where "0x626C6168" is my target byte sequence. Unfortunately, this works only if the field does not contain any bytes with the value zero (0x00) and those are very common in my data. Is there a different approach I can take that will work with values that contain zero-valued bytes?
If you use a binary collation it should work.
WITH foo(myvarbincolumn) AS
(
SELECT 0x00626C616800
)
SELECT *
FROM foo
WHERE CONVERT(VARCHAR(max), myvarbincolumn) COLLATE Latin1_General_100_BIN2
LIKE '%' + CONVERT(VARCHAR(max), 0x626C6168) + '%'
You might need (say) Latin1_General_BIN if on an older version of SQL Server.
Unfortunately the solution proposed by Martin has a flaw.
In case the binary sequence in the search key contains any 0x25 byte, it will be translated to the % character (according to the ASCII table).
This character is then interpreted as a wildcard in the like clause, causing many unwanted results to show up.
-- A table with a binary column:
DECLARE #foo TABLE(BinCol VARBINARY(MAX));
INSERT INTO #foo (BinCol) VALUES (0x001125), (0x000011), (0x001100), (0x110000);
-- The search key:
DECLARE #key VARBINARY(MAX) = 0x1125; -- 0x25 is '%' in the ASCII table!
-- This returns ALL values from the table, because of the wildcard in the search key:
SELECT * FROM #foo WHERE
CONVERT(VARCHAR(max), BinCol) COLLATE Latin1_General_100_BIN2
LIKE ('%' + CONVERT(VARCHAR(max), #key) + '%');
To fix this issue, use the search clause below:
-- This returns just the correct value -> 0x001125
SELECT * FROM #foo WHERE
CHARINDEX
(
CONVERT(VARCHAR(max), #key),
CONVERT(VARCHAR(max), BinCol) COLLATE Latin1_General_100_BIN2
) > 0;
I just discovered this very simply query.
SELECT * FROM foo
WHERE CONVERT(varchar(max), myvarbincolumn,2) LIKE '%626C6168%'
The characters 0x aren't added to the left of the converted result for style 2.
https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver16#binary-styles

SQL Server : converting varchar to INT

I am stuck on converting a varchar column UserID to INT. I know, please don't ask why this UserID column was not created as INT initially, long story.
So I tried this, but it doesn't work. and give me an error:
select CAST(userID AS int) from audit
Error:
Conversion failed when converting the varchar value
'1581............................................................................................................................' to data type int.
I did select len(userID) from audit and it returns 128 characters, which are not spaces.
I tried to detect ASCII characters for those trailing after the ID number and ASCII value = 0.
I have also tried LTRIM, RTRIM, and replace char(0) with '', but does not work.
The only way it works when I tell the fixed number of character like this below, but UserID is not always 4 characters.
select CAST(LEFT(userID, 4) AS int) from audit
You could try updating the table to get rid of these characters:
UPDATE dbo.[audit]
SET UserID = REPLACE(UserID, CHAR(0), '')
WHERE CHARINDEX(CHAR(0), UserID) > 0;
But then you'll also need to fix whatever is putting this bad data into the table in the first place. In the meantime perhaps try:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), ''))
FROM dbo.[audit];
But that is not a long term solution. Fix the data (and the data type while you're at it). If you can't fix the data type immediately, then you can quickly find the culprit by adding a check constraint:
ALTER TABLE dbo.[audit]
ADD CONSTRAINT do_not_allow_stupid_data
CHECK (CHARINDEX(CHAR(0), UserID) = 0);
EDIT
Ok, so that is definitely a 4-digit integer followed by six instances of CHAR(0). And the workaround I posted definitely works for me:
DECLARE #foo TABLE(UserID VARCHAR(32));
INSERT #foo SELECT 0x31353831000000000000;
-- this succeeds:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), '')) FROM #foo;
-- this fails:
SELECT CONVERT(INT, UserID) FROM #foo;
Please confirm that this code on its own (well, the first SELECT, anyway) works for you. If it does then the error you are getting is from a different non-numeric character in a different row (and if it doesn't then perhaps you have a build where a particular bug hasn't been fixed). To try and narrow it down you can take random values from the following query and then loop through the characters:
SELECT UserID, CONVERT(VARBINARY(32), UserID)
FROM dbo.[audit]
WHERE UserID LIKE '%[^0-9]%';
So take a random row, and then paste the output into a query like this:
DECLARE #x VARCHAR(32), #i INT;
SET #x = CONVERT(VARCHAR(32), 0x...); -- paste the value here
SET #i = 1;
WHILE #i <= LEN(#x)
BEGIN
PRINT RTRIM(#i) + ' = ' + RTRIM(ASCII(SUBSTRING(#x, #i, 1)))
SET #i = #i + 1;
END
This may take some trial and error before you encounter a row that fails for some other reason than CHAR(0) - since you can't really filter out the rows that contain CHAR(0) because they could contain CHAR(0) and CHAR(something else). For all we know you have values in the table like:
SELECT '15' + CHAR(9) + '23' + CHAR(0);
...which also can't be converted to an integer, whether you've replaced CHAR(0) or not.
I know you don't want to hear it, but I am really glad this is painful for people, because now they have more war stories to push back when people make very poor decisions about data types.
This question has got 91,000 views so perhaps many people are looking for a more generic solution to the issue in the title "error converting varchar to INT"
If you are on SQL Server 2012+ one way of handling this invalid data is to use TRY_CAST
SELECT TRY_CAST (userID AS INT)
FROM audit
On previous versions you could use
SELECT CASE
WHEN ISNUMERIC(RTRIM(userID) + '.0e0') = 1
AND LEN(userID) <= 11
THEN CAST(userID AS INT)
END
FROM audit
Both return NULL if the value cannot be cast.
In the specific case that you have in your question with known bad values I would use the following however.
CAST(REPLACE(userID COLLATE Latin1_General_Bin, CHAR(0),'') AS INT)
Trying to replace the null character is often problematic except if using a binary collation.
This is more for someone Searching for a result, than the original post-er. This worked for me...
declare #value varchar(max) = 'sad';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 0
declare #value varchar(max) = '3';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 3
I would try triming the number to see what you get:
select len(rtrim(ltrim(userid))) from audit
if that return the correct value then just do:
select convert(int, rtrim(ltrim(userid))) from audit
if that doesn't return the correct value then I would do a replace to remove the empty space:
select convert(int, replace(userid, char(0), '')) from audit
This is how I solved the problem in my case:
First of all I made sure the column I need to convert to integer doesn't contain any spaces:
update data set col1 = TRIM(col1)
I also checked whether the column only contains numeric digits.
You can check it by:
select * from data where col1 like '%[^0-9]%' order by col1
If any nonnumeric values are present, you can save them to another table and remove them from the table you are working on.
select * into nonnumeric_data from data where col1 like '%[^0-9]%'
delete from data where col1 like '%[^0-9]%'
Problems with my data were the cases above. So after fixing them, I created a bigint variable and set the values of the varchar column to the integer column I created.
alter table data add int_col1 bigint
update data set int_col1 = CAST(col1 AS VARCHAR)
This worked for me, hope you find it useful as well.

Resources