Does SQL server expect numbers to be specified with digits from the latin alphabet, e.g.:
0123456789
Is it valid to give SQL Server digits in other alphabets?
Rosetta Stone:
Latin: 01234567890
Arabic: ٠١٢٣٤٥٦٧٨٩
Bengali: ০১২৩৪৫৬৭৮৯
i know that the client (ADO) will convert 8-bit strings to 16-bit unicode strings using the current culture. But the client is also converting numbers to strings using their current culture, e.g.:
SELECT * FROM Inventory
WHERE Quantity > ২৩৪,৭৮
Which throws SQL Server for fits.
i know that the server/database has it's defined code page and locale, but that is for strings.
Will SQL Server interpret numbers using the active (or per-login specified) locale, or must all numeric values be specifid with latin numeral digits?
From what I can tell, T-SQL requires latin digits, and decimal points specified as ..
Neither ISNUMERIC() nor CAST() can successfully test these digits, so a numeric constant using those characters would not work either.
Allowing a client to pass non-Latin digits sounds dangerously promiscuous (I'm not sure what path your data travels, but there seems to be a potential for SQL injection if user's localized input isn't being tested to be numeric.
Related
I'm doing some tests with SQL Server 2017.
I'm trying to store arbitrary Unicode code points in an NVARCHAR column.
I've tried different collations.
I have no problem with common characters in the BMP plane of Unicode.
For more exotic symbols, for example if I try to store the "𝌹" character (U+1D33), the following happens:
If I do it within Management Studio, I only see the infamous square symbol. But Management Studio has the proper font since I can paste it in the query editor.
If I send the text from Visual Studio, the value I see in Management Studio is "??", that's what I retrieve from Visual Studio, too, after performing a query.
My understanding is, for non-supplementary character collations, characters outside the UCS-2 subset shouldn't be interpreted correctly because NCHAR fields are limited to 2 bytes.
But, I tried with Latin1_General_100_CS_AS_KS_WS_SC, both at the DB level and column level, and it doesn't seem to work either.
Any ideas?
Thanks
I can't reproduce any data loss or encoding issue. I can reproduce a squares that becomes 𝌹 when copied. It's probably caused by the font used to display results in the SSMS grid or the Visual Studio debugger windows.
SQL Server and Windows use UTF16 for some time now, not UCS-2. Few fonts support the full UTF16 range though.
When I tried this in SSMS :
create table #tc(name nvarchar(20));
insert into #tc values (N'𝌹');
select name,len(name),DATALENGTH(name) from #tc;
I saw a square, 2 and 4 in the grid. This means the character was stored properly and took 4 bytes. When I tried to copy those results to SO though I saw :
name (No column name) (No column name)
𝌹 2 4
When I used Result to Text I got the actual character :
name
-------------------- ----------- -----------
𝌹 2 4
The correct character is there but the SSMS grid's font can't display it
Update
As Dan Guzman noted,the font can be changed from Tools-->Options-->Environment-->Fonts and Colors-->Show settings for:-->Grid Results. The default font is Microsoft Sans Serif, a small font (855KB) used as the default font on Windows. It contains "only" 3000 glyphs. Chinese characters aren't included, which is why squares are displayed.
Chinese computers use SimShun as the default though, whose file is 17.1MB. They wouldn't have any problem displaying chinese characters.
I'm trying to store arbitrary unicode points in an nvarchar column. I've tried different collations. I have no problem with common characters in the PBS plane of Unicode.
Collations have nothing to do with what code points you can store in an NVARCHAR / NCHAR / NTEXT (deprecated) column, variable, or literal. Those datatypes can store all 1,114,112 Unicode code points (even though most haven't been mapped to a character yet).
if I try to store 𝌹 character(U+1D33), ... within Management Studio, i only see the infamous square symbol. But management studio has the proper font since i can paste it in the query editor.
As others have explained already: this is merely a font issue. Fonts can hold a max of 65k characters, so you might need multiple fonts to cover all of the characters you are trying to use. I prefer Code2003 which you can find on FontSpace.com.
If i send the text from Visual Studio, the value i see in management studio is '??'
This should be due to forgetting to prefix the string literal with an upper-case "N" ;-).
SELECT '𝌹' AS [Oops], N'𝌹' AS [No Oops];
-- ?? 𝌹
My understanding is, for non supplementary character collations, characters outside the UCS-2 subset shouldn't be interpreted correctly because nchar fields are limited to 2 bytes.
The Supplementary Character-Aware (SCA) collations — those ending with _SC or with _140_ in their names — do support supplementary characters. BUT, "support" only means that the built-in functions handle the surrogate pair as a single, supplementary code point instead a pair of surrogate code points. But, support for sorting and comparison of supplementary characters actually started in SQL Server 2005 with the introduction of the version 90 collations.
All code units in UCS-2 and UTF-16 are 16 bits / 2 bytes. Supplementary characters are merely two of those 2-byte code units. Hence, being able to store supplementary characters should have been available back in SQL Server 7.0 when NVARCHAR was introduced. Even though no supplementary characters were defined until years later (after SQL Server 2000 was released), the NVARCHAR types were still capable of storing and retrieving them. I don't have SQL Server 7.0 to test with, but I have confirmed this on SQL Server 2000.
For more info, please see:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide
Collations Info
I have some data which I believe is Unicode and seeing what happens when I store it into my database column which is of VARCHAR(MAX) datatype.
And here's the source, from the file which is UTF-8...
looking for that ‘X’ and • 3 large bedrooms with 2 ensuites and • Main bedroom with ensuite & surround with plantation shutters`
and using the Visual Studio debugger:
=> so 2x apostrophes and 2x bullets.
I thought SQL Server can only store Unicode if the column is of type NVARCHAR?
I'm assuming my source data is not Unicode and therefore, I totally suck at all this Unicode/UTF-8 stuff :(
I thought SQL Server can only store Unicode if the column is of type NVARCHAR?
That's correct. As far as I can guess from your example, it is not storing Unicode. Probably it is storing bytes encoded in Windows code page 1252, which would be the default encoding for a Western install of SQL Server.
Code page 1252 happens to include mappings for characters ‘, ’ and •, so those characters can be safely stored. But step outside that limited repertoire and you'll start losing characters.
Any idea why the Turkish Lira symbol is replaced by a question mark when I insert it in a table in the database. See the image below
This is not a font issue. This is a Unicode (UTF-16) vs 8-bit Code Page character set issue (i.e. NVARCHAR vs VARCHAR). The character you are trying to use does not exist in the particular Code Page indicated by the default Collation of the DB in which you are executing this query. The Code Page used by the DB's default Collation is relevant here since your string literal is not prefixed with an upper-case "N". If it was, then the string would be interpreted as being Unicode and no conversion would take place. But since you are passing in a non-Unicode string, it will be forced into the current DB's default Collation's Code Page as the query is parsed. Any characters not available in that Code Page, and not having a Best-fit mapping, get turned into "?".
You can run the following to see for yourself:
SELECT '₺';
PRINT '₺';
It both prints AND displays in the results grid as ?
If you want to see what character SQL Server thinks it is, run the following:
SELECT ASCII('₺');
And it will return: 63
If you want to see what character has an ASCII value of 63, run this:
SELECT CHAR(63);
And it will return: ?
Now run this:
SELECT N'₺';
PRINT N'₺';
This will both print and display in the results grid correctly.
To see what character value the symbol really is, run the following:
SELECT UNICODE(N'₺'), UNICODE('₺');
This will return: 8378 and 63
But isn't 63 the question mark? Yes. That is because not prefixing the string literal '₺' with a capital "N" tells SQL Server that it is VARCHAR and so it gets translated to the default unknown character.
Now, if you were to execute this VARCHAR version in a DB that had a Collation tied to a Code Page that had this character, then it would work even when not prefixing the string literal with an upper-case "N". However, at the moment, I cannot find any Code Page used within SQL Server that supports this character. So, it might be a Unicode-only character, at least at far as SQL Server is concerned.
The way to fix this is:
Change the datatype of the field to NVARCHAR (I see in a comment on the question that the field is currently VARCHAR). If the field is VARCHAR then even if you use the N prefix on the string, the character will still get stored as ?, unless the Code Page specified by the Collation of the column supports this character, but again, I think this might be a Unicode-only character.
Change your INSERT statement to prefix the string field with a capital "N": (73, 4, N'(3) ₺'). Even if you change the field to NVARCHAR, if you don't prefix the string with N then SQL Server will translate the character to ? first and then insert the ?. This is because the query gets parsed before it gets executed, and parsing (for non-Unicode string literals and variables) is done in the Code Page of the DB's default Collation
Probably for the same reason my browser isn't displaying it in the title for this question: It isn't in the application's character set (or maybe not supported by the font).
In this case, my browser shows some numbers in a box (denoting the character code).
SQL-server is translating it to a known character instead.
Ensure you're storing it in a field that supports the character in it's character set (I think UTF-8 is sufficient)
I'm facing a problem in a package to import some data from a MySQL table to Oracle table and MS SQL Server table. It works well from MySQL to SQL Server, however I get an error when I want to import to Oracle.
The table I want to import contains an attribute (unitPrice) of data type DT_R8.
The destination data type for Oracle is a DT_NUMBERIC as you can see in the capture.
I added a conversion step to convert the unitPrice data from DT_R8 to DT_NUMERIC.
It doesn't work, I get the following error.
I found the detail of the error :
An ORA-01722 ("invalid number") error occurs when an attempt is made to convert a character string into a number, and the string cannot be converted into a valid number. Valid numbers contain the digits '0' through '9', with possibly one decimal point, a sign (+ or -) at the beginning or end of the string, or an 'E' or 'e' (if it is a floating point number in scientific notation). All other characters are forbidden.
However, I don't know how to fix.
EDIT : I added a component to redirect rows/errors to an Excel file.
The following screenshot show the result of the process including errors :
By browsing the only 3000 rows recorded, It seems the process accept only int values no real. So if the price is equal to 10, it's OK but if it's 10,5 it's failed.
Any idea to solve this issue ?
Your NLS environment does not match the expected one. Default, Oracle assumes that "," is the grouping character and "." is the decimal separator. Make sure that your session uses the correct value for the NLS_NUMERIC_CHARACTERS parameter.
See Setting Up a Globalization Support Environment for docu.
I am trying to insert ≤ and ≥ into a symbol table where the column is of type nvarchar.
Is this possible or are these symbols not allowed in SQL Server?
To make it work, prefix the string with N
create table symboltable
(
val nvarchar(10)
)
insert into symboltable values(N'≥')
select *
from symboltable
Further Reading:
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
Why do some SQL strings have an 'N' prefix?
To add to gonzalo's answer, both the string literal and the field need to support unicode characters.
String Literal
Per Marc Gravell's answer on What does N' stands for in a SQL script ?:
'abcd' is a literal for a [var]char string, occupying 4 bytes memory, and using whatever code-page the SQL server is configured for.
N'abcd' is a literal for a n[var]char string, occupying 8 bytes of memory, and using UTF-16.
Where the N prefix stands for "National" Language in the SQL-92 standard and is used for representing unicode characters. For example, in the following code, any unicode characters in the basic string literal are first encoded into SQL Server's "code page":
Aside: You can check your code page with the following SQL:
SELECT DATABASEPROPERTYEX('dbName', 'Collation') AS dbCollation;
SELECT COLLATIONPROPERTY( 'SQL_Latin1_General_CP1_CI_AS' , 'CodePage' ) AS [CodePage];
The default is Windows-1252 which only contains these 256 characters
Field Type
Once the values are capable of being passed, they'll also need to be capable of being stored into a column that supports unicode types, for example:
nchar
nvarchar
ntext
Further Reading:
Why do we need to put N before strings in Microsoft SQL Server?
What is the meaning of the prefix N in T-SQL statements?
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
Why do some SQL strings have an 'N' prefix?