Why is Turkish Lira symbol ₺ replaced with ? in SQL server 2008 database - sql-server

Any idea why the Turkish Lira symbol is replaced by a question mark when I insert it in a table in the database. See the image below

This is not a font issue. This is a Unicode (UTF-16) vs 8-bit Code Page character set issue (i.e. NVARCHAR vs VARCHAR). The character you are trying to use does not exist in the particular Code Page indicated by the default Collation of the DB in which you are executing this query. The Code Page used by the DB's default Collation is relevant here since your string literal is not prefixed with an upper-case "N". If it was, then the string would be interpreted as being Unicode and no conversion would take place. But since you are passing in a non-Unicode string, it will be forced into the current DB's default Collation's Code Page as the query is parsed. Any characters not available in that Code Page, and not having a Best-fit mapping, get turned into "?".
You can run the following to see for yourself:
SELECT '₺';
PRINT '₺';
It both prints AND displays in the results grid as ?
If you want to see what character SQL Server thinks it is, run the following:
SELECT ASCII('₺');
And it will return: 63
If you want to see what character has an ASCII value of 63, run this:
SELECT CHAR(63);
And it will return: ?
Now run this:
SELECT N'₺';
PRINT N'₺';
This will both print and display in the results grid correctly.
To see what character value the symbol really is, run the following:
SELECT UNICODE(N'₺'), UNICODE('₺');
This will return: 8378 and 63
But isn't 63 the question mark? Yes. That is because not prefixing the string literal '₺' with a capital "N" tells SQL Server that it is VARCHAR and so it gets translated to the default unknown character.
Now, if you were to execute this VARCHAR version in a DB that had a Collation tied to a Code Page that had this character, then it would work even when not prefixing the string literal with an upper-case "N". However, at the moment, I cannot find any Code Page used within SQL Server that supports this character. So, it might be a Unicode-only character, at least at far as SQL Server is concerned.
The way to fix this is:
Change the datatype of the field to NVARCHAR (I see in a comment on the question that the field is currently VARCHAR). If the field is VARCHAR then even if you use the N prefix on the string, the character will still get stored as ?, unless the Code Page specified by the Collation of the column supports this character, but again, I think this might be a Unicode-only character.
Change your INSERT statement to prefix the string field with a capital "N": (73, 4, N'(3) ₺'). Even if you change the field to NVARCHAR, if you don't prefix the string with N then SQL Server will translate the character to ? first and then insert the ?. This is because the query gets parsed before it gets executed, and parsing (for non-Unicode string literals and variables) is done in the Code Page of the DB's default Collation

Probably for the same reason my browser isn't displaying it in the title for this question: It isn't in the application's character set (or maybe not supported by the font).
In this case, my browser shows some numbers in a box (denoting the character code).
SQL-server is translating it to a known character instead.
Ensure you're storing it in a field that supports the character in it's character set (I think UTF-8 is sufficient)

Related

Encoding error reading Greek characters string form SQL database

I have a search form (with method GET) with only one text field named “search_field”. When a user submits the form, the typed by the user characters are posted to the URL. For example if the user type "blablabla" the generated URL will be something like that:
results.asp?search_field=blablabla
In my MSSQL 2012 database I have a table named “Products” with a column named “kodikos” in it.
I want to display all the records from the column “kodikos” containing the typed characters. My SQL select statement if the following:
"SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE '%' + ? + '%' "
(the question mark is the “search_field” that contains the typed by the user characters.
All the above works perfect and I am getting the correct results. The problem that I am facing is with the Greek characters. For example when the user type “fff” my codes works perfect and finds all the records containing the characters “fff”. Also works perfect with numbers too. But if the user type in Greek characters “φφφ” I am not getting any results. And there are a lot of records with “φφφ”. The problem is that the Greek characters are not recognized at all.
For your information:
In my local PC with the same SQL version the Greek characters are recognized correctly with my code, because my regional settings are set in Greek. But the same code in the hosting server in US does not recognize them.
All of my pages have UTF-8 encoding.
Can someone have any idea to solve this issue???
SQL Server knows two encodings natively:
2-byte-unicode (in most cases NVARCHAR)
extended ASCII in connection with a collation (in most cases VARCHAR)
I assume, that the language you are calling this from is using 2-byte-unicode for normal strings. This is pretty usual today...
I assume, that your column Products.kodikos is of type NVARCHAR (2-byte-unicode). In this case it should help to force your search string to be 2-byte-unicode too. Try
LIKE N'%' + CAST(? AS NVARCHAR(MAX)) + N'%'
If your column is not 2-byte encoded it might help to use COLLATE to force your search string to know your special characters.
If you pass this string into a SQL-Server routine as-is, you should make sure, that the accepting parameter is 2-byte-unicode too.
You have to make sure your search string is two byte encoded using the N'' notation...
For instance, the following query uses a string that is two byte encoded:
SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE N'%φφφ%'
But this query uses a string that is not two byte encoded (you won't get any results):
SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE '%φφφ%'

SQL Server: encoding of string constants in SQL

I have a problem with encoding of string constants in queries to NVARCHAR field in SQL Server v12.0.2. I need to use national characters (all in the same single code page e.g. cyrillic WIN1251) in queries without N prefix.
Is it possible?
Example:
1. CREATE TABLE TEST (VALUE NVARCHAR(100) COLLATE Cyrillic_General_CI_AS);
2. INSERT INTO TEST VALUES (N'привет мир');
3. INSERT INTO TEST VALUES ('привет мир');
4. SELECT * FROM TEST;
This will return two rows:
| привет мир |
| ?????? ??? |
So the first insert works correctly, I expect the second to do the same because TEST.VALUE column collated in Cyrillic_General_CI_AS. But it looks like national characters ignores field collation and use code page from somewhere else.
I realize that in this case I won't be able to use characters from more than one code page and languages that doesn't fit 1-byte encoding, but that is fine for me. Other option is to modify all queries to use N prefix before string constants, but it is not possible.
Without the N prefix, the string is converted to the default code page of the database, not the table you're inserting into (see MSDN for details)
So either you should change database collation to Cyrillic_General_CI_AS, or find all the string constants and insert N prefix.

Unable to return query Thai data

I have a table with columns that contain both thai and english text data. NVARCHAR(255).
In SSMS I can query the table and return all the rows easy enough. But if I then query specifically for one of the Thai results it returns no rows.
SELECT TOP 1000 [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
Returns
Province District SubDistrict Branch
อุตรดิตถ์ ลับแล ศรีพนมมาศ Northern
Bangkok Khlong Toei Khlong Tan SSS1
But this query:
SELECT [Province]
,[District]
,[SubDistrict]
,[Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
where [Province] LIKE 'อุตรดิตถ์'
Returns no rows.
What do I need o do to get the expected results.
The collation set is Latin1_General_CI_AS.
The data is displayed and inserted with no errors just can't search.
Two problems:
The string being passed into the LIKE clause is VARCHAR due to not being prefixed with a capital "N". For example:
SELECT 'อุตรดิตถ์' AS [VARCHAR], N'อุตรดิตถ์' AS [NVARCHAR]
-- ????????? อุตรดิตถ
What is happening here is that when SQL Server is parsing the query batch, it needs to determine the exact type and value of all literals / constants. So it figures out that 12 is an INT and 12.0 is a NUMERIC, etc. It knows that N'ดิ' is NVARCHAR, which is an all-inclusive character set, so it takes the value as is. BUT, as noted before, 'ดิ' is VARCHAR, which is an 8-bit encoding, which means that the character set is controlled by a Code Page. For string literals and variables / parameters, the Code Page used for VARCHAR data is the Database's default Collation. If there are characters in the string that are not available on the Code Page used by the Database's default Collation, they are either converted to a "best fit" mapping, if such a mapping exists, else they become the default replacement character: ?.
Technically speaking, since the Database's default Collation controls string literals (and variables), and since there is a Code Page for "Thai" (available in Windows Collations), then it would be possible to have a VARCHAR string containing Thai characters (meaning: 'ดิ', without the "N" prefix, would work). But that would require changing the Database's default Collation, and that is A LOT more work than simply prefixing the string literal with "N".
For an in-depth look at this behavior, please see my two-part series:
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part A of 2: “Duck”)
Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition? (Part B of 2: “Rabbit”)
You need to add the wildcard characters to both ends:
N'%อุตรดิตถ์%'
The end result will look like:
WHERE [Province] LIKE N'%อุตรดิตถ์%'
EDIT:
I just edited the question to format the "results" to be more readable. It now appears that the following might also work (since no wildcards are being used in the LIKE predicate in the question):
WHERE [Province] = N'อุตรดิตถ์'
EDIT 2:
A string (i.e. something inside of single-quotes) is VARCHAR if there is no "N" prefixed to the string literal. It doesn't matter what the destination datatype is (e.g. an NVARCHAR(255) column). The issue here is the datatype of the source data, and that source is a string literal. And unlike a string in .NET, SQL Server handles 'string' as an 8-bit encoding (VARCHAR; ASCII values 0 - 127 same across all Code Pages, Extended ASCII values 128 - 255 determined by the Code Page, and potentially 2-byte sequences for Double-Byte Character Sets) and N'string' as UTF-16 Little Endian (NVARCHAR; Unicode character set, 2-byte sequences for BMP characters 0 - 65535, two 2-byte sequences for Code Points above 65535). Using 'string' is the same as passing in a VARCHAR variable. For example:
DECLARE #ASCII VARCHAR(20);
SET #ASCII = N'อุตรดิตถ์';
SELECT #ASCII AS [ImplicitlyConverted]
-- ?????????
Could be a number of things!
Fist of print out the value of the column and your query string in hex.
SELECT convert(varbinary(20)Province) as stored convert(varbinary(20),'อุตรดิตถ์') as query from allDistricsBranches;
This should give you some insight to the problem. I think the most likely cause is the ั, ิ, characters being typed in the wrong sequence. They are displayed as part of the main letter but are stored internally as separate characters.

insert special character in my sql server database

ANSWER :
Sorry about the this sort of question guys, I assumed that it wouldn't work if I directly enter the special character into my string in query but it does. so all you need to do is locate the special character, copy it and paste it into your query and it works :)
folks,
QUESTION CHANGED:
I want to enter a ascii character in the database which is the standard trademark symbol (®) using a direct query and have it read correctly ! how can i do this ?
PREVIOUS QUESTION:
how can i enter a special character in SQL Server in varchar column... ® (there is also a line below this symbol which I am unable to paste here) so that it is read correctly.
Also, I am unable to find the character sequence for that symbol any places where I can look for ?
The symbol is standard ® symbol which hangs on the top and there is a line below it just like an underscore.
Thanks
EDIT 1: I am talking about a direct query to the database.
You can use this T-SQL query:
INSERT INTO dbo.YourTable(UnicodeCol)
VALUES(nchar(0x00AE))
® is the Unicode character with code 0x00AE
But of course - since this is a Unicode character, the column you're inserting into must be of type NVARCHAR (not VARCHAR)
You can convert it to Unicode NCR format before you store to database, or just encode it with related functions of the language you are using , like JavaScript's encodeuricomponent, PHP's urlencode.
You can use 'N' ahead of data.
This query might be helpful to you.
insert into product_master(product_name) values(N'कंप्यूटर')

Do I have use the prefix N in the "insert into" statement for unicode?

Like:
insert into table (col) values (N'multilingual unicode strings')
I'm using SQL Server 2008 and I already use nVarChar as the column data type.
You need the N'' syntax only if the string contains characters which are not inside the default code page. "Best practice" is to have N'' whenever you insert into an nvarchar or ntext column.
Yes, you do if you have unicode characters in the strings.
From books online (http://msdn.microsoft.com/en-us/library/ms191313.aspx)...
"Unicode string constants that appear in code executed on the server, such as in stored procedures and triggers, must be preceded by the capital letter N. This is true even if the column being referenced is already defined as Unicode. Without the N prefix, the string is converted to the default code page of the database. This may not recognize certain characters. The requirement to use the N prefix applies to both string constants that originate on the server and those sent from the client."
It is preferable for compatibility sake.
Best practice is to use parameterisation in which case you don't need the N prefix.

Resources