French Characters - sql-server

I need to do an update with french characters in MS SQL SERVER, the problem is that I don't know where I can find a conversion list. For example I identified that this character É means É.
Where can I find the all corresponding list of symbols for each French character?
Thank you.

I don't think "É" is encoded as "É", that would be a very strange way to encode the character, using other special characters.
It seems rather the opposite:
I've noticed "É" is sometimes shown as ""é". It's an encoding character error, due to character encoding not being appropriate for the language.
If you try to correct the righ French character from your error display characters, I don't know if such a list exist. You would first need to know what kind of error in the encoding character you make to find the list of the wrong displays.
So my suggested solution is here: Find the proper character encoding to solve this issue. Changing the encoding will normally fix the display.
For French characters, the right ones are iso-8859-1 and utf-8. Once you have the right encoding, you can find a conversion list, translating your source code into displayed characters.
For instance, this list: http://www.fileformat.info/info/charset/ISO-8859-1/list.htm

I've seen similar in a nvarchar -> varchar conversion gone wrong.
Especially when concatenating and doing this:
e.FirstName + ' ' + e.LastName
Instead of doing:
e.FirstName + N' ' + e.LastName
Doing a cast(nvarchar_field AS varchar(50)) - or the opposite, as required.
Bring everything to NVARCHAR, as this supports Unicode, which supports UTF-8.
Perhaps the actual table needs to be modified if the fields are varchar instead of nvarchar.

Related

Values shifting to end of line because of encoding

I'm trying to create a .CSV file from a sql statement, in ssis. The dile includes names with special characters like Latin and German letters.
I'm using Unicode but there is one word in Arab who keeps skipping to the end of the line and not in the place it belongs.
I tried replacing special characters with replace char(10), char(13) etc, But it didn't help.
I've also tried using UTF8 encoding but I still need to mark Unicode because of the other Latin letters.
First of all you should use UTF-8 encoding. Then make sure you are trying to store data to a nvarchar data type.

Encoding error reading Greek characters string form SQL database

I have a search form (with method GET) with only one text field named “search_field”. When a user submits the form, the typed by the user characters are posted to the URL. For example if the user type "blablabla" the generated URL will be something like that:
results.asp?search_field=blablabla
In my MSSQL 2012 database I have a table named “Products” with a column named “kodikos” in it.
I want to display all the records from the column “kodikos” containing the typed characters. My SQL select statement if the following:
"SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE '%' + ? + '%' "
(the question mark is the “search_field” that contains the typed by the user characters.
All the above works perfect and I am getting the correct results. The problem that I am facing is with the Greek characters. For example when the user type “fff” my codes works perfect and finds all the records containing the characters “fff”. Also works perfect with numbers too. But if the user type in Greek characters “φφφ” I am not getting any results. And there are a lot of records with “φφφ”. The problem is that the Greek characters are not recognized at all.
For your information:
In my local PC with the same SQL version the Greek characters are recognized correctly with my code, because my regional settings are set in Greek. But the same code in the hosting server in US does not recognize them.
All of my pages have UTF-8 encoding.
Can someone have any idea to solve this issue???
SQL Server knows two encodings natively:
2-byte-unicode (in most cases NVARCHAR)
extended ASCII in connection with a collation (in most cases VARCHAR)
I assume, that the language you are calling this from is using 2-byte-unicode for normal strings. This is pretty usual today...
I assume, that your column Products.kodikos is of type NVARCHAR (2-byte-unicode). In this case it should help to force your search string to be 2-byte-unicode too. Try
LIKE N'%' + CAST(? AS NVARCHAR(MAX)) + N'%'
If your column is not 2-byte encoded it might help to use COLLATE to force your search string to know your special characters.
If you pass this string into a SQL-Server routine as-is, you should make sure, that the accepting parameter is 2-byte-unicode too.
You have to make sure your search string is two byte encoded using the N'' notation...
For instance, the following query uses a string that is two byte encoded:
SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE N'%φφφ%'
But this query uses a string that is not two byte encoded (you won't get any results):
SELECT * FROM dbo.Products WHERE dbo.Products.kodikos LIKE '%φφφ%'

Unicode issue in sql server, some characters cannot be saved in a varchar field

I'm dealing with unicode stuff in my DB. I have a data field defined as varchar(max),
and I'm preventing user to save unknown characters in this field, like "≤" for example (all unicode above U+00FF).
While doing so, I found that some characters if sent to be saved in this field would be displayed as "?", so I thought that all unicode characters above "U+00FF" will all be displayed like this, but then I found that "U+201B" which is "‛" is displayed "?" but the next character "U+201C" which is "“" is displayed as "“".
Can someone please explain to me why is that?
Update: Sorry if I was not clear, but I do not want to convert to nvarchar, I want to keep my field as varchar.
What I need to understand is why a character like "‛" is displayed as "?" in a "varchar" field while the next unicode character "“" is displayed properly?
If you want to store Unicode characters, you should use an nvarchar type, not varchar
You need to change your data type to nvarchar which will hold any unicode character where varchar is restricted to 8bit codepage.
For more information, read the accepted answer in this link below.
Difference between varchar and nvarchar

insert special character in my sql server database

ANSWER :
Sorry about the this sort of question guys, I assumed that it wouldn't work if I directly enter the special character into my string in query but it does. so all you need to do is locate the special character, copy it and paste it into your query and it works :)
folks,
QUESTION CHANGED:
I want to enter a ascii character in the database which is the standard trademark symbol (®) using a direct query and have it read correctly ! how can i do this ?
PREVIOUS QUESTION:
how can i enter a special character in SQL Server in varchar column... ® (there is also a line below this symbol which I am unable to paste here) so that it is read correctly.
Also, I am unable to find the character sequence for that symbol any places where I can look for ?
The symbol is standard ® symbol which hangs on the top and there is a line below it just like an underscore.
Thanks
EDIT 1: I am talking about a direct query to the database.
You can use this T-SQL query:
INSERT INTO dbo.YourTable(UnicodeCol)
VALUES(nchar(0x00AE))
® is the Unicode character with code 0x00AE
But of course - since this is a Unicode character, the column you're inserting into must be of type NVARCHAR (not VARCHAR)
You can convert it to Unicode NCR format before you store to database, or just encode it with related functions of the language you are using , like JavaScript's encodeuricomponent, PHP's urlencode.
You can use 'N' ahead of data.
This query might be helpful to you.
insert into product_master(product_name) values(N'कंप्यूटर')

SQL Server BULK INSERT - Escaping reserved characters

There's very little documentation available about escaping characters in SQL Server BULK INSERT files.
The documentation for BULK INSERT says the statement only has two formatting options: FIELDTERMINATOR and ROWTERMINATOR, however it doesn't say how you're meant to escape those characters if they appear in a row's field value.
For example, if I have this table:
CREATE TABLE People ( name varchar(MAX), notes varchar(MAX) )
and this single row of data:
"Foo, \Bar", "he has a\r\nvery strange name\r\nlol"
...how would its corresponding bulk insert file look like, because this wouldn't work for obvious reasons:
Foo,\Bar,he has a
very strange name
lol
SQL Server says it supports \r and \n but doesn't say if backslashes escape themselves, nor does it mention field value delimiting (e.g. with double-quotes, or escaping double-quotes) so I'm a little perplexed in this area.
I worked-around this issue by using \0 as a row separator and \t as a field separator, as neither character appeared as a field value and are both supported as separators by BULK INSERT.
I am surprised MSSQL doesn't offer more flexibility when it comes to import/export. It wouldn't take too much effort to build a first-class CSV/TSV parser.
For the next person to search:
I used "\0\t" as a field separator, and "\0\n" for the end-of-line separator on the last field. Use of "\0\r\n" would also be acceptable if you wish to pretend that the files have DOS EOL conventions.
For those unfamiliar with the \x notation, \0 is CHAR(0), \t is CHAR(9), \n is CHAR(10) and \r is CHAR(13). Replace the CHAR() function with whatever your language offers to convert a number to a nominated character.
With this combination, all instances of \t and \n (and \r) become acceptable characters in the data file. After all, the weakness of the bulk upload system is that tabs and newlines are often legitimate characters in text strings, whereas other low-ASCII characters like CHAR(0), CHAR(1) and CHAR(2) are not legal text - not even appearing in UTF-8.
The only character you cannot have in your data is \0 - UNLESS you can guarantee it will never be followed by \t or \n (or \r)
If your language suffers problems when you use \0 in strings (but depending on how you code, you may still be able to avoid that problem) - AND if you know that your data won't have CHAR(1) or CHAR(2) in it (ie no binary) then use those characters instead. Those low characters are only going to be found when you are trying to store arbitrary binary data in strings.
Note also that you will find bytes 0, 1, 2 in UTF-16, UCS-2 and UTF-32 (aka UCS-4) - BUT - the 2 or 4 byte wide representation of CHAR(0, 1 or 2) is still acceptable and distinct from any legal unicode text. Just make sure you select the correct codepage setting in the format file to suit your choice of a UTF or UCS variant.
A bulk insert needs to have corresponding fields and field count for each row. Your example is a little rough, as its not structured data. As for thecharacters it will interpret them literally, not using escape characters (your string will be as seen in the file.
As for the double quotes enclosing each field, you will just have to use them as field and row terminators as well. So now your you should have:
Fieldterminator = '","',
Rowterminator = '"\n'
Does that make sense? Then after the bulk insert you'll need to take out the prefix double quote with something like:
Update yourtable
set yourfirstcolumn = right(yourfirstcolumn, len(yourfirstcolumn) - 1)

Resources