Encoding issue: £ pound symbol appearing as <?> symbol - database

My database field is set to utf8_general_ci and my websites encoding is utf8.
The £ symbol is coming up as a black diamond with a question mark through the center.
I tried changing it to £ in the database and it just outputted £
I tried a string replace:
$row['Information'] = str_replace("£", "£", $row['Information']);
Nothing seems to work, any ideas?

I tried changing it to £ in the database
Don't. The database should contain raw text, never HTML-encoded content. The time to HTML-encode (using htmlspecialchars()) is when you insert some raw text into HTML at the output templating stage, and not before. Even if you got this to work, you'd only have fixed one character; the other 107025 non-ASCII characters would still break.
Clearly there is a mismatch of encodings here; you must ensure you use the same encoding (preferably UTF-8) everywhere, in particular:
the encoding you've saved the PHP file in, if it contains any non-ASCII characters;
the charset declared on the output page (by Content-Type <meta> or header(), preferably both; if you only use a <meta> to set it and the server is incorrectly configured it may set its own charset overriding yours);
the encoding of the column in the database (each column has its own collation, so just setting it on the table is ineffective);
the encoding used by PHP to talk to MySQL. This should be set using mysql_set_charset.
Unfortunately, none of these settings default to UTF-8.

Before communicating with your database, you need to send the query :
SET NAMES 'UTF-8'
It tells the database to use utf8 encoding for all queries on this connection.

Related

How does SQL Server store these Unicode characters into a column that is VARCHAR(MAX) and not NVARCHAR(MAX)

I have some data which I believe is Unicode and seeing what happens when I store it into my database column which is of VARCHAR(MAX) datatype.
And here's the source, from the file which is UTF-8...
looking for that ‘X’ and • 3 large bedrooms with 2 ensuites and • Main bedroom with ensuite & surround with plantation shutters`
and using the Visual Studio debugger:
=> so 2x apostrophes and 2x bullets.
I thought SQL Server can only store Unicode if the column is of type NVARCHAR?
I'm assuming my source data is not Unicode and therefore, I totally suck at all this Unicode/UTF-8 stuff :(
I thought SQL Server can only store Unicode if the column is of type NVARCHAR?
That's correct. As far as I can guess from your example, it is not storing Unicode. Probably it is storing bytes encoded in Windows code page 1252, which would be the default encoding for a Western install of SQL Server.
Code page 1252 happens to include mappings for characters ‘, ’ and •, so those characters can be safely stored. But step outside that limited repertoire and you'll start losing characters.

Why is Turkish Lira symbol ₺ replaced with ? in SQL server 2008 database

Any idea why the Turkish Lira symbol is replaced by a question mark when I insert it in a table in the database. See the image below
This is not a font issue. This is a Unicode (UTF-16) vs 8-bit Code Page character set issue (i.e. NVARCHAR vs VARCHAR). The character you are trying to use does not exist in the particular Code Page indicated by the default Collation of the DB in which you are executing this query. The Code Page used by the DB's default Collation is relevant here since your string literal is not prefixed with an upper-case "N". If it was, then the string would be interpreted as being Unicode and no conversion would take place. But since you are passing in a non-Unicode string, it will be forced into the current DB's default Collation's Code Page as the query is parsed. Any characters not available in that Code Page, and not having a Best-fit mapping, get turned into "?".
You can run the following to see for yourself:
SELECT '₺';
PRINT '₺';
It both prints AND displays in the results grid as ?
If you want to see what character SQL Server thinks it is, run the following:
SELECT ASCII('₺');
And it will return: 63
If you want to see what character has an ASCII value of 63, run this:
SELECT CHAR(63);
And it will return: ?
Now run this:
SELECT N'₺';
PRINT N'₺';
This will both print and display in the results grid correctly.
To see what character value the symbol really is, run the following:
SELECT UNICODE(N'₺'), UNICODE('₺');
This will return: 8378 and 63
But isn't 63 the question mark? Yes. That is because not prefixing the string literal '₺' with a capital "N" tells SQL Server that it is VARCHAR and so it gets translated to the default unknown character.
Now, if you were to execute this VARCHAR version in a DB that had a Collation tied to a Code Page that had this character, then it would work even when not prefixing the string literal with an upper-case "N". However, at the moment, I cannot find any Code Page used within SQL Server that supports this character. So, it might be a Unicode-only character, at least at far as SQL Server is concerned.
The way to fix this is:
Change the datatype of the field to NVARCHAR (I see in a comment on the question that the field is currently VARCHAR). If the field is VARCHAR then even if you use the N prefix on the string, the character will still get stored as ?, unless the Code Page specified by the Collation of the column supports this character, but again, I think this might be a Unicode-only character.
Change your INSERT statement to prefix the string field with a capital "N": (73, 4, N'(3) ₺'). Even if you change the field to NVARCHAR, if you don't prefix the string with N then SQL Server will translate the character to ? first and then insert the ?. This is because the query gets parsed before it gets executed, and parsing (for non-Unicode string literals and variables) is done in the Code Page of the DB's default Collation
Probably for the same reason my browser isn't displaying it in the title for this question: It isn't in the application's character set (or maybe not supported by the font).
In this case, my browser shows some numbers in a box (denoting the character code).
SQL-server is translating it to a known character instead.
Ensure you're storing it in a field that supports the character in it's character set (I think UTF-8 is sufficient)

PostgreSQL: unable to save special character (regional language) in blob

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.
The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.
Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

cakephp encoding from database

I have problem with encoding characters from database. I am using Postgres with win1250 encoding, but whatever I put in core.php (right now I have this line of code):
Configure::write('App.encoding', 'iso-8859-1');
sometimes it give me some strange letters from database, for example È indstead of Č. Is there anything that I can do to get correct encoding.
NOTE: I can't edit or change anything to database.
I think all you need to do is declaring the right encoding option in your database connection configuration, as described at http://book.cakephp.org/2.0/en/development/configuration.html#database-configuration (scroll a bit).
Look at this particular paragraph:
encoding
Indicates the character set to use when sending SQL statements to the server. This defaults to the database’s default encoding for all databases other than DB2. If you wish to use UTF-8 encoding with mysql/mysqli connections you must use ‘utf8’ without the hyphen.
I had the same issue (with French and Spanish names) in a previous project and I only had to add the following to my $default connection, in the app/Config/database.php configuration file:
'encoding' => 'utf8'
Maybe you need the utf8 connection or the iso-8859-1 you mentionned.
win1250 encoding is similar to iso-8859-2 (see http://en.wikipedia.org/wiki/Windows-1250), so you might want to try that instead of iso-8859-1.

SQL Server Bulk Import With Format File of UTF-8 Data

I have been referring to the following page:
http://msdn.microsoft.com/en-us/library/ms178129.aspx
I simply want to bulk import some data from a file that has Unicode characters. I have tried encoding the actual data file in UC-2, UTF-8, etc but nothing works. I have also modified the format file to use SQLNCHAR, but still it doesn't work and gives error:
Bulk load data conversion error (truncation) for row 1, column 1
I think it has to do with this statement from the above link:
For a format file to work with a Unicode character data file, all the
input fields must be Unicode text strings (that is, either fixed-size
or character-terminated Unicode strings).
What exactly does this mean? I thought this means every character string needs to be a fixed 2 bytes, which encoding the file in UCS-2 should handle???
This blog post was really helpful and solved my problem:
http://blogs.msdn.com/b/joaol/archive/2008/11/27/bulk-insert-using-unicode-data-files.aspx
Something else to note - a Java class was generating the data file. In order for the above solution to work, the data file needed to be encoded in UTF-16LE, which can be set in the constructor of OutputStreamWriter (for example).
In SQL Server 2012 I imported a .csv file saved with Notepad++ enconded in UCS-2 with special spanish characters

Resources