cakephp encoding from database - cakephp

I have problem with encoding characters from database. I am using Postgres with win1250 encoding, but whatever I put in core.php (right now I have this line of code):
Configure::write('App.encoding', 'iso-8859-1');
sometimes it give me some strange letters from database, for example È indstead of Č. Is there anything that I can do to get correct encoding.
NOTE: I can't edit or change anything to database.

I think all you need to do is declaring the right encoding option in your database connection configuration, as described at http://book.cakephp.org/2.0/en/development/configuration.html#database-configuration (scroll a bit).
Look at this particular paragraph:
encoding
Indicates the character set to use when sending SQL statements to the server. This defaults to the database’s default encoding for all databases other than DB2. If you wish to use UTF-8 encoding with mysql/mysqli connections you must use ‘utf8’ without the hyphen.
I had the same issue (with French and Spanish names) in a previous project and I only had to add the following to my $default connection, in the app/Config/database.php configuration file:
'encoding' => 'utf8'
Maybe you need the utf8 connection or the iso-8859-1 you mentionned.

win1250 encoding is similar to iso-8859-2 (see http://en.wikipedia.org/wiki/Windows-1250), so you might want to try that instead of iso-8859-1.

Related

PostgreSQL: unable to save special character (regional language) in blob

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.
The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.
Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

Manual import into SQL Server 2000 of tab delimited text file does not format international characters

I have searched for this specific solution and while I have found similar queries, I have not found one that solves my issue. I am manually importing a tab-delimited text file of data that contains international characters in some fields.
This is one such character: Exhibit Hall C–D
it's either an em dash or en dash in between the C & D. It copies and pastes fine, but when the data is taken into SQL Server 2000, it ends up looking like this:
Exhibit Hall C–D
The field is nvarchar and like I said, I am doing the import manually through Enterprise Manager. Any ideas on how to solve this?
The problem is that the encoding between the import file and SQL Server is mismatched. The following approach worked for me in SQL Server 2000 importing into a database with the default encoding (SQL_Latin1_General_CP1_CI_AS):
Open the .csv/.tsv file with the free text editor Notepad++, and ensure that special characters appear normal to start with (if not, try Encoding|Encode in...)
Select Encoding|Convert to UCS-2 Little Endian
Save as a new .csv/.tsv file
In SQL Server Enterprise Manager, in the DTS Import/Export Wizard, choose the new file as the data source (source type: Text File)
If not automatically detected, choose File type: Unicode (in preview on this page, the unicode characters will still look like black blocks)
On the next page, Specify Column Delimiter, choose the correct delimiter. Once chosen, Unicode characters should appear correctly in the Preview pane
Complete import wizard
I would try using the bcputility ( http://technet.microsoft.com/en-us/library/ms162802(v=sql.90).aspx ) with the -w parameter.
You may also want to check the text encoding of the input file.

How to read Arabic characters from varchar datatype?

I have an old system that uses varchar datatype in its database to store Arabic names, now the names appear in the database like this:
"ãíÓÇÁ ÇáãÈíÖíä"
Now I am building a new system using VB.NET, how can I read these names to appear in Arabic characters?
Also I need to point out here that the old system even it stores the data as I mentioned earlier it converts the characters in a correct format.
How to display it properly in the new system and in the SQL Server Management Studio?
have you tried nvarchar? you may find some usefull information at the link below
When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server?
I faced the same Problem, and I solved it by two steps:
1.change the datatype of the column in DB into nvarchar
2.use the encoding to change the data into Arabic
I used the following function
private string GetDataWithArabic(string srcData)
{
Encoding iso = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Default;
byte[] unicodeBytes = iso.GetBytes(srcData);
return unicode.GetString(unicodeBytes);
}
but make sure you use this method once on DB data, because it will corrupt the data if used twice
I think your answer is here: "storing and retrieving non english characters" http://aalamrangi.wordpress.com/2012/05/13/storing-and-retrieving-non-english-unicode-characters-hindi-czech-arabic-etc-in-sql-server/

Automatic character encoding handling in Perl / DBI / DBD::ODBC

I'm using Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.
The database has a default collation of SQL_Latin1_General_CP1_CI_AS, so data in varchar columns is encoded in Microsoft's version of Latin-1, AKA windows-1252.
There doesn't seem to be a way to handle this transparently in DBI/DBD::ODBC. I get data back still encoded as windows-1252, for instance, € “ ” are encoded as bytes 0x80, 0x93 and 0x94. When I write those to an UTF-8 encoded XML file without decoding them first, they are written as Unicode characters 0x80, 0x93 and 0x94 instead of 0x20AC, 0x201C, 0x201D, which is obviously not correct.
My current workaround is to call $val = Encode::decode('windows-1252', $val) on every column after every fetch. This works, but hardly seems like the proper way to do this.
Isn't there a way to tell DBI or DBD::ODBC to do this conversion for me?
I'm using ActivePerl (5.12.2 Build 1202), with DBI (1.616) and DBD::ODBC (1.29) provided by ActivePerl and updated with ppm; running on the same server that hosts the database (SQL Server 2008 R2).
My connection string is:
dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=localhost;Database=$DB_NAME;Trusted_Connection=yes;
Thanks in advance.
DBD::ODBC (and ODBC API) does not know the character set of the underlying column so DBD::ODBC cannot do anything with 8 bit data returned, it can only return it as it is and you need to know what it is and decode it. If you bind the columns as SQL_WCHAR/SQL_WVARCHAR the driver/sql_server should translate the characters to UCS2 and DBD::ODBC should see the columns as SQL_WCHAR/SQL_WVARCHAR. When DBD::ODBC is built in unicode mode SQL_WCHAR columns are treat as UCS2 and decoded and re-encoded in UTF-8 and Perl should see them as unicode characters.
You need to set SQL_WCHAR as the bind type after bind_columns as bind types are not sticky like parameter types.
If you want to continue reading your varchar data which windows 1252 as bytes then currently you have no choice but to decode them. I'm not in a rush to add something to DBD::ODBC to do this for you since this is the first time anyone has mentioned this to me. You might want to look at DBI callbacks as decoding the returned data might be more easily done in those (say the fetch method).
You might also want to investigate the "Perform Translation for character data" setting in newer SQL Server ODBC Drivers although I have little experience with it myself.

Encoding issue: £ pound symbol appearing as <?> symbol

My database field is set to utf8_general_ci and my websites encoding is utf8.
The £ symbol is coming up as a black diamond with a question mark through the center.
I tried changing it to £ in the database and it just outputted £
I tried a string replace:
$row['Information'] = str_replace("£", "£", $row['Information']);
Nothing seems to work, any ideas?
I tried changing it to £ in the database
Don't. The database should contain raw text, never HTML-encoded content. The time to HTML-encode (using htmlspecialchars()) is when you insert some raw text into HTML at the output templating stage, and not before. Even if you got this to work, you'd only have fixed one character; the other 107025 non-ASCII characters would still break.
Clearly there is a mismatch of encodings here; you must ensure you use the same encoding (preferably UTF-8) everywhere, in particular:
the encoding you've saved the PHP file in, if it contains any non-ASCII characters;
the charset declared on the output page (by Content-Type <meta> or header(), preferably both; if you only use a <meta> to set it and the server is incorrectly configured it may set its own charset overriding yours);
the encoding of the column in the database (each column has its own collation, so just setting it on the table is ineffective);
the encoding used by PHP to talk to MySQL. This should be set using mysql_set_charset.
Unfortunately, none of these settings default to UTF-8.
Before communicating with your database, you need to send the query :
SET NAMES 'UTF-8'
It tells the database to use utf8 encoding for all queries on this connection.

Resources