Google Cloud Sql Second Generation Utf8Mb4 Encoding - google-app-engine

We are using Google Cloud Sql Second Generation with our AppEngine application.
However today we've discovered some problem, we cannot insert emoji characters to our database because we cannot change some server flags to utf8mb4 character encoding.
We've changed
character_set_server flag to utf8mb4 but it wasnt enough
We have to change:
character_set_system
character_set_client
collaction_connection
flags to utf8mb4 also, but second generation db is not allowing root user to change those flags.
What can we do to solve this problem
Does anyone have any idea about that?
Thanks

You have to set character_set_server to utf8mb4, change the columns you need to utf8mb4 and create a new Cloud SQL 2nd gen instance with the new flag (!!). Basically, setting the flag on an existing instance and just restarting (tested with 5.7) will not be enough (is this a bug? I did not find it in the docs). Any encoding related connection parameters are not needed and should be removed. The collation will be the standard collation for utf8mb4 which is perfect for me (and probably most cases), even without setting anything.

We had the exact same problem. Setting character_set_server to utf8mb4 wasn't enough. We could insert emojis through MySQL Workbench, but not through our application.
In our case, this problem went away after we started a new instance running MySQL 5.7 instead of 5.6. So my hypothesis is that in 5.7, but not in 5.6, changing the character_set_server flag lets Google Cloud SQL change those other flags you mention, or some other relevant setting.
Of course if you are already running 5.7, this does not apply to you.

SHOW CREATE TABLE -- that will probably say that the column(s) are CHARACTER SET utf8. That need to be fixed with
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci;

For me, I've found that using the AppEngine Console->SQL and edit the character_set_server to utf8mb4 and restart the DB does work!

I have old java project with second generation database and emoji was working fine, without using anything else in the connection string. Just two things:
to set character_set_server flag to utf8mb4,
and to create the database with utf8mb4.
(Skip to Finally if you don't want to read it all.) Now I have this problem on python and nothing is working. I have to solve this so I will write what I have found.
I have tried to (this below is not working, is just what I have tried):
1 remove the flag , to restart the instance, to add the flag , to restart again
2 I have set ?charset=utf8 in the connection string and the library returned error: Invalid utf8 character string: 'F09F98'
3 I have set ?charset=utf8mb4 and the library wrote the value to the database, but instead of emoji there was ??? . So if the library recognizes utf8mb4, and writes it, then the problem is not in the connection from the library, but in the database.
4 I have run
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8mb4' -> this is set from the Google Console
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
Invalid utf8 character string: '\xF0\x9F\x98\x8E' 0,045 sec
SET NAMES utf8mb4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'
'collation_connection', 'utf8mb4_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
SUCCESS
So the problem is in one of those flags.
5 I closed the current connection and reopened again my client so that I have these variables set again to utf8. First I changed the character_set_results and the character_set_client so that I can see the correct result in my client (MysqlWorkbench). I have run the update statement again without success and still ??? in the field. After changing the character_set_connection to utf8mb4 and updating the field again , this time I had emoji in the table. But why character_set_connection. As the tests above show , the connection from the library is already utf8mb4. So at this point I don't understand where to set my connection charset to be utf8mb4 so that the things can start to work.
6 I have tried to create new Cloud SQL instance with the charset flag, and created database with utf8mb4, and table with utf8mb4 (although the tables are created with the default database charset), and the insert statement didn't work again. So the only thing that I can think of is, that the charset=utf8mb4 is not working in the connection string. But it wasn't that. I have tried to remove the charset in the connection string and again the same error as before, when using only utf8 charset in the connectio string
So what is left, I don't know.
7 I have tried to use instance with HDD , not SSD.
8 Tried to connect via Google Cloud shell and to insert row via their console.
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'body' at row 1
Interesting that the cloud shell shows even in the 'show create table' that the default encoding for this table is utf8mb4. So the cloud shell, (Light Bulb) just like mysqlworkbench is connecting with utf8 as default
Finally
The things worked using db.session.execute("SET NAMES 'utf8mb4'") before inserting to the database (in python), (and using ?charset=utf8mb4 only locally). The real problem when testing something like this could be what method you use to check the result in the database. MySQL Workbench was connecting always with the utf8 encoding as default (you can check this using the 'SHOW ...' command above). So first thing to do is to switch the connection in MySQL Workbench (or your client) using SET NAMES 'utf8mb4'. The tests above show ,that Google cloud shell was connected with utf8 by default as well. I searched the internet and found that they cannot use utf8mb4 as default because they wait the utf8mb4 to be the new standard connection in mysql, and becoming such would be named 'utf8'. Also there is no way to make MySQL Workbench to run with utf8mb4 automatically after connection. You should do this by yourself.
Whether or not the problem can occur when reading from the database? I'm about to test this now.

Add this in settings.py,
'OPTIONS': {'charset': 'utf8mb4'}
Very thanks to : Unable to use utf8mb4 character set with CloudSQL on AppEngine Python

Related

IBM DB2 values displayed as utf-8 text

Once I connect to the database (DB2) to check the values in the tables, if they have special chars then I see their utf-8 text value:
I expected instead to see the correct: Tükörfúrógép.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
More Info:
Connected to DB with Intellij and also tried with DbVisualizer.
The following JDBC connection was used in intellij:
jdbc:db2://(...)?characterEncoding=UTF-8;
Tried both with the characterEncoding and without getting the same results.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
DB Version: v11 LUW
JDBC: com.ibm.db2.jcc -- db2jcc4 -- Version 10.5
Encoding being used: UTF-8
db2 "select char(value,10), char(name,10) from sysibmadm.dbcfg where
name like 'code%'"
1 2
---------- ---------- 1208 codepage UTF-8 codeset
2 record(s) selected.
UPDATE 1:
I was able to directly insert in the database values with special
chars, so starting to think this is not DB2 configuration missing but
maybe jdbc or other related issue.
You must have the following HEX string representation for given string Tükörfúrógép in UTF-8 database:
54C3BC6BC3B67266C3BA72C3B367C3A970.
But you have the following instead with repeating garbage symbols:
54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970
You may try to manually remove such a byte sequence with the following statement, but it's better to understand a root cause of such a garbage appearance in this column.
VALUES REPLACE (x'54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970', x'83C2', '');
SELECT REPLACE (TOWN, x'83C2', '') FROM ...;

ColdFusion 9.01 -> Lucee 5.3.3.62 and <cfinsert> / <cfupdate>

I’ve inherited a big application which is running on CF 9.01.
I’m in the process to port it to Lucee 5.3.3.62, but have some problems with and
I know that I should replace it with , but this application has ~1000 source files (!!), and replacing all those tags is currently not obvious for timing reasons.
Lucee is throwing errors like:
“An object or column name is missing or empty. For SELECT INTO
statements, verify each column has a name. For other statements, look
for empty alias names. Aliases defined as “” or are not allowed.
Change the alias to a valid name.”
At first, I thought there were problems with date field, because Lucee is handling them differently than CF 9.01, but this is not the case.
So, I created a test table (on MS-SQL Server 2008R2):
CREATE TABLE [dbo].[LuceeTest01](
[Field1] [nvarchar](50) NULL,
[Field2] [nvarchar](50) NULL ) ON [PRIMARY]
In Lucee, I’m using as datasource: Microsoft SQL Server (Vendor Microsoft), called “one”
This is my test application:
<cfset Form.Field1 = "Field1">
<cfset Form.Field2 = "Field2">
<cfoutput>
<cfinsert datasource="one"
tablename="LuceeTest01"
formfields="Field1, Field2">
</cfoutput>
When I run this, I get the same error. Any idea why?
Full trace here: https://justpaste.it/6k0hw
Thanks!
EDIT1:
Curious. I tried using “jTDS Type 4 JDBC Driver for MS SQL Server and Sybase” as datasource driver, and now the error is:
The database name component of the object qualifier must be the name
of the current database.
This traces back to this statement:
{call []..sp_columns 'LuceeTest01', '', '', 'null', 3}
When I try this in the Microsoft SQL Server Management Studio, I get the same error.
However, when I specify the database name (‘one’ as third argument), no error in MS SQL SMS.
EXEC sp_columns 'LuceeTest01', '', 'one', 'null', 3
Shouldn’t Lucee take this argument from the datasource configuration or something?
EDIT2:
As suggested by #Redtopia, when "tableowner" and "tablequalifier" are specified, it works for the jTDS driver. Will use this as workaround.
Updated sample code:
<cfset Form.Field1 = "Field1">
<cfset Form.Field2 = "Field2">
<cfinsert datasource="onecfc"
tableowner="dbo"
tablename="LuceeTest01"
tablequalifier="one"
formfields="Field1,Field2">
EDIT3:
Bug filed here: https://luceeserver.atlassian.net/browse/LDEV-2566
I personally would refactor CFINSERT into queryExecute and write a plain InsertInto SQL statement. I wish we would completely remove support for cfinsert.
Consider using
<cfscript>
Form.Field1 = "Field1";
Form.Field2 = "Field2";
// Don't forget to setup datasource in application.cfc
QueryExecute("
INSERT INTO LuceeTest01 (Field1, Field2)
VALUES (?, ?)
",
[form.field1, form.field2]
);
</cfscript>
I am 99% confident that this is a Lucee / JDK / JDBC Driver bug and not a fault in your config.
Source:
I initially suspected some low-hanging fruit such as your leading whitespace in ' Field2'. Then I saw your comment showing that you had tried with that trimmed and your Edit1 with the different error when using a different DB Driver. So I set to work trying to reproduce your issue.
On Lucee 5.2.4.37 and MS SQL Server 2016, armed with your sample code and two new datasources - one each for jTDS (MSQL and Sybase) driver and Microsoft SQL Server (JDBC4 - Vendor Microsoft) on SQL, I was unable to reproduce either issue on either driver. Even when selectively taking away various DB permissions and changing default DB for the SQL user, I was still only able to force different (expected) errors, not your error.
As soon as I hit the admin update to Lucee 5.3.3.62 and re-ran the tests, boom I hit both of your errors with the respective datasources, with no other change in DB permissions, datasource config or sample code.
Good luck convincing the Lucee guys that this anecdotal evidence is proof of a bug, but give me a shout if you need an extra voice. Whilst I don't use cfinsert/cfupdate in my own code, I have in the recent past been in the position of supporting a legacy CF application of similar sounding size and nature and empathise with the logistical challenges surrounding refactoring or modernising it!
Edit:
I tried the tablequalifier suggestion from #Redtopia in a comment above. Adding just the tablequalifier attribute did not work for me with either DB driver.
Using both tablequalifier="dbname" and tableowner="dbo" still didn't work for me with the MS SQL Server driver, but does seem to work for the jTDS driver, so it's a possible workaround meaning changing every occurrence of the tag, so ideally the Lucee guys will be able to fix the bug from their end or identify which Java update broke it if Lucee itself didn't.

saved as utf-8; still displays?

I've saved all my .php files as utf-8, and everything works fine. whenever I enter certain characters, such as å and ë, it works just fine. however, with the data is retrieved from the database through a mysqli query, and then put into the $page variable which eventually is echo'ed at the end of the document, it displays a ? where there should be a ë. when I make an empty page, with only the dbconnect, the query, and make that echo nothing but the table from the database that contains this character, it works just fine.
my question is: how can I make sure it does show the ë and å and such, and not a ? ?
Your database is probably configured "wrong". When the fields holding your to-be-echoed text are not configured with the proper charset (utf-8), the data you retrieve via php are not either.
So, configure your database properly and insert the data there after that.
Also see How to make MySQL handle UTF-8 properly.

cakephp encoding from database

I have problem with encoding characters from database. I am using Postgres with win1250 encoding, but whatever I put in core.php (right now I have this line of code):
Configure::write('App.encoding', 'iso-8859-1');
sometimes it give me some strange letters from database, for example È indstead of Č. Is there anything that I can do to get correct encoding.
NOTE: I can't edit or change anything to database.
I think all you need to do is declaring the right encoding option in your database connection configuration, as described at http://book.cakephp.org/2.0/en/development/configuration.html#database-configuration (scroll a bit).
Look at this particular paragraph:
encoding
Indicates the character set to use when sending SQL statements to the server. This defaults to the database’s default encoding for all databases other than DB2. If you wish to use UTF-8 encoding with mysql/mysqli connections you must use ‘utf8’ without the hyphen.
I had the same issue (with French and Spanish names) in a previous project and I only had to add the following to my $default connection, in the app/Config/database.php configuration file:
'encoding' => 'utf8'
Maybe you need the utf8 connection or the iso-8859-1 you mentionned.
win1250 encoding is similar to iso-8859-2 (see http://en.wikipedia.org/wiki/Windows-1250), so you might want to try that instead of iso-8859-1.

Automatic character encoding handling in Perl / DBI / DBD::ODBC

I'm using Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.
The database has a default collation of SQL_Latin1_General_CP1_CI_AS, so data in varchar columns is encoded in Microsoft's version of Latin-1, AKA windows-1252.
There doesn't seem to be a way to handle this transparently in DBI/DBD::ODBC. I get data back still encoded as windows-1252, for instance, € “ ” are encoded as bytes 0x80, 0x93 and 0x94. When I write those to an UTF-8 encoded XML file without decoding them first, they are written as Unicode characters 0x80, 0x93 and 0x94 instead of 0x20AC, 0x201C, 0x201D, which is obviously not correct.
My current workaround is to call $val = Encode::decode('windows-1252', $val) on every column after every fetch. This works, but hardly seems like the proper way to do this.
Isn't there a way to tell DBI or DBD::ODBC to do this conversion for me?
I'm using ActivePerl (5.12.2 Build 1202), with DBI (1.616) and DBD::ODBC (1.29) provided by ActivePerl and updated with ppm; running on the same server that hosts the database (SQL Server 2008 R2).
My connection string is:
dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=localhost;Database=$DB_NAME;Trusted_Connection=yes;
Thanks in advance.
DBD::ODBC (and ODBC API) does not know the character set of the underlying column so DBD::ODBC cannot do anything with 8 bit data returned, it can only return it as it is and you need to know what it is and decode it. If you bind the columns as SQL_WCHAR/SQL_WVARCHAR the driver/sql_server should translate the characters to UCS2 and DBD::ODBC should see the columns as SQL_WCHAR/SQL_WVARCHAR. When DBD::ODBC is built in unicode mode SQL_WCHAR columns are treat as UCS2 and decoded and re-encoded in UTF-8 and Perl should see them as unicode characters.
You need to set SQL_WCHAR as the bind type after bind_columns as bind types are not sticky like parameter types.
If you want to continue reading your varchar data which windows 1252 as bytes then currently you have no choice but to decode them. I'm not in a rush to add something to DBD::ODBC to do this for you since this is the first time anyone has mentioned this to me. You might want to look at DBI callbacks as decoding the returned data might be more easily done in those (say the fetch method).
You might also want to investigate the "Perform Translation for character data" setting in newer SQL Server ODBC Drivers although I have little experience with it myself.

Resources