IBM DB2 values displayed as utf-8 text

IBM DB2 values displayed as utf-8 text - database

Once I connect to the database (DB2) to check the values in the tables, if they have special chars then I see their utf-8 text value:
I expected instead to see the correct: Tükörfúrógép.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
More Info:
Connected to DB with Intellij and also tried with DbVisualizer.
The following JDBC connection was used in intellij:
jdbc:db2://(...)?characterEncoding=UTF-8;
Tried both with the characterEncoding and without getting the same results.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
DB Version: v11 LUW
JDBC: com.ibm.db2.jcc -- db2jcc4 -- Version 10.5
Encoding being used: UTF-8
db2 "select char(value,10), char(name,10) from sysibmadm.dbcfg where
name like 'code%'"
1 2
---------- ---------- 1208 codepage UTF-8 codeset
2 record(s) selected.
UPDATE 1:
I was able to directly insert in the database values with special
chars, so starting to think this is not DB2 configuration missing but
maybe jdbc or other related issue.

You must have the following HEX string representation for given string Tükörfúrógép in UTF-8 database:
54C3BC6BC3B67266C3BA72C3B367C3A970.
But you have the following instead with repeating garbage symbols:
54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970
You may try to manually remove such a byte sequence with the following statement, but it's better to understand a root cause of such a garbage appearance in this column.
VALUES REPLACE (x'54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970', x'83C2', '');
SELECT REPLACE (TOWN, x'83C2', '') FROM ...;

Related

How to insert XML into SQL Server when it contains escaped invalid characters

I'm trying to insert some XML into a SQL Server database table which uses column type XML.
This works fine most of the time, but one user submitted some XML with the character with hex value 3, and SQL Server gave the error "hexadecimal value 0x03, is an invalid character."
Now I want to check, and remove, any invalid XML characters before doing the insert, and there are various articles suggesting how invalid XML characters can be replaced using regex or something similar.
However, the problem for me is that the user submitted the XML document with the invalid character escaped i.e. "", and none of the methods I've found will detect this. This is also why the error was not detected earlier: it's only when inserting it into the SQL database that the problem occurs.
Has anyone written a function that will check for all escaped invalid XML characters? I suppose the character above could have been written as  or , or lots of other ways, so it's quite hard to catch them all.
Thanks in advance for any help you can offer.

You could try importing the XML to a temporary varchar(max) variable or table column and use REPLACE to strip out the offending characters, then insert the cleansed string into the destination CASTing it to XML

Pentaho insert operation error due to a collation

I have a Pentaho transformation which executes a procedure from a database and then insert the rows into another database.
The source database is a sql server 2008 r2 database, not utf8 charset, Latin_general_ci, and the destiny databanase is postgresql with utf8 charset.
If I execute the ETL it throws an error when attempt to insert the following statement:
INSERT INTO aux (name, account, id, state) VALUES ( 'UAN 5 BAR ','01082082R','UY903847JDNF','BAJA')
As you can see in the name value field exists some unknown characters. Pentaho shows that with some square images,exactly 6. If I copy the insert statement from log the row is break in this point so I understant that it is a break of line or something like that.
I solved this for other rows, that the hidden chars are different, but in this case I can not solve this. Furthermore I would like to find a solution to solve all the possible problems of charset.
Anyone knows how to solve that?
In the other hidden char I solved it applying cast(name as binary) but this does not workd for this other case.
EDIT
The value 'UAN 5 BAR ' has unknown chars just after the word BAR until the quote char '.
When I say unknown means weird chars that I can not see what they are.

Google Cloud Sql Second Generation Utf8Mb4 Encoding

We are using Google Cloud Sql Second Generation with our AppEngine application.
However today we've discovered some problem, we cannot insert emoji characters to our database because we cannot change some server flags to utf8mb4 character encoding.
We've changed
character_set_server flag to utf8mb4 but it wasnt enough
We have to change:
character_set_system
character_set_client
collaction_connection
flags to utf8mb4 also, but second generation db is not allowing root user to change those flags.
What can we do to solve this problem
Does anyone have any idea about that?
Thanks

You have to set character_set_server to utf8mb4, change the columns you need to utf8mb4 and create a new Cloud SQL 2nd gen instance with the new flag (!!). Basically, setting the flag on an existing instance and just restarting (tested with 5.7) will not be enough (is this a bug? I did not find it in the docs). Any encoding related connection parameters are not needed and should be removed. The collation will be the standard collation for utf8mb4 which is perfect for me (and probably most cases), even without setting anything.

We had the exact same problem. Setting character_set_server to utf8mb4 wasn't enough. We could insert emojis through MySQL Workbench, but not through our application.
In our case, this problem went away after we started a new instance running MySQL 5.7 instead of 5.6. So my hypothesis is that in 5.7, but not in 5.6, changing the character_set_server flag lets Google Cloud SQL change those other flags you mention, or some other relevant setting.
Of course if you are already running 5.7, this does not apply to you.

SHOW CREATE TABLE -- that will probably say that the column(s) are CHARACTER SET utf8. That need to be fixed with
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci;

For me, I've found that using the AppEngine Console->SQL and edit the character_set_server to utf8mb4 and restart the DB does work!

I have old java project with second generation database and emoji was working fine, without using anything else in the connection string. Just two things:
to set character_set_server flag to utf8mb4,
and to create the database with utf8mb4.
(Skip to Finally if you don't want to read it all.) Now I have this problem on python and nothing is working. I have to solve this so I will write what I have found.
I have tried to (this below is not working, is just what I have tried):
1 remove the flag , to restart the instance, to add the flag , to restart again
2 I have set ?charset=utf8 in the connection string and the library returned error: Invalid utf8 character string: 'F09F98'
3 I have set ?charset=utf8mb4 and the library wrote the value to the database, but instead of emoji there was ??? . So if the library recognizes utf8mb4, and writes it, then the problem is not in the connection from the library, but in the database.
4 I have run
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8mb4' -> this is set from the Google Console
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
Invalid utf8 character string: '\xF0\x9F\x98\x8E' 0,045 sec
SET NAMES utf8mb4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'
'collation_connection', 'utf8mb4_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
SUCCESS
So the problem is in one of those flags.
5 I closed the current connection and reopened again my client so that I have these variables set again to utf8. First I changed the character_set_results and the character_set_client so that I can see the correct result in my client (MysqlWorkbench). I have run the update statement again without success and still ??? in the field. After changing the character_set_connection to utf8mb4 and updating the field again , this time I had emoji in the table. But why character_set_connection. As the tests above show , the connection from the library is already utf8mb4. So at this point I don't understand where to set my connection charset to be utf8mb4 so that the things can start to work.
6 I have tried to create new Cloud SQL instance with the charset flag, and created database with utf8mb4, and table with utf8mb4 (although the tables are created with the default database charset), and the insert statement didn't work again. So the only thing that I can think of is, that the charset=utf8mb4 is not working in the connection string. But it wasn't that. I have tried to remove the charset in the connection string and again the same error as before, when using only utf8 charset in the connectio string
So what is left, I don't know.
7 I have tried to use instance with HDD , not SSD.
8 Tried to connect via Google Cloud shell and to insert row via their console.
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'body' at row 1
Interesting that the cloud shell shows even in the 'show create table' that the default encoding for this table is utf8mb4. So the cloud shell, (Light Bulb) just like mysqlworkbench is connecting with utf8 as default
Finally
The things worked using db.session.execute("SET NAMES 'utf8mb4'") before inserting to the database (in python), (and using ?charset=utf8mb4 only locally). The real problem when testing something like this could be what method you use to check the result in the database. MySQL Workbench was connecting always with the utf8 encoding as default (you can check this using the 'SHOW ...' command above). So first thing to do is to switch the connection in MySQL Workbench (or your client) using SET NAMES 'utf8mb4'. The tests above show ,that Google cloud shell was connected with utf8 by default as well. I searched the internet and found that they cannot use utf8mb4 as default because they wait the utf8mb4 to be the new standard connection in mysql, and becoming such would be named 'utf8'. Also there is no way to make MySQL Workbench to run with utf8mb4 automatically after connection. You should do this by yourself.
Whether or not the problem can occur when reading from the database? I'm about to test this now.

Add this in settings.py,
'OPTIONS': {'charset': 'utf8mb4'}
Very thanks to : Unable to use utf8mb4 character set with CloudSQL on AppEngine Python

SSIS - How to convert real values for Oracle?

I'm facing a problem in a package to import some data from a MySQL table to Oracle table and MS SQL Server table. It works well from MySQL to SQL Server, however I get an error when I want to import to Oracle.
The table I want to import contains an attribute (unitPrice) of data type DT_R8.
The destination data type for Oracle is a DT_NUMBERIC as you can see in the capture.
I added a conversion step to convert the unitPrice data from DT_R8 to DT_NUMERIC.
It doesn't work, I get the following error.
I found the detail of the error :
An ORA-01722 ("invalid number") error occurs when an attempt is made to convert a character string into a number, and the string cannot be converted into a valid number. Valid numbers contain the digits '0' through '9', with possibly one decimal point, a sign (+ or -) at the beginning or end of the string, or an 'E' or 'e' (if it is a floating point number in scientific notation). All other characters are forbidden.
However, I don't know how to fix.
EDIT : I added a component to redirect rows/errors to an Excel file.
The following screenshot show the result of the process including errors :
By browsing the only 3000 rows recorded, It seems the process accept only int values no real. So if the price is equal to 10, it's OK but if it's 10,5 it's failed.
Any idea to solve this issue ?

Your NLS environment does not match the expected one. Default, Oracle assumes that "," is the grouping character and "." is the decimal separator. Make sure that your session uses the correct value for the NLS_NUMERIC_CHARACTERS parameter.
See Setting Up a Globalization Support Environment for docu.

PostgreSQL: unable to save special character (regional language) in blob

I am using PostgreSQL 9.0 and am trying to store a bytea file which contains certain special characters (regional language characters - UTF8 encoded). But I am not able to store the data as input by the user.
For example :
what I get in request while debugging:
<sp_first_name_gu name="sp_first_name_gu" value="ઍયેઍ"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
This is what is stored in DB:
<sp_first_name_gu name="sp_first_name_gu" value="\340\252\215\340\252\257\340\253\207\340\252\215"></sp_first_name_gu><sp_first_name name="sp_first_name" value="aaa"></sp_first_name>
Note the difference in value tag. With this issue I am not able to retrieve the proper text input by the user.
Please suggest what do I need to do?
PS: My DB is UTF8 encoded.

The value is stored correctly, but is escaped into octal escape sequences upon retrieval.
To fix that - change the settings of the DB driver or chose different different encoding/escaping for bytea.
Or just use proper field types for the XML data - like varchar or XML.

Your string \340\252\215\340\252\257\340\253\207\340\252\215 is exactly ઍયેઍ in octal encoding, so postgres stores your data correctly. PostgreSQL escapes all non printable characters, for more details see postgresql documentation, especially section 8.4.2

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight