saved as utf-8; still displays? - database

I've saved all my .php files as utf-8, and everything works fine. whenever I enter certain characters, such as å and ë, it works just fine. however, with the data is retrieved from the database through a mysqli query, and then put into the $page variable which eventually is echo'ed at the end of the document, it displays a ? where there should be a ë. when I make an empty page, with only the dbconnect, the query, and make that echo nothing but the table from the database that contains this character, it works just fine.
my question is: how can I make sure it does show the ë and å and such, and not a ? ?

Your database is probably configured "wrong". When the fields holding your to-be-echoed text are not configured with the proper charset (utf-8), the data you retrieve via php are not either.
So, configure your database properly and insert the data there after that.
Also see How to make MySQL handle UTF-8 properly.

Related

"?" character in MSSQL DB getting replaced with (capital A with grave accennt) when displayed by ASP script

I'm attempting to provide support for a legacy ASP/MSSQL web application - I wasn't involved in the development of the software (the company that built it no longer exists) & I'm not the admin of the server where it's hosted, I just manage the hosting for the owners of the site via a reseller account. I'm also not an ASP developer (more a PHP guy), and am not that familiar with it beyond the basics - updating DB connection strings after server migrations, etc.
The issue is that the site in question stores the content of individuals pages in an MSSQL database, and much of the content includes links. Almost all of the internal links on the site are format like "main.asp?123" (with "123" being the ID of a database row). The problem is, starting sometime in the last 8 months or so*, something caused the links in the DB content to show up as "main.aspÀ123" instead - in other words, the "?" character is being replaced by the "À" character (capital A with grave accent). Which, of course, breaks all of those links. Note that Stackoverflow won't allow me to include that character in the post title, because it seems to think that it indicates I'm posting in Spanish...?
(*unfortunately I don't know the timing beyond that, the site owners didn't know when the issue started occurring, so all I have to go by is an archive.org snapshot from last October, where it was working)
I attempted to manually change the "?" character in one of the relevant DB records to "?" (the HTML entity for the question mark), but that didn't make any difference. I also checked the character encoding of the HTML code used to display the content, but that doesn't seem to be the cause either - the same ASP files contain hard-coded links to some of the same pages (formatted exactly the same way), and those work correctly: the "?" doesn't get replaced.
I've also connected to the database directly with the MSSQL Management Studio Express application, but couldn't find any charset/character encoding options for either the database or the table.
And I've tried contacting the hosting provider, but they (M247 UK, in case anyone is curious) have been laughably unhelpful. The responses from them have been along the lines of "durrrrrr, we checked a totally different link that wasn't actually the one that you clearly described AND highlighted in a screenshot, and it works when we check the wrong link, so the problem must be resolved, right?" Suffice it to say, I wouldn't recommend them - used to be a customer of RedFox hosting, and the quality of customer has dropped off substantially since M247 bought them.
Any suggestions? If this were PHP/MySQL, I'd probably start by creating a small test script that did nothing but fetch one of the relevant records and display it's contents, to narrow down the issue - but I'm not familiar enough with ASP to do that here, at least not without a fair amount of googl'ing (and most of the info I can find is specific to ASP.net instead).
Edit: the thread suggested as a solution appears to be for character encoding issues when writing to MSSQL, not reading from it - and I've tried the solutions suggested in that thread, none make any difference.
Looks like you're converting from UNICODE to ASCII somewhere along the line...
Have a look at this to get a quick demo of what happens. In particular, pay attention to the ascii derived from inr, versus the ascii derived from unicode...
SELECT
t.n,
ascii_char = CHAR(t.n),
unicode_char = NCHAR(t.n),
unicode_to_ascii = CONVERT(varchar(10), NCHAR(t.n))
FROM (
SELECT TOP (1024)
n = ROW_NUMBER() OVER (ORDER BY ao.object_id)
FROM
sys.all_objects ao
) t
WHERE 1 = 1
--AND CONVERT(varchar(10), NCHAR(t.n)) ='À'
;
I found a workaround that appears to do the trick: I was previously trying to replace the ? in the code with &#63 (took out the ; so that it will show the code rather than the output), which didn't work. BUT it seems to work if I use &quest instead.
One thing to note, it seemed that I was originally incorrect in thinking that the issue was only affecting content being read/displayed from the MSSQL DB. Rather, it looks like the same problem was also occurring with static content being "echo'd" by code in the ASP scripts (I'm more of a PHP guy, not sure the correct term is for ASP's equivalent to echo is). Though the links that were hardcoded as static (rather HTML being dynamically output by ASP) were unaffected. Though chancing the ? to &quest worked for those ones too (hardest part was tracking down the file I needed to edit).

Google Cloud Sql Second Generation Utf8Mb4 Encoding

We are using Google Cloud Sql Second Generation with our AppEngine application.
However today we've discovered some problem, we cannot insert emoji characters to our database because we cannot change some server flags to utf8mb4 character encoding.
We've changed
character_set_server flag to utf8mb4 but it wasnt enough
We have to change:
character_set_system
character_set_client
collaction_connection
flags to utf8mb4 also, but second generation db is not allowing root user to change those flags.
What can we do to solve this problem
Does anyone have any idea about that?
Thanks
You have to set character_set_server to utf8mb4, change the columns you need to utf8mb4 and create a new Cloud SQL 2nd gen instance with the new flag (!!). Basically, setting the flag on an existing instance and just restarting (tested with 5.7) will not be enough (is this a bug? I did not find it in the docs). Any encoding related connection parameters are not needed and should be removed. The collation will be the standard collation for utf8mb4 which is perfect for me (and probably most cases), even without setting anything.
We had the exact same problem. Setting character_set_server to utf8mb4 wasn't enough. We could insert emojis through MySQL Workbench, but not through our application.
In our case, this problem went away after we started a new instance running MySQL 5.7 instead of 5.6. So my hypothesis is that in 5.7, but not in 5.6, changing the character_set_server flag lets Google Cloud SQL change those other flags you mention, or some other relevant setting.
Of course if you are already running 5.7, this does not apply to you.
SHOW CREATE TABLE -- that will probably say that the column(s) are CHARACTER SET utf8. That need to be fixed with
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci;
For me, I've found that using the AppEngine Console->SQL and edit the character_set_server to utf8mb4 and restart the DB does work!
I have old java project with second generation database and emoji was working fine, without using anything else in the connection string. Just two things:
to set character_set_server flag to utf8mb4,
and to create the database with utf8mb4.
(Skip to Finally if you don't want to read it all.) Now I have this problem on python and nothing is working. I have to solve this so I will write what I have found.
I have tried to (this below is not working, is just what I have tried):
1 remove the flag , to restart the instance, to add the flag , to restart again
2 I have set ?charset=utf8 in the connection string and the library returned error: Invalid utf8 character string: 'F09F98'
3 I have set ?charset=utf8mb4 and the library wrote the value to the database, but instead of emoji there was ??? . So if the library recognizes utf8mb4, and writes it, then the problem is not in the connection from the library, but in the database.
4 I have run
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8mb4' -> this is set from the Google Console
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
Invalid utf8 character string: '\xF0\x9F\x98\x8E' 0,045 sec
SET NAMES utf8mb4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'
'collation_connection', 'utf8mb4_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
SUCCESS
So the problem is in one of those flags.
5 I closed the current connection and reopened again my client so that I have these variables set again to utf8. First I changed the character_set_results and the character_set_client so that I can see the correct result in my client (MysqlWorkbench). I have run the update statement again without success and still ??? in the field. After changing the character_set_connection to utf8mb4 and updating the field again , this time I had emoji in the table. But why character_set_connection. As the tests above show , the connection from the library is already utf8mb4. So at this point I don't understand where to set my connection charset to be utf8mb4 so that the things can start to work.
6 I have tried to create new Cloud SQL instance with the charset flag, and created database with utf8mb4, and table with utf8mb4 (although the tables are created with the default database charset), and the insert statement didn't work again. So the only thing that I can think of is, that the charset=utf8mb4 is not working in the connection string. But it wasn't that. I have tried to remove the charset in the connection string and again the same error as before, when using only utf8 charset in the connectio string
So what is left, I don't know.
7 I have tried to use instance with HDD , not SSD.
8 Tried to connect via Google Cloud shell and to insert row via their console.
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'body' at row 1
Interesting that the cloud shell shows even in the 'show create table' that the default encoding for this table is utf8mb4. So the cloud shell, (Light Bulb) just like mysqlworkbench is connecting with utf8 as default
Finally
The things worked using db.session.execute("SET NAMES 'utf8mb4'") before inserting to the database (in python), (and using ?charset=utf8mb4 only locally). The real problem when testing something like this could be what method you use to check the result in the database. MySQL Workbench was connecting always with the utf8 encoding as default (you can check this using the 'SHOW ...' command above). So first thing to do is to switch the connection in MySQL Workbench (or your client) using SET NAMES 'utf8mb4'. The tests above show ,that Google cloud shell was connected with utf8 by default as well. I searched the internet and found that they cannot use utf8mb4 as default because they wait the utf8mb4 to be the new standard connection in mysql, and becoming such would be named 'utf8'. Also there is no way to make MySQL Workbench to run with utf8mb4 automatically after connection. You should do this by yourself.
Whether or not the problem can occur when reading from the database? I'm about to test this now.
Add this in settings.py,
'OPTIONS': {'charset': 'utf8mb4'}
Very thanks to : Unable to use utf8mb4 character set with CloudSQL on AppEngine Python

Sql Server - Encoding issue, replace strange characters

After importing some data into a Sql 2014 database, I realized that there are some fields in which the data replaced German characters such as (ü, ß, ä, ö, etc) with some weird characters. Ex.
München should be München
ChiemgaustraÃe should be Chiemgaustraße
Königstr should be Königstr
I would like to replace these characters with the right German letter. Ex.
ü -> ü
à - > ß
ö -> ö
However when I run queries like the following to try to identify which rows have these characters, the queries returns 0 rows.
select address
from Directory
where street like N'%ChiemgaustraÃe 50%'
select address
from Directory
where street like N'%ü%'
Is there a query I can run to identify and replace these characters?
I must clarify that most of the data was imported correctly, in fact I believe the strange characters were already part of the original data.
Also, I think I can export the data to a text file, replace the characters and re-import, but I was wondering if there is a way to do it directly in sql.
Thanks in advance for the help.
I couldn't get it fix using only sql.
FutbolFa suggestion worked for the most part but there were a couple of symbols, in particular "Ã" that wasn't picked up by any query a tried. I ended up exporting the data to a text file and replacing the symbols there. Then I just re-imported the info.

SQL Server not respecting padding

When I save a string with blank spaces at the end, the string is being saved without the blank spaces.
In my development environment it works fine, but in production it doesn't.
Is there a configuration that can be done to force the blank spaces? It must be a configuration made via code.
As said by Damien, I will store data in its most natural representation and will format later.
Thanks!

cakephp encoding from database

I have problem with encoding characters from database. I am using Postgres with win1250 encoding, but whatever I put in core.php (right now I have this line of code):
Configure::write('App.encoding', 'iso-8859-1');
sometimes it give me some strange letters from database, for example È indstead of Č. Is there anything that I can do to get correct encoding.
NOTE: I can't edit or change anything to database.
I think all you need to do is declaring the right encoding option in your database connection configuration, as described at http://book.cakephp.org/2.0/en/development/configuration.html#database-configuration (scroll a bit).
Look at this particular paragraph:
encoding
Indicates the character set to use when sending SQL statements to the server. This defaults to the database’s default encoding for all databases other than DB2. If you wish to use UTF-8 encoding with mysql/mysqli connections you must use ‘utf8’ without the hyphen.
I had the same issue (with French and Spanish names) in a previous project and I only had to add the following to my $default connection, in the app/Config/database.php configuration file:
'encoding' => 'utf8'
Maybe you need the utf8 connection or the iso-8859-1 you mentionned.
win1250 encoding is similar to iso-8859-2 (see http://en.wikipedia.org/wiki/Windows-1250), so you might want to try that instead of iso-8859-1.

Resources