Problem with encoding and terminals with Special Characters

Problem with encoding and terminals with Special Characters - c

I'm making a code and as im progressing the encoding changes to UTF-8, but that created a problem for me, im brazilian and i have some phrases in portuguese with special characters that are in ASCII table, but having to revise every printf and every phrase or word to see if have a special character is madness in a 700 line code, i have a short time so i tried changing the encoding to ISO-8859-1,UNICODE and WINDOWS-1252 but the moment when i build or save the file it returns to UTF-8, i tried changing the setlocale(LC_ALL,"pt_BR.utf8") or anything but nothing happens, i tought that was the Code::Blocks terminal that was broken then i made a new test file to see with WINDOWS-1252 encoding worked, anyone has any ideia to help or i'd have to make character by character?
Im using the default terminal of codeblocks cb_console_runner
Isn't the encoding UTF-8 enconding and bytes that is incompatible with special characters? Because the default in UNICODE is 16bytes or am i wrong?
EDIT:
#include <stdio.h>
#include <locale.h>
int main(){
printf("%s", setlocale(LC_ALL,"pt_BR.utf8"));
}
returned: (NULL)
#include <stdio.h>
#include <locale.h>
int main(){
printf("%s", setlocale(LC_ALL,""));
}
returned: Portuguese_Brazil.1252
as i looked in previewed questions in portuguese stackoverflow none has helped at all, some says is the encoding, others says is the terminal.

So, yesterday i talked to my professor and we both agreed that was the encoding, but he had an idea, i got my code and opened in Dev-C++ and as we noticed the file was "corrupted" with the special letters as i mentioned i think it was from when the file was been saved in UTF8 that changed.

Related

My windows C program does not print japanese characters

#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main() {
setlocale(LC_ALL, "");
wchar_t test = L'づ';
printf("%ls", L"\x3065");
printf("%lc", test);
return 0;
}
the expected output is: づづ, but
these two printf does not print anything, what can i do to solve this problem?

printf is a narrow string function and unless you have requested UTF-8 in your manifest and are running on an appropriate version Windows 10 it is not going to print Unicode correctly in all cases.
Use wprintf to print wide strings. Depending on the C runtime library, you might need to call _setmode(_fileno(stdout), _O_U16TEXT); first before printing.
Even if your program does everything correctly it might still not work in the console. Using the new Windows Terminal should work. The older console might just display squares. This is a console/font limitation. Copy the squares to the clipboard and paste in Wordpad to see that your program actually worked correctly.
See also:
Myth busting in the console

Can I change the text color through sprintf in C?

I'm new to C and I came across this code and it was confusing me:
sprintf(banner1, "\e[37╔═╗\e[37┌─┐\e[37┌┐┌\e[37┌─┐\e[37┌─┐\e[37┌─┐\e[37┌─┐\e[37m\r\n");
sprintf(banner2, "\e[37╠═╝\e[37├─┤\e[37│││\e[37│ ┬\e[37├─┤\e[37├┤\e[37 ├─┤\e[37m\r\n");
sprintf(banner3, "\e[37╩ \e[37┴ ┴┘\e[37└┘\e[37└─┘\e[37┴ ┴\e[37└─┘\e[37┴ ┴\e[37m\r\n");
I was just confused as I don't know what do \e[37 and \r\n mean. And can I change the colors?

This looks like an attempt to use ANSI terminal color escapes and Unicode box drawing characters to write the word "PANGAEA" in a large, stylized, colorful manner. I'm guessing it's part of a retro-style BBS or MUD system, intended to be interacted with over telnet or ssh. It doesn't work, because whoever wrote it made a bunch of mistakes. Here's a corrected, self-contained program:
#include <stdio.h>
int main(void)
{
printf("\e[31m╔═╗\e[32m┌─┐ \e[33m┌┐┌\e[34m┌─┐\e[35m┌─┐\e[36m┌─┐\e[37m┌─┐\e[0m\n");
printf("\e[31m╠═╝\e[32m├─┤ \e[33m│││\e[34m│ ┬\e[35m├─┤\e[36m├┤ \e[37m├─┤\e[0m\n");
printf("\e[31m╩ \e[32m┴ ┴┘\e[33m┘└┘\e[34m└─┘\e[35m┴ ┴\e[36m└─┘\e[37m┴ ┴\e[0m\n");
return 0;
}
The mistakes were: using \r\n instead of plain \n, leaving out the m at the end of each and every escape sequence, and a number of typos in the actual letters (missing spaces and the like).
I deliberately changed sprintf(bannerN, ... to printf to make it a self-contained program instead of a fragment of a larger system, and changed the actual color codes used for each letter to make it a more interesting demo. When I run this program on my computer I get this output:
The program will only work on your computer if your terminal emulator supports both ANSI color escapes and printing UTF-8 with no special ceremony. Most Unix-style operating systems nowadays support both by default; I don't know about Windows.

What locale LC_CTYPE is used for Windows unicode console app?

While converting a multi-byte console application to Unicode, I ran up against a weird problem where _tcprintf and WriteConsole worked fine but _tprintf was printing the wrong characters...
I've traced it back to using setlocale(LC_ALL, "C") which uses LC_CTYPE of 1 byte based on MS doc:
The C locale assumes that all char data types are 1 byte and that their value is always less than 256.
However, I want to keep "C" for everything except the LC_CTYPE but I don't know what to use?
I thought the whole point of using UTF16 is that all the characters are available and things would print properly no matter the code page or locale.
Although it also appears setting the console output to UTF-8 (65001) (SetConsoleCP which of course is separate from the locale) in a Unicode app and outputting UTF16 also has problems displaying the correct characters.
Anyway, does anyone know what value I should be using the LC_CTYPE for UTF16 on Windows Unicode Console Application? Maybe it's as easy as setlocale( LC_CTYPE, "" ); ? TIA!!

Use _setmode() to set the file translation mode to _O_U16TEXT:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void)
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"ελληνικά\n");
}

Text file apostrophe trouble

I'm attempting to read from a Project Gutenberg text file and count the total number of words. I'm currently overshooting because words with apostrophes are double counted. However the apostrophe character from the text file doesn't match the ASCII character 39, i.e. '\'', so my is_word function is working incorrectly. Any suggestion as to what that character actually is?
Note: When I go through and manually replace the apostrophes in vim, the word counter works fine.
link to text file: http://www.gutenberg.org/ebooks/1342

This isn't a complete answer, but if you do
#include <wchar.h>
#include <locale.h>
and then
setlocale(LC_ALL, "en_US.UTF-8");
and then call getwchar() or getwc(fp) instead of getchar/getc, and then check for the value 8217 as well as '\'', you might be able to get it all to work.
(It works for me. YMMV. Depending on your OS, you might have to use a locale string other than "en_US.UTF-8".)
(And if this does work, welcome to the wonderful world of internationalization. Having gone down this road, there are several other issues you'll have to pay attention to if you want your code to work properly under all circumstances and in all locales.)

C Programming - ascii for windows "unknown" characters

I'm programming in windows, but in my C console some characters (like é, à, ã) are not recognizable. I would like to see how can I make widows interpret those chars as using unicode in the console or utf-8.
I would be glad for some enlightening.
Thank you very much

By console do you mean cmd.exe? It doesn't handle Unicode well, but you can get it to display "ANSI" characters by changing the display font to Lucida Console and changing the code page from "OEM" to "ANSI." By the choice of characters you seem to be Western European, so try giving this command before running your application:
chcp 1252
If you want to try your luck with UTF-8 output use chcp 65001 instead.

Although I completely agree with Joni's answer, I think it can be added a detail:
Since Telmo Vaz asked about how to solve this problem for C programs, we can consider the alternative of adding a system command inside the code:
#include <stdlib.h> // To use the function system();
#include <stdio.h>
int main(void) {
system("CHCP 1252");
printf("Now accents are right: áéíüñÇ \n");
return 0;
}
EDIT It is a good idea to do some experiments with codepages. Check the following table for information (under Windows):
Windows Codepages

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Problem with encoding and terminals with Special Characters - c

So, yesterday i talked to my professor and we both agreed that was the encoding, but he had an idea, i got my code and opened in Dev-C++ and as we noticed the file was "corrupted" with the special letters as i mentioned i think it was from when the file was been saved in UTF8 that changed.

Related

My windows C program does not print japanese characters

Can I change the text color through sprintf in C?

What locale LC_CTYPE is used for Windows unicode console app?

Text file apostrophe trouble

C Programming - ascii for windows "unknown" characters

Categories

Resources