Wide character and Locale - c

#1
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main()
{
setlocale(LC_CTYPE,"C");
wprintf(L"大\n");
return 0;
}
//result : ?
#2
#include <stdio.h>
#include <locale.h>
int main()
{
setlocale(LC_CTYPE,"C");
printf("大\n");
return 0;
}
//result : 大
The difference between #1 and #2 is just printing function.
I expect that if wide character doesnt printed in certain locale, then multibyte character also should not be printed in the same locale.
I'm curious why multibyte string is printed(#2), whereas wide character string doesnt printed(#1)?
I know if locale is not "C", wide character will be printed well. but why?? What is the locale exactly do?
+) I thought multibyte characer encoding is locale dependent, but multibyte character is printed well regradless of locale.. How computer can determine multibyte character encoding?

If you work with Windows Console you should use _setmode function to change the default translation mode of stdout to Unicode, if you want to work with wide strings.
For example:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <fcntl.h>
#include <io.h>
int main()
{
setlocale(LC_CTYPE,"C");
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"大\n");
return 0;
}
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170

Related

How can I print filled / unfilled square character using printf in c?

I am trying to print "□" and "■" using c.
I tried printf("%c", (char)254u); but it didn't work.
Any help? Thank you!
I do not know what is (char)254u in your code. First you set locale to unicode, next you just printf it. That is it.
#include <locale.h>
#include <stdio.h>
int main()
{
setlocale(LC_CTYPE, "en_US.UTF-8");
printf("%lc", u'□');
return 0;
}
You can use directly like this :
#include <stdio.h>
int main()
{
printf("■");
return 0;
}
You can print Unicode characters using _setmode.
Sets the file translation mode. learn more
#include <fcntl.h>
#include <stdio.h>
#include <io.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x25A0\x25A1\n");
return 0;
}
output
■□
As other answers have mentioned, you have need to set the proper locale to use the UTF-8 encoding, defined by the Unicode Standard. Then you can print it with %lc using any corresponding number. Here is a minimal code example:
#include <stdio.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "en_US.UTF-8"); // Defined proper UTF-8 locale
printf("%lc\n", 254); // Using '%lc' to specify wchar_t instead of char
return 0;
}
If you want to store it in a variable, you must use a wchar_t, which allows the number to be mapped to its Unicode symbol. This answer provides more detail.
wchar_t x = 254;
printf("%lc\n", x);

Why doesn't putchar() output the copyright symbol while printf() does?

So I want to print the copyright symbol and putchar() just cuts off the the most significant byte of the character which results in an unprintable character.
I am using Ubuntu MATE and the encoding I am using is en_US.UTF-8.
Now what I know is that the hex value for © is 0xc2a9 and when I try putchar('©' - 0x70) it gives me 9 which has the hex value of 0x39 add 0x70 to it and you'll get 0xa9 which is the least significant byte of 0xc2a9
#include <stdio.h>
main()
{
printf("©\n");
putchar('©');
putchar('\n');
}
I expect the output to be:
©
©
rather than:
©
�
The putchar function takes an int argument and casts it to an unsigned char to print it. So you can't pass it a multibyte character.
You need to call putchar twice, once for each byte in the codepoint.
putchar(0xc2);
putchar(0xa9);
You could try the wide version: putwchar
Edit: That was actually more difficult than I thought. Here's what I needed to make it work:
#include <locale.h>
#include <wchar.h>
#include <stdio.h>
int main() {
setlocale(LC_ALL, "");
putwchar(L'©');
return 0;
}

Program makes a beep sound even though it doesn't contain \a anywhere

My computer made a beep sound even though I did not add \a to my code. Why?
Program:
#include <stdio.h>
#include <limits.h>
#include <float.h>
#include <stdlib.h>
#define START_CHAR ' '
#define END_CHAR 'DEL'
int main(void)
{ /* This code prints characters on keyboard.*/
/* declaration */
int char_code;
for (char_code=(int)START_CHAR; char_code<=(int)END_CHAR; char_code=char_code+1)
printf("%c", (char)char_code);
printf("\n");
return(0);
}
'DEL' is not a valid character constant. It ends up being equal to 4474188. And since you have char_code defined as an int, the loop goes from 32 (the ASCII code for a space) to 4474188. So it loops through the full character set multiple times.
You should be using 0x7F instead.

Why Unicode characters are not displayed properly in terminal with GCC?

I've written a small C program:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main() {
wprintf(L"%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
wprintf(L"%c\n", chr);
}
Why doesn't this print the character ┐ ?
Instead it prints gibberish.
I've checked:
tried compiling without setlocale, same result
the terminal itself can print the character, I can copy-paste it to terminal from text-editor, it's gnome-terminal on Ubuntu
GCC version is 4.8.2
wprintf is a version of printf which takes a wide string as its format string, but otherwise behaves just the same: %c is still treated as char, not wchar_t. So instead you need to use %lc to format a wide character. And since your strings are ASCII you may as well use printf. For example:
int main() {
printf("%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
printf("%lc\n", chr);
}

Output unicode wchar_t character

Just trying to output this unicode character ☒ in C using MinGW. I first put it on a buffer using swprintf, and then write it to the stdout using wprintf.
#include <stdio.h>
int main(int argc, char **argv)
{
wchar_t buffer[50];
wchar_t c = L'☒';
swprintf(buffer, L"The character is: %c.", c);
wprintf(buffer);
return 0;
}
The output under Windows 8 is:
The character is: .
Other characters such as Ɣ doesn't work neither.
What I am doing wrong?
You're using %c, but %c is for char, even when you use it from wprintf(). Use %lc, because the parameter is whar_t.
swprintf(buffer, L"The character is: %lc.", c);
This kind of error should normally be caught by compiler warnings, but it doesn't always happen. In particular, catching this error is tricky because both %c and %lc actually take int arguments, not char and wchar_t (the difference is how they interpret the int).
To output Unicode (or to be more precise UTF-16LE) to the Windows console, you have to change the file translation mode to _O_U16TEXT or _O_WTEXT. The latter one includes the BOM which isn't of interest in this case.
The file translation mode can be changed with _setmode. But it takes a file descriptor (abbreviated fd) and not a FILE *! You can get the corresponding fd from a FILE * with _fileno.
Here's an example that should work with MinGW and its variants, and also with various Visual Studio versions.
#define _CRT_NON_CONFORMING_SWPRINTFS
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int
main(void)
{
wchar_t buffer[50];
wchar_t c = L'Ɣ';
_setmode(_fileno(stdout), _O_U16TEXT);
swprintf(buffer, L"The character is: %c.", c);
wprintf(buffer);
return 0;
}
This works for me:
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(int argc, char **argv)
{
wchar_t buffer[50];
wchar_t c = L'☒';
if (!setlocale(LC_CTYPE, "")) {
fprintf(stderr, "Cannot set locale\n");
return 1;
}
swprintf(buffer, sizeof buffer, L"The character is %lc.", c);
wprintf(buffer);
return 0;
}
What I changed:
I added wchar.h include required by the use of swprintf
I added size as the second argument of swprintf as required by C
I changed %c conversion specification to %lc
I change locale using setlocale
This FAQ explains how to use UniCode / wide characters in MinGW:
https://sourceforge.net/p/mingw-w64/wiki2/Unicode%20apps/

Resources