Why printf() can display é (\u00E9 int UTF-16) and putwchar() can't ?
And what is the right syntax to get putwchar displaying é correctly ?
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
wint_t wc = L'\u00E9';
setlocale(LC_CTYPE, "fr_FR.utf8");
printf("%C\n", wc);
putwchar((wchar_t)wc);
putchar('\n');
return 0;
}
Environnement
OS : openSUSE Leap 42.1
compiler : gcc version 4.8.5 (SUSE Linux)
Terminal : Terminator
Terminal encoding : UTF-8
Shell : zsh
CPU : x86_64
Shell env :
env | grep LC && env | grep LANG
LC_CTYPE=fr_FR.utf8
LANG=fr_FR.UTF-8
GDM_LANG=fr_FR.utf8
Edit
in :
wint_t wc = L'\u00E9'
setlocale(LC_CTYPE, "");
out:
C3 A9 0A E9 0A
in:
wint_t wc = L'\xc3a9';
setlocale(LC_CTYPE, "");
out:
EC 8E A9 0A A9 0A
You cannot mix wide character and byte input/output functions (printf is a byte output function, regardless if it includes formats for wide characters) on the same stream. The orientation of a stream can only be reset with freopen, which must be done again before calling the byte-oriented putchar function.
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
wint_t wc = L'\u00E9';
setlocale(LC_CTYPE, "");
printf("%lc\n", wc);
freopen(NULL, "w", stdout);
putwchar((wchar_t)wc);
freopen(NULL, "w", stdout);
putchar('\n');
return 0;
}
The fact that the orientation can only be set by reopening the stream indicates that this is not intended to be done trivially, and most programs should use only one kind of output. (i.e. either wprintf/putwchar, or printf/putchar, using printf or wctomb if you need to print a wide character)
The problem is your setlocale() call failed. If you check the result you'll see that.
if( !setlocale(LC_CTYPE, "fr_FR.utf8") ) {
printf("Failed to set locale\n");
return 1;
}
The problem is fr_FR.utf8 is not the correct name for the locale. Instead, use the LANG format: fr_FR.UTF-8.
if( !setlocale(LC_CTYPE, "fr_FR.UTF-8") ) {
printf("Failed to set locale\n");
return 1;
}
The locale names are whatever is installed on your system, probably in /usr/share/locale/. Or you can get a list with locale -a.
It's rare you want to hard code a locale. Usually you want to use whatever is specified by the environment. To do this, pass in "" as the locale and the program will figure it out.
if( !setlocale(LC_CTYPE, "") ) {
printf("Failed to set locale\n");
return 1;
}
Related
In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?
Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}
You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.
In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues
How might one go about printing an em dash in C?
One of these: —
Whenever I do: printf("—") I just get a ù in the terminal.
Thank you.
EDIT: The following code is supposed to print out an Xs an Os looking grid with em dashes for the horizontal lines.
int main ()
{
char grid[3][3] = {{'a', 'a', 'a'}, {'a', 'a', 'a'}, {'a', 'a', 'a'}};
int i, j;
for (i = 0; i < 3; i++) {
for (j = 0; j < 3; j++) {
if (j != 0)
{
printf("|");
}
printf(" %c ", grid[i][j]);
}
if (i != 2)
{
printf("\n——————————————\n");
}
}
return 0;
}
Output: (The "ù"s should be "—"s)
a | a | a
ùùùùùùùùùùùù
a | a | a
ùùùùùùùùùùùù
a | a | a
EDIT: I'm on Windows 10 x64 using Codeblocks 16.01 with C11.
EDIT: I was informed of box characters and the question has morphed into how to print those, hence the title and tag change.
In standard C, you use wide characters and wide strings:
#include <stdlib.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void)
{
setlocale(LC_ALL, "");
fwide(stdout, 1);
wprintf(L"🞨🞩🞪🞫🞬🞭🞮 🞉🞈🞇🞆🞅\n");
wprintf(L" │ │ \n");
wprintf(L"───┼───┼───\n");
wprintf(L" │ │ \n");
wprintf(L"───┼───┼───\n");
wprintf(L" │ │ \n");
return EXIT_SUCCESS;
}
You can use wide character constants like L'┼'; their conversion specifier for printf() and wprintf() functions is %lc. Similarly, a wide string constant has an L prefix, and its conversion specifier is %ls.
Unfortunately, you are limited to the mangled version of C Microsoft provides, so it may or may not work for you.
The above code does not work in Windows, because Microsoft does not want it to. See Microsoft documentation on setlocale() for details:
The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8.
In other words, Microsoft's C localization is limited to one-byte code pages, and specifically excludes any Unicode locales. This is, however, purely part of Microsoft's EEE strategy to bind you, a budding developer, to Microsoft's own walled garden, so that you will not write actual portable C code (or, horror of horrors, avail yourself to POSIX C), but are mentally locked to the Microsoft model. You see, you can use _setmode() to enable Unicode output.
As I do not use Windows at all myself, I cannot verify whether the following Windows-specific workarounds actually work or not, but it is worth trying. (Do report your findings in a comment, Windows users, please, so I can fix/include this part of this answer.)
#include <stdlib.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
#ifdef _WIN32
#include <io.h>
#include <fcntl.h>
static int set_wide_stream(FILE *stream)
{
return _setmode(_fileno(stream), _O_U16TEXT);
}
#else
static int set_wide_stream(FILE *stream)
{
return fwide(stream, 1);
}
#endif
int main(void)
{
setlocale(LC_ALL, "");
/* After this call, you must use wprintf(),
fwprintf(), fputws(), putwc(), fputwc()
-- i.e. only wide print/scan functions
with this stream.
You can print a narrow string using e.g.
wprintf(L"%s\n", "Hello, world!");
*/
set_wide_stream(stdout, 1);
/* These may not work in Windows, because
the code points are 0x1F785 .. 0x1F7AE
and Windows is probably limited to
Unicode 0x0000 .. 0xFFFF */
wprintf(L"🞨🞩🞪🞫🞬🞭🞮 🞉🞈🞇🞆🞅\n");
/* These are from the Box Drawing Unicode block,
U+2500 ─, U+2502 │, and U+253C ┼,
and should work everywhere. */
wprintf(L" │ │ \n");
wprintf(L"───┼───┼───\n");
wprintf(L" │ │ \n");
wprintf(L"───┼───┼───\n");
wprintf(L" │ │ \n");
return EXIT_SUCCESS;
}
On a 32-bit Ubuntu machine, from JDK 1.7.0, I'm unable to print wide characters.
Here is my code:
JNIFoo.java
public class JNIFoo {
public native void nativeFoo();
static {
System.loadLibrary("foo");
}
public void print () {
nativeFoo();
System.out.println("The end");
}
public static void main(String[] args) {
(new JNIFoo()).print();
return;
}
}
foo.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <jni.h>
#include "JNIFoo.h"
JNIEXPORT void JNICALL Java_JNIFoo_nativeFoo (JNIEnv *env, jobject obj)
{
fwprintf(stdout, L"using fWprintf\n");
fflush(stdout);
}
Then I'm executing the following commands:
javac JNIFoo.java
javah -jni JNIFoo
gcc -shared -fpic -o libfoo.so -I/path/to/jdk/include -I/path/to/jdk/include/linux foo.c
Here is the result depending of the JDK used to execute the program:
jdk1.6.0_45/bin/java -Djava.library.path=/path/to/jni_test JNIFoo
using fWprintf
The end
jdk1.7.0/bin/java -Djava.library.path=/path/to/jni_test JNIFoo
The end
jdk1.8.0_25/bin/java -Djava.library.path=/path/to/jni_test JNIFoo
The end
As you can see, with JDK 1.7 and JDK 1.8, the fwprintf has no effect!
So my question is what am I missing to be able to use wide chars using JDK 1.7 (and 1.8) ?
Note: if I call fprintf instead of fwprintf, then there is no problem, everything is print out correctly.
Edit
Based on the comment of James, I created a main.c file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <wchar.h>
#include "JNIFoo.h"
int main(int argc, char* argv[])
{
fwprintf(stdout, L"In the main\n");
Java_JNIFoo_nativeFoo(NULL, NULL);
return 0;
}
Then I compile it like that:
gcc -Wall -L/path/to/jni_test -I/path/to/jdk1.8.0_25/include -I/pat/to/jdk1.8.0_25/include/linux main.c -o main -lfoo
And set LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/path/to/jni_test
And it is working correctly:
In the main
using fWprintf
So the problem may not come from C.
Note: it's working correctly on a 64-bit machine.
I have similar problem using linux Mint 32-bit.
You must not mix printing of narrow and wide characters to the same stream. C99 introduced the concept of stream orientation whereby an I/O stream can be wide-oriented or byte-oriented (prior to C99, wide characters did not exist in the C language standard). From C99 §7.19.2/4–5:
4) Each stream has an orientation. After a stream is associated with an external file, but before any operations are performed on it, the stream is without orientation. Once a wide character input/output function has been applied to a stream without orientation, the stream becomes a wide-oriented stream. Similarly, once a byte input/output function has been applied to a stream without orientation, the stream becomes a byte-oriented stream. Only a call to the freopen function or the fwide function can otherwise alter the orientation of a stream. (A successful call to freopen removes any orientation.)233)
5) Byte input/output functions shall not be applied to a wide-oriented stream and wide character input/output functions shall not be applied to a byte-oriented stream. [...]
233) The three predefined streams stdin, stdout, and stderr are unoriented at program startup.
The C99 standard leaves mixing narrow- and wide-character functions as Undefined Behavior. In practice, the GNU C library says "There are no diagnostics issued. The application behavior will simply be strange or the application will simply crash. The fwide function can help avoiding this. " (source)
Since the JRE is in charge of program startup, it's in charge of the stdin, stdout, and stderr streams and therefore also their orientations. Your JNI code is a guest in its house, don't go changing its carpets. In practice, this means you have to deal with whatever stream orientation you're given, which you can detect with the fwide(3) function. If you want to print wide characters to a byte-oriented stream, too bad. You'll need to work around that by convincing the JRE to use wide-oriented streams, or convert your wide characters to UTF-8, or something else.
For example, this code should work in all cases:
JNIEXPORT void JNICALL Java_JNIFoo_nativeFoo (JNIEnv *env, jobject obj)
{
if (fwide(stdout, 0) >= 0) {
// The stream is wide-oriented or unoriented, so it's safe to print wide
// characters
fwprintf(stdout, L"using fWprintf\n");
} else {
// The stream is narrow oriented. Convert to UTF-8 (and hopefully the
// terminal (or wherever stdout is going) can handle that)
char *utf8_string = convert_to_utf8(L"my wide string");
printf("%s", utf8_string);
}
fflush(stdout);
}
#dalf,
The problem is outside JDK. It's in 32 bit version of GLIBC.
Please try to reproduce it on your machine:
I) create 3 files:
---foo.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
void foo() {
fwprintf(stdout, L"using fWprintf\n");
fflush(stdout);
}
----main.c:
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(int argc, char **argv) {
void *handle;
void (*foo)();
char *error;
handle = dlopen("libfoo.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror();
*(void **) (&foo) = dlsym(handle, "foo");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
(*foo)();
dlclose(handle);
exit(EXIT_SUCCESS);
}
----mapfile:
SomethingPrivate {
local:
*;
};
II) run commands:
$ gcc -m32 -shared -fpic -o libfoo.so foo.c
$ gcc -m32 -Xlinker -version-script=mapfile -o main main.c -ldl
$ export LD_LIBRARY_PATH="."
$ ./main
and see what does it print to output
How can I assign non-ASCII characters to a wide char and print it to the console? This code down doesn't work:
#include <stdio.h>
int main(void)
{
wchar_t wc = L'ć';
printf("%lc\n", wc);
printf("%ld\n", wc);
return 0;
}
Output:
263
Press [Enter] to close the terminal ...
I'm using MinGW GCC on Windows 7.
You should use wprintf to print wide-character strings:
wprintf(L"%c\n", wc);
I think your calls to printf() fail with an «Illegal byte sequence» error returned in errno, at least that is what happens here on MacOS X with the above example code (and also if using wprintf() instead of printf()). For me it works when I call setlocale(LC_ALL, ""); before the call to printf() so that it stops using the C locale by default:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
wchar_t wc = L'ć';
setlocale(LC_ALL, "");
printf("%lc\n", wc);
return 0;
}
It is unclear what platform/compiler you are on, so YMMV.
use wprintf("%lc\n" ,wc); and you will get your desired output
I am testing a C program in the windows terminal. I mocked up a quick example of the section I am having issues with. The example is as follows:
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
int main() {
char var[6];
scanf("%s", var);
int i=0;
while(var[i] != '\0') {
printf("%x ", var[i]);
i++;
}
return 0;
}
When I use a string with "normal" characters such as "dd" the output is as expected "61 61" (hex 61 is the letter "d"). When I try to input special characters such as í (0xA1 or U+00ED) I get the following output:
$ ./a.exe
í
ffffffc3 ffffffad
The UTF-8 codepage at http://www.utf8-chartable.de/ shows that the backwards 'i' is in fact 0xc3ad. How can I copy and paste this character as 0xA1, as I really want to input 0xA1 into the terminal, not 0xc3ad? I am copy and pasting this from "charmap". I even tried saving a text file in ANSI with the character and copying and pasting but I still get 0xc3ad. Please assist me.
EDIT: Running the same on a mac also gives me c3ad.