I have difficulties with putwchar() in c - c

Certainly, my problem is not new...., so I apologize if my error is simply too stupid.
I just wanted to become familiar with putwchar and simply wrote the following little piece of code:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
char *locale = setlocale(LC_ALL, "");
printf ("Locale: %s\n", locale);
//setlocale(LC_CTYPE, "de_DE.utf8");
wchar_t hello[]=L"Registered Trademark: ®®\nEuro sign: €€\nBritisch Pound: ££\nYen: ¥¥\nGerman Umlauts: äöüßÄÖÜ\n";
int index = 0;
while (hello[index]!=L'\0'){
//printf("put liefert: %d\n", putwchar(hello[index++]));
putwchar(hello[index++]);
};
}
Now. the output is simply:
Locale: de_DE.UTF-8
Registered Trademark: ��
Euro sign: ��
Britisch Pound: ��
Yen: ��
German Umlauts: �������
\[1\]+ Fertig gedit versuch.c
None of the non-ASCII chars appeared on the screen.
As you see in the comment (and I well noticed that I must not mix putwchar and print in the same program, hence the line is in comment, putwchar returned the proper Unicode codepoint for the character I wanted to print. Thus, the call is supposed to work. (At least to my understanding.)
The c source is coded in utf-8
$ file versuch.c
versuch.c: C source, UTF-8 Unicode text
my system is Ubuntu Linux 20.04.05
compiler: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
I would greatly appreciate any advice on this one.
As stated above: I simply expected the trademark sign, yen, € and the umlauts äöüßÄÖÜ to appear.

You shouldn't mix normal and wide output on the same stream.
I get the expected output if I change this early print:
printf ("Locale: %s\n", locale);
into a wide print:
wprintf(L"Locale: %s\n", locale);
Then the subsequent putwchar() calls write the expected characters.

You cannot mix narrow and wide I/O in the same stream (7.21.2). If you want putwchar, you cannot use printf. Start with wprintf instead (with the wide format string):
wprintf (L"Locale: %s\n", locale);

You can simply print those wide characters as shown below:
wprintf(L"Registered Trade Mark: %ls\n", L"®®");
wprintf(L"Euro Sign: %ls\n", L"€€");
wprintf(L"British Pound: %ls\n", L"££");
wprintf(L"Yen: %ls\n", L"¥¥");
wprintf(L"German Umlauts: %ls\n", L"äöüßÄÖÜ");
Please refer:
https://stackoverflow.com/a/37587933/2805824
https://stackoverflow.com/a/7696033/2805824

Related

Let nCurses handle UTF8 character by terminal

Ncurses have whole family of print functions for wide characters: https://linux.die.net/man/3/mvwaddnwstr
The problem with those is that it depends glibc, so if some UTF-8 character were yet not added to glibc those won't be printed properly example is: ✅
The solution I can see is to let ncurses handle displaying the character by "terminal", what I mean by that is that if we would have some sort of print function that would accept hex UTF-8 string and let terminal find correct font like below snippet do:
#include <iostream>
#include <unistd.h>
int main()
{
write(1, "\xe2\x9c\x85", 9);
}
Is is possible with nCurses?

step into standard wstdio functions in Visual Studio 2019

I want to find the reason why with the new feature setlocale(LC_ALL, ".utf8") the standard function fgetwc() can't read '\u2013' (EN DASH) from a utf8 text file and instead returns WEOF. Maybe find a workaround.
I disabled "Only my code" and enabled symbol downloading for C:\WINDOWS\SysWOW64\ucrtbased.dll that contains fgetwc
However, when I try to step into that function it cannot find fgetwc.cpp.
These two locations don't contain that file and I can't find any other place:
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\crt\src\
This is my test program:
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#include <stdlib.h>
int main()
{
wint_t wc; // = L'\u2013';
FILE* file;
printf("%s\n", setlocale(LC_ALL, ".utf8"));
file = fopen("test.txt", "r");
wc = fgetwc(file);
// ffff '?' 0 0
fprintf(stdout, "%04x '%lc' %d %d\n", wc, wc, ferror(file), feof(file));
return 0;
}
It prints ffff instead of 2013. ferror() and feof() return false.
test.txt:
–
It's encoded as E2 80 93
For reading the UTF-8 file, optionally drop the setlocale call, and replace the fopen line with:
file = fopen("test.txt", "r, ccs=utf-8");
The fopen documentation states:
ccs=encoding -- Specifies the encoded character set to use (one of UTF-8, UTF-16LE, or UNICODE) for this file. Leave unspecified if you want ANSI encoding.
This appears to imply that the ccs=UTF-8 encoding must be specified explicitly in order to read a file as UTF-8 text.
Though, on the other hand, "ANSI" used to mean either the active codepage, or the system default locale. With the recent support in Windows 10 1903 and later for UTF-8 as an active codepage, it would be expected that "ANSI encoding" be the same as "UTF-8 encoding" when the current locale is UTF-8. However, that does not seem to be the case with the current implementation of the UCRT.
For writing the wide char, #include <io.h> and <fcntl.h>, and replace the fprintf line with:
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"%04x '%wc' %d %d\n", wc, wc, ferror(file), feof(file));
The printf documentation states:
wprintf is a wide-character version of printf; format is a wide-character string. wprintf and printf behave identically if the stream is opened in ANSI mode. printf does not currently support output into a UNICODE stream.

Linux, field_buffer does not provide a UTF-8 string

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.
The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

Odd strtok behavior

I've been trying to use strtok in order to write a polynomial differentiation program, but it seems to be behaving oddly. At this point I've told it to stop at the characters ' ', [, ], (, and ). But for some reason, when passed input such as "Hello[]" it returns "Hello\n"
Is there anything wrong with my code here? All the polynomial string is is the text "Hello[]"
void differentiate(char* polynomial)
{
char current[10];
char output[100];
strncpy(current, strtok(polynomial, " []()/\n"), 10);
printf("%s", current);
} // differentiate()
EDIT : It appears to be an issue related to the shell, and it would also appear to not be a newline after all, as when I use bash it does not occur, but when I use fish, I get the following:
I've never seen this kind of thing before, does anyone have any advice? Is this just a quirk of fish?
I converted your code into this SSCCE (Short, Self-Contained, Correct Example):
#include <string.h>
#include <stdio.h>
static
void differentiate(char* polynomial)
{
char current[10];
strncpy(current, strtok(polynomial, " []()/\n"), 10);
printf("<<%s>>\n", current);
}
int main(void)
{
char string[] = "Hello[]";
printf("Before: <<%s>>\n", string);
differentiate(string);
printf("After: <<%s>>\n", string);
return 0;
}
Actual output:
Before: <<Hello[]>>
<<Hello>>
After: <<Hello>>
I was testing with GCC 4.8.1 on Mac OS X 10.8.4, but I got the same result with the Apple-supplied GCC (i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)) and clang (Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)).
You should justify your assertion that you got a newline out of strtok() by adapting this test and showing the output. Note how the code uses the << and >> to surround the string it is printing; if there's a newline in there, it will show up inside the double angle brackets.

Accepting single byte special characters into the windows terminal (testing on cygwin)

I am testing a C program in the windows terminal. I mocked up a quick example of the section I am having issues with. The example is as follows:
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
int main() {
char var[6];
scanf("%s", var);
int i=0;
while(var[i] != '\0') {
printf("%x ", var[i]);
i++;
}
return 0;
}
When I use a string with "normal" characters such as "dd" the output is as expected "61 61" (hex 61 is the letter "d"). When I try to input special characters such as í (0xA1 or U+00ED) I get the following output:
$ ./a.exe
í
ffffffc3 ffffffad
The UTF-8 codepage at http://www.utf8-chartable.de/ shows that the backwards 'i' is in fact 0xc3ad. How can I copy and paste this character as 0xA1, as I really want to input 0xA1 into the terminal, not 0xc3ad? I am copy and pasting this from "charmap". I even tried saving a text file in ANSI with the character and copying and pasting but I still get 0xc3ad. Please assist me.
EDIT: Running the same on a mac also gives me c3ad.

Resources