Linux, field_buffer does not provide a UTF-8 string

Linux, field_buffer does not provide a UTF-8 string - c

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.

The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

Related

Latin Capital Letter 'E' with Circumflex (Ê)

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?

Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}

You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.

In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

Let nCurses handle UTF8 character by terminal

Ncurses have whole family of print functions for wide characters: https://linux.die.net/man/3/mvwaddnwstr
The problem with those is that it depends glibc, so if some UTF-8 character were yet not added to glibc those won't be printed properly example is: ✅
The solution I can see is to let ncurses handle displaying the character by "terminal", what I mean by that is that if we would have some sort of print function that would accept hex UTF-8 string and let terminal find correct font like below snippet do:
#include <iostream>
#include <unistd.h>
int main()
{
write(1, "\xe2\x9c\x85", 9);
}
Is is possible with nCurses?

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

I've been writing a Linux program that's meant to write non-English characters on the terminal, I've recently been porting it to Windows, and I've run into some issues, when trying to change the code page and the font of the terminal, having the symbolic constant _POSIX_C_SOURCE previously defined seems to change the behavior of the code, and makes it incapable of properly printing non-English characters, for reference, this is my code.
#include <windows.h>
#include <stdio.h>
int main()
{
SetConsoleCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8)
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX cfie;
ZeroMemory(&cfie, sizeof(cfie));
cfie.cbSize = sizeof(cfie);
lstrcpyW(cfie.FaceName, L"Lucida Console");
SetCurrentConsoleFontEx(hStdOut, 0, &cfie);
printf("Ћирилични текст\n");
return 0;
}
This is what the program prints out depending on whether I do or don't define the constant in a command line argument while compiling.
C:\Users\User\Desktop>gcc test.c
C:\Users\User\Desktop>a.exe
Ћириличан текст
C:\Users\User\Desktop>gcc -D_POSIX_C_SOURCE=2 test.c
C:\Users\User\Desktop>a.exe
������������������ ����������

This is because outputting to standard output is done literally byte-by-byte when POSIX compliance is in effect. It uses a different implementation of what is done inside the printf function.

The wcscoll function, is marked as poisoned, what do I do?

On Mac Os X 10.6.8 I can't compile code using wchar_t functions from the standard library until I have resolved this.
The wcscoll function, together with a bunch of others:
inttypes.h:#pragma GCC poison wcstoimax wcstoumax
stdlib.h:#pragma GCC poison mbstowcs mbtowc wcstombs wctomb
wchar.h:#pragma GCC poison fgetws fputwc fputws fwprintf fwscanf mbrtowc mbsnrtowcs >mbsrtowcs putwc putwchar swprintf swscanf vfwprintf vfwscanf vswprintf vswscanf vwprintf >vwscanf wcrtomb wcscat wcschr wcscmp wcscoll wcscpy wcscspn wcsftime wcsftime wcslcat >wcslcpy wcslen wcsncat wcsncmp wcsncpy wcsnrtombs wcspbrk wcsrchr wcsrtombs wcsspn wcsstr >wcstod wcstof wcstok wcstol wcstold wcstoll wcstoul wcstoull wcswidth wcsxfrm wcwidth >wmemchr wmemcmp wmemcpy wmemmove wmemset wprintf wscanf
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <locale.h>
#include <stdlib.h>
extern int errno;
int main(void)
{
wchar_t pwcs1[3]={L"ØL"}, pwcs2[3]={L"Ål"};
size_t n;
(void)setlocale(LC_ALL, "");
/* set it to zero for checking errors on wcscoll */
errno = 0;
/*
** Let pwcs1 and pwcs2 be two wide character strings to
** compare.
*/
/* n = wcscmp(pwcs1, pwcs2); */
n = wcscoll(pwcs1, pwcs2);
/*
** If errno is set then it indicates some
** collation error.
*/
if (n < 0 ) {
printf("%s\n","Øl mindre en Ål" );
} else if (n == 0) {
printf("%s\n","Øl lik Ål" );
} else {
printf("%s\n","Øl større en Ål" );
}
if(errno != 0){
/* error has occurred... handle error ...*/
}
}
How do I resolve this?
I am a little bit reluctant to mess with the standard library. But I guess I maybe can compile the GNU C library, if Apple hasn't a fix for it? Or is there any other suitable alternatives amongst libraries for handling wide characters (Utf-8).
I am porting something ancient, so I really need to use ncurses, and in order to use ncurses, I need wide characters! :)
Edit: The standard includepath should, as I have understood it be /usr/include. I have been through the include directories of the SDK's I have, and a grep through the header files there reveals the same poison pragma's, as did the latest tarball from http://opensource.apple.com/tarballs/Libc/
Edit++
Hindsightly, those pragmas are there for a reason, and I was looking for alternatives, so right now, I am trying to build glibc, just downloaded, and I have inspected the headers, which are without any "GCC poison" pragmas.
Having read up a little bit, in the configure file of glibc, I guess that isn't an easy option. I guess I'll have to dissect something that works with utf-8 and uses ncurses on mac osX to figure out how.
It might be that I am just overlooking an easy solution. But ncurses falls back on 7-bit ascii, and that is my problem. My goal is to render utf-8 language specific characters, while using ncurses. I need to be able to sort since the format is "propritary" with indexing, forking out a system call to sort records is no option. I also need to be able to know how many codepoints that are in a string of some kind for field-editing, insertion and removal of characters from the display with ncurses.
Thanks!

So far it seems that the ICU library looks promising: I think I will pursue a solution with the ICU library, that as far as I know are shipped with Mac Os X. http://icu-project.org/apiref/icu4c/

Adding Unicode/UTF8 chars to a ncurses display in C

I'm attempting to add wchar_t Unicode characters to an ncurses display in C.
I have an array:
wchar_t characters[]={L'\uE030', L'\uE029'}; // containing 2 thai letters, for example
And I later try to add a wchar_t from the array to the ncurses display with:
add_wch(characters[0]);
To provide a bit more info, doing this with ASCII works ok, using:
char characters[]={'A', 'B'};
// and later...
addch(characters[0]);
To setup the locale, I add the include...
#include <locale.h>
// in main()
setlocale(LC_CTYPE,"C-UTF-8");
The ncurses include is:
#include <ncurses.h>
Compiling with :
(edit: added c99 standard, for universal char name support.)
gcc -o ncursesutf8 ncursesutf8.c -lm -lncurses -Wall -std=c99
I get the following compilation warning (of course the executable will fail):
ncursesutf8.c:48: warning: implicit declaration of function ‘add_wch’
I've tried just using addch which appears to be macro'ed to work with wchar_t but when I do that the Unicode chars do not show up, instead they show as ASCII chars instead.
Any thoughts?
I am using OS X Snow Leopard, 10.6.6
Edit: removed error on wchar_t [] assignment to use L'\u0E30' instead of L"\u0E30" etc.
I've also updated the compiler settings to use C99 (to add universal char name support). both changes do not fix the problem.
Still no answers on this, does anyone know how to do Unicode ncurses addchar (add_wchar?) ?! Help!

The wide character support is handled by ncursesw. Depending on your distro, ncurses may or may not point there (seemingly not in yours).
Try using -lncursesw instead of -lncurses.
Also, for the locale, try calling setlocale(LC_ALL, "")

This is not 2 characters:
wchar_t characters[]={L"\uE030", L"\uE029"};
You're trying to initialize wchar_t (integer) values with pointers, which should result in an error from the compiler. Either use:
wchar_t characters[]={L'\uE030', L'\uE029'};
or
wchar_t characters[]=L"\uE030\uE029";

cchar_t is defined as:
typedef struct {
attr_t attr;
wchar_t chars[CCHARW_MAX];
} cchar_t;
so you might try:
int add_wchar(int c)
{
cchar_t t = {
0, // .attr
{c, 0} // not sure how .chars works, so best guess
};
return add_wch(t);
}
not at all tested, but should work.

Did you define _XOPEN_SOURCE_EXTENDED before including the ncurses header?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Linux, field_buffer does not provide a UTF-8 string - c

Related

Latin Capital Letter 'E' with Circumflex (Ê)

Let nCurses handle UTF8 character by terminal

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

The wcscoll function, is marked as poisoned, what do I do?

Adding Unicode/UTF8 chars to a ncurses display in C

Categories

Resources