Adding Unicode/UTF8 chars to a ncurses display in C - c

I'm attempting to add wchar_t Unicode characters to an ncurses display in C.
I have an array:
wchar_t characters[]={L'\uE030', L'\uE029'}; // containing 2 thai letters, for example
And I later try to add a wchar_t from the array to the ncurses display with:
add_wch(characters[0]);
To provide a bit more info, doing this with ASCII works ok, using:
char characters[]={'A', 'B'};
// and later...
addch(characters[0]);
To setup the locale, I add the include...
#include <locale.h>
// in main()
setlocale(LC_CTYPE,"C-UTF-8");
The ncurses include is:
#include <ncurses.h>
Compiling with :
(edit: added c99 standard, for universal char name support.)
gcc -o ncursesutf8 ncursesutf8.c -lm -lncurses -Wall -std=c99
I get the following compilation warning (of course the executable will fail):
ncursesutf8.c:48: warning: implicit declaration of function ‘add_wch’
I've tried just using addch which appears to be macro'ed to work with wchar_t but when I do that the Unicode chars do not show up, instead they show as ASCII chars instead.
Any thoughts?
I am using OS X Snow Leopard, 10.6.6
Edit: removed error on wchar_t [] assignment to use L'\u0E30' instead of L"\u0E30" etc.
I've also updated the compiler settings to use C99 (to add universal char name support). both changes do not fix the problem.
Still no answers on this, does anyone know how to do Unicode ncurses addchar (add_wchar?) ?! Help!

The wide character support is handled by ncursesw. Depending on your distro, ncurses may or may not point there (seemingly not in yours).
Try using -lncursesw instead of -lncurses.
Also, for the locale, try calling setlocale(LC_ALL, "")

This is not 2 characters:
wchar_t characters[]={L"\uE030", L"\uE029"};
You're trying to initialize wchar_t (integer) values with pointers, which should result in an error from the compiler. Either use:
wchar_t characters[]={L'\uE030', L'\uE029'};
or
wchar_t characters[]=L"\uE030\uE029";

cchar_t is defined as:
typedef struct {
attr_t attr;
wchar_t chars[CCHARW_MAX];
} cchar_t;
so you might try:
int add_wchar(int c)
{
cchar_t t = {
0, // .attr
{c, 0} // not sure how .chars works, so best guess
};
return add_wch(t);
}
not at all tested, but should work.

Did you define _XOPEN_SOURCE_EXTENDED before including the ncurses header?

Related

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

I've been writing a Linux program that's meant to write non-English characters on the terminal, I've recently been porting it to Windows, and I've run into some issues, when trying to change the code page and the font of the terminal, having the symbolic constant _POSIX_C_SOURCE previously defined seems to change the behavior of the code, and makes it incapable of properly printing non-English characters, for reference, this is my code.
#include <windows.h>
#include <stdio.h>
int main()
{
SetConsoleCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8)
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX cfie;
ZeroMemory(&cfie, sizeof(cfie));
cfie.cbSize = sizeof(cfie);
lstrcpyW(cfie.FaceName, L"Lucida Console");
SetCurrentConsoleFontEx(hStdOut, 0, &cfie);
printf("Ћирилични текст\n");
return 0;
}
This is what the program prints out depending on whether I do or don't define the constant in a command line argument while compiling.
C:\Users\User\Desktop>gcc test.c
C:\Users\User\Desktop>a.exe
Ћириличан текст
C:\Users\User\Desktop>gcc -D_POSIX_C_SOURCE=2 test.c
C:\Users\User\Desktop>a.exe
������������������ ����������
This is because outputting to standard output is done literally byte-by-byte when POSIX compliance is in effect. It uses a different implementation of what is done inside the printf function.

using c11 standard with clang for use of strcpy_s

I'm running OS X Sierra and trying to compile a c program that uses strcpy_s, but my installed clang compiler is using the c99 standard, but from what I've read strcpy_s requires c11.
Here's the code I'm trying to compile
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char source[] = "Test string";
char destination[50];
if(strcpy_s(destination, sizeof(destination), source))
printf("string copied - %s",destination);
return 0;
}
And here's the command I'm using to compile
$ clang copytest.c -o copytest
copytest.c:11:5: warning: implicit declaration of function 'strcpy_s' is invalid in C99 [-Wimplicit-function-declaration]
if(strcpy_s(copied_string, sizeof(copied_string), source))
^
1 warning generated.
Undefined symbols for architecture x86_64:
"_strcpy_s", referenced from:
_main in copytest-e1e05a.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I've tried compiling with the standard flag...
clang -std=c11 copytest.c -o copytest
but I get the same exact "invalid in c99" warning. I've also tried compiling with gcc instead, and I still get the same c99 warning.
I tried upgrading via homebrew which shows the following
Warning: gcc 9.2.0 is already installed and up-to-date
I have clang version 9.0.0
$ clang -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
My xcode version is Xcode 9.2, which from everything I've read should come with c11 support.
Am I doing something wrong with the compiling, is my code itself incorrect? This is the only similar question I found on here, but it didn't even have an answer. Thanks
The _s functions are an optional component of the 2011 C standard (Annex K), and, to the best of my knowledge, they have never been implemented as an integrated part of any C library. Portable code cannot rely on their availability. (Microsoft's C compilers for Windows implement an overlapping set of functions with the same names but different semantics (and sometimes even a different argument list), and at least one bolt-on implementation does exist. See this old answer, and the much longer question and answer it links to, for more detail.)
Also, the _s functions do not solve the problem that they were intended to solve (unsafe string handling); it is necessary to put actual thought into a proper fix for each use of strcpy, instead of globally search-and-replacing strcpy with strcpy_s, etc., as was the hope of the authors of Annex K. If you do put appropriate amounts of thought into a proper fix, you won't need any of the _s functions to implement it. For instance, here's a fixed version of your example program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char source[] = "Test string";
char destination[50];
size_t srclen = strlen(source);
if (srclen + 1 > sizeof destination) {
fprintf(stderr, "string too long to copy - %zu bytes, need %zu\n",
sizeof destination, srclen + 1);
return 1;
} else {
memcpy(destination, source, srclen + 1);
printf("string copied - %s\n", destination);
return 0;
}
}
And here's an even better version:
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
if (argc != 2) {
fprintf(stderr, "usage: ./test 'message of arbitrary length'\n");
return 1;
}
char *destination = strdup(argv[1]);
if (!destination) {
perror("strdup");
return 1;
}
printf("string copied - '%s'\n", destination);
free(destination);
return 0;
}
Therefore: Never use any of the _s functions. If you need to write a program that compiles on Windows with no warnings, put #define _CRT_SECURE_NO_WARNINGS 1 at the top of each file to make MSVC stop giving you bad advice.
If all, or even most, programmers wrote the suggested solutions above all the time, then these functions wouldn't be needed. We have a lot of evidence that many programmers do not write such careful code, going back to Spaf's notes on the Robert T Morris finger worm in the late 1980's.
You also would prefer not to have to duplicate 10 lines of code for every call site of strcpy. That leads to unreadable code. More so, what zwol suggests is really just an implementation of the function he claims we don't need. A good programmer would take that, stick it in a header, and name it something helpful, maybe checked_strcpy? Or even strcpy_s?
The second suggested implementation, which is purportedly better is not - it would cause an allocation when we might already have a buffer. Allocations are expensive, using this approach everywhere would be bad for perf. It also introduces new complexity because now we'd have to free every duplicated string - imagine doing that with repeated calls to strcat.
There is a fairly nicely done cross-platform implementation here:
https://github.com/intel/safestringlib
I'm also not sure whether this is actually any different, but worth taking a look - https://github.com/coruus/safeclib

Check if a system implements a function

I'm creating a cross-system application. It uses, for example, the function itoa, which is implemented on some systems but not all. If I simply provide my own itoa implementation:
header.h:115:13: error: conflicting types for 'itoa'
extern void itoa(int, char[]);
In file included from header.h:2:0,
from file.c:2:0,
c:\path\to\mingw\include\stdlib.h:631:40: note: previous declaration of 'itoa' was here
_CRTIMP __cdecl __MINGW_NOTHROW char* itoa (int, char*, int);
I know I can check if macros are predefined and define them if not:
#ifndef _SOME_MACRO
#define _SOME_MACRO 45
#endif
Is there a way to check if a C function is pre-implemented, and if not, implement it? Or to simply un-implement a function?
Given you have already written your own implementation of itoa(), I would recommend that you rename it and use it everywhere. At least you are sure you will get the same behavior on all platforms, and avoid the linking issue.
Don't forget to explain your choice in the comments of your code...
I assume you are using GCC, as I can see MinGW in your path... there's one way the GNU linker can take care of this for you. So you don't know whether there is an itoa implementation or not. Try this:
Create a new file (without any headers) called my_itoa.c:
char *itoa (int, char *, int);
char *my_itoa (int a, char *b, int c)
{
return itoa(a, b, c);
}
Now create another file, impl_itoa.c. Here, write the implementation of itoa but add a weak alias:
char* __attribute__ ((weak)) itoa(int a, char *b, int c)
{
// implementation here
}
Compile all of the files, with impl_itoa.c at the end.
This way, if itoa is not available in the standard library, this one will be linked. You can be confident about it compiling whether or not it's available.
Ajay Brahmakshatriya's suggestion is a good one, but unfortunately MinGW doesn't support weak definition last I checked (see https://groups.google.com/forum/#!topic/mingwusers/44B4QMPo8lQ, for instance).
However, I believe weak references do work in MinGW. Take this minimal example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
__attribute__ ((weak)) char* itoa (int, char*, int);
char* my_itoa (int a, char* b, int c)
{
if(itoa != NULL) {
return itoa(a, b, c);
} else {
// toy implementation for demo purposes
// replace with your own implementation
strcpy(b, "no itoa");
return b;
}
}
int main()
{
char *str = malloc((sizeof(int)*3+1));
my_itoa(10, str, 10);
printf("str: %s\n", str);
return 0;
}
If the system provides an itoa implementation, that should be used and the output would be
str: 10
Otherwise, you'll get
str: no itoa
There are two really important related points worth making here along the "don't do it like this" lines:
Don't use atoi because it's not safe.
Don't use atoi because it's not a standard function, and there are good standard functions (such as snprintf) which are available to do what you want.
But, putting all this aside for one moment, I want to introduce you to autoconf, part of the GNU build system. autoconf is part of a very comprehensive, very portable set of tools which aim to make it easier to write code which can be built successfully on a wide range of target systems. Some would argue that autoconf is too complex a system to solve just the one problem you pose with just one library function, but as any program grows, it's likely to face more hurdles like this, and getting autoconf set up for your program now will put you in a much stronger position for the future.
Start with a file called Makefile.in which contains:
CFLAGS=--ansi --pedantic -Wall -W
program: program.o
program.o: program.c
clean:
rm -f program.o program
and a file called configure.ac which contains:
AC_PREREQ([2.69])
AC_INIT(program, 1.0)
AC_CONFIG_SRCDIR([program.c])
AC_CONFIG_HEADERS([config.h])
# Checks for programs.
AC_PROG_CC
# Checks for library functions.
AH_TEMPLATE([HAVE_ITOA], [Set to 1 if function atoi() is available.])
AC_CHECK_FUNC([itoa],
[AC_DEFINE([HAVE_ITOA], [1])]
)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT
and a file called program.c which contains:
#include <stdio.h>
#include "config.h"
#ifndef HAVE_ITOA
/*
* WARNING: This code is for demonstration purposes only. Your
* implementation must have a way of ensuring that the size of the string
* produced does not overflow the buffer provided.
*/
void itoa(int n, char* p) {
sprintf(p, "%d", n);
}
#endif
int main(void) {
char buffer[100];
itoa(10, buffer);
printf("Result: %s\n", buffer);
return 0;
}
Now run the following commands in turn:
autoheader: This generates a new file called config.h.in which we'll need later.
autoconf: This generates a configuration script called configure
./configure: This runs some tests, including checking that you have a working C compiler and, because we've asked it to, whether an itoa function is available. It writes its results into the file config.h for later.
make: This compiles and links the program.
./program: This finally runs the program.
During the ./configure step, you'll see quite a lot of output, including something like:
checking for itoa... no
In this case, you'll see that the config.h find contains the following lines:
/* Set to 1 if function atoi() is available. */
/* #undef HAVE_ITOA */
Alternatively, if you do have atoi available, you'll see:
checking for itoa... yes
and this in config.h:
/* Set to 1 if function atoi() is available. */
#define HAVE_ITOA 1
You'll see that the program can now read the config.h header and choose to define itoa if it's not present.
Yes, it's a long way round to solve your problem, but you've now started using a very powerful tool which can help you in a great number of ways.
Good luck!

Linux, field_buffer does not provide a UTF-8 string

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.
The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

The wcscoll function, is marked as poisoned, what do I do?

On Mac Os X 10.6.8 I can't compile code using wchar_t functions from the standard library until I have resolved this.
The wcscoll function, together with a bunch of others:
inttypes.h:#pragma GCC poison wcstoimax wcstoumax
stdlib.h:#pragma GCC poison mbstowcs mbtowc wcstombs wctomb
wchar.h:#pragma GCC poison fgetws fputwc fputws fwprintf fwscanf mbrtowc mbsnrtowcs >mbsrtowcs putwc putwchar swprintf swscanf vfwprintf vfwscanf vswprintf vswscanf vwprintf >vwscanf wcrtomb wcscat wcschr wcscmp wcscoll wcscpy wcscspn wcsftime wcsftime wcslcat >wcslcpy wcslen wcsncat wcsncmp wcsncpy wcsnrtombs wcspbrk wcsrchr wcsrtombs wcsspn wcsstr >wcstod wcstof wcstok wcstol wcstold wcstoll wcstoul wcstoull wcswidth wcsxfrm wcwidth >wmemchr wmemcmp wmemcpy wmemmove wmemset wprintf wscanf
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <locale.h>
#include <stdlib.h>
extern int errno;
int main(void)
{
wchar_t pwcs1[3]={L"ØL"}, pwcs2[3]={L"Ål"};
size_t n;
(void)setlocale(LC_ALL, "");
/* set it to zero for checking errors on wcscoll */
errno = 0;
/*
** Let pwcs1 and pwcs2 be two wide character strings to
** compare.
*/
/* n = wcscmp(pwcs1, pwcs2); */
n = wcscoll(pwcs1, pwcs2);
/*
** If errno is set then it indicates some
** collation error.
*/
if (n < 0 ) {
printf("%s\n","Øl mindre en Ål" );
} else if (n == 0) {
printf("%s\n","Øl lik Ål" );
} else {
printf("%s\n","Øl større en Ål" );
}
if(errno != 0){
/* error has occurred... handle error ...*/
}
}
How do I resolve this?
I am a little bit reluctant to mess with the standard library. But I guess I maybe can compile the GNU C library, if Apple hasn't a fix for it? Or is there any other suitable alternatives amongst libraries for handling wide characters (Utf-8).
I am porting something ancient, so I really need to use ncurses, and in order to use ncurses, I need wide characters! :)
Edit: The standard includepath should, as I have understood it be /usr/include. I have been through the include directories of the SDK's I have, and a grep through the header files there reveals the same poison pragma's, as did the latest tarball from http://opensource.apple.com/tarballs/Libc/
Edit++
Hindsightly, those pragmas are there for a reason, and I was looking for alternatives, so right now, I am trying to build glibc, just downloaded, and I have inspected the headers, which are without any "GCC poison" pragmas.
Having read up a little bit, in the configure file of glibc, I guess that isn't an easy option. I guess I'll have to dissect something that works with utf-8 and uses ncurses on mac osX to figure out how.
It might be that I am just overlooking an easy solution. But ncurses falls back on 7-bit ascii, and that is my problem. My goal is to render utf-8 language specific characters, while using ncurses. I need to be able to sort since the format is "propritary" with indexing, forking out a system call to sort records is no option. I also need to be able to know how many codepoints that are in a string of some kind for field-editing, insertion and removal of characters from the display with ncurses.
Thanks!
So far it seems that the ICU library looks promising: I think I will pursue a solution with the ICU library, that as far as I know are shipped with Mac Os X. http://icu-project.org/apiref/icu4c/

Resources