Write Unicode with write() function - c

I am doing an exercise where I need to write Unicode on the terminal,
using only write() in <unistd.h>.
I can't use :
putchar
setlocale
printf (in fact the exercise is reproducing printf function)
Any "low level" advice on how to perform that?

As Chris wrote in the comments, you need a terminal (e.g. like xterm on Linux) that understands the Unicode and then you just write them. So by default xterm understands UTF8 and is set to a codepage such that this code will give you a UTF8 Smiley Face (☺).
#include <stdio.h>
#include <unistd.h>
char happy[] = { 0xe2, 0x98, 0xba }; /* U+263A */
int main()
{
write(1, happy, 3);
return 0;
}

Related

Latin Capital Letter 'E' with Circumflex (Ê)

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?
Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}
You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.
In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

Let nCurses handle UTF8 character by terminal

Ncurses have whole family of print functions for wide characters: https://linux.die.net/man/3/mvwaddnwstr
The problem with those is that it depends glibc, so if some UTF-8 character were yet not added to glibc those won't be printed properly example is: ✅
The solution I can see is to let ncurses handle displaying the character by "terminal", what I mean by that is that if we would have some sort of print function that would accept hex UTF-8 string and let terminal find correct font like below snippet do:
#include <iostream>
#include <unistd.h>
int main()
{
write(1, "\xe2\x9c\x85", 9);
}
Is is possible with nCurses?

Assigning non-ASCII characters to wide char and printing with printf

How can I assign non-ASCII characters to a wide char and print it to the console? This code down doesn't work:
#include <stdio.h>
int main(void)
{
wchar_t wc = L'ć';
printf("%lc\n", wc);
printf("%ld\n", wc);
return 0;
}
Output:
263
Press [Enter] to close the terminal ...
I'm using MinGW GCC on Windows 7.
You should use wprintf to print wide-character strings:
wprintf(L"%c\n", wc);
I think your calls to printf() fail with an «Illegal byte sequence» error returned in errno, at least that is what happens here on MacOS X with the above example code (and also if using wprintf() instead of printf()). For me it works when I call setlocale(LC_ALL, ""); before the call to printf() so that it stops using the C locale by default:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
wchar_t wc = L'ć';
setlocale(LC_ALL, "");
printf("%lc\n", wc);
return 0;
}
It is unclear what platform/compiler you are on, so YMMV.
use wprintf("%lc\n" ,wc); and you will get your desired output

The wcscoll function, is marked as poisoned, what do I do?

On Mac Os X 10.6.8 I can't compile code using wchar_t functions from the standard library until I have resolved this.
The wcscoll function, together with a bunch of others:
inttypes.h:#pragma GCC poison wcstoimax wcstoumax
stdlib.h:#pragma GCC poison mbstowcs mbtowc wcstombs wctomb
wchar.h:#pragma GCC poison fgetws fputwc fputws fwprintf fwscanf mbrtowc mbsnrtowcs >mbsrtowcs putwc putwchar swprintf swscanf vfwprintf vfwscanf vswprintf vswscanf vwprintf >vwscanf wcrtomb wcscat wcschr wcscmp wcscoll wcscpy wcscspn wcsftime wcsftime wcslcat >wcslcpy wcslen wcsncat wcsncmp wcsncpy wcsnrtombs wcspbrk wcsrchr wcsrtombs wcsspn wcsstr >wcstod wcstof wcstok wcstol wcstold wcstoll wcstoul wcstoull wcswidth wcsxfrm wcwidth >wmemchr wmemcmp wmemcpy wmemmove wmemset wprintf wscanf
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <locale.h>
#include <stdlib.h>
extern int errno;
int main(void)
{
wchar_t pwcs1[3]={L"ØL"}, pwcs2[3]={L"Ål"};
size_t n;
(void)setlocale(LC_ALL, "");
/* set it to zero for checking errors on wcscoll */
errno = 0;
/*
** Let pwcs1 and pwcs2 be two wide character strings to
** compare.
*/
/* n = wcscmp(pwcs1, pwcs2); */
n = wcscoll(pwcs1, pwcs2);
/*
** If errno is set then it indicates some
** collation error.
*/
if (n < 0 ) {
printf("%s\n","Øl mindre en Ål" );
} else if (n == 0) {
printf("%s\n","Øl lik Ål" );
} else {
printf("%s\n","Øl større en Ål" );
}
if(errno != 0){
/* error has occurred... handle error ...*/
}
}
How do I resolve this?
I am a little bit reluctant to mess with the standard library. But I guess I maybe can compile the GNU C library, if Apple hasn't a fix for it? Or is there any other suitable alternatives amongst libraries for handling wide characters (Utf-8).
I am porting something ancient, so I really need to use ncurses, and in order to use ncurses, I need wide characters! :)
Edit: The standard includepath should, as I have understood it be /usr/include. I have been through the include directories of the SDK's I have, and a grep through the header files there reveals the same poison pragma's, as did the latest tarball from http://opensource.apple.com/tarballs/Libc/
Edit++
Hindsightly, those pragmas are there for a reason, and I was looking for alternatives, so right now, I am trying to build glibc, just downloaded, and I have inspected the headers, which are without any "GCC poison" pragmas.
Having read up a little bit, in the configure file of glibc, I guess that isn't an easy option. I guess I'll have to dissect something that works with utf-8 and uses ncurses on mac osX to figure out how.
It might be that I am just overlooking an easy solution. But ncurses falls back on 7-bit ascii, and that is my problem. My goal is to render utf-8 language specific characters, while using ncurses. I need to be able to sort since the format is "propritary" with indexing, forking out a system call to sort records is no option. I also need to be able to know how many codepoints that are in a string of some kind for field-editing, insertion and removal of characters from the display with ncurses.
Thanks!
So far it seems that the ICU library looks promising: I think I will pursue a solution with the ICU library, that as far as I know are shipped with Mac Os X. http://icu-project.org/apiref/icu4c/

Greek letters in Windows Concole

I'm writting a program in C and I want to have Greek characters in the menu when I run it in cmd.exe . Someone said that in order to include Greek characters you have to use a printf that goes something like this:
printf(charset:IS0-1089:uffe);
but they weren't sure.
Does anyone know how to do that?
Assuming Windows, you can:
set your console font to a Unicode TrueType font:
emit the data using an "ANSI" mechanism
This code prints γειά σου:
#include "windows.h"
int main() {
SetConsoleOutputCP(1253); //"ANSI" Greek
printf("\xE3\xE5\xE9\xDC \xF3\xEF\xF5");
return 0;
}
The hex codes represent γειά σου when encoded as windows-1253. If you use an editor that saves data as windows-1253, you can use literals instead. An alternative would be to use either OEM 737 (that really is a DOS encoding) or use Unicode.
I used SetConsoleOutputCP to set the console code page, but you could type the command chcp 1253 prior to running the program instead.
you can print a unicode char characters by using printf like this :
printf("\u0220\n");
this will print Ƞ
I think this might only work if your console supports Greek. Probably what you want to do is to map characters to the Greek, but using ASCII. For C# but same idea in C.
913 to 936 = upper case Greek letters
945 to 968 = lower case Greek letters
Read more at Suite101: Working with the Greek Alphabet and C#: How to Display ASCII Codes Correctly when Creating a C# Application | Suite101.com at this link.
One way to do this is to print a wide string. Unfortunately, Windows needs a bit of non-standard setup to make this work. This code does that setup inside #if blocks.
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
/* This has been reported not to autodetect correctly on tdm-gcc. */
#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
# if ( _WIN32 || _WIN64 )
# define MS_STDLIB_BUGS 1
# else
# define MS_STDLIB_BUGS 0
# endif
#endif
#if MS_STDLIB_BUGS
# include <io.h>
# include <fcntl.h>
#endif
void init_locale(void)
// Does magic so that wprintf() can work.
{
// Constant for fwide().
static const int wide_oriented = 1;
#if MS_STDLIB_BUGS
// Windows needs a little non-standard magic.
static const char locale_name[] = ".1200";
_setmode( _fileno(stdout), _O_WTEXT );
#else
// The correct locale name may vary by OS, e.g., "en_US.utf8".
static const char locale_name[] = "";
#endif
setlocale( LC_ALL, locale_name );
fwide( stdout, wide_oriented );
}
int main(void)
{
init_locale();
wprintf(L"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");
return EXIT_SUCCESS;
}
This has to be saved as UTF-8 with a BOM in order for older versions of Visual Studio to read it properly. Your console also has to be set to a monospaced Unicode font, such as Lucida Console, to display it properly. To mix wide strings in with ASCII strings, the standard defines the %ls and %lc format specifiers to printf(), although I’ve found these don’t work everywhere.
An alternative is to set the console to UTF-8 mode (On Windows, do this with chcp 65001.) and then print the UTF-8 string with printf(u8"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");. UTF-8 is a second-class citizen on Windows, but that usually works. Try to run that without setting the code page first, though, and you will get garbage.

Resources