Write Unicode with write() function

Write Unicode with write() function - c

I am doing an exercise where I need to write Unicode on the terminal,
using only write() in <unistd.h>.
I can't use :
putchar
setlocale
printf (in fact the exercise is reproducing printf function)
Any "low level" advice on how to perform that?

As Chris wrote in the comments, you need a terminal (e.g. like xterm on Linux) that understands the Unicode and then you just write them. So by default xterm understands UTF8 and is set to a codepage such that this code will give you a UTF8 Smiley Face (☺).
#include <stdio.h>
#include <unistd.h>
char happy[] = { 0xe2, 0x98, 0xba }; /* U+263A */
int main()
{
write(1, happy, 3);
return 0;
}

Related

Latin Capital Letter 'E' with Circumflex (Ê)

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?

Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}

You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.

In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

Let nCurses handle UTF8 character by terminal

Ncurses have whole family of print functions for wide characters: https://linux.die.net/man/3/mvwaddnwstr
The problem with those is that it depends glibc, so if some UTF-8 character were yet not added to glibc those won't be printed properly example is: ✅
The solution I can see is to let ncurses handle displaying the character by "terminal", what I mean by that is that if we would have some sort of print function that would accept hex UTF-8 string and let terminal find correct font like below snippet do:
#include <iostream>
#include <unistd.h>
int main()
{
write(1, "\xe2\x9c\x85", 9);
}
Is is possible with nCurses?

Assigning non-ASCII characters to wide char and printing with printf

How can I assign non-ASCII characters to a wide char and print it to the console? This code down doesn't work:
#include <stdio.h>
int main(void)
{
wchar_t wc = L'ć';
printf("%lc\n", wc);
printf("%ld\n", wc);
return 0;
}
Output:
263
Press [Enter] to close the terminal ...
I'm using MinGW GCC on Windows 7.

You should use wprintf to print wide-character strings:
wprintf(L"%c\n", wc);

I think your calls to printf() fail with an «Illegal byte sequence» error returned in errno, at least that is what happens here on MacOS X with the above example code (and also if using wprintf() instead of printf()). For me it works when I call setlocale(LC_ALL, ""); before the call to printf() so that it stops using the C locale by default:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
wchar_t wc = L'ć';
setlocale(LC_ALL, "");
printf("%lc\n", wc);
return 0;
}
It is unclear what platform/compiler you are on, so YMMV.

use wprintf("%lc\n" ,wc); and you will get your desired output

The wcscoll function, is marked as poisoned, what do I do?

On Mac Os X 10.6.8 I can't compile code using wchar_t functions from the standard library until I have resolved this.
The wcscoll function, together with a bunch of others:
inttypes.h:#pragma GCC poison wcstoimax wcstoumax
stdlib.h:#pragma GCC poison mbstowcs mbtowc wcstombs wctomb
wchar.h:#pragma GCC poison fgetws fputwc fputws fwprintf fwscanf mbrtowc mbsnrtowcs >mbsrtowcs putwc putwchar swprintf swscanf vfwprintf vfwscanf vswprintf vswscanf vwprintf >vwscanf wcrtomb wcscat wcschr wcscmp wcscoll wcscpy wcscspn wcsftime wcsftime wcslcat >wcslcpy wcslen wcsncat wcsncmp wcsncpy wcsnrtombs wcspbrk wcsrchr wcsrtombs wcsspn wcsstr >wcstod wcstof wcstok wcstol wcstold wcstoll wcstoul wcstoull wcswidth wcsxfrm wcwidth >wmemchr wmemcmp wmemcpy wmemmove wmemset wprintf wscanf
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <locale.h>
#include <stdlib.h>
extern int errno;
int main(void)
{
wchar_t pwcs1[3]={L"ØL"}, pwcs2[3]={L"Ål"};
size_t n;
(void)setlocale(LC_ALL, "");
/* set it to zero for checking errors on wcscoll */
errno = 0;
/*
** Let pwcs1 and pwcs2 be two wide character strings to
** compare.
*/
/* n = wcscmp(pwcs1, pwcs2); */
n = wcscoll(pwcs1, pwcs2);
/*
** If errno is set then it indicates some
** collation error.
*/
if (n < 0 ) {
printf("%s\n","Øl mindre en Ål" );
} else if (n == 0) {
printf("%s\n","Øl lik Ål" );
} else {
printf("%s\n","Øl større en Ål" );
}
if(errno != 0){
/* error has occurred... handle error ...*/
}
}
How do I resolve this?
I am a little bit reluctant to mess with the standard library. But I guess I maybe can compile the GNU C library, if Apple hasn't a fix for it? Or is there any other suitable alternatives amongst libraries for handling wide characters (Utf-8).
I am porting something ancient, so I really need to use ncurses, and in order to use ncurses, I need wide characters! :)
Edit: The standard includepath should, as I have understood it be /usr/include. I have been through the include directories of the SDK's I have, and a grep through the header files there reveals the same poison pragma's, as did the latest tarball from http://opensource.apple.com/tarballs/Libc/
Edit++
Hindsightly, those pragmas are there for a reason, and I was looking for alternatives, so right now, I am trying to build glibc, just downloaded, and I have inspected the headers, which are without any "GCC poison" pragmas.
Having read up a little bit, in the configure file of glibc, I guess that isn't an easy option. I guess I'll have to dissect something that works with utf-8 and uses ncurses on mac osX to figure out how.
It might be that I am just overlooking an easy solution. But ncurses falls back on 7-bit ascii, and that is my problem. My goal is to render utf-8 language specific characters, while using ncurses. I need to be able to sort since the format is "propritary" with indexing, forking out a system call to sort records is no option. I also need to be able to know how many codepoints that are in a string of some kind for field-editing, insertion and removal of characters from the display with ncurses.
Thanks!

So far it seems that the ICU library looks promising: I think I will pursue a solution with the ICU library, that as far as I know are shipped with Mac Os X. http://icu-project.org/apiref/icu4c/

Greek letters in Windows Concole

I'm writting a program in C and I want to have Greek characters in the menu when I run it in cmd.exe . Someone said that in order to include Greek characters you have to use a printf that goes something like this:
printf(charset:IS0-1089:uffe);
but they weren't sure.
Does anyone know how to do that?

Assuming Windows, you can:
set your console font to a Unicode TrueType font:
emit the data using an "ANSI" mechanism
This code prints γειά σου:
#include "windows.h"
int main() {
SetConsoleOutputCP(1253); //"ANSI" Greek
printf("\xE3\xE5\xE9\xDC \xF3\xEF\xF5");
return 0;
}
The hex codes represent γειά σου when encoded as windows-1253. If you use an editor that saves data as windows-1253, you can use literals instead. An alternative would be to use either OEM 737 (that really is a DOS encoding) or use Unicode.
I used SetConsoleOutputCP to set the console code page, but you could type the command chcp 1253 prior to running the program instead.

you can print a unicode char characters by using printf like this :
printf("\u0220\n");
this will print Ƞ

I think this might only work if your console supports Greek. Probably what you want to do is to map characters to the Greek, but using ASCII. For C# but same idea in C.
913 to 936 = upper case Greek letters
945 to 968 = lower case Greek letters
Read more at Suite101: Working with the Greek Alphabet and C#: How to Display ASCII Codes Correctly when Creating a C# Application | Suite101.com at this link.

One way to do this is to print a wide string. Unfortunately, Windows needs a bit of non-standard setup to make this work. This code does that setup inside #if blocks.
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
/* This has been reported not to autodetect correctly on tdm-gcc. */
#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
# if ( _WIN32 || _WIN64 )
# define MS_STDLIB_BUGS 1
# else
# define MS_STDLIB_BUGS 0
# endif
#endif
#if MS_STDLIB_BUGS
# include <io.h>
# include <fcntl.h>
#endif
void init_locale(void)
// Does magic so that wprintf() can work.
{
// Constant for fwide().
static const int wide_oriented = 1;
#if MS_STDLIB_BUGS
// Windows needs a little non-standard magic.
static const char locale_name[] = ".1200";
_setmode( _fileno(stdout), _O_WTEXT );
#else
// The correct locale name may vary by OS, e.g., "en_US.utf8".
static const char locale_name[] = "";
#endif
setlocale( LC_ALL, locale_name );
fwide( stdout, wide_oriented );
}
int main(void)
{
init_locale();
wprintf(L"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");
return EXIT_SUCCESS;
}
This has to be saved as UTF-8 with a BOM in order for older versions of Visual Studio to read it properly. Your console also has to be set to a monospaced Unicode font, such as Lucida Console, to display it properly. To mix wide strings in with ASCII strings, the standard defines the %ls and %lc format specifiers to printf(), although I’ve found these don’t work everywhere.
An alternative is to set the console to UTF-8 mode (On Windows, do this with chcp 65001.) and then print the UTF-8 string with printf(u8"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");. UTF-8 is a second-class citizen on Windows, but that usually works. Try to run that without setting the code page first, though, and you will get garbage.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Write Unicode with write() function - c

I am doing an exercise where I need to write Unicode on the terminal, using only write() in <unistd.h>. I can't use : putchar setlocale printf (in fact the exercise is reproducing printf function) Any "low level" advice on how to perform that?

Related

Latin Capital Letter 'E' with Circumflex (Ê)

Let nCurses handle UTF8 character by terminal

Assigning non-ASCII characters to wide char and printing with printf

The wcscoll function, is marked as poisoned, what do I do?

Greek letters in Windows Concole

Categories

Resources