I have this paper, on C language, which requires some greek sentences to be printed in the terminal.
In the code template that is given to us there is this line of code:
system("chcp 1253>nul");
This is supposed to print the greek characters.
In my Ubuntu Terminal I see:
�������� ����� �� ����� ����� ��� �������� ���� ������
So, how can I print greek characters in my terminal?
This is supported out of the box in most Linuxes. The only thing one must do is use
setlocale(LC_ALL, "");
in the beginning of the program. This relies on the fact that UTF-8 is the default choice of encoding for users' locales. The standard says that this call switches to user's current locale. The default is to use the "C" locale which may or may not support national characters.
By default gcc interprets the source code as encoded in UTF-8. Compile-time options exist to change that, but it is recommendedd to keep everything in UTF-8 on Linux. Sources that come from Windows are probably not encoded in UTF-8 and need to be recoded. Use the iconv utility for that;l. If the source is associated with a particular legacy code page, try that code page name as the source encoding.
A C program (comforming to ISO C99 or later, or POSIX.1 or later) that inputs or outputs non-ASCII text should use wide strings, wide I/O, and localization.
For example:
#include <stdlib.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void)
{
/* Tell the C library to use the current locale settings. */
setlocale(LC_ALL, "");
/* Standard output is used with the wide I/O functions. */
fwide(stdout, 1);
/* Print some Greek text. */
wprintf(L"Γειά σου Κόσμε!\n");
return EXIT_SUCCESS;
}
Note that wide string literals are written using L"..." whereas normal (ASCII or narrow) string literals as "...". Similarly, wide character constants (of type wint_t) are written with the L prefix; for example, L'€'.
When compiling, you do need to tell the compiler what character set the source code uses. In Linux, GCC uses the locale settings, but also provides an option -finput-charset=windows-1252 to change it to Windows Western European, for example.
Rather than fiddle with the flags, I recommend you write a Bash helper script, say to-utf8:
#!/bin/bash
if [ $# -lt 2 ] || [ ":$1" = ":-h" ] || [ ":$1" = ":--help" ]; then
printf '\n'
printf 'Usage: %s [ -h | --help ]\n' "$0"
printf ' %s windows-1252 file.c [ ... ]\n' "$0"
printf '\n'
exit 0
fi
charset="$1"
shift 1
Work=$(mktemp) || exit 1
trap "rm -f '$Work'" EXIT
for src in "$#" ; do
iconv -f "$charset//TRANSLIT" -t UTF-8 "$src" > "$Work" || exit $?
sed -e 's|\r$||' "$Work" > "$src" || exit $?
printf '%s: Converted successfully.\n' "$src"
done
exit 0
If you want, you can install that system-wide using
sudo install -o 0 -g 0 -m 0755 to-utf8 /usr/bin/
The first command-line parameter is the source character set (use iconv --list to see them all), followed by a list of files to fix.
The script creates an automatically deleted temporary file. The iconv line converts the character set of each file to UTF-8, saving the result into the temporary file. The sed file changes any CRLF (\r\n) newlines to LF (\n), overwriting the contents of the file.
(Rather than use a second temporary file to hold the contents, having sed to direct its output to the original file, means the original file keeps its owner and group intact.)
Related
I have a .echo_colors file containing some variables for colors in the following format:
export red="\033[0;31m"
this works fine with echo -e, but i want to use this environment variables on a C code. I'm getting the variable via getenv and printing with printf:
#include <stdlib.h>
#include <stdio.h>
int main(){
char* color = getenv("red");
printf("%s", color);
printf("THIS SHOULD BE IN RED\n");
return 0;
}
with this program, i get
\033[0;31mTHIS SHOULD BE IN RED
The string is just being printed and not interpreted as a color code. printf("\033[0;31m") works and prints output in red as i want to. Any ideas for what to do to correct this problem?
Bash doesn't interpret \033 as "ESC" by default, as evident from hex-dumping the variable, but as "backslash, zero, three, three":
bash-3.2$ export red="\033[0;31m"
bash-3.2$ echo $red | xxd
00000000: 5c30 3333 5b30 3b33 316d 0a \033[0;31m.
You'll need to use a different Bash syntax to export the variable to have it interpret the escape sequence (instead of echo -e doing it):
export red=$'\033[0;31m'
i.e.
bash-3.2$ export red=$'\033[0;31m'
bash-3.2$ echo $red | xxd
00000000: 1b5b 303b 3331 6d0a .[0;31m.
Use ^[ (Control-left-bracket) in your shell script to key in the export command, as in
$ export red="<ctrl-[>[0;31m"
in your shell script, the ctrl-[ is actually the escape character (as typed from the terminal, it is possible that you need to escape it with Ctrl-V so it is not interpreted as an editing character, in that case, put a Ctrl-V in front of the escape char)
If you do
$ echo "<ctrl-[>[31mTHIS SHOULD BE IN RED"
THIS SHOULD BE IN RED (in red letters)
$ _
you will see the effect.
I've been writing shell scripts (to make things easier to use) but this time i wanted to do it in C, and one command that i used the most is xxd -r, to "patch" a binary file.
Exemple :
echo "0000050: 2034" | xxd -r - my_binary_file
My question is : is there a way to do something similar in C ?
(I hope my question is clear)
Universally, you could use fopen (with "w" on Unix and "wb" on Windows), fseek, and fwrite.
Of if you prefer posix style, open, seek, and write.
On Win32, the posix equivalents are CreateFile, SetFilePointer, and WriteFile
Well you could still use your command and call it in C code using system() function.
system("echo "0000050: 2034" | xxd -r - my_binary_file")
NOTE: you can dynamically build above string with file name and parameters using sprintf() function and then pass it to the system function() as shown below.
#include <string.h>
#include <stdlib.h>
int main(){
char acBuffer[512]; //Allocate as reuiquired only
memset(acBuffer, 0x00, sizeof(acBuffer));
sprintf(acBuffer, "echo \"%s\" | xxd -r - %s", "0000050: 2034", "YourBinaryFile");
system(acBuffer); //You can check the return type if you want to
return 0;
}
In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.
The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.
Is btowc(3) locale-dependant? I thought that with LANG=en_US.iso88591 it would return some european chars for bytes between 128 and 255, but it returns WEOF.
$ printf '\xFF\n' | iconv -f iso88591
ÿ
$ LANG=en_US.iso88591 ./a.out
255 -1
_
int main() {
int i = 0xFF;
printf("%d %d\n", i, btowc(i));
}
On my system anyway, going:
#include <locale.h>
//...
setlocale(LC_CTYPE, "en_US.iso88591");
causes the output to be 255 255. So this indicates that it does seem to be locale-dependent, although the C standard doesn't explicitly say that it is, as far as I can see. (It says that the mbs* function family are locale-dependent , but doesn't say so for btowc).
Your post looks like you are expecting the LANG environment variable to change how setlocale is done in the program startup. That variable affects how gcc reads your source files, but perhaps it does not have any run-time effect. The C standard says that programs all start up in the locale C.
I'm writting a program in C and I want to have Greek characters in the menu when I run it in cmd.exe . Someone said that in order to include Greek characters you have to use a printf that goes something like this:
printf(charset:IS0-1089:uffe);
but they weren't sure.
Does anyone know how to do that?
Assuming Windows, you can:
set your console font to a Unicode TrueType font:
emit the data using an "ANSI" mechanism
This code prints γειά σου:
#include "windows.h"
int main() {
SetConsoleOutputCP(1253); //"ANSI" Greek
printf("\xE3\xE5\xE9\xDC \xF3\xEF\xF5");
return 0;
}
The hex codes represent γειά σου when encoded as windows-1253. If you use an editor that saves data as windows-1253, you can use literals instead. An alternative would be to use either OEM 737 (that really is a DOS encoding) or use Unicode.
I used SetConsoleOutputCP to set the console code page, but you could type the command chcp 1253 prior to running the program instead.
you can print a unicode char characters by using printf like this :
printf("\u0220\n");
this will print Ƞ
I think this might only work if your console supports Greek. Probably what you want to do is to map characters to the Greek, but using ASCII. For C# but same idea in C.
913 to 936 = upper case Greek letters
945 to 968 = lower case Greek letters
Read more at Suite101: Working with the Greek Alphabet and C#: How to Display ASCII Codes Correctly when Creating a C# Application | Suite101.com at this link.
One way to do this is to print a wide string. Unfortunately, Windows needs a bit of non-standard setup to make this work. This code does that setup inside #if blocks.
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
/* This has been reported not to autodetect correctly on tdm-gcc. */
#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
# if ( _WIN32 || _WIN64 )
# define MS_STDLIB_BUGS 1
# else
# define MS_STDLIB_BUGS 0
# endif
#endif
#if MS_STDLIB_BUGS
# include <io.h>
# include <fcntl.h>
#endif
void init_locale(void)
// Does magic so that wprintf() can work.
{
// Constant for fwide().
static const int wide_oriented = 1;
#if MS_STDLIB_BUGS
// Windows needs a little non-standard magic.
static const char locale_name[] = ".1200";
_setmode( _fileno(stdout), _O_WTEXT );
#else
// The correct locale name may vary by OS, e.g., "en_US.utf8".
static const char locale_name[] = "";
#endif
setlocale( LC_ALL, locale_name );
fwide( stdout, wide_oriented );
}
int main(void)
{
init_locale();
wprintf(L"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");
return EXIT_SUCCESS;
}
This has to be saved as UTF-8 with a BOM in order for older versions of Visual Studio to read it properly. Your console also has to be set to a monospaced Unicode font, such as Lucida Console, to display it properly. To mix wide strings in with ASCII strings, the standard defines the %ls and %lc format specifiers to printf(), although I’ve found these don’t work everywhere.
An alternative is to set the console to UTF-8 mode (On Windows, do this with chcp 65001.) and then print the UTF-8 string with printf(u8"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");. UTF-8 is a second-class citizen on Windows, but that usually works. Try to run that without setting the code page first, though, and you will get garbage.