Print Unicode characters in C, using ncurses

Print Unicode characters in C, using ncurses - c

I have to draw a box in C, using ncurses;
First, I have defined some values for simplicity:
#define RB "\e(0\x6a\e(B" (ASCII 188,Right bottom, for example)
I have compiled with gcc, over Ubuntu, with -finput-charset=UTF-8 flag.
But, if I try to print with addstr or printw, I get the hexa code.
What I`m doing wrong?

ncurses defines the values ACS_HLINE, ACS_VLINE, ACS_ULCORNER, ACS_URCORNER, ACS_LLCORNER and ACS_LRCORNER. You can use those constants in addch and friends, which should result in your seeing the expected box characters. (There's lots more ACS characters; you'll find a complete list in man addch.)
ncurses needs to know what it is drawing because it needs to know exactly where the cursor is all the time. Outputting console control sequences is not a good idea; if ncurses knows how to handle the sequence, it has its own abstraction for the feature and you should use that abstraction. The ACS ("alternate character set") defines are one of those abstractions.

A few issues:
if your program writes something like "\e(0\x6a\e(B" using addstr, then ncurses (any curses implementation) will translate the individual characters to printable form as described in the addch manual page.
ncurses supports line-drawing for commonly-used pseudo-graphics using symbols (such as ACS_HLINE) which are predefined characters with the A_ALTCHARSET attribute combined. You can read about those in the Line Graphics section of the addch manual page.
the code 0x6a is ASCII j, which (given a VT100-style mapping) would be the lower left corner. The curses symbol for that is ACS_LRCORNER.
you cannot write the line-drawing characters with addstr; instead addch, addchstr are useful. There are also functions oriented to line-drawing (see box and friends).
running in Ubuntu, your locale encoding is probably UTF-8. To make your program work properly, it should initialize the locale as described in the Initialization section of the ncurses manual page. In particular:
setlocale(LC_ALL, "");
Also, your program should link against the ncursesw library (-lncursesw) to use UTF-8, rather than just ncurses (-lncurses).
when compiling on Ubuntu, to use the proper header definitions, you should define _GNU_SOURCE.

BTW, maybe I'm probably arriving somewhat late to the party but I'll give you some insight that might or not shed some light and skills for your "box drawing" needs.
As of 2020 I'm involved in a funny project on my own mixing Swift + Ncurses (under OSX for now, but thinking about mixing it with linux). Apparently it works flawlessly.
The thing is, as I'm using Swift, internally it all reduces to "importing .h and .c" files from some Darwin.ncurses library the MacOS Xcode/runtime offers.
That means (I hope) my newly acquired skills might be useful for you because apparently we're using the very same .h and .c files for our ncurses needs. (or at least they should be really similar)
Said that:
As of now, I "ignored" ACS_corner chars (I can't find them under swift/Xcode/Darwin.ncurses runtime !!!) in favour of pure UTF "corner chars", which also exist in the unicode pointspace, look:
https://en.wikipedia.org/wiki/Box-drawing_character
What does it mean? Whenever I want to use some drawing box chars around I just copy&paste pure UTF-8 chars into my strings, and I send these very strings onto addstr.
Why does it work? Because as someone also answered above, before initializing ncurses with initscr(), I just claimed "I want a proper locale support" in the form of a setlocale(LC_ALL, ""); line.
What did I achieve? Apparently pure magic. And very comfortable one, as I just copy paste box chars inside my normal strings. At least under Darwin.ncurses/OSX Mojave I'm getting, not only "bounding box chars", but also full UTF8 support.
Try the "setlocale(LC_ALL, ""); initscr();" approach and tell us if "drawing boxes" works also for you under a pure C environment just using UTF8 bounding box chars.
Greetings and happy ncursing!

Related

save and clear terminal window in linux [duplicate]

I'm having a hard time even googling this, because I don't know the right keywords. Some command-line apps (such as vi and less) take over the whole console screen and present an interactive interface to the user. Upon exiting such an app, the screen is returned to the state it was in before the app was launched. I want to write a program that behaves in this fashion, but again, I don't even know what this is called, so I can't find any documentation for how it's accomplished.
So, my question is threefold:
What keywords can I use to find documentation on this?
If you are so inclined, links to such documentation would be helpful.
Lastly, can I accomplish this in a scripting language like Ruby, or even bash? I have no problem with C, but the environment I work in is more amenable to interpreted languages.

As said in some comments, you are looking for ncurses. The Linux Documentation Project have a very good HOWTO on ncurses for C that I used myself to start on it
https://tldp.org/HOWTO/NCURSES-Programming-HOWTO/

The feature you are describing is the alternate screen buffer. I think that [N]Curses will enable this by default. There are certainly curses bindings for Ruby, Python, and other scripting languages.

you can even access ncurses in bash by using the tput program. The whole ncurses library (like curses before it) works by sending escape sequences to the terminal. The xterm program emulates a vt100 terminal (and also a Tektronic terminal) and there were various combinations of characters which would move the cursor, clear the screen, draw various characters etc. These would generally start with an escape character, hence the name: escape sequence. You also sometimes see these escape sequences in people's PS1 shell variables with the \e to provide the escape character; often used to colour the prompt or set the window title.
tput refers to the terminfo database to figure out what the escape sequences are to perform the functions you've asked it to do.
see the manual page, type:
man 5 terminfo
for more details

How does ncurses output non-ascii characters?

I'd like to know how ncurses (a c library) manages to put characters like ├, despite them not (to the best of my knowledge) being part of ASCII.
I would have assumed it was just drawing them pixel by pixel, but you can copy/paste them out of the terminal (in MacOS).

ncurses puts characters such as ├ on the screen by assuming that your locale environment variables (LC_ALL and/or LC_CTYPE) match the terminal on which you are displaying. The environment variables indicate the encoding (e.g., UTF-8). There are other encodings and terminals which support those encodings, but generally speaking you'll mostly see UTF-8. If the environment and terminal cooperate, things "just work":
at startup, ncurses checks for the locale which a program has initialized, via setlocale, and determines if that uses UTF-8. It uses that information later.
when a program adds character strings, e.g., using addstr, ncurses uses the character-type information (set as a side-effect of calling setlocale), and uses standard C library functions for combining sequences of bytes which make up a multi-byte character, and converting those into wide characters. It stores those wide characters internally, and
when writing to the terminal, ncurses reverses the process, converting from wide characters to use the encoding assumed to be supported by the terminal (assuming that your locale environment matches the terminal).
However —
The character indicated ├ happens to be a special case. That is one of the graphic characters used for line-drawing, which predate Unicode and UTF-8. curses has names for these graphic characters, making it simple to refer to them, e.g., ACS_LTEE (the ├ is a left-tee):
Before UTF-8 came along to complicate things, developers came up with a scheme using a table of these graphic characters by adapting the escape sequences used for the VT100 (late 1970s) and the AT&T 4410 and 5410 terminals (apparently the early 1980s since the latter were in use by 1984) for drawing their graphic characters.
AT&T SystemV curses provided support for these graphic characters from the mid-1980s. BSD curses never did that...
Unicode (roughly 1990 and later) provided most of the same glyphs using a different encoding. There are a few omissions (the most noticeable are the scan lines above/below the one used for horizontal lines), but once UTF-8 got into use in the early 2000s, it was logical to extend ncurses to use these characters.
ncurses looks at the locale settings, but prefers using the terminal description for these graphic characters except for cases where that is known to not work — and will assume that the terminal can display the Unicode equivalents for these characters if the terminal is assumed to use UTF-8. It uses a table for this purpose (SystemV curses and its successor X/Open Curses didn't do any of this — NetBSD curses adapted the table from ncurses sometime after 2010).
Further reading:
NCURSES_NO_UTF8_ACS
Line Graphics (in curs_addch(3x))
Line Graphics (in curs_add_wch(3x))

There is more than one version of ncurses, for more than one platform, and if you really want to know, check the source. However, none of them would draw a character pixel-by-pixel; that isn’t something a library running inside a terminal emulator does.
Modern versions of the C standard library, POSIX and ncurses all support writing wide characters to the console and conversion between wide and multibyte strings. Today, wide characters are normally UTF-16 or UTF-32 and multibyte strings are normally UTF-8. You can see the documentation for <wchar.h> and ncursesw for more information.
Note that C11 does have support for UTF-8 literals, through the u8 prefix.
A program that’s concerned about portability with systems where the local multibyte encoding is something other than UTF-8 can use another library such as the C++ standard library or ICU to convert between UTF-8 and wide-character strings, then display those with curses.
You might need to #define _XOPEN_SOURCE 700, or the appropriate value for the version of the standard you are targeting, and with some versions of the libraries, also #define _XOPEN_SOURCE_EXTENDED 1, to get your system libraries to let you use functions such as addwstr().
However, many programs might simply send strings of char encoded in UTF-8 to the console and assume it can handle them. I don’t recommend this approach, but it works on most Linux systems in 2017.

Why use `code' for embracing code in a comment?

I just read some glibc 2.22 source code (the source file at /sysdeps/posix/readdir.c) and came across this comment:
/* The only version of `struct dirent*' that lacks `d_reclen' is fixed-size. */
(Newline removed.)
The weird emphasis of the type and identifier bugs me. Why not use just single-quotes or des accents graves? Is there some specific reason behind this? Might it be some character set conversion mistake?
I also searched the glibc style guide but didn't found anything concerning code formatting in comments.

Convention.
As you no doubt know, comments are ignored by the C compiler. They make no difference, but the developer who wrote that comment probably preferred their appearance to plain single quotes.
ASCII
Using non-ASCII characters (unicode) in source code is a relatively new practice (moreso when English-authored source code is concerned), and there are still compatibility issues remaining in many programming language implementations. Unicode in program input/output is a different thing entirely (and that isn't perfect either). In program source code, unicode characters are still quite uncommon, and I doubt we'll see them make much of an appearance in older code like the POSIX header files for some time, yet.
Source code filters
There are some source code filters, such as document generation packages like the the well-known Javadoc, that look for specific comment strings, such as /** to open a comment. Some of these programs may treat your `quoted strings' specially, but that quoting convention is older than most (all?) of the source code filters that might give them special treatment, so that's probably not it.
Backticks for command substutution
There is a strong convention in many scripting languages (as well as StackExchange markdown!) to use backticks (``) to execute commands and include the output, such as in shell scripts:
echo "The current directory is `pwd`"
Which would output something like:
The current directory is /home/type_outcast
This may be part of the reason behind the convention, however I believe Cristoph has a point as well, about the quotes being balanced, similar to properly typeset opening and closing quotation marks.
So, again, and in a word: `convention'.

This goes back to early computer fonts, where backtick and apostrophe were displayed as mirror images. In fact, early versions of the ASCII standard blessed this usage.
Paraphrased from RFC 20, which is easier to get at than the actual standards like USAS X3.4-1968:
Column/Row Symbol Name
2/7 ' Apostrophe (Closing Single Quotation Mark Acute Accent)
6/0 ` Grave Accent (Opening Single Quotation Mark)
This heritage can also be seen in tools like troff, m4 and TeX, which also used this quoting style originally.
Note that syntactically, there is a benefit to having different opening and closing marks: they can be nested properly.

Access Keystrokes in C

I am trying to access keystrokes in C. I can access alphanumeric keys. How can I access Control, Shift and Alt key?
Plus I read somewhere that sometimes while entering text in console, OS masks backspace key. I would like to know where user pressed backspace key. It's not same as knowing when '\n' was pressed.
GNU C. Ubuntu 11.

Dietrich Epp answered in a comment: use ncurses library.
See also this question
And you might make an X11 client graphical application; in that case use a graphical toolkit library like GTK or Qt
If you want to make a console application, use ncurses or perhaps readline
And your question, when taken literally, has no sense: the strict C standard don't know what a key or a keystroke is (the only I/O operations mentioned in the standard are related to <stdio.h> thru FILE). This is why most people uses additional libraries and standards (in addition of those required by ISO C), eg. Posix...

The simple answer is "you can't", at least not easily or without downloading third party libraries.
Most C programs shouldn't have to know anything about the keyboard or the screen. Standard C is only concerned with reading from and writing to files (the keyboard and screen being special-case files).
Assuming you have a good reason for wanting to access the keyboard directly, you should be looking at the ncurses library (http://www.gnu.org/software/ncurses/ncurses.html). Ncurses knows how many different (virtual) terminals and keyboards work, and it presents a uniform interface to them. It lets you paint the screen and create a substitute graphical interface using only blocks of text.
Since you use Ubuntu, try running the "aptitude" command to see a good example of what ncurses can do.

More Control Over C Standard Output

I can't seem to find the right way to ask the almighty google...
In such programs as a command-line progress bar, the output buffer seems to be directly manipulated. It can't print a character to a terminal in any place it wants. How is such control over a program's output controlled in standard C? Is there a special library that I can look up?

look at curses
it`s a lib for unix/linux

If you just want a progress bar, you can just print a single 'X' for every 2% completion. This should fill 50 characters on a line.
If you want something more fancy, on Linux you can try the classic "curses" library, or if you just want a dialog box, you could try the library that the Debian install utilities use, but I forget its name.

you can do the progress bar with \r check this
for doing more advanced stuff you can use ncurses

It's not part of standard C. These things work by writing some special character sequences that are recognized by the terminal emulator which takes care of the cursor positioning and stuff.

The big gorilla here is the ncurses library, but you can do a lot of cool stuff with less learning curve. Try using \r to move to beginning of line and using simple control sequences to clear to end of line, turn bold on and off, etcetera. The tput(1) command is invaluable. For example, I wrote a simple application the does highlighted text, and to turn highlighting on and off I just called the commands tput smso and tput rmso. You can capture the results using C with popen(3); using a shell it is even easier.

You could use ANSI escape coding to control the termincal output. This is how a lot of MUD games do their output.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight