I'm here looking at some C source code and I've found this:
fprintf(stderr, _("Try `%s --help' for more information.\n"), command);
I already saw the underscore when I had a look at wxWidget, and I read it's used for internationalization. I found it really horrible (the least intutive name ever), but I tought it's just another weird wxWidget convention.
Now I find it again in some Alsa source. Does anyone know where it comes from?
It comes from GNU gettext, a package designed to ease the internationalization process. The _() function is simply a string wrapper. This function basically replaces the given string on runtime with a translation in the system's language, if available (i.e. if they shipped a .mo file for this language with the program).
It comes from gettext. Originally thought out, internationalization was too long to type each time you needed a string internationalized. So programmers created the shortcut i18n (because there are 18 letters in between the 'i' and the 'n' in internationalization) and you may see source code out there using that. Apparently though i18n was still too long, so now its just an underscore.
That would be from gettext
Related
I just read some glibc 2.22 source code (the source file at /sysdeps/posix/readdir.c) and came across this comment:
/* The only version of `struct dirent*' that lacks `d_reclen' is fixed-size. */
(Newline removed.)
The weird emphasis of the type and identifier bugs me. Why not use just single-quotes or des accents graves? Is there some specific reason behind this? Might it be some character set conversion mistake?
I also searched the glibc style guide but didn't found anything concerning code formatting in comments.
Convention.
As you no doubt know, comments are ignored by the C compiler. They make no difference, but the developer who wrote that comment probably preferred their appearance to plain single quotes.
ASCII
Using non-ASCII characters (unicode) in source code is a relatively new practice (moreso when English-authored source code is concerned), and there are still compatibility issues remaining in many programming language implementations. Unicode in program input/output is a different thing entirely (and that isn't perfect either). In program source code, unicode characters are still quite uncommon, and I doubt we'll see them make much of an appearance in older code like the POSIX header files for some time, yet.
Source code filters
There are some source code filters, such as document generation packages like the the well-known Javadoc, that look for specific comment strings, such as /** to open a comment. Some of these programs may treat your `quoted strings' specially, but that quoting convention is older than most (all?) of the source code filters that might give them special treatment, so that's probably not it.
Backticks for command substutution
There is a strong convention in many scripting languages (as well as StackExchange markdown!) to use backticks (``) to execute commands and include the output, such as in shell scripts:
echo "The current directory is `pwd`"
Which would output something like:
The current directory is /home/type_outcast
This may be part of the reason behind the convention, however I believe Cristoph has a point as well, about the quotes being balanced, similar to properly typeset opening and closing quotation marks.
So, again, and in a word: `convention'.
This goes back to early computer fonts, where backtick and apostrophe were displayed as mirror images. In fact, early versions of the ASCII standard blessed this usage.
Paraphrased from RFC 20, which is easier to get at than the actual standards like USAS X3.4-1968:
Column/Row Symbol Name
2/7 ' Apostrophe (Closing Single Quotation Mark Acute Accent)
6/0 ` Grave Accent (Opening Single Quotation Mark)
This heritage can also be seen in tools like troff, m4 and TeX, which also used this quoting style originally.
Note that syntactically, there is a benefit to having different opening and closing marks: they can be nested properly.
I have to draw a box in C, using ncurses;
First, I have defined some values for simplicity:
#define RB "\e(0\x6a\e(B" (ASCII 188,Right bottom, for example)
I have compiled with gcc, over Ubuntu, with -finput-charset=UTF-8 flag.
But, if I try to print with addstr or printw, I get the hexa code.
What I`m doing wrong?
ncurses defines the values ACS_HLINE, ACS_VLINE, ACS_ULCORNER, ACS_URCORNER, ACS_LLCORNER and ACS_LRCORNER. You can use those constants in addch and friends, which should result in your seeing the expected box characters. (There's lots more ACS characters; you'll find a complete list in man addch.)
ncurses needs to know what it is drawing because it needs to know exactly where the cursor is all the time. Outputting console control sequences is not a good idea; if ncurses knows how to handle the sequence, it has its own abstraction for the feature and you should use that abstraction. The ACS ("alternate character set") defines are one of those abstractions.
A few issues:
if your program writes something like "\e(0\x6a\e(B" using addstr, then ncurses (any curses implementation) will translate the individual characters to printable form as described in the addch manual page.
ncurses supports line-drawing for commonly-used pseudo-graphics using symbols (such as ACS_HLINE) which are predefined characters with the A_ALTCHARSET attribute combined. You can read about those in the Line Graphics section of the addch manual page.
the code 0x6a is ASCII j, which (given a VT100-style mapping) would be the lower left corner. The curses symbol for that is ACS_LRCORNER.
you cannot write the line-drawing characters with addstr; instead addch, addchstr are useful. There are also functions oriented to line-drawing (see box and friends).
running in Ubuntu, your locale encoding is probably UTF-8. To make your program work properly, it should initialize the locale as described in the Initialization section of the ncurses manual page. In particular:
setlocale(LC_ALL, "");
Also, your program should link against the ncursesw library (-lncursesw) to use UTF-8, rather than just ncurses (-lncurses).
when compiling on Ubuntu, to use the proper header definitions, you should define _GNU_SOURCE.
BTW, maybe I'm probably arriving somewhat late to the party but I'll give you some insight that might or not shed some light and skills for your "box drawing" needs.
As of 2020 I'm involved in a funny project on my own mixing Swift + Ncurses (under OSX for now, but thinking about mixing it with linux). Apparently it works flawlessly.
The thing is, as I'm using Swift, internally it all reduces to "importing .h and .c" files from some Darwin.ncurses library the MacOS Xcode/runtime offers.
That means (I hope) my newly acquired skills might be useful for you because apparently we're using the very same .h and .c files for our ncurses needs. (or at least they should be really similar)
Said that:
As of now, I "ignored" ACS_corner chars (I can't find them under swift/Xcode/Darwin.ncurses runtime !!!) in favour of pure UTF "corner chars", which also exist in the unicode pointspace, look:
https://en.wikipedia.org/wiki/Box-drawing_character
What does it mean? Whenever I want to use some drawing box chars around I just copy&paste pure UTF-8 chars into my strings, and I send these very strings onto addstr.
Why does it work? Because as someone also answered above, before initializing ncurses with initscr(), I just claimed "I want a proper locale support" in the form of a setlocale(LC_ALL, ""); line.
What did I achieve? Apparently pure magic. And very comfortable one, as I just copy paste box chars inside my normal strings. At least under Darwin.ncurses/OSX Mojave I'm getting, not only "bounding box chars", but also full UTF8 support.
Try the "setlocale(LC_ALL, ""); initscr();" approach and tell us if "drawing boxes" works also for you under a pure C environment just using UTF8 bounding box chars.
Greetings and happy ncursing!
I have a .so library and I want to replace a string hardcoded inside it by another one which is longer in its length. Is it possible?
If you have the source and can recompile the library: fine. go for it.
If you mean via a hex editor or similar: Very dangerous to try.
Adding one char might work depending on how much padding etc is applied (possibly none, so even adding 1 char will break). The more you add the more chance it will fail.
Assuming "without source", I think the real answer is "No".
If the variable has a symbol and is always looked up by it, you could LD_PRELOAD a small library that just exports that symbol.
Alternatively, for a oneshot technique, you could load the program under gdb and set it (which will implicitly malloc a string for you).
Yes, you can, but it isn't easy. (I've done similar things using each of the approaches below).
Option 1: Sometimes, the string that follows the target string isn't used, or isn't very commonly used. For example, it might be an error message string. In that case, you can just write over it, and hope that whatever used that string isn't going to break if it sees something else (namely, the tail end of your new string).
Option 2: You can use a disassembler (like IDA) to locate uses of the string in the program, and rewrite those to point at a new region of the binary. Then you can write your new string in that new area. This isn't nearly as bad as you might think, especially if you have a good disassembler that can show you references to the data section.
GHC insist that the module name has to equal the file name. But if they are the same, then why does a Haskell compiler need both? Seems redundant for me. Is this only a language design mistake?
Beside the inconvinience it also raises the problem that if I want to use 2 libraries that accidentially have the same top module name, then I can not disambiguate simply by renaming the folder of one of them. What is the idiomatic solution to this problem?
The Haskell language specification doesn't talk about files. It only talks about modules and their syntax. So there's clearly no language design mistake.
The GHC compiler (and many others) chose to follow a pattern of one module per file, and searching for modules in files with matching names. Seems like a decent strategy to me. Otherwise you'd need to provide the compiler with some mapping from module name to file name or an explicit list of every file in use.
I would say that one of the big reasons is that you don't always want the module name to be path to the file appended with the file name. This is the same as with Java, C#, and many other languages that prefer an explicit namespace declaration in the source code, explicit is better than implicit in many cases. It gives the programmer maximum control over their filenames without tying it to the filename only.
Imagine that I was a Japanese Haskell programmer, and my OS used Japanese characters for file names. I can write my source code using Japanese characters where possible, but I also want to export an API that uses ASCII characters. If module name and filename had to be identical, this would be impossible and would make it very difficult for people in other countries to use my library.
And as #chi has pointed out, if you have two packages with conflicting module names (a very rare occurrence in my experience), you can always use package-qualified imports.
The Haskell language specification requires that modules are started by a module header, and it does not mention files - it leaves total freedom for the implementing compilers regarding files. So the Haskell language lacks the ability to express where files containing modules are. Because of this some compilers [including the most important one: GHC] use a simple solution : the name of the module must match the path from an include directory to the file. This introduced the redundancy.
To avoid the redundancy, the compilers could drop the requirement in the language specification to start each module by a header. However they chose not to do this simply for the sake of confirming to the specification. Perhaps a GHC language extension could do this, but currently there is no such extension.
So the problem is a language design mistake, and lives on as legacy.
To combat possible name collisions among independent libraries, GHC extension Package-qualified imports seems the best way.
I'm here looking at some C source code and I've found this:
fprintf(stderr, _("Try `%s --help' for more information.\n"), command);
I already saw the underscore when I had a look at wxWidget, and I read it's used for internationalization. I found it really horrible (the least intutive name ever), but I tought it's just another weird wxWidget convention.
Now I find it again in some Alsa source. Does anyone know where it comes from?
It comes from GNU gettext, a package designed to ease the internationalization process. The _() function is simply a string wrapper. This function basically replaces the given string on runtime with a translation in the system's language, if available (i.e. if they shipped a .mo file for this language with the program).
It comes from gettext. Originally thought out, internationalization was too long to type each time you needed a string internationalized. So programmers created the shortcut i18n (because there are 18 letters in between the 'i' and the 'n' in internationalization) and you may see source code out there using that. Apparently though i18n was still too long, so now its just an underscore.
That would be from gettext