Why use `code' for embracing code in a comment? - c

I just read some glibc 2.22 source code (the source file at /sysdeps/posix/readdir.c) and came across this comment:
/* The only version of `struct dirent*' that lacks `d_reclen' is fixed-size. */
(Newline removed.)
The weird emphasis of the type and identifier bugs me. Why not use just single-quotes or des accents graves? Is there some specific reason behind this? Might it be some character set conversion mistake?
I also searched the glibc style guide but didn't found anything concerning code formatting in comments.

Convention.
As you no doubt know, comments are ignored by the C compiler. They make no difference, but the developer who wrote that comment probably preferred their appearance to plain single quotes.
ASCII
Using non-ASCII characters (unicode) in source code is a relatively new practice (moreso when English-authored source code is concerned), and there are still compatibility issues remaining in many programming language implementations. Unicode in program input/output is a different thing entirely (and that isn't perfect either). In program source code, unicode characters are still quite uncommon, and I doubt we'll see them make much of an appearance in older code like the POSIX header files for some time, yet.
Source code filters
There are some source code filters, such as document generation packages like the the well-known Javadoc, that look for specific comment strings, such as /** to open a comment. Some of these programs may treat your `quoted strings' specially, but that quoting convention is older than most (all?) of the source code filters that might give them special treatment, so that's probably not it.
Backticks for command substutution
There is a strong convention in many scripting languages (as well as StackExchange markdown!) to use backticks (``) to execute commands and include the output, such as in shell scripts:
echo "The current directory is `pwd`"
Which would output something like:
The current directory is /home/type_outcast
This may be part of the reason behind the convention, however I believe Cristoph has a point as well, about the quotes being balanced, similar to properly typeset opening and closing quotation marks.
So, again, and in a word: `convention'.

This goes back to early computer fonts, where backtick and apostrophe were displayed as mirror images. In fact, early versions of the ASCII standard blessed this usage.
Paraphrased from RFC 20, which is easier to get at than the actual standards like USAS X3.4-1968:
Column/Row Symbol Name
2/7 ' Apostrophe (Closing Single Quotation Mark Acute Accent)
6/0 ` Grave Accent (Opening Single Quotation Mark)
This heritage can also be seen in tools like troff, m4 and TeX, which also used this quoting style originally.
Note that syntactically, there is a benefit to having different opening and closing marks: they can be nested properly.

Related

C placing cursor in a console - explanation?

I have been trying to find a documentation explaining the use of escape sequences but have not been successful.
For instance, I know that I can use
printf("%c[%d;%df",0x1B, y, x);
for placing the cursor to a certain position in the console.
But where I would find an explanation for this and other escape sequences. As said, I've been looking through the internet, there are a lot of articles explaining that you can escape sequences for different things but not found one with a list of available functions.
It would be great if anyonw knew where I can find this. Thanks for all answers!
Update after some answer:
I am aware of the wikipedia page. It e.g. mentions the above possibility but not really explained in the table of CSIs.
What I am looking for is something like
ESC[<l>;<c>f => move cursor to line "l" and column "c"
ESC[<l>;<c>H => move cursor to line "l" and column "c"
and explanation of other ESC...
I am not looking for formatting possibilities of printf (but thanks anyway for all answers)
where I would find an explanation for this and other escape sequences
Wikipedia has a quite extensive list https://en.wikipedia.org/wiki/ANSI_escape_code . The standard is ECMA-48 (and it's horrible to read), but it's old, and I think there are some new escape sequences "in the wild".
but not found one with a list of available functions.
There is no list, or the closest you can get is ECMA-48. Each and every terminal (well, nowadays, terminal emulators) has different support for ANSI escape sequences, and this list is not strict, developers add support for new escape sequences, and terminals sometimes support their own escape sequences. There are endless terminals and emulators and versions of them. The terminfo database was created to deal with compatibility issues between ANSI escape codes between terminals.
The escape sequences are different for each terminal type as a general rule. In the past, each terminal brand used (and published) their own set of escape sequences and they were in generla incompatible.
With time, DEC (Digital Equipment Corporation) imposed their set for several reasons:
Their terminals where the most extended and popular ones (vt100, vt200, vt220, vt420, etc.)
All their models shared the same specification.
PDP-11 and later the VAX mainly where sold with these terminals.
For these reasons, the escape sequences of DEC terminals became an standard and slowly all the software adapted to them.
At the same time, some software tools started to use full screen applications, and addressed the problem of using different terminals. This resulted in the unix environments in a library (curses) that allowed the user to have almost any terminal type with addressable cursor and display features to be possible to use with almost any application. Curses was written to support vi(1) but later, it has been successfuly used in many other programs.
Escape sequences became standarized, and the standard (ANSI X3.64 (ISO 6429)) became a de-facto standard in almost any application that was not designed using the curses library. This standard covers just a subset of the full set of escapes that DEC terminals implement (mainly because the sequences to multiplex several sessions in the same terminal is a patented ---and not published--- set of commands, protected by copyright rules).
ECMA has also standarized escape sequences, as answered in another answer to this question.
But, if you actually want to be completely terminal agnostic, you had better to use some curses-like (e.g. ncurses, which is also opensource) library in order to cope with the large database of terminals that have different and incompatible escape sequences. For example, Hewlett Packard terminals have a completely different language for expressing escape codes, and so, escape sequences for HP terminals are completely different than the ones from DEC.
Look at ANSI wikipedia page for a medium to full list of these escapes, and for other links related to documentation of these escapes.

Print Unicode characters in C, using ncurses

I have to draw a box in C, using ncurses;
First, I have defined some values for simplicity:
#define RB "\e(0\x6a\e(B" (ASCII 188,Right bottom, for example)
I have compiled with gcc, over Ubuntu, with -finput-charset=UTF-8 flag.
But, if I try to print with addstr or printw, I get the hexa code.
What I`m doing wrong?
ncurses defines the values ACS_HLINE, ACS_VLINE, ACS_ULCORNER, ACS_URCORNER, ACS_LLCORNER and ACS_LRCORNER. You can use those constants in addch and friends, which should result in your seeing the expected box characters. (There's lots more ACS characters; you'll find a complete list in man addch.)
ncurses needs to know what it is drawing because it needs to know exactly where the cursor is all the time. Outputting console control sequences is not a good idea; if ncurses knows how to handle the sequence, it has its own abstraction for the feature and you should use that abstraction. The ACS ("alternate character set") defines are one of those abstractions.
A few issues:
if your program writes something like "\e(0\x6a\e(B" using addstr, then ncurses (any curses implementation) will translate the individual characters to printable form as described in the addch manual page.
ncurses supports line-drawing for commonly-used pseudo-graphics using symbols (such as ACS_HLINE) which are predefined characters with the A_ALTCHARSET attribute combined. You can read about those in the Line Graphics section of the addch manual page.
the code 0x6a is ASCII j, which (given a VT100-style mapping) would be the lower left corner. The curses symbol for that is ACS_LRCORNER.
you cannot write the line-drawing characters with addstr; instead addch, addchstr are useful. There are also functions oriented to line-drawing (see box and friends).
running in Ubuntu, your locale encoding is probably UTF-8. To make your program work properly, it should initialize the locale as described in the Initialization section of the ncurses manual page. In particular:
setlocale(LC_ALL, "");
Also, your program should link against the ncursesw library (-lncursesw) to use UTF-8, rather than just ncurses (-lncurses).
when compiling on Ubuntu, to use the proper header definitions, you should define _GNU_SOURCE.
BTW, maybe I'm probably arriving somewhat late to the party but I'll give you some insight that might or not shed some light and skills for your "box drawing" needs.
As of 2020 I'm involved in a funny project on my own mixing Swift + Ncurses (under OSX for now, but thinking about mixing it with linux). Apparently it works flawlessly.
The thing is, as I'm using Swift, internally it all reduces to "importing .h and .c" files from some Darwin.ncurses library the MacOS Xcode/runtime offers.
That means (I hope) my newly acquired skills might be useful for you because apparently we're using the very same .h and .c files for our ncurses needs. (or at least they should be really similar)
Said that:
As of now, I "ignored" ACS_corner chars (I can't find them under swift/Xcode/Darwin.ncurses runtime !!!) in favour of pure UTF "corner chars", which also exist in the unicode pointspace, look:
https://en.wikipedia.org/wiki/Box-drawing_character
What does it mean? Whenever I want to use some drawing box chars around I just copy&paste pure UTF-8 chars into my strings, and I send these very strings onto addstr.
Why does it work? Because as someone also answered above, before initializing ncurses with initscr(), I just claimed "I want a proper locale support" in the form of a setlocale(LC_ALL, ""); line.
What did I achieve? Apparently pure magic. And very comfortable one, as I just copy paste box chars inside my normal strings. At least under Darwin.ncurses/OSX Mojave I'm getting, not only "bounding box chars", but also full UTF8 support.
Try the "setlocale(LC_ALL, ""); initscr();" approach and tell us if "drawing boxes" works also for you under a pure C environment just using UTF8 bounding box chars.
Greetings and happy ncursing!

Does Comments/Identifiers can impact on code performance/operability?

Today i was presented with a wiered fact (or not)
it was said:
"At it is disallowed to write long, descriptive identifier names, and forbidden to write Comments for Linux Drivers written in ANSI C."
When i asked "WTF? Why?" i was told it caused performence issues and errors of such...
not many details there.
I am supprised, but have to ask...
Can this be real?
knowing that Comments are stripped by the compilation pre-processor,
and that Identifiers are either way converted to adresses.
so... Can it cause Problems ?
Well, ANSI C is a standard, and a standard is something itself that everyone must follow (I mean compiler designers and programmers, if they decide to support it).
ANSI C standard states that exported identifiers (yeah, exported identifiers are stored as symbols in symbols table as is, not just addresses) must not be longer than 6 characters, and non-exported identifiers are ok to be not longer than 31 character.
On commenting. Except some obvious pitfalls like accidental code swallowing by multi-line commenting, I recommend you to read Coding Style article for Kernel developers which explains what kind of comments are not encouraged.
Absolutely not. Whatever identifier you used in your code, they will be translated to symbols by compiler.
Also, all comments will be ignored by the compilation pre-processor.
The only effect of comments are help you understand code more quickly .
The only performance impact comments can have is during compile time, though I would say it is neglectable, unless you write whole books as comments.
The identifer names are translated to symbols, so there is also, at best, a performance impact at compiletime, which again is neglectable. Identifer names might hit a maximum limit, but to be honest, I never encountered a problem because of to long identifier names.
No, the first step in the compilation is pre-process your source code to remove comments and do other tricks like expanding macros.
Identifiers are often translated into pointers (to symbol table entries).

unwanted output of printf

I am working on a legacy source code for computing data.
In order to debug few error conditions I have added the following printf in the code
printf("What???!!!!....\n");
The logs were maintained in a file and I was searching for the string "What???!!!!...." but II never found this because the output of it was coming as:
What??|!!!....
I have already wasted lot of time because of this unwanted output.
Can someone please help me to identify the reason for this?
the output is related to trigraph,
the string
??! corresponds to |
Check your makefile for -trigraphs
Make sure to have more sensible prints now-on :-)
In the olden days, keyboards didn't necessarily include all of the characters required to write C programs. To allow those without the right keyboard to program, the earliest versions of C compilers used trigraphs and digraphs, uncommon two- or three-character combinations that would translate directly to a possibly absent key. Here is a list of digraphs and trigraphs for C:
http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C
??! is in the list, and it translates to | in the preprocessor.
One way to fix this is in the article I linked above: Separate the question marks with a \, or close the string and reopen it between the question marks. This is likely your best choice, being that you're working with legacy code.
Often, you can also disable digraphs and trigraphs with compiler switches. Consult your documentation for those details.
Trigraphs(3 character sequences) and Digraphs(2 character sequences) were added in C to help someone type some characters that are outside the ISO 646 character set, and don't have keyboard compliant to that.
Here's a paragraph from the Diagraph and Trigraphs Wiki page which specifies it clearly:
The basic character set of the C programming language is a subset of
the ASCII character set that includes nine characters which lie
outside the ISO 646 invariant character set. This can pose a problem
for writing source code when the encoding (and possibly keyboard)
being used does not support any of these nine characters. The ANSI C
committee invented trigraphs as a way of entering source code using
keyboards that support any version of the ISO 646 character set.
To print those two question marks, you can escape the 2nd one, or use string concatenation:
printf("What??\?!!!!....\n");
printf("What??" "?!!!!....\n);

How do you compare two files containing C code based on code structure, not merely textual differences?

I have two files containing C code which I wish to compare. I'm looking for a utility which will construct a syntax tree for each file, and compare the syntax trees, instead of merely comparing the text of the files. This way minor differences in formatting and style will be ignored. It would be nice to even be able to tell the comparison tool to ignore differences such as variable names, etc.
Correct me if I'm wrong, but diff doesn't have this capability. I'm a Ubuntu user. Thanks!
Our SD Smart Differencer does exactly what you want. It uses compiler-quality parsers to read source code and build ASTs for two files you select. It then compares the trees guided by the syntax, so it doesn't get confused by whitespace, layout or comments. Because it normalize the values of constants, it doesn't get confused by change of radix or how you expressed escape sequences!
The deltas are reported at the level of the langauge constructs (variable, expression, statement, declaration, function, ...) in terms of programmer intent (delete, insert, copy, move) complete with determining that an identifier has been renamed consistently throughout a changed block.
The SmartDifferencer has versions available for C (in a number of dialects; if you compiler-accurate parse, the langauge dialect matters) was well as for C++, Java, C#, JavaScript, COBOL, Python and many other langauges.
If you want to understand how a set of files are related to one another, our SD CloneDR will accept a very large set of files, and tell you what they have in common. It finds code that has been copy-paste-edited across the entire set. You don't have to tell it what to look for; it finds it automatically. Using ASTs (as above), it isn't fooled by whitespace changes or renames of identifiers. There's a bunch of sample clone detection reports for various languages at the web site.
There is a program called codeCompare from devart (http://www.devart.com/codecompare/benefits.html#cc) that includes the following feature (I know it is not exactly what you asked for but probably it can be used for that).
The feature is called "Structure Comparison"
This functionality allows you to compare different file revision by the presense of the structure blocks (classes, fields, methods). At that different versions of the same file are compared independently from their destination.
Structure comparison can be applied to the following languages:
C#
C++
Visual Basic
JavaScript
(I know it does not include C, but maybe with the C++ version you can solve the problem)

Resources