Does any C standard header include ASCII aliases? - c

Would help to avoid doing things like,
#define ESC (27)
#define DEL (127)
Edit: Looking for either a C standard header or a POSIX C header with this.

Sadly, no. C is described abstractly in terms of execution character set and implementation character set, both of which may vary. The characters it uses are not the complete set offered by ASCII. In fact the version of ascii current at the time of the first C compilers didn't even have '#' yet.
Searching for "posix character set" turned this up. http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap06.html

Related

What is the meaning of char foo(|10|) in C?

I'm a very experienced C programmer, but recently I came across some code on a mainframe that has a local variable. This is in a simple C function that declares this variable, and then strcpy / strcats two strings into it, and then tries an fopen.
char foo(|10|);
This code is very old. Possibly even K&R C old. I'm wondering if this is some obscure compiler extension or an adaptation to a keyboard that doesn't have [] or something like that.
Anyone know if this declaration is 'special'?
This is a standard Z/OS mainframe. I'm not sure what compiler is used.
It seems to be an early or non-standard form of digraph. The code was probably written using EBCDIC instead of ASCII, and EBCDIC doesn't have [ ] characters (at least not in all code pages).
I found the manual for SAS/C, a C compiler apparently meant for System/370. On page 2-10 (page 42 of the pdf) you can see they list (| |) as "alternate forms" for [ ].
(Though apparently | is not in all the code pages either; but maybe it was in a code page that was more commonly used? I don't know.)
C99 also included digraphs (and trigraphs) to solve the same problem, but they used <: :> as the digraphs, and ??( ??) for the trigraphs.

Why use `code' for embracing code in a comment?

I just read some glibc 2.22 source code (the source file at /sysdeps/posix/readdir.c) and came across this comment:
/* The only version of `struct dirent*' that lacks `d_reclen' is fixed-size. */
(Newline removed.)
The weird emphasis of the type and identifier bugs me. Why not use just single-quotes or des accents graves? Is there some specific reason behind this? Might it be some character set conversion mistake?
I also searched the glibc style guide but didn't found anything concerning code formatting in comments.
Convention.
As you no doubt know, comments are ignored by the C compiler. They make no difference, but the developer who wrote that comment probably preferred their appearance to plain single quotes.
ASCII
Using non-ASCII characters (unicode) in source code is a relatively new practice (moreso when English-authored source code is concerned), and there are still compatibility issues remaining in many programming language implementations. Unicode in program input/output is a different thing entirely (and that isn't perfect either). In program source code, unicode characters are still quite uncommon, and I doubt we'll see them make much of an appearance in older code like the POSIX header files for some time, yet.
Source code filters
There are some source code filters, such as document generation packages like the the well-known Javadoc, that look for specific comment strings, such as /** to open a comment. Some of these programs may treat your `quoted strings' specially, but that quoting convention is older than most (all?) of the source code filters that might give them special treatment, so that's probably not it.
Backticks for command substutution
There is a strong convention in many scripting languages (as well as StackExchange markdown!) to use backticks (``) to execute commands and include the output, such as in shell scripts:
echo "The current directory is `pwd`"
Which would output something like:
The current directory is /home/type_outcast
This may be part of the reason behind the convention, however I believe Cristoph has a point as well, about the quotes being balanced, similar to properly typeset opening and closing quotation marks.
So, again, and in a word: `convention'.
This goes back to early computer fonts, where backtick and apostrophe were displayed as mirror images. In fact, early versions of the ASCII standard blessed this usage.
Paraphrased from RFC 20, which is easier to get at than the actual standards like USAS X3.4-1968:
Column/Row Symbol Name
2/7 ' Apostrophe (Closing Single Quotation Mark Acute Accent)
6/0 ` Grave Accent (Opening Single Quotation Mark)
This heritage can also be seen in tools like troff, m4 and TeX, which also used this quoting style originally.
Note that syntactically, there is a benefit to having different opening and closing marks: they can be nested properly.

Converting string in host character encoding to Unicode in C

Is there a way to portably (that is, conforming to the C standard) convert strings in the host character encoding to an array of Unicode code points? I'm working on some data serialization software, and I've got a problem because while I need to send UTF-8 over the wire, the C standard doesn't guarantee the ASCII encoding, so converting a string in the host character encoding can be a nontrivial task.
Is there a library that takes care of this kind of stuff for me? Is there a function hidden in the C standard library that can do something like this?
The C11 standard, ISO/IEC 9899:2011, has a new header <uchar.h> with rudimentary facilities to help. It is described in section ยง7.28 Unicode utilities <uchar.h>.
There are two pairs of functions defined:
c16rtomb() and mbrtoc16() โ€” using type char16_t aka uint_least16_t.
c32rtomb() and mbrtoc32() โ€” using type char32_t aka uint_least32_t.
The r in the name is for 'restartable'; the functions are intended to be called iteratively. The mbrtoc{16,32}() pair convert from a multibyte code set (hence the mb) to either char16_t or char32_t. The c{16,32}rtomb() pair convert from either char16_t or char32_t to a multibyte character sequence.
I'm not sure whether they'll do what you want. The <uchar.h> header and hence the functions are not available on Mac OS X 10.9.1 with either the Apple-provided clang or with the 'home-built' GCC 4.8.2, so I've not had a chance to investigate them. The header does appear to be available on Linux (Ubuntu 13.10) with GCC 4.8.1.
I think it likely that ICU is a better choice โ€” it is, however, a rather large library (but that is because it does a thorough job of supporting Unicode in general and different locales in general).

unwanted output of printf

I am working on a legacy source code for computing data.
In order to debug few error conditions I have added the following printf in the code
printf("What???!!!!....\n");
The logs were maintained in a file and I was searching for the string "What???!!!!...." but II never found this because the output of it was coming as:
What??|!!!....
I have already wasted lot of time because of this unwanted output.
Can someone please help me to identify the reason for this?
the output is related to trigraph,
the string
??! corresponds to |
Check your makefile for -trigraphs
Make sure to have more sensible prints now-on :-)
In the olden days, keyboards didn't necessarily include all of the characters required to write C programs. To allow those without the right keyboard to program, the earliest versions of C compilers used trigraphs and digraphs, uncommon two- or three-character combinations that would translate directly to a possibly absent key. Here is a list of digraphs and trigraphs for C:
http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C
??! is in the list, and it translates to | in the preprocessor.
One way to fix this is in the article I linked above: Separate the question marks with a \, or close the string and reopen it between the question marks. This is likely your best choice, being that you're working with legacy code.
Often, you can also disable digraphs and trigraphs with compiler switches. Consult your documentation for those details.
Trigraphs(3 character sequences) and Digraphs(2 character sequences) were added in C to help someone type some characters that are outside the ISO 646 character set, and don't have keyboard compliant to that.
Here's a paragraph from the Diagraph and Trigraphs Wiki page which specifies it clearly:
The basic character set of the C programming language is a subset of
the ASCII character set that includes nine characters which lie
outside the ISO 646 invariant character set. This can pose a problem
for writing source code when the encoding (and possibly keyboard)
being used does not support any of these nine characters. The ANSI C
committee invented trigraphs as a way of entering source code using
keyboards that support any version of the ISO 646 character set.
To print those two question marks, you can escape the 2nd one, or use string concatenation:
printf("What??\?!!!!....\n");
printf("What??" "?!!!!....\n);

C ANSI Escape Code

How can I do cursor control with ANSI using escape sequences using Turbo C? Here I've provided a code, but it's not yet working in my TurboC.
main()
{
while( getche() != '.' )
printf("\x1B[B");
}
Apart from the possibility that that output may be line buffered (meaning nothing may appear until you send a newline), you should probably also ensure that ANSI.SYS is loaded, since it's the device driver responsible for interpreting those sequences.
But I'm wondering why you're doing this. From memory (admittedly pretty faded memory), Turbo C has calls for doing this sort of thing, gotoXY and clrscr and such.
A way of putting escape character with printf() is:
printf("%c[B", 0x1b);
But usually (I don't know Turbo C), there are libraries for doing terminal related stuff in a portable way.

Resources