I´ve seen this in many popular C-Projects e.g the Go language and nowhere i can find some information about it. I think it is a kind of namespacing but i thought C doesn´t support it.
e.g
void runtime·memhash(uintptr*, uintptr, void*);
Thanks.
· is not a part of the "basic execution character set", and thus is not a standard C operator.
However, it does appear that the C standard allows it as an implementation-defined identifier character. It has no special meaning; it's just another character.
Related
So I'm completely new to programming. I currently study computer science and have just read the first 200 pages of my programming book, but there's one thing I cannot seem to see the difference between and which havn't been clearly specified in the book and that's reserved words vs. standard identifiers - how can I see from code if it's one or the other.
I know the reserved words are some that cannot be changed, while the standard indentifiers can (though not recommended according to my book). The problem is while my book says reserved words are always in pure lowercase like,
(int, void, double, return)
it kinda seems to be the very same for standard indentifier like,
(printf, scanf)
so how do I know when it is what, or do I have to learn all the reserved words from the ANSI C, which is the current language we are trying to learn, (or whatever future language I might work with) to know when it is when?
First off, you'll have to learn the rules for each language you learn as it is one of the areas that varies between languages. There's no universal rule about what's what.
Second, in C, you need to know the list of keywords; that seems to be what you're referring to as 'reserved words'. Those are important; they're immutable; they can't be abused because the compiler won't let you. You can't use int as a variable name; it is always a type.
Third, the C preprocessor can be abused to hijack anything; if you compile with #define double int in effect, you get what you deserve, but there's nothing much to stop you doing that.
Fourth, the only predefined variable name is __func__, the name of the current function.
Fifth, names such as printf() are defined by the standard library, but the standard library has to be implemented by someone using a C compiler; ask the maintainers of the GNU C library. For a discussion of many of the ideas behind the treaty between the standard and the compiler writers, and between the compiler writers and the programmers using a compiler, see the excellent book The Standard C Library by P J Plauger from 1992. Yes, it is old and the modern standard C library is somewhat bigger than the one from C90, but the background information is still valid and very helpful.
Reserved words are part of the language's syntax. C without int is not C, but something else. They are built into the language and are not and cannot be defined anywhere in terms of this particular language.
For example, if is a reserved keyword. You can't redefine it and even if you could, how would you do this in terms of the C language? You could do that in assembly, though.
The standard library functions you're talking about are ordinary functions that have been included into the standard library, nothing more. They are defined in terms of the language's syntax. Also, you can redefine these functions, although it's not advised to do so as this may lead to all sorts of bugs and unexpected behavior. Yet it's perfectly valid to write:
int puts(const char *msg) {
printf("This has been monkey-patched!\n");
return -1;
}
You'd get a warning that'd complain about the redefinition of a standard library function, but this code is valid anyway.
Now, imagine reimplementing return:
unknown_type return(unknown_type stuff) {
// what to do here???
}
I was looking at the runtime.c file in the go runtime at
/usr/local/go/src/pkg/runtime
and saw the following function definitions:
void
runtime∕pprof·runtime_cyclesPerSecond(int64 res)
{...}
and
int64
runtime·tickspersecond(void)
{...}
and there are a lot of declarations like
void runtime·hashinit(void);
in the runtime.h.
I haven't seen this C syntax before (specially the one with the slash seems odd).
Is this part of std C or some plan9 dialect?
It's Go's special internal syntax for Go package paths. For example,
runtime∕pprof·runtime_cyclesPerSecond
is function runtime_cyclesPerSecond in package path runtime∕pprof.
The '∕' character is the Unicode division slash character, which separates path elements. The '·' character is the Unicode middle dot character, which separates the package path and the function.
∕ and · and friends are merely random Unicode characters that someone decided to put in function names. Obscure Unicode characters (edit: that are listed in Annex D of the C99 standard (pages 452-453 of this PDF); see also here) are just as legal in C identifiers as A or 7 (in your average Unicode-capable compiler, anyway).
Char| Hex| Octal|Decimal|Windows Alt-code
----+------+------+-------+----------------
∕ |0x2215|021025| 8725| (null)
· | 0xB7| 0267| 183| Alt+0183
Putting characters that look like operators but aren't (U+2215 ∕, in particular, resembles U+2F / (division) far too closely) in function names can be a confusing practice, so I would personally advise against it. Obviously someone on the Go team decided that whatever reasons they had for including them in function names outweighed the potential for confusion.
(Edit: It should be noted that U+2215 ∕ isn't expressly permitted by Annex D. As discussed here, this may be an extension.)
I know there are several language extensions added in the GNU C compiler (aka gcc).
I can read something about that here.
What I'm looking for is deeper and wider documentation about those topics.
For example I'd like to read more about _Static_assert(), typeof and the likes.
Maybe it's just my fault, but I cannot find such an official documentation. Any hint? TIA!
The answer is http://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html and you're not finding about static assertions because it's not an extension of the C language, it's a core, built-in, standardized part of the language and described in the language international standards. In this case, refer to the C specification:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
See section 6.7.10 Static assertions, in particular paragraph 3:
"The constant expression shall be an integer constant expression. If the value of the
constant expression compares unequal to 0, the declaration has no effect. Otherwise, the
constraint is violated and the implementation shall produce a diagnostic message that
includes the text of the string literal, except that characters not in the basic source
character set are not required to appear in the message."
Here: http://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html.
Use Google to search inside gnu.org. Found it by typing this search in Google: c extensions site:gnu.org.
I saw a line of C that looked like this:
!ErrorHasOccured() ??!??! HandleError();
It compiled correctly and seems to run ok. It seems like it's checking if an error has occurred, and if it has, it handles it. But I'm not really sure what it's actually doing or how it's doing it. It does look like the programmer is trying express their feelings about errors.
I have never seen the ??!??! before in any programming language, and I can't find documentation for it anywhere. (Google doesn't help with search terms like ??!??!). What does it do and how does the code sample work?
??! is a trigraph that translates to |. So it says:
!ErrorHasOccured() || HandleError();
which, due to short circuiting, is equivalent to:
if (ErrorHasOccured())
HandleError();
Guru of the Week (deals with C++ but relevant here), where I picked this up.
Possible origin of trigraphs or as #DwB points out in the comments it's more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory.
From ISO/IEC 9899:1999 §5.2.1.1, footnote 12 (h/t #Random832):
The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as
described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.
Well, why this exists in general is probably different than why it exists in your example.
It all started half a century ago with repurposing hardcopy communication terminals as computer user interfaces. In the initial Unix and C era that was the ASR-33 Teletype.
This device was slow (10 cps) and noisy and ugly and its view of the ASCII character set ended at 0x5f, so it had (look closely at the pic) none of the keys:
{ | } ~
The trigraphs were defined to fix a specific problem. The idea was that C programs could use the ASCII subset found on the ASR-33 and in other environments missing the high ASCII values.
Your example is actually two of ??!, each meaning |, so the result is ||.
However, people writing C code almost by definition had modern equipment,1 so my guess is: someone showing off or amusing themself, leaving a kind of Easter egg in the code for you to find.
It sure worked, it led to a wildly popular SO question.
ASR-33 Teletype
1. For that matter, the trigraphs were invented by the ANSI committee, which first met after C become a runaway success, so none of the original C code or coders would have used them.
It's a C trigraph. ??! is |, so ??!??! is the operator ||
As already stated ??!??! is essentially two trigraphs (??! and ??! again) mushed together that get replaced-translated to ||, i.e the logical OR, by the preprocessor.
The following table containing every trigraph should help disambiguate alternate trigraph combinations:
Trigraph Replaces
??( [
??) ]
??< {
??> }
??/ \
??' ^
??= #
??! |
??- ~
Source: C: A Reference Manual 5th Edition
So a trigraph that looks like ??(??) will eventually map to [], ??(??)??(??) will get replaced by [][] and so on, you get the idea.
Since trigraphs are substituted during preprocessing you could use cpp to get a view of the output yourself, using a silly trigr.c program:
void main(){ const char *s = "??!??!"; }
and processing it with:
cpp -trigraphs trigr.c
You'll get a console output of
void main(){ const char *s = "||"; }
As you can notice, the option -trigraphs must be specified or else cpp will issue a warning; this indicates how trigraphs are a thing of the past and of no modern value other than confusing people who might bump into them.
As for the rationale behind the introduction of trigraphs, it is better understood when looking at the history section of ISO/IEC 646:
ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry.
As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones.
(emphasis mine)
So, in essence, some needed characters (those for which a trigraph exists) were replaced in certain national variants. This leads to the alternate representation using trigraphs comprised of characters that other variants still had around.
While cruising through my white book the other day, I noticed in the list of C keywords.
entry is one of the keywords on that list.
It is reserved for future use. Thinking back to my Fortran days, there was a function of some sort that used an entry statement to make a second argument signature, or entry point into a function.
Is this what entry was originally intended to be used for? or something completely different?
What is the story on the entry keyword?
I had no idea, so I googled to find something about this. This is what I found.
First, it was included as a reserved keyword.
Q: What was the entry keyword mentioned in K&R1?
A: It was reserved to allow functions with multiple, differently-named entry points, but it has been withdrawn.
(From http://archives.devshed.com/forums/c-c-134/c-programming-faqs-371017.html.)
It was never standardized; some compilers used it, in a very personal way.
It was later declared obsolete, I guess.
In FORTRAN, "ENTRY" could declare a second entry point into a subroutine. It was a structured programming nightware, and fortunately C decided not to adopt it.
The entry keyword came from PL/I and allowed multiple entry points into a function. The keyword was implemented by some compilers but was never standardized.
To complement the accepted answer 'entry' is mentioned in K&R1:
2.3 Keywords
The following identifiers are reserved for use as keywords, and may not be used otherwise
int extern else
char register for
float typedef do
double static while
struct goto switch
union return case
long sizeof default
short break entry
unsigned continue
auto if
and here:
The entry keyword is not currently implemented by any compiler but is
reserved for future use. Some implementations also reserve the words 'fortran'
and 'asm'.
Then in the Rationale for the ANSI C language (C89) it is mentioned here:
3.1.1 Keyword
[...]
The keywords 'entry' 'fortran', and 'asm' have not been included since they were either never used, or are not portable. Uses of 'fortran' and 'asm' as keywords are not as common extensions.