I saw a line of C that looked like this:
!ErrorHasOccured() ??!??! HandleError();
It compiled correctly and seems to run ok. It seems like it's checking if an error has occurred, and if it has, it handles it. But I'm not really sure what it's actually doing or how it's doing it. It does look like the programmer is trying express their feelings about errors.
I have never seen the ??!??! before in any programming language, and I can't find documentation for it anywhere. (Google doesn't help with search terms like ??!??!). What does it do and how does the code sample work?
??! is a trigraph that translates to |. So it says:
!ErrorHasOccured() || HandleError();
which, due to short circuiting, is equivalent to:
if (ErrorHasOccured())
HandleError();
Guru of the Week (deals with C++ but relevant here), where I picked this up.
Possible origin of trigraphs or as #DwB points out in the comments it's more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory.
From ISO/IEC 9899:1999 §5.2.1.1, footnote 12 (h/t #Random832):
The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as
described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.
Well, why this exists in general is probably different than why it exists in your example.
It all started half a century ago with repurposing hardcopy communication terminals as computer user interfaces. In the initial Unix and C era that was the ASR-33 Teletype.
This device was slow (10 cps) and noisy and ugly and its view of the ASCII character set ended at 0x5f, so it had (look closely at the pic) none of the keys:
{ | } ~
The trigraphs were defined to fix a specific problem. The idea was that C programs could use the ASCII subset found on the ASR-33 and in other environments missing the high ASCII values.
Your example is actually two of ??!, each meaning |, so the result is ||.
However, people writing C code almost by definition had modern equipment,1 so my guess is: someone showing off or amusing themself, leaving a kind of Easter egg in the code for you to find.
It sure worked, it led to a wildly popular SO question.
ASR-33 Teletype
1. For that matter, the trigraphs were invented by the ANSI committee, which first met after C become a runaway success, so none of the original C code or coders would have used them.
It's a C trigraph. ??! is |, so ??!??! is the operator ||
As already stated ??!??! is essentially two trigraphs (??! and ??! again) mushed together that get replaced-translated to ||, i.e the logical OR, by the preprocessor.
The following table containing every trigraph should help disambiguate alternate trigraph combinations:
Trigraph Replaces
??( [
??) ]
??< {
??> }
??/ \
??' ^
??= #
??! |
??- ~
Source: C: A Reference Manual 5th Edition
So a trigraph that looks like ??(??) will eventually map to [], ??(??)??(??) will get replaced by [][] and so on, you get the idea.
Since trigraphs are substituted during preprocessing you could use cpp to get a view of the output yourself, using a silly trigr.c program:
void main(){ const char *s = "??!??!"; }
and processing it with:
cpp -trigraphs trigr.c
You'll get a console output of
void main(){ const char *s = "||"; }
As you can notice, the option -trigraphs must be specified or else cpp will issue a warning; this indicates how trigraphs are a thing of the past and of no modern value other than confusing people who might bump into them.
As for the rationale behind the introduction of trigraphs, it is better understood when looking at the history section of ISO/IEC 646:
ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry.
As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones.
(emphasis mine)
So, in essence, some needed characters (those for which a trigraph exists) were replaced in certain national variants. This leads to the alternate representation using trigraphs comprised of characters that other variants still had around.
Related
I was under the impression that you could only start variable names with letters and _, however while testing around, I also found out that you can start variable names with $, like so:
Code
#include <stdio.h>
int main() {
int myvar=13;
int $var=42;
printf("%d\n", myvar);
printf("%d\n", $var);
}
Output
13
42
According to this resource, it says that you can't start variable names with $ in C, which is wrong (at least when compiled using my gcc version, Apple LLVM version 10.0.1 (clang-1001.0.46.4)). Other resources that I found online also seem to suggest that variables can't start with $, which is why I'm confused.
Do these articles just fail to mention this nuance, and if so, why is this a feature of C?
In the C 2018 standard, clause 6.4.2, paragraph 1 allows implementations to allow additional characters in identifiers.
It defines an identifier to be an identifier-nondigit character followed by any number of identifier-nondigit or digit characters. It defines digit to be “0“ to “9”, and it defines the identifier-nondigit characters to be:
a nondigit, which is one of underscore, “a” to “z”, or “A” to “Z”,
a universal-character-name, or
other implementation-defined characters.
Thus, implementations may define other characters that are allowed in identifiers.
The characters included as universal-character-name are those listed in ranges in Annex D of the C standard.
The resource you link to is wrong in several places:
Variable names in C are made up of letters (upper and lower case) and digits.
This is false; identifiers may include underscores and the above universal characters in every conforming implementation and other characters in implementations that permit them.
$ not allowed -- only letters, and _
This is incorrect. The C standard does not require an implementation to allow “$”, but it does not disallow an implementation from allowing it. “$” is allowed by some implementations and not others. It can be said not to be a part of strictly conforming C programs, but it may be a part of conforming C programs.
This answers your question:
In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.
This is allowed in GCC and LLVM because many traditional C implementations allow identifiers like this.
One such reason is that VMS commonly uses these, where a lot of system library routines have names like SYS$SOMETHING.
Here's a link to the GCC docs describing this:
https://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html
Depends on the dialect of C and the options selected. Historically some Cs supported $ to be compatible with existing libraries when C was new. You may need to use a command line option to enable $ or another to turn if of if strictly conforming C is valuable to you.
A spot of history: in my early years I got into enough mainframe rooms to know that $ is one of what IBM mainframes called "national characters" of $,#, and # that could show up in identifiers of programming languages like PL/1 and mainframe assembler. This worked down to some mainframe spin-offs, such as the IBM 1130. It looked to me like early impact printers using pieces of shaped slugs to print with, and CRT terminals, could swap out these characters to meet the national needs of foreign customers. The IBM 1403 printer had many "print chains" to choose from for different human languages and technical purposes.
Some non-IBM identifiers picked up on at least some of these characters. GNU C, VMS, and JavaScript kept "$". "$" is the only character of old that seems to have survived to this day, even as an option, in most languages. The odd thing is back on early IBM days the underscore was invalid for identifier names.
TL;DR: it's the assembler not the compiler
Ok, so I did some research into this. It's not really allowed, but what excludes it as the assembly pass. Trying to do the following fails:
#include <stdio.h>
extern int $func();
int main() {
int myvar=13;
int $var=42;
printf("%d\n", myvar);
printf("%d\n", $var);
$func();
}
joshua#nova:/tmp$ gcc -c test.c
/tmp/ccg7zLVB.s: Assembler messages:
/tmp/ccg7zLVB.s:31: Error: operand type mismatch for `call'
joshua#nova:/tmp$
I pulled K&R C version 2 (this covers ANSI C) off my shelf and it says "Identifiers are a sequence of letters and digits. The first character must be a letter; the underscore _ character counts as a letter. Upper and lower case letters are different. Identifiers may have any length ... [obsolete verbiage omitted]."
This reference as clearly aged; and almost everybody accepts high-unicode as letters. What's going on is the back-end assembler sees symbols bytewise and every byte with the high bit set counts as a letter. If you're crazy enough to use shift-jis outside of string literals, chaos can ensue; but otherwise this tends to work well enough.
I accessed a draft of C18 which says identifier-nondigit: nondigit ; nondigit ; universal-character-name other-implementation-defined-characters. Therefore, implementations are allowed to permit additional characters.
For universal-character-name, we have a restriction: "A universal character name shall not specify a character whose short identifier is less than 00A0
other than 0024 ( $ ), 0040 ( # ), or 0060 (‘), nor one in the range D800 through DFFF inclusive."
The following code still chokes at the assembly pass as expected:
#include <stdio.h>
extern int \U00000024func();
int main()
{
return \U00000024func();
}
The following code builds:
#include <stdio.h>
extern int func\U00000024();
int main()
{
return func\U00000024();
}
I am trying to calculate the Greatest Common Denominator of two integers.
C Code:
#include <stdio.h>
int gcd(int x, int y);
int main()
{
int m,n,temp;
printf("Enter two integers: \n");
scanf("%d%d",&m,&n);
printf("GCD of %d & %d is = %d",m,n,gcd(m,n));
return 0;
}
int gcd(int x, int y)
{
int i,j,temp1,temp2;
for(i =1; i <= (x<y ? x:y); i++)
{
temp1 = x%i;
temp2 = y%i;
if(temp1 ==0 and temp2 == 0)
j = i;
}
return j;
}
In the if statement, note the logical operator. It is and not && (by mistake). The code works without any warning or error.
Is there an and operator in C? I am using orwellDev-C++ 5.4.2 (in c99 mode).
&& and and are alternate tokens and are functionally same, from section 2.6 Alternative tokens from the C++ draft standard:
Alternative Primary
and &&
Is one of the entries in the Table 2 - Alternative tokens and it says in subsection 2:
In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling. The set of alternative tokens is defined in Table 2.
As Potatoswatter points out, using and will most likely confuse most people, so it is probably better to stick with &&.
Important to note that in Visual Studio is not complaint in C++ and apparently does not plan to be.
Edit
I am adding a C specific answer since this was originally an answer to a C++ question but was merged I am adding the relevant quote from the C99 draft standard which is section 7.9 Alternative spellings <iso646.h> paragraph 1 says:
The header defines the following eleven macros (on the left) that expand
to the corresponding tokens (on the right):
and includes this line as well as several others:
and &&
We can also find a good reference here.
Update
Looking at your latest code update, I am not sure that you are really compiling in C mode, the release notes for OrwellDev 5.4.2 say it is using GCC 4.7.2. I can not get this to build in either gcc-4.7 nor gcc-4.8 using -x c to put into C language mode, see the live code here. Although if you comment the gcc line and use g++ it builds ok. It also builds ok under gcc if you uncomment #include <iso646.h>
Check out the page here iso646.h
This header defines 11 macro's that are the text equivalents of some common operators.
and is one of the defines.
Note that I can only test this for a C++ compiler so I'm not certain if you can use this with a strict C compiler.
EDIT I've just tested it with a C compiler here and it does work.
and is just an alternative token for &&.
We can easily quote the standard here :
2.6 Alternative tokens [lex.digraph]
In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling. The set of alternative tokens is defined in Table 2.
In table 2 :
Alternative | Primary
and | &&
But I suggest you to use &&. People used to C/C++ may get confused by and...
Since it is merged now, we are talking also about C, you can check this page ciso646 defining the alternatives tokens.
This header defines 11 macro constants with alternative spellings for those C++ operators not supported by the ISO646 standard character set.
From the C99 draft standard :
7.9 Alternative spellings <iso646.h>
The header defines the following eleven macros (on the left) that expand
to the corresponding tokens (on the right):
and &&
Basically and is just the text version of && in c.
You do however need to #include <iso646.h>. or it isn't going to compile.
You can read more here:
http://msdn.microsoft.com/en-us/library/c6s3h5a7%28v=vs.80%29.aspx
If the code in your question compiles without errors, either you're not really compiling in C99 mode or (less likely) your compiler is buggy. Or the code is incomplete, and there's a #include <iso646.h> that you haven't shown us.
Most likely you're actually invoking your compiler in C++ mode. To test this, try adding a declaration like:
int class;
A C compiler will accept this; a C++ compiler will reject it as a syntax error, since class is a keyword. (This may be a bit more reliable than testing the __cplusplus macro; a misconfigured development system could conceivably invoke a C++ compiler with the preprocessor in C mode.)
In C99, the header <iso646.h> defines 11 macros that provide alternative spellings for certain operators. One of these is
#define and &&
So you can write
if(temp1 ==0 and temp2 == 0)
in C only if you have a #include <iso646.h>; otherwise it's a syntax error.
<iso646.h> was added to the language by the 1995 amendment to the 1990 ISO C standard, so you don't even need a C99-compliant compiler to use it.
In C++, the header is unnecessary; the same tokens defined as macros by C's <iso646.h> are built-in alternative spellings. (They're defined in the same section of the C++ standard, 2.6 [lex.digraph], as the digraphs, but a footnote clarifies that the term "digraph" doesn't apply to lexical keywords like and.) As the C++ standard says:
In all respects of the language, each alternative token behaves the
same, respectively, as its primary token, except for its spelling.
You could use #include <ciso646> in a C++ program, but there's no point in doing so (though it will affect the behavior of #ifdef and).
I actually wouldn't advise using the alternative tokens, either in C or in C++, unless you really need to (say, in the very rare case where you're on a system where you can't easily enter the & character). Though they're more readable to non-programmers, they're likely to be less readable to someone with a decent knowledge of the C and/or C++ language -- as demonstrated by the fact that you had to ask this question.
It is compiling to you because I think you included iso646.h(ciso646.h) header file.
According to it and is identical to &&. If you don't include that it gives compiler error.
The and operator is the text equivalent of && Ref- AND Operator
The or operator is the text equivalent of || Ref.- OR Operator
So resA and resB are identical.
&& and and are synonyms and mean Logical AND in C++. For more info check Logical Operators in C++ and Operator Synonyms in C++.
I was looking at the runtime.c file in the go runtime at
/usr/local/go/src/pkg/runtime
and saw the following function definitions:
void
runtime∕pprof·runtime_cyclesPerSecond(int64 res)
{...}
and
int64
runtime·tickspersecond(void)
{...}
and there are a lot of declarations like
void runtime·hashinit(void);
in the runtime.h.
I haven't seen this C syntax before (specially the one with the slash seems odd).
Is this part of std C or some plan9 dialect?
It's Go's special internal syntax for Go package paths. For example,
runtime∕pprof·runtime_cyclesPerSecond
is function runtime_cyclesPerSecond in package path runtime∕pprof.
The '∕' character is the Unicode division slash character, which separates path elements. The '·' character is the Unicode middle dot character, which separates the package path and the function.
∕ and · and friends are merely random Unicode characters that someone decided to put in function names. Obscure Unicode characters (edit: that are listed in Annex D of the C99 standard (pages 452-453 of this PDF); see also here) are just as legal in C identifiers as A or 7 (in your average Unicode-capable compiler, anyway).
Char| Hex| Octal|Decimal|Windows Alt-code
----+------+------+-------+----------------
∕ |0x2215|021025| 8725| (null)
· | 0xB7| 0267| 183| Alt+0183
Putting characters that look like operators but aren't (U+2215 ∕, in particular, resembles U+2F / (division) far too closely) in function names can be a confusing practice, so I would personally advise against it. Obviously someone on the Go team decided that whatever reasons they had for including them in function names outweighed the potential for confusion.
(Edit: It should be noted that U+2215 ∕ isn't expressly permitted by Annex D. As discussed here, this may be an extension.)
I´ve seen this in many popular C-Projects e.g the Go language and nowhere i can find some information about it. I think it is a kind of namespacing but i thought C doesn´t support it.
e.g
void runtime·memhash(uintptr*, uintptr, void*);
Thanks.
· is not a part of the "basic execution character set", and thus is not a standard C operator.
However, it does appear that the C standard allows it as an implementation-defined identifier character. It has no special meaning; it's just another character.
What was is it the original reason to use trigraph sequence of some chars to become other chars in ansi C like:
??=define arraycheck(a, b) a??(b??) ??!??! b??(a??)
becomes
#define arraycheck(a, b) a[b] || b[a]
Short answer: keyboards/character encodings that didn't include such graphs.
From wikipedia:
The basic character set of the C programming language is a superset of the ASCII character set that includes nine characters which lie outside the ISO 646 invariant character set. This can pose a problem for writing source code when the keyboard being used does not support any of these nine characters. The ANSI C committee invented trigraphs as a way of entering source code using keyboards that support any version of the ISO 646 character set.
http://en.wikipedia.org/wiki/Digraphs_and_trigraphs
Some old keyboards didn't have specific characters on them, so the language worked around it by letting you use trigraphs instead.