How is the following code giving the output as -99? - c

Given below is a C code snippet with str initialized to "face":
char *str = "face";
printf("%d\n", -2[str]);

First, we need to parse the expression in question: We have two operators in -2[str] - a subscript operator [] and a unary minus operator -. Subscript operator has higher precedence than unary minus, so printf prints a negation of 2[str]*.
There are many Q&As explaining that 2[str] is the same as str[2], so I am not going to repeat the explanation; you can read about it in this Q&A.
Finally, the value of str[2] is 'c', which represents code of 99 on your system. Negation is applied to that value, so that is how -99 gets printed.
* Note that - is not part of an integer constant, because in C integer constants do not include sign.

The code in the question is:
char *str = "face";
printf("%d\n", -2[str]);
Let's be clear: this is horrid, and anyone writing that code should be made to rewrite it.
There are two parts to the confusion when approaching this:
Why is a[i] == i[a]?
How is -2[str] evaluated?
The linked question covers (1) extensively. Read it.
To address the second part, consider an alternative program:
#include <stdio.h>
int main(void)
{
char data[] = "XYZface";
char *str = &data[3];
printf("[%s] %d %d %d (%c)\n", str, -2[str], -(2[str]), (-2)[str], (-2)[str]);
return 0;
}
This outputs:
[face] -99 -99 89 (Y)
Why? The -2[str] notation is equivalent to -str[2] (you have read the linked Q&A, haven't you?) and not str[-2], because there are no negative literal numbers.
Read C11 §6.4.4.1 Integer constants: there are no minus signs in there. When you write -2, you have a unary minus operator and a literal 2. Mostly, that's the same as negative two, but not when mixed with a higher priority operator such as subscripting. The §6.5.2 Postfix operators such as subscripting have higher priority than the §6.5.3 Unary operators such as negation.
Let's also be clear: there is no undefined behaviour in the question's code (or mine, I trust). Technically, the value for letter 'c' (+99) is implementation-defined, but there are few extant systems where the integer value of 'c' is not 99 (but see EBCDIC for a code set where the answer would be different).

Lets dissect:
-2[str]
is
-(2[str])
because of operator precedence. Note that the -2 is not directly an integer literal; 2 is and it can receive the unary operator -, but before that happens, the [] operator is applied.
Next step is
-(str[2])
Because (a well known if curious fact) a[i]==i[a].
-('c')
Because of the format string %d, this is seen as a negative int, with the absolute value of the ASCII value of 'c'.
-(99)
-99
(This is of course a compilation of know-how by several commenters:
Jonathan Leffler, StoryTeller and a little by myself.)

As explained in the comments the code is working like this:
-(2[str]) => -(*(2 + str)) => -str[2]
As str[2] is 'c', whose ASCII value is 99. So the output is -99.
Thanks, storyteller for clearing this out.

Related

Understanding syntax puzzle in c

In an upcoming exam in c, we have one question that gives you extra credit.
The question is always related to tricky syntax of various printing types.
Overall, I understood all the questions I have gone through, but two questions in particular had me puzzled :
What is the output of the following program?
#include <stdio.h>
void main(){
printf ("%c", '&'&'&');
}
answer: &
What is the output of the following program?
#include <stdio.h>
#include <string.h>
void main(){
printf("%c",strcmp("***","**")*'*');
}
answer: *
As you can see the questions are quite similar.
My question is, why is this the output?
Regarding the first question: I understand that a character is, logic-wise, always TRUE and that AND-ing TRUE with TRUE gives you TRUE (or 1) as well, but why would it convert 1 to '&', why not the char equivalent of 1 from the ASCII-table? (notice the required print of %c and not %d)
Regarding the second question: I understand that strcmp returns an int according to the value that 'appears first in the dictionary' and in this example would result in 1 but why multiplying it with the char '*' (again, logic-wise equals to 1) would result in converting (1*1=1) to char '*'?
For the first question the expression is '&' & '&', where & is a bitwise AND operator (not a logical operator). With bitwise AND the result of x & x is x, so the result in this case is just the character '&'.
For the second question, assuming the the result of the call to strcmp() is 1, you can then simplify the expression to 1 * '*' which is just '*'. (Note that as #rici mentions in the comments above, the result of strcmp is not guaranteed to be 1 in this case, only that it will be an integer > 0, so you should not rely on this behaviour, and the question is therefore a bad question).
'&' is a constant of type int. '&'&'&' has the same value and type as '&' since a & a is a for any int a. So the output is equivalent to printf ("%c", '&');.
The analysis of the second snippet is more difficult. The result of strcmp is a positive number. And that is multiplied by '*' (which must be a positive number for any encoding supported by C). That's an int but the value is implementation defined (subject to the encoding on your platform and your platform's implementation of strcmp), and the behaviour of %c is contingent on the signedness or otherwise of char on your platform. If the result is too big to fit into a char, and char is unsigned, then the value is converted to a char with the normal wrap-around behaviour. If char is signed then the conversion is implementation-defined and an implementation-defined signal might be raised.

Increment madness [duplicate]

int main ()
{
int a = 5,b = 2;
printf("%d",a+++++b);
return 0;
}
This code gives the following error:
error: lvalue required as increment operand
But if I put spaces throughout a++ + and ++b, then it works fine.
int main ()
{
int a = 5,b = 2;
printf("%d",a++ + ++b);
return 0;
}
What does the error mean in the first example?
Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).
Normal lexers are greedy (with some exceptions), so your code is being interpreted as
a++ ++ +b
The input to the parser is a stream of symbols, so your code would be something like:
[ SYMBOL_NAME(name = "a"),
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS,
SYMBOL_NAME(name = "b")
]
Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)
a+++b
is
a++ +b
Which is ok. So are your other examples.
printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule!.
++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.
!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"
The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.
In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.
Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:
struct bad_code {
bad_code &operator++(int) {
return *this;
}
int operator+(bad_code const &other) {
return 1;
}
};
int main() {
bad_code a, b;
int c = a+++++b;
return 0;
}
The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).
This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:
If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token. [...]
which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.
the paragraph also has two examples the second one is an exact match for you question and is as follows:
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which
violates a constraint on increment operators, even though the parse x
++ + ++ y might yield a correct expression.
which tells us that:
a+++++b
will be parsed as:
a ++ ++ + b
which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):
The operand of the postfix increment or decrement operator shall have
qualified or unqualified real or pointer type and shall be a
modifiable lvalue.
and
The result of the postfix ++ operator is the value of the operand.
The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:
->*
the lexical analyzer can do one of three things:
Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*
The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):
solves many more problems than it causes, but in two common
situations, it’s an annoyance.
The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:
list<vector<string>> lovos; // error!
^^
Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:
list< vector<string> > lovos;
^
The second case involves default arguments for pointers, for example:
void process( const char *= 0 ); // error!
^^
would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.
Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.
Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.
(a++)++ +b
a++ returns the previous value, a rvalue. You can't increment this.
Because it causes undefined behaviour.
Which one is it?
c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)
Yeah, neither you nor the compiler know it.
EDIT:
The real reason is the one as said by the others:
It gets interpreted as (a++)++ + b.
but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.
Thx to the others to pointing this out.
I think the compiler sees it as
c = ((a++)++)+b
++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.
By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.
Follow this precesion order
1.++ (pre increment)
2.+ -(addition or subtraction)
3."x"+ "y"add both the sequence
int a = 5,b = 2;
printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment
return 0; //it is 5+3=8

Why there is no sign character in the syntax of constants?

Why doesn't the standard include a sign character in the syntax of constants?
It is mentioning only digits and sign character is only present in exponents.
The standard does not bother with the sign in front of numeric literals because it would be redundant.
The syntax already captures the sign as part of unary plus + and unary minus - operators. When you write
int a = -4;
the syntax of the right-hand side could be adequately described as a unary minus - expression with the operand of 4. This is the approach that the standard takes.
If - were a part of the constant -2 then 4-2 would be a syntax error (since a token is always the longest possible sequence of characters). Also, the semantics of -2147483648 and - 2147483648 would be different (the first one would be an int and the second one a long, assuming int is 32 bits and long is longer). Both of those things would be confusing.
If the - is always an operator, the semantics of -2147483648 are sometimes a little unexpected, but the more common x-1 works as expected. So that's how most programming languages, including C, work.

Semantics of `printf("…") || printf("…") || printf("…")`

I'm wondering what the following statement will print in C?
printf("hello\n") || (printf("goodbye\n") || printf("world\n"));
I'm usually accustomed to using "cout" to print something in C. Also I'm confused about the pipe and double pipe operators used this way. Thank you!
First, cout is a C++ invention, never made it back to C, and never will.
Next, printf returns the number of printed characters, so the first call returns non-zero.
As || is short-circuiting boolean-or, none of the following printf-calls will be done.
(| is bitwise-or, and thus not short-circuiting. Added because you are talking about single pipes and #Leeor linked such a question.)
Endresult: hello\n is printed: 5 characters+newline (will be translated, as stdin is text-mode (identity-transformation on Unixoids)).
7.21.6.3 The printf function
Synopsis
#include <stdio.h>
int printf(const char * restrict format, ...);
Description
2 The printf function is equivalent to fprintf with the argument stdout interposed
before the arguments to printf.
Returns
3 The printf function returns the number of characters transmitted, or a negative value if
an output or encoding error occurred.
6.5.12 Bitwise inclusive OR operator
Synopsis
[...]
Constraints
2 Each of the operands shall have integer type.
Semantics
3 The usual arithmetic conversions are performed on the operands.
4 The result of the | operator is the bitwise inclusive OR of the operands (that is, each bit in
the result is set if and only if at least one of the corresponding bits in the converted
operands is set).
6.5.14 Logical OR operator
Synopsis
[...]
Constraints
2 Each of the operands shall have scalar type.
Semantics
3 The || operator shall yield 1 if either of its operands compare unequal to 0; otherwise, it
yields 0. The result has type int.
4 Unlike the bitwise | operator, the || operator guarantees left-to-right evaluation; if the
second operand is evaluated, there is a sequence point between the evaluations of the first
and second operands. If the first operand compares unequal to 0, the second operand is
not evaluated.
It will print:
hello\n (that is, hello and a newline, not the literal "\n".)
printf returns the number of characters printed to the console. The || is a short-circuiting "or", which means: do the first thing, then if the first thing returned "false", do the next thing. At the end, return back whether any of the things you did returned "true".
In c, an int is considered "true" if it is any value other than 0, and all three of those printf calls print more than 0 characters, so it will run the first one, which returns (a value logically equivalent to) true, so it will stop execution of that line and go onto the next.
Of course, there's no reason to ever write code like that... there are sometimes reasons to use short-circuiting boolean operators with functions that have side-effects (like printing to the console), but I can't think of a reason you'd ever need to short-circuit where the functions you were calling were being passed constants and you always knew exactly what result you would get from them.
Also, yes, as written there's a compile error because of an extra open-parenthesis before your second printf. But ignoring that.
It prints "hello" only!
http://www.compileonline.com/compile_c_online.php
#include <stdio.h>
#include <string.h>
main()
{
printf("hello\n") || (printf("goodbye\n") || printf("world\n"));
}

Why doesn't c = a+++++b work in C? [duplicate]

int main ()
{
int a = 5,b = 2;
printf("%d",a+++++b);
return 0;
}
This code gives the following error:
error: lvalue required as increment operand
But if I put spaces throughout a++ + and ++b, then it works fine.
int main ()
{
int a = 5,b = 2;
printf("%d",a++ + ++b);
return 0;
}
What does the error mean in the first example?
Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).
Normal lexers are greedy (with some exceptions), so your code is being interpreted as
a++ ++ +b
The input to the parser is a stream of symbols, so your code would be something like:
[ SYMBOL_NAME(name = "a"),
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS,
SYMBOL_NAME(name = "b")
]
Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)
a+++b
is
a++ +b
Which is ok. So are your other examples.
printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule!.
++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.
!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"
The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.
In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.
Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:
struct bad_code {
bad_code &operator++(int) {
return *this;
}
int operator+(bad_code const &other) {
return 1;
}
};
int main() {
bad_code a, b;
int c = a+++++b;
return 0;
}
The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).
This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:
If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token. [...]
which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.
the paragraph also has two examples the second one is an exact match for you question and is as follows:
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which
violates a constraint on increment operators, even though the parse x
++ + ++ y might yield a correct expression.
which tells us that:
a+++++b
will be parsed as:
a ++ ++ + b
which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):
The operand of the postfix increment or decrement operator shall have
qualified or unqualified real or pointer type and shall be a
modifiable lvalue.
and
The result of the postfix ++ operator is the value of the operand.
The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:
->*
the lexical analyzer can do one of three things:
Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*
The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):
solves many more problems than it causes, but in two common
situations, it’s an annoyance.
The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:
list<vector<string>> lovos; // error!
^^
Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:
list< vector<string> > lovos;
^
The second case involves default arguments for pointers, for example:
void process( const char *= 0 ); // error!
^^
would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.
Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.
Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.
(a++)++ +b
a++ returns the previous value, a rvalue. You can't increment this.
Because it causes undefined behaviour.
Which one is it?
c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)
Yeah, neither you nor the compiler know it.
EDIT:
The real reason is the one as said by the others:
It gets interpreted as (a++)++ + b.
but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.
Thx to the others to pointing this out.
I think the compiler sees it as
c = ((a++)++)+b
++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.
By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.
Follow this precesion order
1.++ (pre increment)
2.+ -(addition or subtraction)
3."x"+ "y"add both the sequence
int a = 5,b = 2;
printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment
return 0; //it is 5+3=8

Resources