Understanding syntax puzzle in c - c

In an upcoming exam in c, we have one question that gives you extra credit.
The question is always related to tricky syntax of various printing types.
Overall, I understood all the questions I have gone through, but two questions in particular had me puzzled :
What is the output of the following program?
#include <stdio.h>
void main(){
printf ("%c", '&'&'&');
}
answer: &
What is the output of the following program?
#include <stdio.h>
#include <string.h>
void main(){
printf("%c",strcmp("***","**")*'*');
}
answer: *
As you can see the questions are quite similar.
My question is, why is this the output?
Regarding the first question: I understand that a character is, logic-wise, always TRUE and that AND-ing TRUE with TRUE gives you TRUE (or 1) as well, but why would it convert 1 to '&', why not the char equivalent of 1 from the ASCII-table? (notice the required print of %c and not %d)
Regarding the second question: I understand that strcmp returns an int according to the value that 'appears first in the dictionary' and in this example would result in 1 but why multiplying it with the char '*' (again, logic-wise equals to 1) would result in converting (1*1=1) to char '*'?

For the first question the expression is '&' & '&', where & is a bitwise AND operator (not a logical operator). With bitwise AND the result of x & x is x, so the result in this case is just the character '&'.
For the second question, assuming the the result of the call to strcmp() is 1, you can then simplify the expression to 1 * '*' which is just '*'. (Note that as #rici mentions in the comments above, the result of strcmp is not guaranteed to be 1 in this case, only that it will be an integer > 0, so you should not rely on this behaviour, and the question is therefore a bad question).

'&' is a constant of type int. '&'&'&' has the same value and type as '&' since a & a is a for any int a. So the output is equivalent to printf ("%c", '&');.
The analysis of the second snippet is more difficult. The result of strcmp is a positive number. And that is multiplied by '*' (which must be a positive number for any encoding supported by C). That's an int but the value is implementation defined (subject to the encoding on your platform and your platform's implementation of strcmp), and the behaviour of %c is contingent on the signedness or otherwise of char on your platform. If the result is too big to fit into a char, and char is unsigned, then the value is converted to a char with the normal wrap-around behaviour. If char is signed then the conversion is implementation-defined and an implementation-defined signal might be raised.

Related

Operator "<<= " : What does it it mean?

I need help solving this problem in my mind, so if anyone had a similar problem it would help me.
Here's my code:
char c=0xAB;
printf("01:%x\n", c<<2);
printf("02:%x\n", c<<=2);
printf("03:%x\n", c<<=2);
Why the program prints:
01:fffffeac
02:ffffffac
03:ffffffb0
What I expected to print, that is, what I got on paper is:
01:fffffeac
02:fffffeac
03:fffffab0
I obviously realized I didn't know what the operator <<= was doing, I thought c = c << 2.
If anyone can clarify this, I would be grateful.
You're correct in thinking that
c <<= 2
is equivalent to
c = c << 2
But you have to remember that c is a single byte (on almost all systems), it can only contain eight bits, while a value like 0xeac requires 12 bits.
When the value 0xeac is assigned back to c then the value will be truncated and the top bits will simply be ignored, leaving you with 0xac (which when promoted to an int becomes 0xffffffac).
<<= means shift and assign. It's the compound assignment version of c = c << 2;.
There's several problems here:
char c=0xAB; is not guaranteed to give a positive result, since char could be an 8 bit signed type. See Is char signed or unsigned by default?. In which case 0xAB will get translated to a negative number in an implementation-defined way. Avoid this bug by always using uint8_t when dealing with raw binary bytes.
c<<2 is subject to Implicit type promotion rules - specifically c will get promoted to a signed int. If the previous issue occured where your char got a negative value, c now holds a negative int.
Left-shifting negative values in C invokes undefined behavior - it is always a bug. Shifting signed operands in general is almost never correct.
%x isn't a suitable format specifier to print the int you ended up with, nor is it suitable for char.
As for how to fix the code, it depends on what you wish to achieve. It's recommended to cast to uint32 before shifting.

How is the following code giving the output as -99?

Given below is a C code snippet with str initialized to "face":
char *str = "face";
printf("%d\n", -2[str]);
First, we need to parse the expression in question: We have two operators in -2[str] - a subscript operator [] and a unary minus operator -. Subscript operator has higher precedence than unary minus, so printf prints a negation of 2[str]*.
There are many Q&As explaining that 2[str] is the same as str[2], so I am not going to repeat the explanation; you can read about it in this Q&A.
Finally, the value of str[2] is 'c', which represents code of 99 on your system. Negation is applied to that value, so that is how -99 gets printed.
* Note that - is not part of an integer constant, because in C integer constants do not include sign.
The code in the question is:
char *str = "face";
printf("%d\n", -2[str]);
Let's be clear: this is horrid, and anyone writing that code should be made to rewrite it.
There are two parts to the confusion when approaching this:
Why is a[i] == i[a]?
How is -2[str] evaluated?
The linked question covers (1) extensively. Read it.
To address the second part, consider an alternative program:
#include <stdio.h>
int main(void)
{
char data[] = "XYZface";
char *str = &data[3];
printf("[%s] %d %d %d (%c)\n", str, -2[str], -(2[str]), (-2)[str], (-2)[str]);
return 0;
}
This outputs:
[face] -99 -99 89 (Y)
Why? The -2[str] notation is equivalent to -str[2] (you have read the linked Q&A, haven't you?) and not str[-2], because there are no negative literal numbers.
Read C11 §6.4.4.1 Integer constants: there are no minus signs in there. When you write -2, you have a unary minus operator and a literal 2. Mostly, that's the same as negative two, but not when mixed with a higher priority operator such as subscripting. The §6.5.2 Postfix operators such as subscripting have higher priority than the §6.5.3 Unary operators such as negation.
Let's also be clear: there is no undefined behaviour in the question's code (or mine, I trust). Technically, the value for letter 'c' (+99) is implementation-defined, but there are few extant systems where the integer value of 'c' is not 99 (but see EBCDIC for a code set where the answer would be different).
Lets dissect:
-2[str]
is
-(2[str])
because of operator precedence. Note that the -2 is not directly an integer literal; 2 is and it can receive the unary operator -, but before that happens, the [] operator is applied.
Next step is
-(str[2])
Because (a well known if curious fact) a[i]==i[a].
-('c')
Because of the format string %d, this is seen as a negative int, with the absolute value of the ASCII value of 'c'.
-(99)
-99
(This is of course a compilation of know-how by several commenters:
Jonathan Leffler, StoryTeller and a little by myself.)
As explained in the comments the code is working like this:
-(2[str]) => -(*(2 + str)) => -str[2]
As str[2] is 'c', whose ASCII value is 99. So the output is -99.
Thanks, storyteller for clearing this out.

Why EOF coincides with valid char value? [duplicate]

This question already has answers here:
What is EOF in the C programming language?
(10 answers)
Closed 6 years ago.
As said in comments to the answer to this question: Why gcc does not produce type mismatch warning for int and char?
both -1 and 255 are 0xFF as 8 bit HEX number on any current CPU.
But EOF is equal to -1. This is a contradiction, because the value of EOF must not coincide with any valid 8-bit character. This example demonstrates it:
#include <stdio.h>
int main(void)
{
char c = 255;
if (c == EOF) printf("oops\n");
return 0;
}
On my machine it prints oops.
How this contradiction can be explained?
When you compare an int value to a char value, the char value is promoted to an int value. This promotion is automatic and part of the C language specification (see e.g. this "Usual arithmetic conversions" reference, especially point 4). Sure the compiler could give a warning about it, but why should it if it's a valid language construct?
There's also the problem with the signedness of char which is implementation defined. If char is unsigned, then your condition would be false.
Also if you read just about any reference for functions reading characters from files (for example this one for fgetc and getc) you will see that they return an int and not a char, precisely for the reasons mentioned above.

What is the reason for the double negation -(-n)?

I'm going through some legacy code and I've seen something like
char n = 65;
char str[1024];
sprintf(str, "%d", -(-n));
Why has the author (no longer present) written -(-n) rather than just n? Wouldn't --n suffice?
The first thing to note is that --n actually decreases n by 1 and evaluates to the new value, with the type char; so it does something very different to -(-n). Don't change the code to that!
-n performs a unary negation of n and is also an expresion of type int due to the type promotion rules of C. The further negation sets it back to the original value but with the type int retained.
So -(-n) is actually a verbose way of writing +n, which is often though to be a no-op but in this case it converts the type of n to an int.
I suspect the author is guarding themselves against errant refactoring and they were worried about mismatching the type of the argument with the format specifier %d.
But in this particular case it does not matter: sprintf will automatically promote the char type to an int, so it's perfectly safe to write
sprintf(str, "%d", n);
Do also consider reducing the size of the str buffer if that's "real" code, and consider using the safer snprintf variant.
(As a final remark note that a double negation can yield signed integral type overflow, so do use with caution.)

Why sizeof('c') is returning 4 instead of 1? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are C character literals ints instead of chars?
http://ideone.com/lHYY8
int main(void)
{
printf("%d %d\n", sizeof('c'), sizeof(char));
return 0;
}
Why does sizeof('c') return 4 instead of 1?
Because in C character constants have the type int, not char. So sizeof('c') == sizeof(int). Refer to this C FAQ
Perhaps surprisingly, character constants in C are of type int, so
sizeof('a') is sizeof(int) (though this is another area where C++
differs).
One (possibly even more extreme) oddity that also somehow justifies this, is the fact that character literals are not limited to being single character.
Try this:
printf("%d\n", 'xy');
This is sometimes useful when dealing with e.g. binary file formats that use 32-bit "chunk" identifiers, such as PNG. You can do things like this:
const int chunk = read_chunk_from_file(...);
if(chunk == 'IHDR')
process_image_header(...);
There might be portability issues with code like this though, of course the above snippet assumes that read_chunk_from_file() magically does the right thing to transform the big-endian 32-bit value found in the PNG file into something that matches the value of the corresponding multi-character character literal.
The following is the famous line from the famous C book - The C programming Language by Kernighan & Ritchie with respect to a character written between single quotes.
A character written between single quotes represents an integer value equal to the numerical value of the character in the machine's character set.
So sizeof('a') is equivalent to sizeof(int)
And this question is a duplicate of why sizeof('a') is 4 in C?
cnicutar is completely right of course. I just wanted to add the reason for this. If you look at functions line fgetc, you'll notice that it also returns an int. It's because a char can represent any character from 0x00 to 0xFF, but an additional value is needed in order to represent EOF. So functions that return a character from input or a file often return an int, which can be compared with EOF, which is usually defined to be -1, but it can be anything that isn't a valid character.

Resources