Explanation of output of program - c

Can anyone explain why this program prints 4 1 instead of 4 2?
Shouldn't pre increment operator which has higher precedence get executed first and print 4 2?
#include <stdio.h>
int main() {
int a=1;
printf ("%ld %d",sizeof(++a),a);
return 0;
}

Although you've already gotten several answers, I want to provide one more, because your question actually contained three separate misunderstandings, and I want to touch on all of them.
First of all, sizeof is a special operator which, by definition, does not evaluate its argument (that is, whatever subexpression it's taking the size of). So sizeof(++a) does not increment a. And sizeof(x = 5) would not assign 5 to x. And sizeof(printf("Hello!")) would not print "Hello".
Second, if we got rid of the sizeof, and simply wrote
printf("%d %d", ++a, a);
we would not be able to use precedence to figure out the behavior. Precedence is an important concept, but in general it does not help you figure out the behavior of confusing operations involving ++.
Finally, the perhaps surprising answer is that if you write
printf("%d %d", ++a, a);
it is not possible to figure out what it will do at all. It's basically undefined. (Specifically: in any function call like printf("%d %d", x, y) it's unspecified which order the arguments get evaluated in, so you don't know whether x or y gets evaluated first -- although there is an order. But then, when one of them is a and one of them is ++a, you have a situation where a is both being modified and having its value used, so there's no way to know whether the old or the new value gets used, and this makes the expression undefined. See this question for more on this issue.)
P.S. One more issue I forgot to mention, as noted by #Vlad from Moscow: %ld is not a reliable way to print the result of sizeof, which is a value of type size_t. You should use %zu if you can, or %u after casting to (unsigned) if you can't.

From the C Standard *6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer. If
the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result
is an integer constant.
So the task of the sizeof operator is to determine the type of the expression used as an operand and then knowing the type of its operand to return the size of an object of the type. If the operand is not a variable length array then the expression used as an operand is not evaluated and the value returned by the sizeof operator is calculated at compile-time.
Thus this call
printf ("%ld %d",sizeof(++a),a);
is equivalent to the call
printf ("%ld %d",sizeof( int ),a);
and in your system sizeof( int ) is equal to 4.
So it does not matter what expression is used (except using a variable length array the size of which is calculated at run time) as an operand. It is the type of the expression that is important. For example you could even write
printf ( "%zu %d\n", sizeof( ( ++a, ++a, ++a, ++a, ++a ) ), a );
and got the same result.
Pay attention to that you should use the conversion specifier zu used for values of the type size_t instead of ld used for signed values. That is you need to write
printf ("%zu %d",sizeof(++a),a);

According to c99 standard,
the sizeof() operator only takes into account the type of the operand, which may be an expression or the name of a type (i.e int, double, float etc) and not the value obtained on evaluating the expression.
Hence, the operand inside the sizeof() operator is not evaluate.

Related

Why in C does the function sizeof() output the size of right most operand when more than one operands are passed separated by comma?

I have the following code in C:
#include <stdio.h>
void main() {
printf("%d %d\n",sizeof(5),sizeof(5,5));
printf("%d %d\n",sizeof(5),sizeof(5.0,5));
printf("%d %d\n",sizeof(5),sizeof(5,5.0));
}
And I get the output:
4 4
4 4
4 8
I understand that sizeof(5) would return me the size of integer and sizeof(5.0) would return the size of a double, but why does it give the size of the rightmost operand in case more than one arguments are passed separated by comma? Why not the first argument or the collective size of all the arguments?
I am compiling online using OnlineGDB.com compiler for C.
Thanks for your help.
The simple reason is: Because sizeof is not a function! It is an operator that takes some expression on its right. Syntactically, it behaves the same as the return operator. The parentheses are only added by the programmers for clarity, and are not needed in most cases:
sizeof(foo); //no surprise, take the size of the variable/object
sizeof foo; //same as above, the parentheses are not needed
sizeof(void*); //you cannot pass a type to a function, but you can pass it to the sizeof operator
sizeof void*; //same as above
typedef char arrayType[20]
arrayType* bar; //pointer to an array
sizeof(*bar); //you cannot pass an array to a function, but you can pass it to the sizeof operator
sizeof*bar; //same as above
//compare to the behavior of `return`:
return foo; //no surprise
return(foo); //same as above, any expression may be enclosed in parentheses
So, what happens when you say sizeof(5, 5.0)? Well, since sizeof is an operator, the parentheses are not a function call, but rather interpreted like the parentheses in 1*(2 + 3) == 5. In both cases, the ( follows an operator, and is thus not interpreted as a function call. As such, the comma does not separate function call arguments (because there is no function call), rather it's interpreted as the comma operator. And the comma operator is defined to evaluate both its operands, and then return the value of the last operand. The operator nature of the sizeof dictates how the expression on its right is parsed.
Because the associativity of the comma operator is left to right.
Only the rightmost expression is used and the rest are discarded (although it's side effects have to do with sequencing).
Therefore,
sizeof(5.0,5) is equivalent to sizeof(5)
and
sizeof(5,5.0) is equivalent to sizeof(5.0)

Why is the printf statement in the code below printing a value rather than a garbage value?

int main(){
int array[] = [10,20,30,40,50] ;
printf("%d\n",-2[array -2]);
return 0 ;
}
Can anyone explain how -2[array-2] is working and Why are [ ] used here?
This was a question in my assignment it gives the output " -10 " but I don't understand why?
Technically speaking, this invokes undefined behaviour. Quoting C11, chapter §6.5.6
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. [....]
So, (array-2) is undefined behavior.
However, most compilers will read the indexing, and it will likely be able to nullify the +2 and -2 indexing, [2[a] is same as a[2] which is same as *(a+2), thus, 2[a-2] is *((2)+(a-2))], and only consider the remaining expression to be evaluated, which is *(a) or, a[0].
Then, check the operator precedence
-2[array -2] is effectively the same as -(array[0]). So, the result is the value array[0], and -ved.
This is an unfortunate example for instruction, because it implies it's okay to do some incorrect things that often work in practice.
The technically correct answer is that the program has Undefined Behavior, so any result is possible, including printing -10, printing a different number, printing something different or nothing at all, failing to run, crashing, and/or doing something entirely unrelated.
The undefined behavior comes up from evaluating the subexpression array -2. array decays from its array type to a pointer to the first element. array -2 would point at the element which comes two positions before that, but there is no such element (and it's not the "one-past-the-end" special rule), so evaluating that is a problem no matter what context it appears in.
(C11 6.5.6/8 says)
When an expression that has integer type is added to or subtracted from a pointer, .... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Now the technically incorrect answer the instructor is probably looking for is what actually happens on most implementations:
Even though array -2 is outside the actual array, it evaluates to some address which is 2*sizeof(int) bytes before the address where the array's data starts. It's invalid to dereference that address since we don't know that there actually is any int there, but we're not going to.
Looking at the larger expression -2[array -2], the [] operator has higher precedence than the unary - operator, so it means -(2[array -2]) and not (-2)[array -2]. A[B] is defined to mean the same as *((A)+(B)). It's customary to have A be a pointer value and B be an integer value, but it's also legal to use them reversed like we're doing here. So these are equivalent:
-2[array -2]
-(2[array -2])
-(*(2 + (array - 2)))
-(*(array))
The last step acts like we would expect: Adding two to the address value of array - 2 is 2*sizeof(int) bytes after that value, which gets us back to the address of the first array element. So *(array) dereferences that address, giving 10, and -(*(array)) negates that value, giving -10. The program prints -10.
You should never count on things like this, even if you observe it "works" on your system and compiler. Since the language guarantees nothing about what will happen, the code might not work if you make slight changes which seem they shouldn't be related, or on a different system, a different compiler, a different version of the same compiler, or using the same system and compiler on a different day.
Here is how -2[array-2] is evaluated:
First, note that -2[array-2] is parsed as - (2[array-2]). The subscript operator, [...] has higher precedence than the unary - operator. We often think of constants like -2 as single numbers, but it is in fact a - operator applied to a 2.
In array-2, array is automatically converted to a pointer to its first element, so it points to array[0].
Then array-2 attempts to calculate a pointer to two elements before the first element of the array. The resulting behavior is not defined by the C standard because C 2018 6.5.6 8 says that only arithmetic that points to array members and the end of the array is defined.
For illustration only, suppose we are using a C implementation that extends the C standard by defining pointers to use a flat address space and permit arbitrary pointer arithmetic. Then array-2 points two elements before the array.
Then 2[array-2] uses the fact that the C standard defines E1[E2] to be *((E1)+(E2)). That is, the subscript operator is implemented by adding the two things and applying *. Thus, it does not matter which expression is E1 and which is E2. E1+E2 is the same as E2+E1. So 2[array-2] is *(2 + (array-2)). Adding 2 moves the pointer from two elements before the array back to the start of the array. Then applying * produces the element at that location, which is 10.
Finally, applying - gives −10. (Recall that this conclusion is only achieved using our supposition that the C implementation supports a flat address space. You cannot use this in general C code.)
This code invokes undefined behavior and can print anything, including -10.
C17 6.5.2.1 Array subscripting states:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))
Meaning array[n] is equivalent to *((array) + (n)) and that's how the compiler evaluates subscripting. This allows us to write silly obfuscation like n[array] as 100% equivalent to array[n]. Because *((n) + (array)) is equivalent to *((array) + (n)). As explained here:
With arrays, why is it the case that a[5] == 5[a]?
Looking at the expression -2[array -2] specifically:
[array -2] and [array - 2] are naturally equivalent. In this case the former is just sloppy style purposely used for the sake of obfuscating the code.
Operator precedence tells us to first consider [].
Thus the expression is equivalent to -*( (2) + (array - 2) )
Note that the first - is not part of the integer constant 2. C does not support negative integer constants1), the - is actually the unary minus operator.
Unary minus has lower presedence than [], so the 2 in -2[ "binds" to the [.
The sub-expression (array - 2) is evaluated individually and invokes undefined behavior, as per C17 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. /--/ If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Speculatively, one potential form of undefined behavior could be that a compiler decides to replace the whole expression (2) + (array - 2) with array, in which case the whole expression would end up as -*array and prints -10.
There's no guarantees of this and therefore the code is bad. If you were given the assignment to explain why the code prints -10, your teacher is incompetent. Not only is it meaningless/harmful to study obfuscation as part of C studies, it is harmful to rely on undefined behavior or expect it to give a certain result.
1) C rather supports negative integer constant expressions. -2 is an integer constant expression, where 2 is an integer constant of type int.

How does sizeof operator behaves in below code snippet?

Please explain the OP for below code snippet :
int *a="";
char *b=NULL;
float *c='\0' ;
printf(" %d",sizeof(a[1])); // prints 4
printf(" %d",sizeof(b[1])); // prints 1
printf(" %d",sizeof(c[1])); // prints 4
Compiler interprets a[1] as *(a+1) , so a has some address , now it steps 4 bytes ahead , then it will have some garbage value there so how is the OP 4 bytes , even if I do a[0] , still it prints 4 , although it is an empty string , so how come its size is 4 bytes ?
Here we are finding out the size of the variable the pointer is pointing to , so if I say size of a[1] , it means size of *(a+1), Now a has the address of a string constant which is an empty string , after I do +1 to that address it moves 4 bytes ahead , now its at some new address , now how do we know the size of this value , it can be an integer , a character or a float , anything , so how to reach to a conclusion for this ?
The sizeof operator does not evaluate its operand except one case.
From the C Standard (6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer.
If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.
In this code snippet
int *a="";
char *b=NULL;
float *c='\0' ;
printf(" %d",sizeof(a[1])); // prints 4
printf(" %d",sizeof(b[1])); // prints 1
printf(" %d",sizeof(c[1])); // prints 4
the type of the expression a[1] is int, the type of the expression b[1] is char and the type of the expression c[1] is float.
So the printf calls output correspondingly 4, 1, 4.
However the format specifiers in the calls are specified incorrectly. Instead of "%d" there must be "%zu" because the type of the value returned by the sizeof operator is size_t.
From the same section of the C Standard
5 The value of the result of both operators is implementation-defined,
and its type (an unsigned integer type) is size_t, defined in
<stddef.h> (and other headers).
This is all done statically, i.e. no dereferencing is happening at runtime. This is how the sizeof operator works, unless you use variable-length arrays (VLAs), then it must do work at runtime.
Which is why you can get away with sizeof:ing through a NULL pointer, and other things.
You should still be getting trouble for
int *a = "";
which makes no sense. I really dislike the c initializer too, but at least that makes sense.
sizeof operator happens at compilation (except for VLA's). It is looking at the type of an expression, not the actual data so even something like this will work:
sizeof(((float *)NULL)[1])
and give you the size of a float. Which on your system is 4 bytes.
Live example
Even though this looks super bad, it is all well defined, since no dereference ever actually occurs. This is all operations on type information at compile time.
sizeof() is based on the data type, so whilst it's getting the sizes outside the bounds of memory allocated to your variables, it doesn't matter as it's worked out at compile time rather than run time.

Why doesn't sizeof work as expected?

#include <stdio.h>
int main(void)
{
printf("%d", sizeof (getchar()) );
}
What I expect is,
1. Type input.
2. Read input and return input value.
3. Evaluate sizeof value.
4. Print the sizeof value.
But the first step never happens.
Why doesn't the first step happen?
The sizeof operator does not evaluate its operand unless its type is a variable length array type: It looks at the type and returns the size. This is perfectly safe:
char *ptr = NULL; // NULL ponter!
printf("%zu", sizeof *ptr);
It will return 1, since it does not have to evaluate the expression to know the answer.
What I expect is, 1. Type input. 2. Read input and return input value. 3. Evaluate sizeof value 4. Print the sizeof value.
But the first step never happens. Why doesn't the first step happen?
Because, with a very few exceptions, the sizeof operator does not evaluate its operand. Your usage is not one of the exceptions. Not evaluating getchar() means getchar() is not called.
In any event, I'm not sure what you expect from your code. Even if getchar() were called, the result always has the same type (int), which does not depend on the input.
Do also pay attention to #P.P.'s comments. Your printf() format does not match the type of the data being printed, size_t. As he observes, the printf() call has undefined behavior as a result.
In C the sizeof operator is evaluated at run-time only for Variable Size Arrays (VLA). In all other cases the operator does nor evaluate its operand. It deduces the type of the expression and returns the size of object of the deduced type.
Because getchar() return type is an int, not a char. sizeof(int) is 4 on your platform.
Also, you should use %zu to print size_t values. Using incorrect format specifier is technically undefined behaviour.

How does the code 'printf("%d %d %d\n",(a,b,c));' work?

Consider:
#include "stdio.h"
int main()
{
int a=3, b=4, c=5;
printf("%d %d %d\n", (a, b, c));
}
And the result is:
5 2280760 2281472
What is the explanation?
You have caused an undefined behavior:
This - (a,b,c) is evaluated as the last parameter, e.g. c, so the first printed number is 5 (c = 5) the other two are uninitialized parameters.
It doesn't work. It produces undefined behavior.
You supplied three format specifiers to printf and provided only one variadic argument, since in C (a,b,c) is an expression that evaluates to the value of c (read about the comma operator).
Since the number of arguments does not match the number of format specifiers, the behavior is undefined.
There are only two arguments to your printf call: "%d %d %d" and the result of evaluating (a,b,c).
The result of (a,b,c) is just the last item in the list: c, which is 5. That's passed to printf, which displays 5 for the first %d.
Since there are no more arguments, the remaining %d's just display whatever garbage is sitting on the call stack, resulting in the strange values you see.
I think you missing about comma expression,comma expression contains two operands of any type separated by a comma and has left-to-right associativity. The left operand is fully evaluated, possibly producing side effects, and its value, if there is one, is discarded. The right operand is then evaluated. The type and value of the result of a comma expression are those of its right operand, after the usual unary conversions.
The result of a comma expression is not an lvalue.
so you getting the above result
As IBM states in their documentation:
The primary use of the comma operator is to produce side effects in
the following situations:
Calling a function
Entering or repeating
an iteration loop
Testing a condition
Other situations where a side effect is required but the result of the expression is not immediately needed
MByD said:
This - (a,b,c) is evaluated as the last parameter, e.g. c, so the first printed number is 5 (c = 5) the other two are uninitialized parameters.
But a and b are not uninitialized (try the code below):
int a=3,b=4,c=5;
printf("%d %d %d\n",a,(a,b,c),b);
due to the bracket, it is assumed as one parameter, and due to parsing from left to right (as user null pointer said) it found c to be its value.
The above code outputs 3 5 4.

Resources