Dereferencing an uninitialized pointer to pass into sizeof() - c

In a recent post, I realised that when allocating a structure variable, passing the dereferenced pointer deemed a better practice in contrast to passing the structure type to sizeof(). This is basically because the former is more resilient to code changes than the latter.
Which suggests, that in the following code method 1 is deemed a better practice than method 2.
typedef struct X_ {
int x;
int y;
int z;
} X;
int main() {
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
X* obj2 = malloc(sizeof(X)); // ----> method 2
return 0;
}
The question is, how valid is it to dereference obj1 in method 1 ? Inside malloc, obj1 is still unconstructed/uninitialized memory which suggests that dereferencing of obj1 happening inside sizeof() shouldn't be valid.
Let me make a guess what makes method 1 valid. Is this because since sizeof() is a compile time operation dereferencing obj1 gets translated into method 2 by the compiler?
Could someone please elaborate the technical validity of this by referring to the relevant C standards?

The sizeof expression where the operand is not a variable length array is a non evaluated expression. So this expression
sizeof(*obj1)
is well-formed.
From the C Standard (6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer.
If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant
As for your question relative to the best method of specifying an argument of malloc
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
X* obj2 = malloc(sizeof(X)); // ----> method 2
then if the type X is visible in the point of using malloc like in this case
X* obj1 = malloc(sizeof(*obj1)); // ----> method 1
then this approach is preferable.
However if the type is not visible like for example
obj1 = malloc(sizeof(*obj1)); // ----> method 1
then I prefer explicitly to specify the type like
obj1 = malloc(sizeof( X ));
Otherwise for example this code snippet
p = malloc( *p );
q = malloc( *q );
does not give enough information for the reader of the code. And the reader will need to scroll the source code forward and backward to find the declarations of p and q to determine their types.

The question is, how valid is it to dereference obj1 in method 1?
It's 100% valid. You could use it without parenthesis though, sizeof *obj1.
From N1570 ISO/IEC 9899:201x §6.5.3.4 The sizeof and _Alignof operators
2 -
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
In fact one can say it's a preferable method, the reason being that if you change, for some reason, the type of the object it's easy to forget to also change the sizeof argument, using the derefenced pointer will avoid this potencial silent error.

Related

Is pointer = &buffer[0] redundant?

I was recently reading through some source code and read the following at the beginning of a function:
char buffer[ 1000 ];
char *pointer;
pointer = &buffer[0];
I guess I don't understand this. Why not just write:
pointer = buffer;
Is there some secret meaning I am missing here?
Some people may find it easier to understand depending on the occasion.
Someone might say that when you use pointer = buffer; you intend to use the pointer as the buffer,
while if you use pointer = &buffer[0]; you intend to use the pointer as a pointer or an item of the buffer.
It just happens that those 2 cases point to the same address.
Both expressions give the same result value. So in your given case it is mainly a question of preferred style.
But there is a difference if you use the expresions for example in a function call. A static code analysis tool should complain about
memcpy(&buffer[0], src, 2 * sizeof(buffer[0]));
because you state that you are writing two elements into one array element. But the tool should not complain about
memcpy(&buffer, src, 2 * sizeof(buffer[0]));
or
memcpy(buffer, src, 2 * sizeof(buffer[0]));
because you now say that you want to write into the complete array.
Relevant parts in the standard:
6.3.2.1 Lvalues, arrays, and function designators
3 Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. [...]
6.5.3.2 Address and indirection operators
Semantics 3 The unary & operator yields the address of its operand. If the operand has type “type”, the result has type “pointer to type”. [...] Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary* that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a+ operator. Otherwise, the result is a pointer to the object or function designated by its operand.
They're absolutely the same. I also prefer the simpler version
pointer = array; // implicit conversion from array to address of its 1st element
pointer = &array[0]; // explicitly set pointer to the address of array's 1st element
In some cases, depending on how you're going to use the pointer, the explicit version may be more self-documented.
From C Standards#6.5.2.1
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))..
So,
&buffer[0]
can be written as
&(*(buffer + 0))
Note that the operator & is used to get the address and the operator * is used for dereferencing. These operators cancel the effect of each other when used one after another. So, it is equivalent to
(buffer + 0)
which is nothing but
buffer
So, &buffer[0] is equivalent to buffer.

C assignment expression in argument to sizeof [duplicate]

Here is the code compiled in dev c++ windows:
#include <stdio.h>
int main() {
int x = 5;
printf("%d and ", sizeof(x++)); // note 1
printf("%d\n", x); // note 2
return 0;
}
I expect x to be 6 after executing note 1. However, the output is:
4 and 5
Can anyone explain why x does not increment after note 1?
From the C99 Standard (the emphasis is mine)
6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
sizeof is a compile-time operator, so at the time of compilation sizeof and its operand get replaced by the result value. The operand is not evaluated (except when it is a variable length array) at all; only the type of the result matters.
short func(short x) { // this function never gets called !!
printf("%d", x); // this print never happens
return x;
}
int main() {
printf("%d", sizeof(func(3))); // all that matters to sizeof is the
// return type of the function.
return 0;
}
Output:
2
as short occupies 2 bytes on my machine.
Changing the return type of the function to double:
double func(short x) {
// rest all same
will give 8 as output.
sizeof(foo) tries really hard to discover the size of an expression at compile time:
6.5.3.4:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. The result is an integer. If the type of the operand is a variable length array
type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
In short: variable length arrays, run at runtime. (Note: Variable Length Arrays are a specific feature -- not arrays allocated with malloc(3).) Otherwise, only the type of the expression is computed, and that at compile time.
sizeof is a compile-time builtin operator and is not a function. This becomes very clear in the cases you can use it without the parenthesis:
(sizeof x) //this also works
Note
This answer was merged from a duplicate, which explains the late date.
Original
Except for variable length arrays sizeof does not evaluate its arguments. We can see this from the draft C99 standard section 6.5.3.4 The sizeof operator paragraph 2 which says:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. The result is an integer. If the type of the operand is a variable length array
type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
A comment(now removed) asked whether something like this would evaluate at run-time:
sizeof( char[x++] ) ;
and indeed it would, something like this would also work (See them both live):
sizeof( char[func()] ) ;
since they are both variable length arrays. Although, I don't see much practical use in either one.
Note, variable length arrays are covered in the draft C99 standard section 6.7.5.2 Array declarators paragraph 4:
[...] If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type.
Update
In C11 the answer changes for the VLA case, in certain cases it is unspecified whether the size expression is evaluated or not. From section 6.7.6.2 Array declarators which says:
[...]Where a size expression is part of the operand of a sizeof
operator and changing the value of the size expression would not
affect the result of the operator, it is unspecified whether or not
the size expression is evaluated.
For example in a case like this (see it live):
sizeof( int (*)[x++] )
As the operand of sizeof operator is not evaluated, you can do this:
int f(); //no definition, which means we cannot call it
int main(void) {
printf("%d", sizeof(f()) ); //no linker error
return 0;
}
Online demo : http://ideone.com/S8e2Y
That is, you don't need define the function f if it is used in sizeof only. This technique is mostly used in C++ template metaprogramming, as even in C++, the operand of sizeof is not evaluated.
Why does this work? It works because the sizeof operator doesn't operate on value, instead it operates on type of the expression. So when you write sizeof(f()), it operates on the type of the expression f(), and which is nothing but the return type of the function f. The return type is always same, no matter what value the function would return if it actually executes.
In C++, you can even this:
struct A
{
A(); //no definition, which means we cannot create instance!
int f(); //no definition, which means we cannot call it
};
int main() {
std::cout << sizeof(A().f())<< std::endl;
return 0;
}
Yet it looks like, in sizeof, I'm first creating an instance of A, by writing A(), and then calling the function f on the instance, by writing A().f(), but no such thing happens.
Demo : http://ideone.com/egPMi
Here is another topic which explains some other interesting properties of sizeof:
sizeof taking two arguments
The execution cannot happen during compilation. So ++i/i++ will not happen. Also sizeof(foo()) will not execute the function but return correct type.
sizeof runs at compile-time, but x++ can only be evaluated at run-time. To solve this, the C++ standard dictates that the operand of sizeof is not evaluated. The C Standard says:
If the type of the operand [of sizeof] is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
This line here:
printf("%d and ", sizeof(x++)); // note 1
causes UB. %d Expects the type int not size_t. After you get UB the behavior is undefined including the bytes written to stdout.
If you would fix that by replacing %d with %zu or casting the value to int, but not both, you would still not increase x but that is a different problem and should be asked in a different question.
sizeof() operator gives size of the data-type only, it does not evaluate inner elements.

Is the operand of `sizeof` evaluated with a VLA?

An argument in the comments section of this answer prompted me to ask this question.
In the following code, bar points to a variable length array, so the sizeof is determined at runtime instead of compile time.
int foo = 100;
double (*bar)[foo];
The argument was about whether or not using sizeof evaluates its operand when the operand is a variable length array, making sizeof(*bar) undefined behavior when bar is not initialized.
Is it undefined behavior to use sizeof(*bar) because I'm dereferencing an uninitialized pointer? Is the operand of sizeof actually evaluated when the type is a variable length array, or does it just determine its type (how sizeof usually works)?
Edit: Everyone seems to be quoting this passage from the C11 draft. Does anyone know if this is the wording in the official standard?
Yes, this causes undefined behaviour.
In N1570 6.5.3.4/2 we have:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Now we have the question: is the type of *bar a variable length array type?
Since bar is declared as pointer to VLA, dereferencing it should yield a VLA. (But I do not see concrete text specifying whether or not it does).
Note: Further discussion could be had here, perhaps it could be argued that *bar has type double[100] which is not a VLA.
Supposing we agree that the type of *bar is actually a VLA type, then in sizeof *bar, the expression *bar is evaluated.
bar is indeterminate at this point. Now looking at 6.3.2.1/1:
if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
Since bar does not point to an object (by virtue of being indeterminate), evaluating *bar causes undefined behaviour.
Two other answers have already quoted N1570 6.5.3.4p2:
The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer. If
the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result
is an integer constant.
According to that paragraph from the standard, yes, the operand of sizeof is evaluated.
I'm going to argue that this is a defect in the standard; something is evaluated at run time, but the operand is not.
Let's consider a simpler example:
int len = 100;
double vla[len];
printf("sizeof vla = %zu\n", sizeof vla);
According to the standard, sizeof vla evaluates the expression vla. But what does that mean?
In most contexts, evaluating an array expression yields the address of the initial element -- but the sizeof operator is an explicit exception to that. We might assume that evaluating vla means accessing the values of its elements, which has undefined behavior since those elements have not been initialized. But there is no other context in which evaluation of an array expression accesses the values of its elements, and absolutely no need to do so in this case. (Correction: If a string literal is used to initialize an array object, the values of the elements are evaluated.)
When the declaration of vla is executed, the compiler will create some anonymous metadata to hold the length of the array (it has to, since assigning a new value to len after vla is defined and allocated doesn't change the length of vla). All that has to be done to determine sizeof vla is to multiply that stored value by sizeof (double) (or just to retrieve the stored value if it stores the size in bytes).
sizeof can also be applied to a parenthesized type name:
int len = 100;
printf("sizeof (double[len]) = %zu\n", sizeof (double[len]));
According to the standard, the sizeof expression evaluates the type. What does that mean? Clearly it has to evaluate the current value of len. Another example:
size_t func(void);
printf("sizeof (double[func()]) = %zu\n", sizeof (double[func()]));
Here the type name includes a function call. Evaluating the sizeof expression must call the function.
But in all of these cases, there's no actual need to evaluate the elements of the array object (if there is one), and no point in doing so.
sizeof applied to anything other than a VLA can be evaluated at compile time. The difference when sizeof is applied to a VLA (either an object or a type) is that something has to be evaluated at run time. But the thing that has to be evaluated is not the operand of sizeof; it's just whatever is needed to determine the size of the operand, which is never the operand itself.
The standard says that the operand of sizeof is evaluated if that operand is of variable length array type. That's a defect in the standard.
Getting back to the example in the question:
int foo = 100;
double (*bar)[foo] = NULL;
printf("sizeof *bar = %zu\n", sizeof *bar);
I've added an initialization to NULL to make it even clearer that dereferencing bar has undefined behavior.
*bar is of type double[foo], which is a VLA type. In principle, *bar is evaluated, which would have undefined behavior since bar is uninitialized. But again, there is no need to dereference bar. The compiler will generate some code when it processes the type double[foo], including saving the value of foo (or foo * sizeof (double)) in an anonymous variable. All it has to do to evaluate sizeof *bar is to retrieve the value of that anonymous variable. And if the standard were updated to define the semantics of sizeof consistently, it would be clear that evaluating sizeof *bar is well defined and yields 100 * sizeof (double) without having to dereference bar.
Indeed the Standard seems to imply that behaviour be undefined:
re-quoting N1570 6.5.3.4/2:
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
I think the wording from the Standard is confusing: the operand is evaluated does not mean that *bar will be evaluated. Evaluating *bar does not in any way help compute its size. sizeof(*bar) does need to be computed at run time, but the code generated for this has no need to dereference bar, it will more likely retrieve the size information from a hidden variable holding the result of the size computation at the time of bar's instantiation.

Sizeof() of pointer pointing to NULL

Following is my code:
#include <stdio.h>
struct abc
{
char a;
int b;
};
int main()
{
struct abc *abcp = NULL;
printf("%d", sizeof(*abcp)); //Prints 8
/* printf("%d",*abcp); //Causes program to hang or terminate */
return 0;
}
I understand that the size of struct is 8 due to structure padding. However, why is sizeof() of '*abcp' giving a value when 'abcp' is assigned NULL? 'abcp' when assigned NULL means that it is not pointing anywhere right? But, why I am getting an output for the above code?
sizeof is an operator, not a function.
You would be reminded of this if you dropped the pointless parentheses, and just wrote it:
printf("%zu", sizeof *abcp);
This also uses the C99-proper way to format a value of type size_t, which is %zu.
It works since the compiler computes the size at compile-time, without ever following (dereferencing) the pointer of course (since the pointer doesn't yet exist; the program isn't running).
sizeof is not a function and it doesn't evaluate its argument. Instead it deduces the type of *abcp, at compile time, and reports the size of that. Since abcp is a struct abc*, the type of *abcp is struct abc regardless of where abcp points.
From the C99 Standard
6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Given Type *var, the following expressions are equivalent:
sizeof(Type)
sizeof(*var)
sizeof(var[N]) // for any constant integer expression N
sizeof(var[n]) // for any variable integer expression n
Each one of these expressions is resolved into a constant value during compilation.
So the value of var during runtime has no effect on either one of these expressions.
sizeof simply returns the size of *abcp and this pointer has size 8 in your machine. It doesn't matter if the address stored in the pointer is valid or not (NULL is usually an invalid address).

Why sizeof(x++) does not increment the variable x value [duplicate]

Here is the code compiled in dev c++ windows:
#include <stdio.h>
int main() {
int x = 5;
printf("%d and ", sizeof(x++)); // note 1
printf("%d\n", x); // note 2
return 0;
}
I expect x to be 6 after executing note 1. However, the output is:
4 and 5
Can anyone explain why x does not increment after note 1?
From the C99 Standard (the emphasis is mine)
6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
sizeof is a compile-time operator, so at the time of compilation sizeof and its operand get replaced by the result value. The operand is not evaluated (except when it is a variable length array) at all; only the type of the result matters.
short func(short x) { // this function never gets called !!
printf("%d", x); // this print never happens
return x;
}
int main() {
printf("%d", sizeof(func(3))); // all that matters to sizeof is the
// return type of the function.
return 0;
}
Output:
2
as short occupies 2 bytes on my machine.
Changing the return type of the function to double:
double func(short x) {
// rest all same
will give 8 as output.
sizeof(foo) tries really hard to discover the size of an expression at compile time:
6.5.3.4:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. The result is an integer. If the type of the operand is a variable length array
type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
In short: variable length arrays, run at runtime. (Note: Variable Length Arrays are a specific feature -- not arrays allocated with malloc(3).) Otherwise, only the type of the expression is computed, and that at compile time.
sizeof is a compile-time builtin operator and is not a function. This becomes very clear in the cases you can use it without the parenthesis:
(sizeof x) //this also works
Note
This answer was merged from a duplicate, which explains the late date.
Original
Except for variable length arrays sizeof does not evaluate its arguments. We can see this from the draft C99 standard section 6.5.3.4 The sizeof operator paragraph 2 which says:
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. The result is an integer. If the type of the operand is a variable length array
type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
A comment(now removed) asked whether something like this would evaluate at run-time:
sizeof( char[x++] ) ;
and indeed it would, something like this would also work (See them both live):
sizeof( char[func()] ) ;
since they are both variable length arrays. Although, I don't see much practical use in either one.
Note, variable length arrays are covered in the draft C99 standard section 6.7.5.2 Array declarators paragraph 4:
[...] If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type.
Update
In C11 the answer changes for the VLA case, in certain cases it is unspecified whether the size expression is evaluated or not. From section 6.7.6.2 Array declarators which says:
[...]Where a size expression is part of the operand of a sizeof
operator and changing the value of the size expression would not
affect the result of the operator, it is unspecified whether or not
the size expression is evaluated.
For example in a case like this (see it live):
sizeof( int (*)[x++] )
As the operand of sizeof operator is not evaluated, you can do this:
int f(); //no definition, which means we cannot call it
int main(void) {
printf("%d", sizeof(f()) ); //no linker error
return 0;
}
Online demo : http://ideone.com/S8e2Y
That is, you don't need define the function f if it is used in sizeof only. This technique is mostly used in C++ template metaprogramming, as even in C++, the operand of sizeof is not evaluated.
Why does this work? It works because the sizeof operator doesn't operate on value, instead it operates on type of the expression. So when you write sizeof(f()), it operates on the type of the expression f(), and which is nothing but the return type of the function f. The return type is always same, no matter what value the function would return if it actually executes.
In C++, you can even this:
struct A
{
A(); //no definition, which means we cannot create instance!
int f(); //no definition, which means we cannot call it
};
int main() {
std::cout << sizeof(A().f())<< std::endl;
return 0;
}
Yet it looks like, in sizeof, I'm first creating an instance of A, by writing A(), and then calling the function f on the instance, by writing A().f(), but no such thing happens.
Demo : http://ideone.com/egPMi
Here is another topic which explains some other interesting properties of sizeof:
sizeof taking two arguments
The execution cannot happen during compilation. So ++i/i++ will not happen. Also sizeof(foo()) will not execute the function but return correct type.
sizeof runs at compile-time, but x++ can only be evaluated at run-time. To solve this, the C++ standard dictates that the operand of sizeof is not evaluated. The C Standard says:
If the type of the operand [of sizeof] is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
This line here:
printf("%d and ", sizeof(x++)); // note 1
causes UB. %d Expects the type int not size_t. After you get UB the behavior is undefined including the bytes written to stdout.
If you would fix that by replacing %d with %zu or casting the value to int, but not both, you would still not increase x but that is a different problem and should be asked in a different question.
sizeof() operator gives size of the data-type only, it does not evaluate inner elements.

Resources