I am tying to understand the effect of the following in C:
int func(int arg) {
if (arg == 0) {
double *d = malloc(...);
}
//...
}
My understanding is:
Regardless of the value of arg, stack space will be made for the pointer d when func is invoked
d is only initialised, i.e. malloc called, if arg == 0
d can only be accessed inside the if block; trying to access it outside will generate a compile error - even though the stack space for d is allocated regardless.
So, it is equivalent to the following except for the scoping rules that prevent access outside the if block:
int func(int arg) {
double *d;
if (arg == 0) {
d = malloc(...);
}
//...
}
Is this correct? I am compiling with icc default settings which seems to be std=gnu89.
The lifetime of the object denoted by d starts at the beginning of the block in which it is declared (which might be prior to the declaration), not necessarily at the beginning of the function. In practice, compilers may choose to allocate space for all variables at function entry; Gcc, for example, compiles both versions of func to identical assembly. With only a few automatic variables in a function, it's likely that they are all placed in registers and no stack space is used for them at all.
Initialization happens at the point where the initializer appears. All this is subject to the as-if rule (as always): In this case, Gcc doesn't generate any call to malloc when optimizing (and thereby removes the memory leak), a compiler is allowed to "know" what standard library functions do. If this wasn't a library function and the definition not known to the compiler, the call was guaranteed to occur exactly when the initializer is reached.
Using an undeclared identifier (or one that has gone out of scope) is a syntax error, and thus caught at compile-time. The lifetime of the denoted object (with automatic storage duration) ends with the enclosing block, any attempt to refer to it afterwards (through a pointer which used to point to the object) is undefined, no diagnostic required.
In the second code snippet, it's not only syntactically possible to use d after the if block, it's also defined to access the denoted object.
To illustrate the difference between the scope of an identifier and the lifetime of the denoted object, this is valid C99 (and C11) code:
void foo(void) {
int *p = 0;
again:
if(p) {
printf("%d\n", *p); /* n is not in scope here, but the object exists */
*p = 0;
}
int n = 42;
printf("%d\n", n);
if(!p) {
p = &n;
goto again;
}
}
The output is three times 42, when the initializer is reached the second time, n is re-initialized to 42 (and does not stay 0).
Such questions don't arise for C89 (where a label cannot be above a declaration); in GNU89, mixed declarations and code is allowed, though it's not clear to me from the documentation if the C99 rules of lifetime are guaranteed to be honoured.
This code is undefined (in all C standards):
void foo(void) {
int *p = 0;
for(int i=0; i<2; ++i) {
int n = 42;
if(p) { /* (*) */
printf("%d\n", *p);
}
p = &n;
}
}
In the second iteration, p refers to the n of the first iteration, after its lifetime, though both n likely reside at the same storage location, and 42 is outputted. NB, the behaviour is undefined when (*) is reached the second time, reading an invalid pointer is undefined, not only the indirection in the printf call.
Related
I'm new to C. I cannot understand the result in following code. I use goto and jump the declaration of int a[N] array and int x. Although x is not initilized to 10, I can still get access to these variables.
#include <stdlib.h>
#include <stdio.h>
#define N 4
void printArray(int a[], int length) {
for (int i = 0; i < length; i++) {
printf("%d, ", a[i]);
}
printf("\n");
}
int main(void) {
goto done;
int a[N];
int x=10;
printf("x=%d\n", x);
done:
for (int i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
result
0, 1, 2, 3
x=0
My question:
Why I can get access to these variables whose declarations have been jumped? How are variables declared in C? It seems that variable declarations are not run line by line.
Aside from "Variable Length Arrays (VLAs)", an automatic variable's "lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." (§6.2.4) The initialization (if any) occurs a bit later, when the program passes through the declaration. It's legal to jump over the declaration if the program does not depend on the initialization of the variable. But regardless of whether the initialization will eventually happen, the variable exists from the moment your program enters the block, however that is done. (Jumping into a block from outside is also legal, and may also prevent initialization. But as soon as you're in the block, the variable exists.)
If the program attempts to read the value of an uninitialised variable, it receives an indeterminate value. (Most compilers attempt to detect the possibility that this might happen, but you'll need to enable warnings in order to see the report.)
The consequence of "hoisting" the lifetime of a variable to its enclosing block is that there is a portion of the program in which the variable exists but is not visible (since its scope starts where it is defined.) If you save the address of the variable and then jump back into this region of the program, you will be able to access the value of the variable through the pointer, which shows that scope and lifetime are distinct.
If the variable is a VLA, then it's lifetime starts at the declaration and the compiler will not allow you to jump over the declaration. VLAs cannot be initialised, so you must assign a value to every element in a VLA which you intend to access. Not all compilers implement VLAs. Your example does not show a VLA, since N is a macro which expands to an integer constant.
For objective info about the C standard, see rici's answer.
But I see you are asking how this behavior is possible from a C program, and remarking that:
It seems that variable declarifications are not run line by line.
In fact, most computer languages are not run line by line. There is almost always some kind of multi-line parsing step that happens beforehand. For example, even the Bash shell language is processed multiple lines at a time. When Bash finds a while loop, it seems to do extra parsing to make sure the done command is found before it runs any of the code in the while loop. If you don't believe me, try running this Bash script:
while [ 1 ]
do
echo hi
sleep 1
# file terminates before the expected "done" command that ends the loop
Similarly, a C compiler is capable of parsing your code, checking its syntax, and checking that your program is well-formed before it even executes a single line of your code inside any of your functions. When the compiler sees that you are using a or x, it knows what those things are because you declared them above, even if the execution of the program never passed through those lines.
One way that you could imagine that a compiler might work is that (after checking the validity of your code) it moves all the local variable declarations to the top of the function. So your main function might look like this in the internal representation in the compiler:
int main(void) {
int a[N];
int x;
int i;
goto done;
x = 10;
printf("x=%d\n", x);
done:
for (i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
This could actually be a useful transformation that would help the compiler generate code for your function because it would make it easy for the compiler to allocate memory on the stack for all your local variables when the function starts.
If inside a function I have the following variables:
{
int a;
};
{
int b;
};
Will then &a be equal to &b?
The ISO-C-standard answer is that we cannot tell. Since a and b are not in the same scope, we cannot evaluate an expression which contains both &a and &b. Moreover, the following trick to get around that is undefined behavior:
int *pa, *pb;
{
int a;
pa = &a;
}
{
int b;
pb = &b;
}
// these pointer values are indeterminate; using them is
// undefined behavior.
pa == pb;
But yes, compilers can and do reduce the storage for local variables in that way. It can be important if some of those variables are large-ish arrays. Though formally undefined behavior, the pa == pb comparison will de facto work in just about any compiler, making it possible to investigate the issue, though the best way to be sure is to obtain an assembly language listing of the generated code and read that.
Suppose you have some DEBUG_PRINT macro which expands to a block of code that declares a local char buf[512] array. If that is used numerous times in a function, it would be poor to have that many repetitions of the buffer reserved in the stack frame. The same remarks apply to inline functions.
Is it allowed to take the address of an object on the right hand-side of its definition, as happens in foo() below:
typedef struct { char x[100]; } chars;
chars make(void *p) {
printf("p = %p\n", p);
chars c;
return c;
}
void foo(void) {
chars b = make(&b);
}
If it is allowed, is there any restriction on its use, e.g., is printing it OK, can I compare it to another pointer, etc?
In practice it seems to compile on the compilers I tested, with the expected behavior most of the time (but not always), but that's far from a guarantee.
To answer the question in the title, with your code sample in mind, yes it may. The C standard says as much in §6.2.4:
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address, and retains its last-stored value throughout
its lifetime.
For such an object that does not have a variable length array type,
its lifetime extends from entry into the block with which it is
associated until execution of that block ends in any way.
So yes, you may take the address of a variable from the point of declaration, because the object has the address at this point and it's in scope. A condensed example of this is the following:
void *p = &p;
It serves very little purpose, but is perfectly valid.
As for your second question, what can you do with it. I can mostly say I wouldn't use that address to access the object until initialization is complete, because the order of evaluation for expressions in initializers is left unsepcified (§6.7.9). You can easily find your foot shot off.
One place where this does come through, is when defining all sorts of tabular data structures that need to be self referential. For instance:
typedef struct tab_row {
// Useful data
struct tab_row *p_next;
} row;
row table[3] = {
[1] = { /*Data 1*/, &table[0] },
[2] = { /*Data 2*/, &table[1] },
[0] = { /*Data 0*/, &table[2] },
};
6.2.1 Scopes of identifiers
Structure, union, and enumeration tags have scope that begins just after the appearance of
the tag in a type specifier that declares the tag. Each enumeration constant has scope that
begins just after the appearance of its defining enumerator in an enumerator list. Any
other identifier has scope that begins just after the completion of its declarator.
In
chars b = make(&b);
// ^^
the declarator is b, so it is in scope in its own initializer.
6.2.4 Storage durations of objects
For such an [automatic] object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way.
So in
{ // X
chars b = make(&b);
}
the lifetime of b starts at X, so by the time the initializer executes, it is both alive and in scope.
As far as I can tell, this is effectively identical to
{
chars b;
b = make(&b);
}
There's no reason you couldn't use &b there.
The question has already been answered, but for reference, it doesn't make much sense. This is how you would write the code:
typedef struct { char x[100]; } chars;
chars make (void) {
chars c;
/* init c */
return c;
}
void foo(void) {
chars b = make();
}
Or perhaps preferably in case of an ADT or similar, return a pointer to a malloc:ed object. Passing structs by value is usually not a good idea.
This is a theoretical question, I know how to do this unambiguously, but I got curious and dug into the standard and I need a second pair of standards lawyer eyes.
Let's start with two structs and one init function:
struct foo {
int a;
};
struct bar {
struct foo *f;
};
struct bar *
init_bar(struct foo *f)
{
struct bar *b = malloc(sizeof *b);
if (!b)
return NULL;
b->f = f;
return b;
}
We now have a sloppy programmer who doesn't check return values:
void
x(void)
{
struct bar *b;
b = init_bar(&((struct foo){ .a = 42 }));
b->f->a++;
free(b);
}
From my reading of the standard there's nothing wrong here other than potentially dereferencing a NULL pointer. Modifying struct foo through the pointer in struct bar should be legal because the lifetime of the compound literal sent into init_bar is the block where it's contained, which is the whole function x.
But now we have a more careful programmer:
void
y(void)
{
struct bar *b;
if ((b = init_bar(&((struct foo){ .a = 42 }))) == NULL)
err(1, "couldn't allocate b");
b->f->a++;
free(b);
}
Code does the same thing, right? So it should work too. But more careful reading of the C11 standard is leading me to believe that this leads to undefined behavior. (emphasis in quotes mine)
6.5.2.5 Compound literals
5 The value of the compound literal is that of an unnamed object initialized by the
initializer list. If the compound literal occurs outside the body of a function, the object
has static storage duration; otherwise, it has automatic storage duration associated with
the enclosing block.
6.8.4 Selection statements
3 A selection statement is a block whose scope is a strict subset of the scope of its
enclosing block. Each associated substatement is also a block whose scope is a strict
subset of the scope of the selection statement.
Am I reading this right? Does the fact that the if is a block mean that the lifetime of the compound literal is just the if statement?
(In case anyone wonders about where this contrived example came from, in real code init_bar is actually pthread_create and the thread is joined before the function returns, but I didn't want to muddy the waters by involving threads).
The second part of the Standard you quoted (6.8.4 Selection statements) says this. In code:
{//scope 1
if( ... )//scope 2
{
}//end scope 2
}//end scope 1
Scope 2 is entirely inside scope 1. Note that a selection statement in this case is the entire if statement, not just the brackets:
if( ... ){ ... }
Anything defined in that statement is in scope 2. Therefore, as shown in your third example, the lifetime of the compound literal, which is declared in scope 2, ends at the closing if bracket (end scope 2), so that example will cause undefined behavior if the function returns non-NULL (or NULL if err() doesn't terminate the program).
(Note that I used brackets in the if statement, even though the third example doesn't use them. That part of the example is equivalent to this (6.8.2 Compound statement):
if ((b = init_bar(&((struct foo){ .a = 42 }))) == NULL)
{
err(1, "couldn't allocate b");
}
Does a static variable really exist for the whole program execution?
I know there is no sense in this code snippet, but I'm asking myself, as I understood the c99 standard,
when I'm getting into the scope of the If statement, It means I never was dereferencing an Object out of its lifetime.
because luckily I was dereferencing the address where the static Object will be/is. So won't this be an undefined behavior as far the if statment is true?
Or does the life time of an static object just begin on its first appearence?
#define UTHOPICALMATCH (int *) 0xBCAA1400
int *foo (void);
int main(int argc, char** argv)
{
int * iPtr = UTHOPICALMATCH;
*iPtr = 5;
if (foo() == UTHOPICALMATCH)
{
printf ("It's still defined behavior!!!\r\n"); // is this true?
/*...*/
return 0;
}
return -1;
}
int *foo (void)
{
static int si;
return &si;
}
EDIT:
In c99 on 6.2.4->3 its said:
An object whose identifier is declared with external or internal linkage, or with the
storage-class specifier static has static storage duration. Its lifetime is the entire
execution of the program and its stored value is initialized only once, prior to program
startup.
So I'm not asking for, its life time after calling foo(), I'm asking my self, does this mean it's valid even before foo() is called?
I am really confused as to what you are asking.
static int * siPtr;
return siPtr;
this means that since siPtr is static, it's initialized to NULL. And because you never modify it, it remains NULL throughout the lifetime of the program. (Yes, it does exist even after foo() returned.)
int * iPtr = UTHOPICALMATCH;
*iPtr = 5;
I don't see what you are trying to do here. UTHOPICALMATCH seems a random hard-coded address, are you sure it's valid?
if (foo() == UTHOPICALMATCH)
printf ("It's still defined behavior!!!\r\n"); // is this true?
It only is if UTHOPICALMATCH is a valid pointer, because then you are just comparing two pointers for equality. Otherwise the behavior is undefined, but that fact has nothing to do with siPtr being static.