[https://i.stack.imgur.com/kfU6n.png][1]
#include<stdio.h>
int main()
{
int x = 23;
int y = 24;
int z = 25;
int asar= 26;
int a[5];
int p = 15;
int q = 16;
int r = 17;
a[11]=56;
a[10]=57;
a[9]=58;
a[8]=59;
a[7]=60;
a[6]=61;
a[5]=62;
printf("\t x=%d, y=%d, z=%d,asar=%d, p=%d, q=%d, r=%d \n", x, y, z, asar, p, q, r);
return 0;
}
I have tried to cross the bound of array and which causes undefined behavior but here I found that all the values out of the bound of array are copied in a sequence that
value of highest index
11 got copied in x(first declaration)
10 got copied in y(second declaration)
9 got copied in z(third declaration)
8 got copied in asar(fourth declaration)
7 got copied in p(fifth declaration)
6 got copied in q(sixth declaration)
5 got copied in r(seventh declaration)
There are altogether 7 other variables other than a and I have crossed limit of a exactly by 7 as such (4+7=11) and I got the output as:
x=56, y=57, z=58,asar=59, p=60, q=61, r=62 is there any logic behind this or?
Don't be amazed why I considered the memory allocation in stack because there are 7 variables excluding a and exceeded 7 values are copied one after the other. At least it is true in every case for me when number of extra variables is equal to bound exceed.
Is there any logical explanation regarding this or the question is worthless?
While auto variables are almost always stored on a stack, this isn't enforced by the C standard.
For such an object [with automatic storage duration] that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
[6.2.4 Storage durations of objects, C11]
The data structure which best fit the description is indeed a stack but it is not defined in the standard, and there are no specific order to store the variables. CPU usually require data to be aligned a certain way to access them quickly, and the compiler may (and will) reorder them to reduce unused addresses. Adding a char somewhere in the code may change the internal order of the variables.
Moreover, trying to access objects from other objects addresses is undefined behaviour. Even if you get "lucky" and get it to work the way you want, it may break later for some seemingly unrelated reasons. For example, by changing the optimization level, the compiler may delete or merge certain variables, or place them in registers that you cannot access. Exploiting undefined behaviour is never a good idea.
If you need to ensure a specific order for your data, you can use a struct, whose order is guaranteed.
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[6.7.2.1 Structure and union specifiers, C99]
Accessing members through the struct object pointer is valid, but the compiler can insert padding in-between members. The offsetof() macro is necessary to correctly access them. (Related : http://www.catb.org/esr/structure-packing/)
Related
If I define a variable anywhere in a function (not at the beginning), when the program is compiled and executed to this function, will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?
If it runs in order, Will it reduce some overhead when problems arise?
like so:
if(!(Size && Packs))
{
ret = false;
return SendAck(ret);
}
uint8_t *pSrc = (uint8_t *)pRcv->data;
uint8_t crc = 0;
The Standard C Computing Model
The C standard specifies memory reservation requirements in terms of storage duration and lifetime. Objects declared inside functions without extern, static, or _Thread_local have automatic storage duration. (Other storage durations, not discussed here, are static, thread, allocated, and temporary.) This includes parameters of functions, since they are declared inside the function declaration.
Each declaration inside a function has an associated block. Blocks are groups of statements (sometimes just a single statement). A compound statement bracketed with { and } is a block, each selection statement (if, switch) and each loop statement (for, while, do) is a block, and each substatement of those statements is a block. The block associated with a declaration inside a function is the innermost block it is in. For a function parameter, its associated block is the compound statement that defines the function.
For an automatic object that is not a variable length array, its lifetime starts when execution enters the block it is in, and it ends when execution of the block ends. (Calling a function suspends execution of the block b ut does not end it.) So in:
{
Label:
foo();
int A;
}
A exists as soon as execution reaches Label, because execution of A’s block has started.
This means that, as soon as the block is entered, all automatic objects in it other than variable length arrays should have memory reserved for them.
This fact is generally of little use, as there is no way to refer to A at Label. However, if we do this:
{
int i = 0;
int *p;
Label:
if (0 < i)
*p += foo();
int A = 0;
p = &A;
if (++i < 3)
goto Label;
bar(A);
}
then we can use A at Label after the first iteration because p points to it. We could imagine motivation for code like this could arise in a loop that needs to treat its first iteration specially. However, I have never seen it used in practice.
For an automatic object that is a variable length array, its lifetime starts when execution reaches its declaration and ends when execution leaves the scope of the declaration. So with this code:
int N = baz();
{
int i = 0;
Label:
foo();
int A[N];
if (++i < 3)
goto Label;
}
A does not exist at Label. Its lifetime begins each time execution reaches the declaration int A[N]; and ends, in the first few iterations, when the goto Label; transfers execution out of the scope of the declaration or, in the last iteration, when execution of the block ends.
Practical Implementation
In general-purpose C implementations, automatic objects are implemented with a hardware stack. A region of memory is set aside to be used as a stack, and a particular processor register, call the stack pointer, keeps track of how much is in use, by recording the address of the current “top” of stack. (For historic reasons, stacks usually start at high addresses and grow toward lower addresses, so the logical top of a stack is at the bottom of its used addresses.)
Because the lifetimes of automatic objects other than variable length arrays start when execution of their associate blocks begins, a compiler could implement this by adjusting the stack pointer whenever entering or ending a block. This can provide memory efficiency in at least two ways. Consider this code:
if (foo(0))
{
int A[100];
bar(A, x, 0);
}
if (foo(1))
{
int B[100];
bar(B, x, 1);
}
if (foo(2))
{
int C[1000];
bar(C, x, 2);
}
Because A and B do not exist at the same time, the compiler does not have to reserve memory for both of them when the function starts. It can adjust the stack pointer when each block is entered and ended. And for the large array C, the space might never be reserved at all; a compiler could choose to allocate space for 1000 int only if the block is actually entered.
However, I do not think GCC and Clang are taking advantage of this. In practice, I think they generally figure out the maximum space will be needed at any one time in the function and allocate that much space on the stack and use it through the function. This does include optimizations like using the same space for A and B, since they are never in use at the same time, but it does not include optimizating for the possibility that C is never used. However, I could be wrong; I have not checked on this compiler behavior lately.
In the cases of variable length arrays, the compiler generally cannot plan the memory use in advance, since it does not know the array size. So, for a variable length array, space has to be reserved for it on the stack when its declaration is reached.
Additionally, note that the compiler does not have to implement the computing model the C standard uses literally. It can make any optimizations that get the same observable behavior. This means that, for example, if it can tell part of an array is not used, it does not have to allocate memory for that at all. This means the answer to your question, “… will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?”, is that a compiler designer may choose either method, as long as the observable behavior of the resulting program matches what the C standard specifies.
The observable behavior includes data written to files, input/output interactions, and accesses to volatile objects.
Let's assume that I have a for loop, and a very large struct as a stack variable:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
…code using structVar…
}
Will every compiler actually zero out the struct at the start of every loop? Or do I need to use memset to zero it out?
This is a very large struct and I want to allocate it on the stack, and I need to make sure every member of it is zeroed out at the start of every iteration. So do I need to use memset?
I can manually inspect the executable that I compile, but I need to make sure if there is any standard for this, or it just depends on the compiler.
Note that this code does compile. I am using Visual Studio.
Will every compiler actually zero out the struct at the start of every loop?
Any compiler that conforms to the C Standard will do this. From this Draft C11 Standard (bold emphasis mine):
6.8 Statements and blocks
…
3 A block allows a set of declarations and statements to be grouped into
one syntactic unit. The initializers of objects that have automatic
storage duration, and the variable length array declarators of
ordinary identifiers with block scope, are evaluated and the values
are stored in the objects (including storing an indeterminate value in
objects without an initializer) each time the declaration is reached
in the order of execution, as if it were a statement, and within each
declaration in the order that declarators appear.
In the case of a for or while loop, a declaration/initializer inside the loop's scope block is reached repeatedly on each and every iteration of the loop.
See 6.2.4, paragraph 6:
If an initialization is specified for the object, it is performed each
time the declaration or compound literal is reached in the execution
of the block; otherwise, the value becomes indeterminate each time the
declaration is reached
Will every compiler actually zero out the struct at the start of every loop?
Yes, or it will produce machine code with equivalent functionality ("observable behavior") as if you had performed a zero-out.
As long as you initialize one single member in the struct, then the rest of them will get set to zero/null ("as if they had static storage duration"). Similarly, any padding bytes added to the struct by the compiler will get set to zero. This is guaranteed by the C standard ISO:9899:2018 6.7.9 §10, §19 and §21.
Generally, the place where the zero-out actually occurs in the resulting executable depends on how the data is used. If you for example zero the struct at the beginning of the loop body, then write to various members and print it all in the end of the loop body, the compiler don't have many other choices but to zero-out everything at each lap of the loop. Example:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
...
structVar.foo = a;
structVar.bar = b;
printf("%d %d\n", structVar.foo, structVar.bar);
}
On the other hand, the compiler might in this case be smart enough to realize that the struct is just a pointless middle man and replace this all with the equivalent printf("%d %d\n", a, b);, meaning that the struct would be removed entirely from the machine code.
Overall, discussing optimizations like this can't be done without a specific use-case, compiler and target system.
Or do I need to use memset to zero it out?
No. MY_STRUCT structVar = {0}; is functionally 100% equivalent of memset(structVar, 0, sizeof structVar);.
This is a very large struct and I want to allocate it on the stack
That's a different matter than initialization. It is indeed unwise to allocate large objects on the stack. In that case consider replacing it with for example this:
MY_STRUCT* structVar = malloc(sizeof *structVar);
for (int x=0 ; x <10; x++)
{
memset(structVar, 0, sizeof *structVar);
...
}
free(structVar);
For the variables (of automatic storage class) defined within the body of a loop, the variables are (notionally) recreated for each iteration of the loop and have indeterminate values for each iteration if not initialized.
Regarding use of the {0} initializer for an object with automatic storage duration, note the following:
As per 6.7.9/19 and 6.7.9/21, elements of the object that have no explicit initializer will be initialized implicitly the same as object of static storage duration. As per 6.7.9/10, those elements will be initialized to value zero or a null pointer as appropriate, and any padding will be initialized to zero bits.
Using memset(ptr, 0, size) sets size bytes from address ptr onwards to 0, but note the following:
For some unusual execution environments, a pointer object with all bytes zero might not represent a null pointer value (so might not compare equal to 0).
For some unusual execution environments, a floating point object with all bytes zero might not represent a valid floating point value or might not compare equal to 0.0.
In summary, using the {0} initializer is the most portable way to set all elements of the object to compare equal to 0, and to set all padding bits or bytes to 0. Using memset instead is generally OK except for some weird execution environments.
C code :
int a;
printf("\n\t %d",a); // It'll print some garbage value;
So how does these garbage values are assigned to uninitialized variables behind the curtains in C?
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'? or something else?
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'?
Exactly!
Basically, C doesn't do anything you don't tell it to. That's both its strength and its weakness.
Does it mean C first allocates memory to variable 'a' and then what
ever there is at that memory location becomes value of 'a'? or
something else?
Correct. It is worth mentioning that the "allocation" of automatic variables such as int a is virtually nonexistent, since those variables are stored on the stack or in a CPU register. For variables stored on the stack, "allocation" is performed when the function is called, and boils down to an instruction that moves the stack pointer by a fixed offset calculated at compile time (the combined storage of all local variables used by the function, rounded to proper alignment).
The initial value of variables assigned to CPU registers is the previous contents of the register. Because of this difference (register vs. memory) it sometimes happens that programs that worked correctly when compiled without optimization start breaking when compiled with optimization turned on. The uninitialized variables, previously pointing to the location that happened to be zero-initialized, now contain values from previous uses of the same register.
Initially memory is having some values, those are unknown values, also called garbage values,
when ever we declare a variable some memory was reserved for the variable according to datatype we specified while declaring, so the memory initial value is unknown value, if we initialize some other value then our value will be in that memory location.
int a;
While declaring a variable, the memory is allocated. But this variable is not assigned which means the variable a is not initialized. If this variable a is only declared but no longer used in the program is called garbage value.
For example:
int a, b;
b=10;
printf("%d",b);
return 0;
Here it's only declared but no longer assigned or initialized. So this is called garbage value.
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'?
No, it does not mean that.
When an object is not initialized, the C standard does not provide any plan for how its value is determined. Not only that, the program is not required to behave as if the object has any fixed value. It can vary as if the memory reserved for it were not held in any fixed state, just fluctuating.
Here are the specific C 2018 rules about that:
The so-called “value” of an uninitialized object is indeterminate, per 6.2.4 6 for objects with automatic storage duration without variable length array type, 6.2.4 7 for those with variable length array type, 7.22.3.4 2 for objects allocated with malloc, 7.22.3.5 2 for additional space allocated by realloc, and 7.22.3.1 2 for aligned_alloc. Other objects, such as those with static storage duration, are initialized.
Per 3.19.2, an indeterminate value is “either an unspecified value or a trap representation”.
Per 3.19.3, an unspecified value is a “valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance”.
This means that in each instance an object is used, the C standard does not impose any requirements on which value is used for it. It is not required to be the same as a previous use. The program may behave as if it does not hold any fixed value. When it is used multiple times, the program may act as if it has a different value each time. For example, the C standard would allow printf("%d %d %d\n", a, a, a); to print “34 -10200773 2147483204”.
A way this can happen is that, while attempting to compile the code int a; printf("%d %d %d\n", a, a, a);, the compiler has nowhere to get a from, because it has never been given any fixed value. So, instead of generating useless instructions to move data from uninitialized memory to where the arguments are passed, the compiler generates nothing. Then printf gets called, and the registers or stack locations where the arguments are passed contain whatever data they had from earlier. And that may well be three different values, which printf prints. So it looks to an observer of the output as if a had three different values in printf("%d %d %d\n", a, a, a);.
(In addition, using the value of an uninitialized object with automatic storage duration that has not had its address taken is explicitly undefined behavior, because 6.3.2.1 2, about converting an object to its value, says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.” So, when there is an object meeting these criteria, using its value can break the program completely; it might not only have different values at different times, but the program might abort, go down a branch different from what you expected, not call printf when there is a printf in the source code, and so on.)
Which I have investigated years back is this:
The answer is that garbage value is the leftover of previous program.
So, when you run any program, that uses variables to store values and when it ends OS only release the memory and make it available for other programs but OS does not flush the data in those locations automatically.
So, when you declare a uninitialized variable OS assigns any available memory to that variable, and because you have not assigned any value, the leftover value on that location is not over written and becomes the GARBAGE VALUE for that variable because it is not relevant.
And this can also be proven with a small program by accessing the location of variable in previous program using pointers.
As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.
When declaring an array in C like this:
int array[10];
What is the initial value of the integers?? I'm getting different results with different compilers and I want to know if it has something to do with the compiler, or the OS.
If the array is declared in a function, then the value is undefined. int x[10]; in a function means: take the ownership of 10-int-size area of memory without doing any initialization. If the array is declared as a global one or as static in a function, then all elements are initialized to zero if they aren't initialized already.
As set by the standard, all global and function static variables automatically initialised to 0. Automatic variables are not initialised.
int a[10]; // global - all elements are initialised to 0
void foo(void) {
int b[10]; // automatic storage - contain junk
static int c[10]; // static - initialised to 0
}
However it is a good practice to always manually initialise function variable, regardless of its storage class. To set all array elements to 0 you just need to assign first array item to 0 - omitted elements will set to 0 automatically:
int b[10] = {0};
Why are function locals (auto storage class) not initialized when everything else is?
C is close to the hardware; that's its greatest strength and its biggest danger. The reason auto storage class objects have random initial values is because they are allocated on the stack, and a design decision was made not to automatically clear these (partly because they would need to be cleared on every function call).
On the other hand, the non-auto objects only have to be cleared once. Plus, the OS has to clear allocated pages for security reasons anyway. So the design decision here was to specify zero initialization. Why isn't security an issue with the stack, too? Actually it is cleared, at first. The junk you see is from earlier instances of your own program's call frames and the library code they called.
The end result is fast, memory-efficient code. All the advantages of assembly with none of the pain. Before dmr invented C, "HLL"s like Basic and entire OS kernels were really, literally, implemented as giant assembler programs. (With certain exceptions at places like IBM.)
According to the C standard, 6.7.8 (note 10):
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate.
So it depends on the compiler. With MSVC, debug builds will initialize automatic variables with 0xcc, whereas non-debug builds will not initialize those variables at all.
A C variable declaration just tells the compiler to set aside and name an area of memory for you. For automatic variables, also known as stack variables, the values in that memory are not changed from what they were before. Global and static variables are set to zero when the program starts.
Some compilers in unoptimized debug mode set automatic variables to zero. However, it has become common in newer compilers to set the values to a known bad value so that the programmer does not unknowingly write code that depends on a zero being set.
In order to ask the compiler to set an array to zero for you, you can write it as:
int array[10] = {0};
Better yet is to set the array with the values it should have. That is more efficient and avoids writing into the array twice.
In most latest compilers(eg. gcc/vc++), partially initialized local array/structure members are default initialized to zero(int), NULL(char/char string), 0.000000(float/double).
Apart from local array/structure data as above, static(global/local) and global space members are also maintain the same property.
int a[5] = {0,1,2};
printf("%d %d %d\n",*a, *(a+2), *(a+4));
struct s1
{
int i1;
int i2;
int i3;
char c;
char str[5];
};
struct s1 s11 = {1};
printf("%d %d %d %c %s\n",s11.i1,s11.i2, s11.i3, s11.c, s11.str);
if(!s11.c)
printf("s11.c is null\n");
if(!*(s11.str))
printf("s11.str is null\n");
In gcc/vc++, output should be:
0 2 0
1 0 0 0.000000
s11.c is null
s11.str is null
Text from http://www.cplusplus.com/doc/tutorial/arrays/
SUMMARY:
Initializing arrays. When declaring a
regular array of local scope (within a
function, for example), if we do not
specify otherwise, its elements will
not be initialized to any value by
default, so their content will be
undetermined until we store some value
in them. The elements of global and
static arrays, on the other hand, are
automatically initialized with their
default values, which for all
fundamental types this means they are
filled with zeros.
In both cases, local and global, when
we declare an array, we have the
possibility to assign initial values
to each one of its elements by
enclosing the values in braces { }.
For example:
int billy [5] = { 16, 2, 77, 40, 12071 };
The relevant sections from the C standard (emphasis mine):
5.1.2 Execution environments
All objects with static storage duration shall be initialized (set to their initial values) before program startup.
6.2.4 Storage durations of objects
An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration.
6.2.5 Types
Array and structure types are collectively called aggregate types.
6.7.8 Initialization
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
It depends from the location of your array.
if it is global/static array it will be part of bss section which means it will be zero initialized at run time by C copy routine.
If it is local array inside a function, then it will be located within the stack and initial value is not known.
if array is declared inside a function then it has undefined value but if the array declared as global one or it is static inside the function then the array has default value of 0.