When declaring an array in C like this:
int array[10];
What is the initial value of the integers?? I'm getting different results with different compilers and I want to know if it has something to do with the compiler, or the OS.
If the array is declared in a function, then the value is undefined. int x[10]; in a function means: take the ownership of 10-int-size area of memory without doing any initialization. If the array is declared as a global one or as static in a function, then all elements are initialized to zero if they aren't initialized already.
As set by the standard, all global and function static variables automatically initialised to 0. Automatic variables are not initialised.
int a[10]; // global - all elements are initialised to 0
void foo(void) {
int b[10]; // automatic storage - contain junk
static int c[10]; // static - initialised to 0
}
However it is a good practice to always manually initialise function variable, regardless of its storage class. To set all array elements to 0 you just need to assign first array item to 0 - omitted elements will set to 0 automatically:
int b[10] = {0};
Why are function locals (auto storage class) not initialized when everything else is?
C is close to the hardware; that's its greatest strength and its biggest danger. The reason auto storage class objects have random initial values is because they are allocated on the stack, and a design decision was made not to automatically clear these (partly because they would need to be cleared on every function call).
On the other hand, the non-auto objects only have to be cleared once. Plus, the OS has to clear allocated pages for security reasons anyway. So the design decision here was to specify zero initialization. Why isn't security an issue with the stack, too? Actually it is cleared, at first. The junk you see is from earlier instances of your own program's call frames and the library code they called.
The end result is fast, memory-efficient code. All the advantages of assembly with none of the pain. Before dmr invented C, "HLL"s like Basic and entire OS kernels were really, literally, implemented as giant assembler programs. (With certain exceptions at places like IBM.)
According to the C standard, 6.7.8 (note 10):
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate.
So it depends on the compiler. With MSVC, debug builds will initialize automatic variables with 0xcc, whereas non-debug builds will not initialize those variables at all.
A C variable declaration just tells the compiler to set aside and name an area of memory for you. For automatic variables, also known as stack variables, the values in that memory are not changed from what they were before. Global and static variables are set to zero when the program starts.
Some compilers in unoptimized debug mode set automatic variables to zero. However, it has become common in newer compilers to set the values to a known bad value so that the programmer does not unknowingly write code that depends on a zero being set.
In order to ask the compiler to set an array to zero for you, you can write it as:
int array[10] = {0};
Better yet is to set the array with the values it should have. That is more efficient and avoids writing into the array twice.
In most latest compilers(eg. gcc/vc++), partially initialized local array/structure members are default initialized to zero(int), NULL(char/char string), 0.000000(float/double).
Apart from local array/structure data as above, static(global/local) and global space members are also maintain the same property.
int a[5] = {0,1,2};
printf("%d %d %d\n",*a, *(a+2), *(a+4));
struct s1
{
int i1;
int i2;
int i3;
char c;
char str[5];
};
struct s1 s11 = {1};
printf("%d %d %d %c %s\n",s11.i1,s11.i2, s11.i3, s11.c, s11.str);
if(!s11.c)
printf("s11.c is null\n");
if(!*(s11.str))
printf("s11.str is null\n");
In gcc/vc++, output should be:
0 2 0
1 0 0 0.000000
s11.c is null
s11.str is null
Text from http://www.cplusplus.com/doc/tutorial/arrays/
SUMMARY:
Initializing arrays. When declaring a
regular array of local scope (within a
function, for example), if we do not
specify otherwise, its elements will
not be initialized to any value by
default, so their content will be
undetermined until we store some value
in them. The elements of global and
static arrays, on the other hand, are
automatically initialized with their
default values, which for all
fundamental types this means they are
filled with zeros.
In both cases, local and global, when
we declare an array, we have the
possibility to assign initial values
to each one of its elements by
enclosing the values in braces { }.
For example:
int billy [5] = { 16, 2, 77, 40, 12071 };
The relevant sections from the C standard (emphasis mine):
5.1.2 Execution environments
All objects with static storage duration shall be initialized (set to their initial values) before program startup.
6.2.4 Storage durations of objects
An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration.
6.2.5 Types
Array and structure types are collectively called aggregate types.
6.7.8 Initialization
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
It depends from the location of your array.
if it is global/static array it will be part of bss section which means it will be zero initialized at run time by C copy routine.
If it is local array inside a function, then it will be located within the stack and initial value is not known.
if array is declared inside a function then it has undefined value but if the array declared as global one or it is static inside the function then the array has default value of 0.
Related
[https://i.stack.imgur.com/kfU6n.png][1]
#include<stdio.h>
int main()
{
int x = 23;
int y = 24;
int z = 25;
int asar= 26;
int a[5];
int p = 15;
int q = 16;
int r = 17;
a[11]=56;
a[10]=57;
a[9]=58;
a[8]=59;
a[7]=60;
a[6]=61;
a[5]=62;
printf("\t x=%d, y=%d, z=%d,asar=%d, p=%d, q=%d, r=%d \n", x, y, z, asar, p, q, r);
return 0;
}
I have tried to cross the bound of array and which causes undefined behavior but here I found that all the values out of the bound of array are copied in a sequence that
value of highest index
11 got copied in x(first declaration)
10 got copied in y(second declaration)
9 got copied in z(third declaration)
8 got copied in asar(fourth declaration)
7 got copied in p(fifth declaration)
6 got copied in q(sixth declaration)
5 got copied in r(seventh declaration)
There are altogether 7 other variables other than a and I have crossed limit of a exactly by 7 as such (4+7=11) and I got the output as:
x=56, y=57, z=58,asar=59, p=60, q=61, r=62 is there any logic behind this or?
Don't be amazed why I considered the memory allocation in stack because there are 7 variables excluding a and exceeded 7 values are copied one after the other. At least it is true in every case for me when number of extra variables is equal to bound exceed.
Is there any logical explanation regarding this or the question is worthless?
While auto variables are almost always stored on a stack, this isn't enforced by the C standard.
For such an object [with automatic storage duration] that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
[6.2.4 Storage durations of objects, C11]
The data structure which best fit the description is indeed a stack but it is not defined in the standard, and there are no specific order to store the variables. CPU usually require data to be aligned a certain way to access them quickly, and the compiler may (and will) reorder them to reduce unused addresses. Adding a char somewhere in the code may change the internal order of the variables.
Moreover, trying to access objects from other objects addresses is undefined behaviour. Even if you get "lucky" and get it to work the way you want, it may break later for some seemingly unrelated reasons. For example, by changing the optimization level, the compiler may delete or merge certain variables, or place them in registers that you cannot access. Exploiting undefined behaviour is never a good idea.
If you need to ensure a specific order for your data, you can use a struct, whose order is guaranteed.
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[6.7.2.1 Structure and union specifiers, C99]
Accessing members through the struct object pointer is valid, but the compiler can insert padding in-between members. The offsetof() macro is necessary to correctly access them. (Related : http://www.catb.org/esr/structure-packing/)
Let's assume that I have a for loop, and a very large struct as a stack variable:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
…code using structVar…
}
Will every compiler actually zero out the struct at the start of every loop? Or do I need to use memset to zero it out?
This is a very large struct and I want to allocate it on the stack, and I need to make sure every member of it is zeroed out at the start of every iteration. So do I need to use memset?
I can manually inspect the executable that I compile, but I need to make sure if there is any standard for this, or it just depends on the compiler.
Note that this code does compile. I am using Visual Studio.
Will every compiler actually zero out the struct at the start of every loop?
Any compiler that conforms to the C Standard will do this. From this Draft C11 Standard (bold emphasis mine):
6.8 Statements and blocks
…
3 A block allows a set of declarations and statements to be grouped into
one syntactic unit. The initializers of objects that have automatic
storage duration, and the variable length array declarators of
ordinary identifiers with block scope, are evaluated and the values
are stored in the objects (including storing an indeterminate value in
objects without an initializer) each time the declaration is reached
in the order of execution, as if it were a statement, and within each
declaration in the order that declarators appear.
In the case of a for or while loop, a declaration/initializer inside the loop's scope block is reached repeatedly on each and every iteration of the loop.
See 6.2.4, paragraph 6:
If an initialization is specified for the object, it is performed each
time the declaration or compound literal is reached in the execution
of the block; otherwise, the value becomes indeterminate each time the
declaration is reached
Will every compiler actually zero out the struct at the start of every loop?
Yes, or it will produce machine code with equivalent functionality ("observable behavior") as if you had performed a zero-out.
As long as you initialize one single member in the struct, then the rest of them will get set to zero/null ("as if they had static storage duration"). Similarly, any padding bytes added to the struct by the compiler will get set to zero. This is guaranteed by the C standard ISO:9899:2018 6.7.9 §10, §19 and §21.
Generally, the place where the zero-out actually occurs in the resulting executable depends on how the data is used. If you for example zero the struct at the beginning of the loop body, then write to various members and print it all in the end of the loop body, the compiler don't have many other choices but to zero-out everything at each lap of the loop. Example:
for (int x=0 ; x <10; x++)
{
MY_STRUCT structVar = {0};
...
structVar.foo = a;
structVar.bar = b;
printf("%d %d\n", structVar.foo, structVar.bar);
}
On the other hand, the compiler might in this case be smart enough to realize that the struct is just a pointless middle man and replace this all with the equivalent printf("%d %d\n", a, b);, meaning that the struct would be removed entirely from the machine code.
Overall, discussing optimizations like this can't be done without a specific use-case, compiler and target system.
Or do I need to use memset to zero it out?
No. MY_STRUCT structVar = {0}; is functionally 100% equivalent of memset(structVar, 0, sizeof structVar);.
This is a very large struct and I want to allocate it on the stack
That's a different matter than initialization. It is indeed unwise to allocate large objects on the stack. In that case consider replacing it with for example this:
MY_STRUCT* structVar = malloc(sizeof *structVar);
for (int x=0 ; x <10; x++)
{
memset(structVar, 0, sizeof *structVar);
...
}
free(structVar);
For the variables (of automatic storage class) defined within the body of a loop, the variables are (notionally) recreated for each iteration of the loop and have indeterminate values for each iteration if not initialized.
Regarding use of the {0} initializer for an object with automatic storage duration, note the following:
As per 6.7.9/19 and 6.7.9/21, elements of the object that have no explicit initializer will be initialized implicitly the same as object of static storage duration. As per 6.7.9/10, those elements will be initialized to value zero or a null pointer as appropriate, and any padding will be initialized to zero bits.
Using memset(ptr, 0, size) sets size bytes from address ptr onwards to 0, but note the following:
For some unusual execution environments, a pointer object with all bytes zero might not represent a null pointer value (so might not compare equal to 0).
For some unusual execution environments, a floating point object with all bytes zero might not represent a valid floating point value or might not compare equal to 0.0.
In summary, using the {0} initializer is the most portable way to set all elements of the object to compare equal to 0, and to set all padding bits or bytes to 0. Using memset instead is generally OK except for some weird execution environments.
Why is the value of
int array[10];
undefined when declared in a function and is 0-initialized when declared as static?
I have been reading the answer of this question and it is clear that
[the expression int array[10];] in a function means: take the ownership of 10-int-size area of memory without doing any initialization. If the array is declared as a global one or as static in a function, then all elements are initialized to zero if they aren't initialized already.
Question: why this behaviour? Do the compiler programmers decide that (for a particular reason)? Can a particular compiler used do the things differently?
Why I am asking this: I am asking this question because I would like to make my code portable among architectures/compilers. In order to ensure it, I know I can always initialize the declared array. But this means that I will lose precious time only for this operation. So, which is the right decision?
An automatic int array[10]; isn't implicitly zeroed because the zeroing takes time and you might not need it zeroed. Additionally, you'd pay the cost not just once but each time control ran past the initialized variable.
A static/global int array[10]; is implicitly zeroed because statics/globals are allocated at load time. The memory will be fresh from the OS and if the OS is security conscious at all, the memory will have been zeroed already. Otherwise the loading code (the OS or a dynamic linker) will have to zero them (because the C standard requires it), but it should be able to do it in one call to memset for all globals/statics, which is considerably more efficient than zeroing each static/global variable at a time.
This initialization is done once. Even statics inside of functions are initialized just once, even if they have nonzero initializers (e.g., static int x = 42;. This is why C requires that the initializer of a static be a constant expression).
Since the loadtime zeroing of all globals/statics is either OS-guaranteed or efficiently implementable, it might as well be standard-guaranteed and thereby make programmers' lives easier.
The values are not undefined but indeterminate, and it behaves this way because the standard says so.
Section 6.7.9p10 of the C standard regarding initialization states:
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate. If an
object that has static or thread storage duration is not
initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules,and any padding is initialized to zero bits;
if it is a union, the first named member is initialized (recursively) according to theserules, and any padding is initialized
to zero bits;
So for any variable defined either at file scope or static you can safely assume the values are zero-initialized. For variables declared inside of a function or scope, you cannot make any assumptions about uninitialized variables.
As for why, global/static variables are initialized at program startup or even at compile time, while locals have to be initialized each time they come into scope and doing so would take time.
The reason for not defining the initial value of the variables in stack-allocated/local variables is efficiency. The C Standard expects your program to allocate your array and later fill it:
int array[10];
for (i = 0; i < 10; ++i)
array[i] = i * 42;
In this case, any initialization would be pointless, so the C Standard wants to avoid it.
If your program needs these values initialized to zero, you can do it explicitly:
int array[10] = {0}; // initialize to zero so the accumulation below works
while (condition)
{
... // some code
for (i = 0; i < 10; ++i)
array[i] += other_array[i];
}
It is your decision whether to initialize or not, because you are supposed to know how your program behaves. This decision will be different for different arrays.
However, this decision will not depend on a compiler - they are all standard-compliant. One little detail regarding portability - if you don't initialize your array and still see all zeros in it when you use a particular compiler - don't be fooled; the values are still undefined; you cannot rely on them being 0.
Some other languages decided that zero initialization is cheap enough to do even if it's superfluous, and its advantage (safety) outweighs its disadvantage (performance). In C, performance is more important, so it decided otherwise.
The C philosophy is to a) always trust the programmer and b) prioritize execution speed over programmer convenience. C assumes that the programmer is in the best position to know whether an array (or any other auto variable) needs to be initialized to a specific value, and if so, is smart enough to write the code to do it themselves. Otherwise it won't waste the CPU cycles.
Same thing for bounds checking on array accesses, same thing for NULL checks on pointer dereferences, etc.
This is simultaneously C's greatest strength (fast code with a small footprint) and greatest weakness (lots of manual labor to make code safe and secure).
If I have a function that is called A LOT of times, and that function needs an array of 16 pointers that will be updated with new pointers every time it's called, is this the right way to declare this array?
char** readUserInput() {
static char* cmds[16];
...
}
Will this array be initialized once?
Yes, static variables are only initialized once. By declaring the variable cmds you are declaring an array of 16 char*s. The array is initialized with zeroes. The array will never be initialized again.
Take this code as an example:
int i;
static char *cmds[5];
for (i = 0;i<5;++i) {
printf("%d ", cmds[i]);
}
It prints:
0 0 0 0 0
It is declared only once regardless of whether it houses the static storage specifier. Don't confuse declare with lifetime.
If the real question is "will there be only one instance of cmds, and will its content persist between calls?", then yes. It is declared with the static storage class specifier. Per §6.2.4.3 of the C11 standard
... Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup."
Static variables are only initialized/declared once and the static keyword is useful to provide a lifetime over the entire program, but limit their scope.
It is unclear whether you have to return that array from hot function. If yes, it has to be static or has to be passed as an argument. Even if not, then there is no difference in "speed" between static and auto-array, because your reuse scheme should anyway have means of [possibly no-op] preinitialization before call, and no matter what very first initial value was.
Drawback of static storage is that code becomes non-reentrant. Passing it as an argument would be more correct solution.
I have this following code and I don't really understand which variable parts in the test_function are stored onto the stack segment?
In the book it says "The memory for these variables is in the stack segment", so I presume it is when the variables are actually initialized to a value. Right?
void test_function(int a, int b, int c, int d) {
int flag; //is it this
char buffer[10];// and this
//or
flag = 31337; //this and
buffer[0] = 'A'; //this. Or all of it?
}
int main() {
test_function(1, 2, 3, 4);
}
The various C standards do not refer to a stack, what it does talk about is storage duration of which there are three kinds(static, automatic, and allocated). In this case flag and buffer have automatic storage duration. On the most common systems objects that have automatic storage duration will be allocated on the stack but you can not assume that universally.
The lifetime of automatic objects starts when you enter the scope and ends when you leave the scope in this case your scope would be the entire function test_function. So assuming there is a stack then buffer and flag in most situations that I have seen there will be space allocated on the stack for the objects when you enter the function, this is assuming no optimization of any sort.
Objects with automatic storage duration are not initialized explicitly so you can not determine their initial values you need to assign to them first.
For completeness sake, the various storage durations are covered in the C99 draft standard section 6.2.4 Storage durations of objects paragraph 1 says(emphasis mine):
An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
Lifetime for automatic objects is covered paragraph 5 which says :
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way.[...]
flag, buffer, and a,b,c,d will be on the stack (well compiler may just remove all the code and call it dead code since it's unused).