I have this following code and I don't really understand which variable parts in the test_function are stored onto the stack segment?
In the book it says "The memory for these variables is in the stack segment", so I presume it is when the variables are actually initialized to a value. Right?
void test_function(int a, int b, int c, int d) {
int flag; //is it this
char buffer[10];// and this
//or
flag = 31337; //this and
buffer[0] = 'A'; //this. Or all of it?
}
int main() {
test_function(1, 2, 3, 4);
}
The various C standards do not refer to a stack, what it does talk about is storage duration of which there are three kinds(static, automatic, and allocated). In this case flag and buffer have automatic storage duration. On the most common systems objects that have automatic storage duration will be allocated on the stack but you can not assume that universally.
The lifetime of automatic objects starts when you enter the scope and ends when you leave the scope in this case your scope would be the entire function test_function. So assuming there is a stack then buffer and flag in most situations that I have seen there will be space allocated on the stack for the objects when you enter the function, this is assuming no optimization of any sort.
Objects with automatic storage duration are not initialized explicitly so you can not determine their initial values you need to assign to them first.
For completeness sake, the various storage durations are covered in the C99 draft standard section 6.2.4 Storage durations of objects paragraph 1 says(emphasis mine):
An object has a storage duration that determines its lifetime. There are three storage
durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.
Lifetime for automatic objects is covered paragraph 5 which says :
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way.[...]
flag, buffer, and a,b,c,d will be on the stack (well compiler may just remove all the code and call it dead code since it's unused).
Related
i know that heap is used in case of dynamic memory allocation otherwise stack is used.
i have tried Difference between static memory allocation and dynamic memory allocation
i know the difference but confusion is about their lifetimes.
First of all, stacks and heaps are implementation details (the words "stack" and "heap" do not appear anywhere in the C language standard). Instead, the standard talks about storage durations for objects (Section 6.2.4).
As of C2011, there are four storage durations: static, automatic, thread, and allocated.
Objects with static storage duration have lifetimes1 that extend over the lifetime of the program. That is, memory is set aside for them when the program is loaded, and that memory is released when the program exits. Objects declared at file scope (outside of any function) or with the static keyword have static storage duration. Storage for static objects is usually allocated from within the binary image itself (for ELF, this would include the .data, .rodata, and .bss sections); that is, something other than a stack or heap.
Objects with automatic storage duration have lifetimes that extend from the entry of the block in which they're created until the block exits2. If the block is entered recursively, a new object is created. Objects declared within a block without the static keyword have automatic storage duration. Objects with automatic storage duration are usually allocated from a runtime hardware stack, although not all architectures have a stack.
Objects with thread storage duration have lifetimes that extend over the execution of the thread for which they were created. Objects declared with the _Thread_local keyword have thread storage duration. I think thread-local objects are allocated in the same way as auto variables, but that may be wrong; I've never used C2011 native threading, so I can't say for sure.
Objects with allocated storage duration have lifetimes that extend from the time they are allocated with malloc, calloc, or realloc until they are explicitly deallocated with a call to free. Objects with allocated storage duration are usually allocated from the heap (although not all architectures will have a heap as such). Where things get confusing is distinguishing the allocated object from the object that points to it. Given the following code:
int *foo( void )
{
int *bar = malloc( sizeof *bar * 10 );
// do stuff with bar
return bar;
}
void bletch( void )
{
int *blurga = foo();
// do stuff with blurga
free( blurga );
}
We've allocated three objects. In the function foo, we allocate a pointer object (referred to by the variable bar) with automatic storage duration; its lifetime is the lifetime of the function foo. In the function bletch, we allocate another pointer object (referred to by the variable blurga) with automatic storage duration; its lifetime extends over the lifetime of the function bletch.
The third object is a buffer large enough to hold 10 int objects. Its lifetime extends from the malloc call in foo to the free call in bletch; its lifetime is not tied to the lifetime of any function or block.
1. The lifetime of an object is the time within a program's execution that storage is guaranteed to be reserved for that object. Note that the lifetime of an object is distinct from the scope of the identifier that refers to that object. Even though memory for the object may be allocated at block entry, the scope of the identifier that refers to it may be more limited.
Assume the following code:void foo()
{
printf( "entered foo\n" );
int i = 0;
while ( i < 10 )
printf( "%d\n", i++ );
}
The scope of the variable i extends from the end of its declaration until the end of the block; however, the lifetime of the integer object i refers to extends from block entry until block exit.
2. In practice, most compilers will set aside storage for all block-scope variables at function entry, even though some may be local to a block within the function. However, it's best to assume that the lifetime of an auto object only extends to the block in which it is contained.
Stack variables have local scope, meaning they are only valid to de-reference within the pair of {} where they were declared. While a dynamically allocated variable is valid until the point where your program calls free().
A more correct name would be local variables, since local variables may also end up allocated in CPU registers and not always on the stack. Formally, they are called variables with automatic storage duration in the C standard, meaning that the compiler automatically decides which is the best place to allocate them at.
I have the following C code:
//declared at the beginning of the CAStar.c file:
int TERRAIN_PASSABLE = 1;
int TERRAIN_IMPASSABLE = 0;
int TERRAIN_SOME_WHAT_PASSABLE = 2;
I've noticed that for any of these variables, if they have a non-zero value, they are reported by the "nm" command as type "D" (initialized):
_TERRAIN_PASSABLE |00000008| D |
_TERRAIN_SOME_WHAT_PASSABLE|00000004| D |
However, those initialized to 0 are reported as "B" (uninitialized):
_TERRAIN_IMPASSABLE |00000000| B |
Why the difference between "initialized with 0" and "initialized with something else but 0" ?
This is more or less about how BSS works and how it is used. B means that variable will be placed in BSS section (and you are right it is uninitialized data section). D means that the symbol is placed in initialized data section.
Read for example this article to know bit more about how BSS works and what it is used for.
Most likely these variables are declared at file scope, giving them static storage duration.
All variables with static storage duration are, for optimization purposes, sorted in two categories by the compiler/linker: initialized to 0 or initialized to something else. Variables initialized to zero are placed in a memory segment usually referred to as .bss, while those who are initialized to another value are placed in .data.
The reason for this is that .bss variables can be initialized much faster if the are allocated in adjacent memory. Basically they would be initialized with a single memset. Also, it will reduce the amount of ROM needed. Releated question with details.
EDIT
The reason .bss variables end up under uninitialized is likely because there is a rule in the C language (C11 6.7.9/10) stating that all static storage duration variables that aren't initialized explicitly by the programmer (they are "uninitialized"), shall be initialized to zero
i) static int a, b, c;
ii) int a; int b; int c;
I am not sure as to how will the memory be allocated for these types of declaration. And if these declarations are different then how much memory is allocated for each declaration?
static int a,b,c;
will allocate three ints (probably 32bits each, or 4 bytes) in the DATA section of your program. They will always be there as long as your program runs.
int a; int b; int c;
will allocate three ints on the STACK. They will be gone when they go out of scope.
There is no difference between the size of memory for
static int a,b,c;
int a;int b;int c;
Differences occur in the lifetime, location, scope & initialization.
Lifetime: Were these were declare globally, both a,b,c sets would exist for the lifetime of the program. Were they both in a function, the static ones would exist for the program lifetime, but the other would exist only for the duration of the function. Further, should the function be called recursively or re-entrant, multiple sets of the non-static a,b,c, would exists.
Location: A common, thought not required by C, is to have a DATA section and STACK section of memory. Global variables tend to go in DATA as well as functional static ones. The non-static version of a,b,c in a function would typically go on in STACK.
Scope: Simple view: Functionally declared variables (static or not) are scoped within the function. Global variables declared static have file scope. Global variables not declared static have the scope of the entire program.
Initialization: follows along the same track as lifetime. Globally declared a,b,c, static or not, are both initialized at program start. If a,b,c are in a function, only static ones are initialized (at program start). Functional non-static a,b,c are not initialized.
Optimization may affect location, especially for the functional non-static a,b,c which could readily be saved in registers. Optimization may also determine that the variable is not used and optimizing it out, thus taking 0 bytes.
The variables that are defined as static will allocated in data segment at compile time. The same is true for global variables even though they are not static. Non-static variables defined within a block are allocated on the stack when the block is entered at runtime and are deallocted when th block is exited.
The amount of memory allocated is implementation dependent. The standard requires that an int is large enough to hold a 16-bit (2 byte) valued, but is can be larger. Most compilers you are likely to use now adays use 32-bit ints.
If we assume that 2nd declaration is inside function, than As Bard and Nashant already said these will be allocated in different memory sections (OS and compilers dependent).
But though variable size will be of the same size, they CAN consume different amount of memory. If function (from 2nd declaration) is called recursively for example, there will be multiple instances of variables from 2nd declaration.
By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?
I think they would be allocated as follows:
Global variables -------> data
Static variables -------> data
Constant data types -----> code
Local variables (declared and defined in functions) --------> stack
Variables declared and defined in main function -----> heap
Pointers (for example, char *arr, int *arr) -------> heap
Dynamically allocated space (using malloc and calloc) --------> stack
I am referring to these variables only from the C perspective.
Please correct me if I am wrong as I am new to C.
You got some of these right, but whoever wrote the questions tricked you on at least one question:
global variables -------> data (correct)
static variables -------> data (correct)
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack (correct)
variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".
For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:
Some heads up:
Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
Typically there are three types of variables:
Local variables (also called as automatic variables in C)
Global variables
Static variables
You can have global static or local static variables, but the above three are the parent types.
5 Memory Segments in C:
1. Code Segment
The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
Further reading
Data Segment
The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.
2. Uninitialized data segment
This segment is also known as bss.
This is the portion of memory which contains:
Uninitialized global variables (including pointer variables)
Uninitialized constant global variables.
Uninitialized local static variables.
Any global or static local variable which is not initialized will be stored in the uninitialized data segment
For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
Further reading
3. Initialized data segment
This segment stores:
Initialized global variables (including pointer variables)
Initialized constant global variables.
Initialized local static variables.
For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
Further reading
4. Stack Segment
Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
Local variables of the function (including pointer variables)
Arguments passed to function
Return address
Variables stored in the stack will be removed as soon as the function execution finishes.
Further reading
5. Heap Segment
This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
Further reading
Corrected your wrong sentences
constant data types -----> code //wrong
local constant variables -----> stack
initialized global constant variable -----> data segment
uninitialized global constant variable -----> bss
variables declared and defined in main function -----> heap //wrong
variables declared and defined in main function -----> stack
pointers(ex:char *arr,int *arr) -------> heap //wrong
dynamically allocated space(using malloc,calloc) --------> stack //wrong
pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.
Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.
Note : Pointer variables can point the memory of any segment.
int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal
//which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}
dynamically allocated space(using malloc,calloc) --------> heap
A popular desktop architecture divides a process's virtual memory in several segments:
Text segment: contains the executable code. The instruction pointer takes values in this range.
Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").
Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.
A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.
On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.
I am referring to these variables only from the C perspective.
From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:
a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...
pointers(ex:char *arr,int *arr) -------> heap
Nope, they can be on the stack or in the data segment. They can point anywhere.
Variables/automatic variables ---> stack section
Dynamically allocated variables ---> heap section
Initialised global variables -> data section
Uninitialised global variables -> data section (bss)
Static variables -> data section
String constants -> text section/code section
Functions -> text section/code section
Text code -> text section/code section
Registers -> CPU registers
Command line inputs -> environmental/command line section
Environmental variables -> environmental/command line section
Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0: stack
-O3: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
otherwise: .data section
char * and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc, since malloc is a function, and in:
int *i = malloc(sizeof(int));
*i is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?
When declaring an array in C like this:
int array[10];
What is the initial value of the integers?? I'm getting different results with different compilers and I want to know if it has something to do with the compiler, or the OS.
If the array is declared in a function, then the value is undefined. int x[10]; in a function means: take the ownership of 10-int-size area of memory without doing any initialization. If the array is declared as a global one or as static in a function, then all elements are initialized to zero if they aren't initialized already.
As set by the standard, all global and function static variables automatically initialised to 0. Automatic variables are not initialised.
int a[10]; // global - all elements are initialised to 0
void foo(void) {
int b[10]; // automatic storage - contain junk
static int c[10]; // static - initialised to 0
}
However it is a good practice to always manually initialise function variable, regardless of its storage class. To set all array elements to 0 you just need to assign first array item to 0 - omitted elements will set to 0 automatically:
int b[10] = {0};
Why are function locals (auto storage class) not initialized when everything else is?
C is close to the hardware; that's its greatest strength and its biggest danger. The reason auto storage class objects have random initial values is because they are allocated on the stack, and a design decision was made not to automatically clear these (partly because they would need to be cleared on every function call).
On the other hand, the non-auto objects only have to be cleared once. Plus, the OS has to clear allocated pages for security reasons anyway. So the design decision here was to specify zero initialization. Why isn't security an issue with the stack, too? Actually it is cleared, at first. The junk you see is from earlier instances of your own program's call frames and the library code they called.
The end result is fast, memory-efficient code. All the advantages of assembly with none of the pain. Before dmr invented C, "HLL"s like Basic and entire OS kernels were really, literally, implemented as giant assembler programs. (With certain exceptions at places like IBM.)
According to the C standard, 6.7.8 (note 10):
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate.
So it depends on the compiler. With MSVC, debug builds will initialize automatic variables with 0xcc, whereas non-debug builds will not initialize those variables at all.
A C variable declaration just tells the compiler to set aside and name an area of memory for you. For automatic variables, also known as stack variables, the values in that memory are not changed from what they were before. Global and static variables are set to zero when the program starts.
Some compilers in unoptimized debug mode set automatic variables to zero. However, it has become common in newer compilers to set the values to a known bad value so that the programmer does not unknowingly write code that depends on a zero being set.
In order to ask the compiler to set an array to zero for you, you can write it as:
int array[10] = {0};
Better yet is to set the array with the values it should have. That is more efficient and avoids writing into the array twice.
In most latest compilers(eg. gcc/vc++), partially initialized local array/structure members are default initialized to zero(int), NULL(char/char string), 0.000000(float/double).
Apart from local array/structure data as above, static(global/local) and global space members are also maintain the same property.
int a[5] = {0,1,2};
printf("%d %d %d\n",*a, *(a+2), *(a+4));
struct s1
{
int i1;
int i2;
int i3;
char c;
char str[5];
};
struct s1 s11 = {1};
printf("%d %d %d %c %s\n",s11.i1,s11.i2, s11.i3, s11.c, s11.str);
if(!s11.c)
printf("s11.c is null\n");
if(!*(s11.str))
printf("s11.str is null\n");
In gcc/vc++, output should be:
0 2 0
1 0 0 0.000000
s11.c is null
s11.str is null
Text from http://www.cplusplus.com/doc/tutorial/arrays/
SUMMARY:
Initializing arrays. When declaring a
regular array of local scope (within a
function, for example), if we do not
specify otherwise, its elements will
not be initialized to any value by
default, so their content will be
undetermined until we store some value
in them. The elements of global and
static arrays, on the other hand, are
automatically initialized with their
default values, which for all
fundamental types this means they are
filled with zeros.
In both cases, local and global, when
we declare an array, we have the
possibility to assign initial values
to each one of its elements by
enclosing the values in braces { }.
For example:
int billy [5] = { 16, 2, 77, 40, 12071 };
The relevant sections from the C standard (emphasis mine):
5.1.2 Execution environments
All objects with static storage duration shall be initialized (set to their initial values) before program startup.
6.2.4 Storage durations of objects
An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration.
6.2.5 Types
Array and structure types are collectively called aggregate types.
6.7.8 Initialization
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
It depends from the location of your array.
if it is global/static array it will be part of bss section which means it will be zero initialized at run time by C copy routine.
If it is local array inside a function, then it will be located within the stack and initial value is not known.
if array is declared inside a function then it has undefined value but if the array declared as global one or it is static inside the function then the array has default value of 0.