C pointer array allocation - arrays

When are variables in main() allocated? Especially how much memory is allocated for the pointer to arrays arr and 2d in the following:
int main()
{
float a, b;
int *b;
float *(arr)[6];
float *(2d)[5][5];
}
Are these considered auto, global, or static?

All these variables are automatic: global variables need to be declared outside the scope of a function; static variables need to have a static modifier.
The exact size is system-dependent. You can find out by printing sizeof(arr), sizeof(b), etc.
The exact time of allocation of automatic variables is compiler-dependent: some of them are allocated upon entering the function, some are upon entering a block where they are used, and some may be optimized out, and not allocated at all.

Memory for all the local variables declared inside a function will be allocated during run time, before executing a function. For each function an activation record will be created in the stack of the process memory, which will contains all the local variables. Once the function execution is completed activation record will be poped out.
All the variables declared inside a function are consider as auto only, unless it is explicitly declared as static or register. Variables declared outside a function will be considered as global.
If a variable is declared inside or outside a function as static, means memory allocation will be done at compile time itself, which will be in data segment(or bss).
All the pointer variable is going to store some virtual address(of variable of any type or function). So size of a pointer variable is 4 bytes in case of 32 bit machine and 8 bytes in case of 64 bit machine.

Related

Memory location of global and local structs [duplicate]

By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?
I think they would be allocated as follows:
Global variables -------> data
Static variables -------> data
Constant data types -----> code
Local variables (declared and defined in functions) --------> stack
Variables declared and defined in main function -----> heap
Pointers (for example, char *arr, int *arr) -------> heap
Dynamically allocated space (using malloc and calloc) --------> stack
I am referring to these variables only from the C perspective.
Please correct me if I am wrong as I am new to C.
You got some of these right, but whoever wrote the questions tricked you on at least one question:
global variables -------> data (correct)
static variables -------> data (correct)
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack (correct)
variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".
For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:
Some heads up:
Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
Typically there are three types of variables:
Local variables (also called as automatic variables in C)
Global variables
Static variables
You can have global static or local static variables, but the above three are the parent types.
5 Memory Segments in C:
1. Code Segment
The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
Further reading
Data Segment
The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.
2. Uninitialized data segment
This segment is also known as bss.
This is the portion of memory which contains:
Uninitialized global variables (including pointer variables)
Uninitialized constant global variables.
Uninitialized local static variables.
Any global or static local variable which is not initialized will be stored in the uninitialized data segment
For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
Further reading
3. Initialized data segment
This segment stores:
Initialized global variables (including pointer variables)
Initialized constant global variables.
Initialized local static variables.
For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
Further reading
4. Stack Segment
Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
Local variables of the function (including pointer variables)
Arguments passed to function
Return address
Variables stored in the stack will be removed as soon as the function execution finishes.
Further reading
5. Heap Segment
This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
Further reading
Corrected your wrong sentences
constant data types -----> code //wrong
local constant variables -----> stack
initialized global constant variable -----> data segment
uninitialized global constant variable -----> bss
variables declared and defined in main function -----> heap //wrong
variables declared and defined in main function -----> stack
pointers(ex:char *arr,int *arr) -------> heap //wrong
dynamically allocated space(using malloc,calloc) --------> stack //wrong
pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.
Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.
Note : Pointer variables can point the memory of any segment.
int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal
//which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}
dynamically allocated space(using malloc,calloc) --------> heap
A popular desktop architecture divides a process's virtual memory in several segments:
Text segment: contains the executable code. The instruction pointer takes values in this range.
Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").
Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.
A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.
On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.
I am referring to these variables only from the C perspective.
From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:
a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...
pointers(ex:char *arr,int *arr) -------> heap
Nope, they can be on the stack or in the data segment. They can point anywhere.
Variables/automatic variables ---> stack section
Dynamically allocated variables ---> heap section
Initialised global variables -> data section
Uninitialised global variables -> data section (bss)
Static variables -> data section
String constants -> text section/code section
Functions -> text section/code section
Text code -> text section/code section
Registers -> CPU registers
Command line inputs -> environmental/command line section
Environmental variables -> environmental/command line section
Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0: stack
-O3: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
otherwise: .data section
char * and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc, since malloc is a function, and in:
int *i = malloc(sizeof(int));
*i is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?

Pass by reference and use of malloc

I'm a beginner to the C programming language. I sort of understand the general definition of stack memory, heap memory, malloc, pointers, and memory addresses. But I'm a little overwhelmed on understanding when to use each technique in practice and the difference between them.
I've written three small programs to serve as examples. They all do the same thing, and I'd like a little commentary and explanation about what the difference between them is. I do realize that's a naive programming question, but I'm hoping to connect some basic dots here.
Program 1:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated
on the stack in A.
*/
*worthRef = *worthRef + 1;
}
void A() {
int netWorth = 20;
B(&netWorth);
printf("%d", netWorth); // Prints 21
}
int main() {
A();
}
Program 2:
int B (int worthRef) {
/* worthRef is now a local variable. If
I return it, will it get destroyed
once B finishes execution?
*/
worthRef = worthRef + 1;
return (worthRef);
}
void A() {
int netWorth = 20;
int result = B(netWorth);
printf("%d", result); // Also prints 21
}
int main() {
A();
}
Program 3:
void B (int* worthRef) {
/* worthRef is a pointer to the
netWorth variable allocated on
the heap.
*/
*worthRef = *worthRef + 1;
}
void A() {
int *netWorth = (int *) malloc(sizeof(int));
*netWorth = 20;
B(netWorth);
printf("%d", *netWorth); // Also prints 21
free(netWorth);
}
int main() {
A();
}
Please check my understanding:
Program 1 allocates memory on the stack for the variable netWorth, and uses a pointer to this stack memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns it back to A() as result. This is an example of pass by value.
Does the local copy of worthRef get destroyed when it's returned?
Program 3 allocates memory on the heap for the variable netWorth, variable, and uses a pointer to this heap memory address to directly modify the variable netWorth. This is an example of pass by reference. No copy of the netWorth variable is made.
My main point of confusion is between Program 1 and Program 3. Both are passing pointers around; it's just that one is passing a pointer to a stack variable versus one passing a pointer to a heap variable, right? But in this situation, why do I even need the heap? I just want to have a single function to change a single value, directly, which I can do just fine without malloc.
The heap allows the programmer to choose the lifetime of the variable, right? In what circumstances would the programmer want to just keep a variable around (e.g. netWorth in this case)? Why not just make it a global variable in that case?
I sort of understand the general definition of stack memory, heap
memory, malloc, pointers, and memory addresses... I'd like a little
commentary and explanation about what the difference between them
is...
<= OK...
Program 1 allocates memory on the stack for the variable netWorth, and
uses a pointer to this stack memory address to directly modify the
variable netWorth. This is an example of pass by reference.
<= Absolutely correct!
Q: Program 2 ... Does the local copy of worthRef get destroyed when it's returned?
A: int netWorth exists only within the scope of A(), whicn included the invocation of B().
Program 1 and Program 3 ... one is passing a pointer to a stack variable versus one passing a pointer to a heap variable.
Q: But in this situation, why do I even need the heap?
A: You don't. It's perfectly OK (arguably preferable) to simply take addressof (&) int, as you did in Program 1.
Q: The heap allows the programmer to choose the lifetime of the variable, right?
A: Yes, that's one aspect of allocating memory dynamically. You are correct.
Q: Why not just make it a global variable in that case?
A: Yes, that's another alternative.
The answer to any question "Why choose one design alternative over another?" is usually "It depends".
For example, maybe you can't just declare everything "local variable" because you're environment happens to have a very small, limited stack. It happens :)
In general,
If you can declare a local variable instead of allocating heap, you generally should.
If you can avoid declaring a global variable, you generally should.
Checking your understanding:
Program 1 allocates memory on within the function stack frame of A() for the variable netWorth, and passes the address of netWorth as a pointer to this stack memory address to function B() allowing B() to directly modify the value for the variable netWorth stored at that memory address. This is an example of pass by reference passing the 'address of' a variable by value. (there is no pass by reference in C -- it's all pass by value) No copy of the netWorth variable is made.
Program 2 calls B(), which creates a locally stored copy of the value netWorth on its stack memory, increments this local copy, then returns the integer back to A() as result. (a function can always return its own type) This is an example of pass by value (because there is only pass by value in C).
(Does the local copy of worthRef get destroyed when it's returned? - answer yes, but since a function can always return its own type, it can return the int value to A() [which is handled by the calling convention for the platform])
Program 3 allocates memory on the heap for the variable netWorth, and uses a pointer to this heap memory address to directly modify the variable netWorth. While "stack/heap" are commonly used terms, C has no concept of stack or heap, the distiction is that variables are either declared with Automatic Storage Duration which is limited to the scope within which they are declared --or-- when you allocate with malloc/calloc/realloc, the block of memory has Allocated Storage Duration which is good for the life of the program or until the block of memory is freed. (there are also static and thread local storage durations not relevant here) See: C11 Standard - 6.2.4 Storage durations of objects This is an example of pass by reference. (It's all pass by Value!) No copy of the netWorth variable is made. A pointer to the address originally returned by malloc is used throughout to reference this memory location.

Dynamically Allocated Array vs. Automating Declared Array with Global Scope (C language)

What is the difference between declaring an array "dynamically",
[ie. using realloc() or malloc(), etc... ]
vs
declaring an array within main() with Global scope?,
eg.
int main()
{
int array[10];
return 0;
}
I am learning, and at the moment it feels that there is not much differnce between
declaring a variable (array, whatever) -with Global scope,
when compared to a
dynamically allocated variable (array, whatever) -AND never calling free() on it AND allowing it to be 'destoryed' when the program ends'
What are the consequences of either option?
EDIT
Thank you for your responses.
Global scope should have been 'local scope' -local to main()
When you declare an array like int arr[10] in a function, the space for the array is allocated on the stack. The memory will be freed when your function exits.
When you declare an array or any other data structure using malloc() or realloc(), you allocated the space on the heap and the memory will only be freed afer the program exits. So when the program is running, you are responsible for freeing it using free() after you no longer want to use it. If you don't free it and make your array pointer point to something else, you will create a memory leak. However, your computer will always be able to retrieve all the program's used memory after the program ends because of virtual memory.
As kaylum said in comment below your question, the array in your second example does not have global scope. Its scope is limited to main(), and it is inaccessible in other scopes unless main() explicitly makes it available (e.g. passes it by argument to another function).
Dynamic memory allocation means that the programmer explicitly allocates memory when needed, and explicitly releases it when no longer needed. Because of that, the amount of memory allocated can be determined at run time (e.g. calculated from user input). Also, if the programmer forgets to release the memory, or reallocates it inappropriately, memory can be leaked (still allocated by the program, but not accessible by the program). For example;
/* within a function */
char *p = malloc(100);
p = malloc(200);
free(p);
leaks 100 bytes, every time this code is executed, because the result of the first malloc() call is never released, and it is then inaccessible to the program because its value is not stored anywhere.
Your second example is actually an array of automatic storage duration. As far as your program is concerned, it only exists until the end of the scope in which it is created. In your case, as main() returns, the array will cease to exist.
An example of an array with global scope is
int array[10];
void f() {array[0] = 42;}
int main()
{
array[0] = 10;
f();
/* array[0] will be 42 here */
}
The difference is that this array exists and is accessible to every function that has visibility of the declaration, within the same compilation unit.
One other important difference is that global arrays are (usually) zero initialised - a global array of int will have all elements zero. A dynamically allocated array will not have elements initialised (unless created with calloc(), which does initialise to zero). Similarly, an automatic array will not have elements initialised. It is undefined behaviour to access the value of something (including an array element) that is uninitialised.
So
#include <stdio.h>
int array[10];
int main()
{
int *array2;
int array3[10];
array2 = malloc(10*sizeof(*array2));
printf("%d\n", array[0]); /* okay - will print 0 */
printf("%d\n", array2[0]); /* undefined behaviour. array2[0] is uninitialised */
printf("%d\n", array3[0]); /* undefined behaviour. array3[0] uninitialised */
return 0;
}
Obviously the way to avoid undefined behaviour is to initialise array elements to something valid before trying to access their value (e.g. printing them out, in the example above).

Is there any difference between the size of memory allocated for the following types of declaration:

i) static int a, b, c;
ii) int a; int b; int c;
I am not sure as to how will the memory be allocated for these types of declaration. And if these declarations are different then how much memory is allocated for each declaration?
static int a,b,c;
will allocate three ints (probably 32bits each, or 4 bytes) in the DATA section of your program. They will always be there as long as your program runs.
int a; int b; int c;
will allocate three ints on the STACK. They will be gone when they go out of scope.
There is no difference between the size of memory for
static int a,b,c;
int a;int b;int c;
Differences occur in the lifetime, location, scope & initialization.
Lifetime: Were these were declare globally, both a,b,c sets would exist for the lifetime of the program. Were they both in a function, the static ones would exist for the program lifetime, but the other would exist only for the duration of the function. Further, should the function be called recursively or re-entrant, multiple sets of the non-static a,b,c, would exists.
Location: A common, thought not required by C, is to have a DATA section and STACK section of memory. Global variables tend to go in DATA as well as functional static ones. The non-static version of a,b,c in a function would typically go on in STACK.
Scope: Simple view: Functionally declared variables (static or not) are scoped within the function. Global variables declared static have file scope. Global variables not declared static have the scope of the entire program.
Initialization: follows along the same track as lifetime. Globally declared a,b,c, static or not, are both initialized at program start. If a,b,c are in a function, only static ones are initialized (at program start). Functional non-static a,b,c are not initialized.
Optimization may affect location, especially for the functional non-static a,b,c which could readily be saved in registers. Optimization may also determine that the variable is not used and optimizing it out, thus taking 0 bytes.
The variables that are defined as static will allocated in data segment at compile time. The same is true for global variables even though they are not static. Non-static variables defined within a block are allocated on the stack when the block is entered at runtime and are deallocted when th block is exited.
The amount of memory allocated is implementation dependent. The standard requires that an int is large enough to hold a 16-bit (2 byte) valued, but is can be larger. Most compilers you are likely to use now adays use 32-bit ints.
If we assume that 2nd declaration is inside function, than As Bard and Nashant already said these will be allocated in different memory sections (OS and compilers dependent).
But though variable size will be of the same size, they CAN consume different amount of memory. If function (from 2nd declaration) is called recursively for example, there will be multiple instances of variables from 2nd declaration.

Where in memory are my variables stored in C?

By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?
I think they would be allocated as follows:
Global variables -------> data
Static variables -------> data
Constant data types -----> code
Local variables (declared and defined in functions) --------> stack
Variables declared and defined in main function -----> heap
Pointers (for example, char *arr, int *arr) -------> heap
Dynamically allocated space (using malloc and calloc) --------> stack
I am referring to these variables only from the C perspective.
Please correct me if I am wrong as I am new to C.
You got some of these right, but whoever wrote the questions tricked you on at least one question:
global variables -------> data (correct)
static variables -------> data (correct)
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack (correct)
variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".
For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:
Some heads up:
Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
Typically there are three types of variables:
Local variables (also called as automatic variables in C)
Global variables
Static variables
You can have global static or local static variables, but the above three are the parent types.
5 Memory Segments in C:
1. Code Segment
The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
Further reading
Data Segment
The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.
2. Uninitialized data segment
This segment is also known as bss.
This is the portion of memory which contains:
Uninitialized global variables (including pointer variables)
Uninitialized constant global variables.
Uninitialized local static variables.
Any global or static local variable which is not initialized will be stored in the uninitialized data segment
For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
Further reading
3. Initialized data segment
This segment stores:
Initialized global variables (including pointer variables)
Initialized constant global variables.
Initialized local static variables.
For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
Further reading
4. Stack Segment
Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
Local variables of the function (including pointer variables)
Arguments passed to function
Return address
Variables stored in the stack will be removed as soon as the function execution finishes.
Further reading
5. Heap Segment
This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
Further reading
Corrected your wrong sentences
constant data types -----> code //wrong
local constant variables -----> stack
initialized global constant variable -----> data segment
uninitialized global constant variable -----> bss
variables declared and defined in main function -----> heap //wrong
variables declared and defined in main function -----> stack
pointers(ex:char *arr,int *arr) -------> heap //wrong
dynamically allocated space(using malloc,calloc) --------> stack //wrong
pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.
Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.
Note : Pointer variables can point the memory of any segment.
int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal
//which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}
dynamically allocated space(using malloc,calloc) --------> heap
A popular desktop architecture divides a process's virtual memory in several segments:
Text segment: contains the executable code. The instruction pointer takes values in this range.
Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").
Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.
A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.
On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.
I am referring to these variables only from the C perspective.
From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:
a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...
pointers(ex:char *arr,int *arr) -------> heap
Nope, they can be on the stack or in the data segment. They can point anywhere.
Variables/automatic variables ---> stack section
Dynamically allocated variables ---> heap section
Initialised global variables -> data section
Uninitialised global variables -> data section (bss)
Static variables -> data section
String constants -> text section/code section
Functions -> text section/code section
Text code -> text section/code section
Registers -> CPU registers
Command line inputs -> environmental/command line section
Environmental variables -> environmental/command line section
Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0: stack
-O3: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
otherwise: .data section
char * and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc, since malloc is a function, and in:
int *i = malloc(sizeof(int));
*i is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?

Resources