How is memory allocated in an array? - c

Suppose we have an array int arr[size] and a pointer to an integer int *i. We can assign a set of sequential blocks of memory to the *i using malloc(sizeof(int) * size). However my question is how is memory allocated in int arr[size]. The assigning of memory blocks according to my guess in this case is done by the compiler. Does the compiler implicitly makes a malloc() or similar function call? And how is the dynamic memory allocation done by a compiler when we write something like int arr[] = "You guys are a great help".

There's 4 different cases.
a) Global variables and static local variables. For these, the array is typically built into the executable program's file (which is typically split into sections - one for code, one for initialized read-only data, one for initialized data, and one for "initialized to zero"/uninitialized data). Mostly, if the array is initialized (e.g. int foo[44] = { 1, 2, 3 };) then it'll be put into the normal read/write data section (.data) and if it's not initialized (e.g. int foo[44];) it'll be put in the "initialized to zero"/uninitialized data section (.bss) to reduce the file's size. Allocating space in (e.g.) a .data section isn't much different to allocating space for code in an executable file's .text section. In this case the memory can not be freed.
b) Local variables (inside a function, like void foo(void) { int bar[44]; }). In this case the compiler will typically create space on the stack for it (and it won't be "initialized to zero"). If it's explicitly initialized then the compiler will generate code to initialize it, either using instructions (if the array is small), or by copying data from a hidden copy in a data section (where the hidden copy is allocated in the same way it would've been for a global variable). In this case the memory can not be explicitly freed but will be implicitly freed (if/when you return from the function).
c) Thread local storage (e.g. __thread int foo[]). In this case the compiler typically allocates space in thread local storage; and if the array is initialized it will also create a hidden copy in a data section (where the hidden copy is allocated in the same way it would've been for a global variable). In this case the memory can not be freed.
d) Dynamically allocated arrays (e.g. int *myPointer = malloc(sizeof(int) * 44);). For these, the compiler doesn't allocate space anywhere (the run-time library does). This is the only case where you can explicitly free the memory (e.g. free(myPointer);).
Note 1: A lot of the above (everywhere I've used "typically") is implementation dependent and a compiler may do something very different; either because "do it different" makes more sense for the compiler's output, or because the compiler can optimize properly.
Note 2: Sadly most compilers can't optimize properly. It's hard to get them to use a .rodata section when arrays are not modified, even if you use const. For cases where there's a hidden array used for initialization it's hard to get them to analyze the contents and find more complex patterns (they can/will do this badly for simple patterns like int foo[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; }). If you have a global array like int hugeArray[12345678] = {4, 5, 6} they won't put the array in the .bss section and initialize it to reduce file size even when it would be trivial to do so, etc).

Related

Memory location of global and local structs [duplicate]

By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?
I think they would be allocated as follows:
Global variables -------> data
Static variables -------> data
Constant data types -----> code
Local variables (declared and defined in functions) --------> stack
Variables declared and defined in main function -----> heap
Pointers (for example, char *arr, int *arr) -------> heap
Dynamically allocated space (using malloc and calloc) --------> stack
I am referring to these variables only from the C perspective.
Please correct me if I am wrong as I am new to C.
You got some of these right, but whoever wrote the questions tricked you on at least one question:
global variables -------> data (correct)
static variables -------> data (correct)
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack (correct)
variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".
For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:
Some heads up:
Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
Typically there are three types of variables:
Local variables (also called as automatic variables in C)
Global variables
Static variables
You can have global static or local static variables, but the above three are the parent types.
5 Memory Segments in C:
1. Code Segment
The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
Further reading
Data Segment
The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.
2. Uninitialized data segment
This segment is also known as bss.
This is the portion of memory which contains:
Uninitialized global variables (including pointer variables)
Uninitialized constant global variables.
Uninitialized local static variables.
Any global or static local variable which is not initialized will be stored in the uninitialized data segment
For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
Further reading
3. Initialized data segment
This segment stores:
Initialized global variables (including pointer variables)
Initialized constant global variables.
Initialized local static variables.
For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
Further reading
4. Stack Segment
Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
Local variables of the function (including pointer variables)
Arguments passed to function
Return address
Variables stored in the stack will be removed as soon as the function execution finishes.
Further reading
5. Heap Segment
This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
Further reading
Corrected your wrong sentences
constant data types -----> code //wrong
local constant variables -----> stack
initialized global constant variable -----> data segment
uninitialized global constant variable -----> bss
variables declared and defined in main function -----> heap //wrong
variables declared and defined in main function -----> stack
pointers(ex:char *arr,int *arr) -------> heap //wrong
dynamically allocated space(using malloc,calloc) --------> stack //wrong
pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.
Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.
Note : Pointer variables can point the memory of any segment.
int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal
//which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}
dynamically allocated space(using malloc,calloc) --------> heap
A popular desktop architecture divides a process's virtual memory in several segments:
Text segment: contains the executable code. The instruction pointer takes values in this range.
Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").
Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.
A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.
On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.
I am referring to these variables only from the C perspective.
From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:
a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...
pointers(ex:char *arr,int *arr) -------> heap
Nope, they can be on the stack or in the data segment. They can point anywhere.
Variables/automatic variables ---> stack section
Dynamically allocated variables ---> heap section
Initialised global variables -> data section
Uninitialised global variables -> data section (bss)
Static variables -> data section
String constants -> text section/code section
Functions -> text section/code section
Text code -> text section/code section
Registers -> CPU registers
Command line inputs -> environmental/command line section
Environmental variables -> environmental/command line section
Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0: stack
-O3: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
otherwise: .data section
char * and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc, since malloc is a function, and in:
int *i = malloc(sizeof(int));
*i is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?

Variables and calls in a C program and its corresponding location in the Linux process address space

I'm currently learning the Linux process address space and I'm not sure where these C variables correspond in the process address space.
I know that when a function is called, a new frame is created, it'll contain local variables and other function calls etc..
What I am not sure about is the pointers that are in the frame:
I have this function:
int main(){
char *pointer1 = NULL;
char *pointer2 = (void *)0xDDDDDDDD;
pointer1 = malloc(80);
strcpy(pointer1, "Testing..");
return(0);
}
When main is called, a new frame is created.
Variables are initialized.
What I am not sure about these are the pointers, where does:
*pointer1 correspond to in the process address space - data or text section?
*pointer2 correspond to in the process address space - data or text section?
Does NULL and 0xDDDDDDDD belong to data or text section?
since pointer1 = malloc(80), does it belong to the stack section?
First of all it should be noted that the C specification doesn't actually require local variables to be stored on a stack, it doesn't specify location of automatic variables at all.
With that said, the storage for the variables pointer1 and pointer2 themselves will most likely be put on a stack by the compiler. Memory for them will be part of the stack-frame created by the compiler when the main function is called.
To continue, on modern PC-like systems a pointer is really nothing more than a simple unsigned integer, and its value is the address where it points. The values you use for the initialization (NULL and 0xDDDDDDDD) are simply plain integer values. The initialization is done just the same as for a plain int variable. And as such, the values used for initialization doesn't really exists as "data", instead they could be encoded directly in the machine code, and as such will be stored in the "text" (code) segment.
Lastly for the dynamic allocation, it doesn't change where pointer1 is stored. What is does it simply assigning a new value to pointer1. The memory being allocated is on the "heap" which is separate from any program section (i.e. it's neither in the code, data or stack segments).
As some programmer dude just said, the C spec does not state a region where automatic variables must be placed. But it is usual for compilers to grow the stack to accommodate them there. However, they might end on the .data region, and they will if they were, e.g., defined as static char *pointer1 instead.
The initialization values may or may not exist in a program region either. In your case, since the type of values is int, most architectures will inline the initialization as appropriate machine instructions instead, if instructions with appropriate inline operators are available. In x86_64, for example, a single mov/movq operation will be issued to put the 0 (NULL) or the other int in the appropriate memory location on the stack.
However, variables initialized with global scope, such as static char string[40] = "Hello world" or other initialized global variables end up on the .data region and take up space in there. Compilers may place declared, but undefined, global scoped variables on the .bss region instead.
The question since pointer1 = malloc(80), does it belong to the stack section? is ill-defined, because it comprises two things.
The value pointer1 is a value that will be saved at &pointer1. An address which, given the above consideration, the compiler may have put on the stack.
The result of malloc(80) is a value that refers to a region on the heap, a different region, dynamically allocated outside the mapped program space.
On Linux, the result of calling malloc may even create a new NULL-backed memory region (that is, a transient region that is not permanently stored on a file; although it could be swapped by the kernel).
In essence, you could think of how malloc(80) behaves, as something like (not taking free() into consideration, so this is an oversimplification):
int space_left = 0; void *last_mapping = NULL;
void *malloc(int req) {
void *result;
if (space_left < req) {
last_mapping = mmap(NULL, MALLOC_CHUNK_LENGTH, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
space_left = MALLOC_CHUNK_LENGTH;
}
space_left -= req;
result = last_mapping;
last_mapping += req;
return result;
}
The huge difference between calling malloc and mmap with MAP_PRIVATE is that mmap is a Linux System Call, which must make a kernel context switch, allocate a new memory map and reset the MMU layer for every memory chunk allocated, while malloc can be more intelligent and use a single big region as "heap" and manage the different malloc's and free's in userspace after the heap initialization (until the heap runs out of space, where it might have to manage multiple heaps).
Last section of your doubts i.e. "since pointer1 = malloc(80), does it belong to the stack section? " , I can tell you
In C, dynamic memory is allocated from the heap using some standard library functions. The two key dynamic memory functions are malloc() and free().
The malloc() function takes a single parameter, which is the size of the requested memory area in bytes. It returns a pointer to the allocated memory. If the allocation fails, it returns NULL. The prototype for the standard library function is like this:
void *malloc(size_t size);
The free() function takes the pointer returned by malloc() and de-allocates the memory. No indication of success or failure is returned. The function prototype is like this:
void free(void *pointer);
You can refer the doc
https://www.design-reuse.com/articles/25090/dynamic-memory-allocation-fragmentation-c.html

I'm little bit confused that whether automatic memory allocation takes place during run time or compile time

I know that the memory is allocated at compile time to auto variables like int a; and are stored in stack but in case of a variable array whose input is taken from the user, for eg
#include<stdio.h>
main()
{
int n;
printf("enter the size of array");
scanf("%d",&n);
int a[n];
.......
}
the memory is allocated at run time. So my question is, is the automatic allocation is case dependent or not. THANKS
In your example, it is unclear where "a" is defined. So, I'll take a stab at answering this by making assumptions on that.
If the array is declared as a global array, it resides in the bss segment, and memory is allocated as the segments are loaded into memory.
If the array is on the stack, and the size of the array is known at compile-time, the stack pointer is moved to allocate space for the array. You can see this if you disassemble the code.
If the array is on the stack, but space is allocated based on an argument to the function you have a VLA(variable length array). These are commonly converted to "alloca" calls by the compiler. In this case the stack pointer is just moved to allocated "n" bytes on the stack.
If the array is on the heap, the allocations are performed by the heap allocator in use.
The code that handles the automatic allocation is created at compile-time. The actual allocation takes place in run-time. You'll have machine code such as "push variable on stack" or "put variable in register", but this code is of course doing nothing until the program is executed. All stack allocations are done in run-time. They may or may not be of a deterministic nature.
In the case of a VLA, the instruction "move stack pointer n steps" is created at compile time, but the variable n is set in run-time and the stack pointer is then moved accordingly, to allocate memory.
The only kind of allocation that takes place at compile-time is allocation of objects with static storage duration - meaning allocation of file scope variables and static variables. Room for these are reserved in the data segments usually named .data and .bss on most systems.
Examples can be found here.

Allocating an Array of Structs without Malloc?

I have a struct defined in this way.
typedef struct COUNTRY {
char Code[3];
char Country[30];
int Population;
float Expectancy;
struct Country *Pointer;
} COUNTRY;
I have seen an array of structs allocated like this:
COUNTRY *countries = calloc(128, sizeof(COUNTRY));
or maybe like this:
COUNTRY *countries = malloc(128 * sizeof(COUNTRY));
But what does this do:
COUNTRY countries[128] = {};
Because I am still able to write to each entries' fields in all cases. Is the third option just bad form? It seems better to me because you can put that line up with the rest of your variable declarations outside of main(). Otherwise, you can only calloc() or malloc() inside of main() or other function.
Am I doing something wrong?
This:
COUNTRY countries[128];
simply defines an object whose type is "array of 128 COUNTRY elements".
The = {} is an initializer -- but empty initializers are illegal in C (I think gcc supports them as an extension). A portable alternative is:
COUNTRY countries[128] = { 0 };
which initializes all members of all elements to zero (0 for integers, \0' for characters, 0.0 for floating-point, NULL for pointers, and recursively for sub-elements). But since you specified the number of elements in the array (as 128), the initializer has no effect on how the array object is allocated.
If the declaration occurs inside a function definition, the array object has automatic storage duration, which means that it ceases to exist when execution reaches the end of the enclosing block. Such objects are commonly allocated on "the stack".
If it occurs outside any function definition (at file scope) or if it has the keyword static, then it has static storage duration, which means that it continues to exist for the entire execution of the program.
Objects allocated with malloc or calloc have allocated storage duration, which means that they continue to exist until they're explicitly deallocated by a call to free(). Such objects are commonly allocated on "the heap". (I'm ignoring realloc(), which complicates the description a bit.)
The first two statements will allocate array of structs on the heap, while the last one will initialize the array of structs on the stack.
It is not a bad form, it is just a matter where you want your data to be stored - on the stack ( freed automatically when your variable goes out of the scope, stack usually have significantly smaller size then heap, so you could overflow it if you place big data structures there), or on the heap (lifetime of data is not related to the scope, you need to manually free your memory).
It seems better to me because you can put that line up with the rest of your variable declarations outside of main().
If you need statically allocated object with the lifetime of the program, use this approach, there's nothing wrong with it. Please note that in this particular case, variable is not stored on the stack, but in the .data segment of your program (check this question for more details: How are global variables stored?).
The last form is 'stack allocated' or 'statically allocated'. Like calloc, all of the fields will be zeroed out.
Inside a function it is 'stack allocated' and that memory will go away when the function returns.
Outside any function, at file scope, it is statically allocated and a global piece of memory allocated before main() starts.
malloc/calloc are used when you don't know how many you need at compile time. For example in a linked list, you need to allocate/deallocate nodes on the fly. When you use an array, you the know exactly how many you need at compile time.
What also differs is where the memory is taken from. If you declare an array in a function, the memory will be taken from the stack. In the case of malloc/calloc, the memory is set aside in the heap.
= {};
is GNU C extension and is the same as:
= {0};

Where in memory are my variables stored in C?

By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?
I think they would be allocated as follows:
Global variables -------> data
Static variables -------> data
Constant data types -----> code
Local variables (declared and defined in functions) --------> stack
Variables declared and defined in main function -----> heap
Pointers (for example, char *arr, int *arr) -------> heap
Dynamically allocated space (using malloc and calloc) --------> stack
I am referring to these variables only from the C perspective.
Please correct me if I am wrong as I am new to C.
You got some of these right, but whoever wrote the questions tricked you on at least one question:
global variables -------> data (correct)
static variables -------> data (correct)
constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
local variables(declared and defined in functions) --------> stack (correct)
variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
dynamically allocated space(using malloc, calloc, realloc) --------> stack heap
It is worth mentioning that "stack" is officially called "automatic storage class".
For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:
Some heads up:
Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
Typically there are three types of variables:
Local variables (also called as automatic variables in C)
Global variables
Static variables
You can have global static or local static variables, but the above three are the parent types.
5 Memory Segments in C:
1. Code Segment
The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
Further reading
Data Segment
The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.
2. Uninitialized data segment
This segment is also known as bss.
This is the portion of memory which contains:
Uninitialized global variables (including pointer variables)
Uninitialized constant global variables.
Uninitialized local static variables.
Any global or static local variable which is not initialized will be stored in the uninitialized data segment
For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
Further reading
3. Initialized data segment
This segment stores:
Initialized global variables (including pointer variables)
Initialized constant global variables.
Initialized local static variables.
For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
Further reading
4. Stack Segment
Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
Local variables of the function (including pointer variables)
Arguments passed to function
Return address
Variables stored in the stack will be removed as soon as the function execution finishes.
Further reading
5. Heap Segment
This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
Further reading
Corrected your wrong sentences
constant data types -----> code //wrong
local constant variables -----> stack
initialized global constant variable -----> data segment
uninitialized global constant variable -----> bss
variables declared and defined in main function -----> heap //wrong
variables declared and defined in main function -----> stack
pointers(ex:char *arr,int *arr) -------> heap //wrong
dynamically allocated space(using malloc,calloc) --------> stack //wrong
pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.
Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.
Note : Pointer variables can point the memory of any segment.
int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal
//which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}
dynamically allocated space(using malloc,calloc) --------> heap
A popular desktop architecture divides a process's virtual memory in several segments:
Text segment: contains the executable code. The instruction pointer takes values in this range.
Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").
Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.
A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.
On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.
I am referring to these variables only from the C perspective.
From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.
One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:
a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...
pointers(ex:char *arr,int *arr) -------> heap
Nope, they can be on the stack or in the data segment. They can point anywhere.
Variables/automatic variables ---> stack section
Dynamically allocated variables ---> heap section
Initialised global variables -> data section
Uninitialised global variables -> data section (bss)
Static variables -> data section
String constants -> text section/code section
Functions -> text section/code section
Text code -> text section/code section
Registers -> CPU registers
Command line inputs -> environmental/command line section
Environmental variables -> environmental/command line section
Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0: stack
-O3: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
otherwise: .data section
char * and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc, since malloc is a function, and in:
int *i = malloc(sizeof(int));
*i is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?

Resources