Physical memory location of uninitialised memory location in C program? - c

I have read that the uninitialized global variables in C will occupy the .bss section of the memory. Also it is just a place holder and doesn't occpy any space in object file.
My question is, once the uninitialised global variable is assigned with some value, where will it will get stored in physical memory?
For example:
int a[100];
int main()
{
a[10] = 25;
}
In the above program, where will the memory location be allocated?

Where the global variables will be stored is implementation defined, the C standard does not define where it should be saved.
The C Standard does not even mention Bss segment or Data segment it only defines the behavior such variables must show.

I think your misunderstanding is thinking that BSS is "permanently zero" memory. It's just a section of the program load mapping that's implicitly zero and thus avoids having any physical storage on disk, but otherwise it's a standard private writable mapping, and takes on physical existence as soon as it's written to.
If you're thinking about it moving, perhaps you're confusing virtual and physical addresses. The virtual address of an object in C never changes, and the physical address is never visible to you and should never matter.

Related

Location of a dereferenced uninitialized pointer in memory?

I have this code example in c from an introductory Embedded system course quiz :
#include <stdlib.h>
#include <stdint.h>
//cross-compiled for MSP432 with cortex-m0plus
int main() {
int * l2;
return 0;
}
I want to know the memory segment ,sub-segment, permissions and lifetime of *l2 in memory.
What I understand is that the pointer l2 is going to be allocated in the stack sub-segment first then because it's uninitialized it's going to get a garbage value which is in this case any value it finds in the stack; I assumed it was in the .text or .const with a static lifetime and none of these answers were right, so am I missing something here ?
Edit:
After I passed the quiz without solving this point correctly, the solution table says it's in the heap with indefinite lifetime. what i got from this answer is that : because a pointer itself is stored in stack and the object it points to is uninitialized (it's not auto or static), it's stored in the heap.. I guess ??
It depends on the implementation.
Usually as it is local automatic variable it will be located on the stack. Its lifetime is the same as lifetime of the main function. It can be only accessed from the main function.
But in real life as you do not do anything with it, it will be just removed by the compiler as not needed even if if you compile it with no optimizations https://godbolt.org/z/1Y6W5j . In this case its location is "nowhere"
Objects can be also kept in the registers and not be placed in the memory https://godbolt.org/z/8nWxxz
Most modern C implementations place code in the .text segment, initialized static storage location variables in the .data segment, not initialized static storage location variables in the .bss segment and read only data in the .rodata segment . You may have plenty other memory segments in your program - but there are so many options. You can also have your own segments and place objects there.
Stack and heap location are 100% implementation defined.
The value stored in l2 is indeterminate - it can even be a trap representation. The l2 object itself has auto storage duration and its lifetime is limited to the lifetime of the enclosing function. What that translates into in terms of memory segment depends on the specific implementation.
You can’t say anything about the value of *l2, unless your specific implementation documents exactly how uninitialized pointers are handled.

difference between stack segment and uninitialized data segment

I was trying to get a hand over the memory allocation in c.
According to the following link, the stack and the uninitialized data segment are different and the uninitialized data of the local function goes to the uninitialized data segment.
If that is the case then what is stored in the stack segment in case of a code with uninitialized local variables? Is it empty?
I would not recommend reading "geeksforgeeks" tutorials. You have some misconceptions.
What they call "uninitialized data", the .bss segment, is in fact a store for variables of static storage duration that are zero-initialized. Including any such variable which is explicitly initialized to value zero.
An explanation of static storage duration and the different common segments, with examples, can be found here.
Only variables with static storage duration end up in .bss and .data. Local variables always end up on the stack, or in CPU registers, no matter if they are initialized or not.
(Please note that none of this is specified by the ISO C standard, but rather by industry de facto standards.)
the uninitialized data of the local function goes to the uninitialized data segment.
Well, that is not entirely true.
Read carefully, (from the same link, emphasis mine)
[...] uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code. [...]
So, the automatic storage variables still resides in stack segment, irrespective of the fact whether they are initialized or not.
That said, a word of caution, this is "A typical memory representation", not universal. C standard does not mandate to have a stack segment (or any other), for the matter.

How are the different segments like heap, stack, text related to the physical memory?

When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.

How do global variables contribute to the size of the executable?

Does having global variables increase the size of the executable? If yes how? Does it increase only the data section size or also the text section size?
If I have a global variable and initialization as below:
char g_glbarr[1024] = {"jhgdasdghaKJSDGksgJKASDGHKDGAJKsdghkajdgaDGKAjdghaJKSDGHAjksdghJKDG"};
Now, does this add 1024 to data section and the size of the initilization string to text section?
If instead if allocating space for this array statically, if I malloc it, and then do a memcpy, only the data section size will reduce or the text section size also will reduce?
Yes, it does. Basically compilers store them to data segment. Sometimes if you use a constant char array in you code (like printf("<1024 char array goes here");) it will go to data segment (AFAIK some old compilers /Borland?/ may store it in the text segment). You can force the compiler to put a global variable in a custom section (for VC++ it was #pragma data_seg(<segment name>)).
Dynamic memory allocation doesn't affect data/text segments, since it allocates memory in the heap.
The answer is implementation-dependent, but for sane implementations this is how it works for variables with static storage duration (global or otherwise):
Whenever the variable is initialized, the whole initialized value of the object will be stored in the executable file. This is true even if only the initial part of it is explicitly initialized (the rest is implicitly zero).
If the variable is constant and initialized, it will be in the "text" segment, or equivalent. Some systems (modern ELF-based, maybe Windows too?) have a separate "rodata" segment for read-only data to allow it to be marked non-executable, separate from program code.
Non-constant initialized variables will be in the "data" segment in the executable, which is mapped into memory in copy-on-write mode by the operating system when the program is loaded.
Uninitialized variables (which are implicitly zero as per the standard) will have no storage reserved in the executable itself, but a size and offset in the "bss" segment, which is created at program load-time by the operating system.
Such uninitialized variables may be created in a separate read-only "bss"-like segment if they're const-qualified.
I am not speaking as an expert, but I would guess that simply having that epic string literal in your program would increase the size of your executable. What you do with that string literal doesn't matter, because it has to be stored somewhere.
Why does it matter which "section" of the executable is increased? This isn't a rhetorical question!
The answer is slightly implementation sensitive, but in general, no. Your g_glbarr is really a pointer to char, or an address. The string itself will be put into the data section with constant strings, and g_glbarr will become a symbol for the address of the string at compile time. You don't end up allocating space for the pointer and the compiler simply resolves the address at link time.
Update
#Jay, it's sorta kinda the same. The integers (usually) just are in-line: the compiler will come as close as it can to just putting the constant in the code, because that's such a common case that most normal architectures have a straightforward way of doing it from immediate data. The string constants will still be in some read-only data section. So when you make something like:
// warning: I haven't compiled this and wouldn't normally
// do it quite this way so I'm not positive this is
// completely grammatical C
struct X {int a; char * b; } x = { 1, "Hello" } ;
the 1 becomes "immediate" data, the "Hello" is allocated in read-only data somewhere, and the compiler will just generate something that allocates a piece of read-write data that looks something like
x:
x.a: WORD 1
x.b WORD #STR42
where STR42 is a symbolic name for the location of the string "Hello" in memory. Then when everything is linked together, the #STR42 is replaced with the actual virtual address of the string in memory.

Where are constant variables stored in C?

I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?
How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.
Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.
Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)
Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.
Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.
offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????
This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.
This is specific to Win32 systems.
It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};
Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.
It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.
Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.
Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.
I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.

Resources