as we know .bss contains un-initialized variables. if in c code, programer initialize the variables before using them. then .bss is not necessary to be zero before executing C code.
Am I right?
Thanks
In C code, any variable with static storage duration is defined to be initialized to 0 by the spec (Section 6.7.8 Initialization, paragraph 10):
If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
Some program loaders will fill the whole section with zeroes to start with, and others will fill it 'on demand' as a perfomance improvement. So while you are technically correct that the .bss section may not really contain all zeroes when the C code starts executing, it logically does. In any case, assuming you have a standard compliant toolchain, you can think of it as being all zero.
Any variables that are initialized to non-zero values will never end up in the .bss section; they are handled in the .data or .rodata sections, depending on their particular characteristics.
The ELF specification says:
.bss This section holds uninitialized
data that contribute to the program’s
memory image. By definition, the
system initializes the data with zeros
when the program begins to run. The
section occupies no file space, as
indicated by the section type,
SHT_NOBITS.
It therefore follows that a C global variable which has a value assigned to it cannot be put into the .bss section and will have to go into the .data section. The .data section contains the initial valued for all the variables assigned to it.
That depends on where the variable is in code. For instance if you're talking about a local variable in main() or any other function, then variables are pushed onto the stack (unless you use other modifying keywords). If your variable is global AND uninitialized then it should be kept in .bss. Note that compiler optimization and so forth may change things around a bit. If you want to know for sure use readelf to investigate an ELF binary on linux.
It seems like you might be confused about the mechanism by which the .bss section ends up zero initialized. The code that you compile doesn'thave to explicitly initialize the region to zero because when the operating system first allocates a new page of memory to a process the OS makes sure that the page is zero initialized. This is done for security reasons, so that a process can't go looking for secrets that were left in memory when other processes exited.
Related
I was trying to get a hand over the memory allocation in c.
According to the following link, the stack and the uninitialized data segment are different and the uninitialized data of the local function goes to the uninitialized data segment.
If that is the case then what is stored in the stack segment in case of a code with uninitialized local variables? Is it empty?
I would not recommend reading "geeksforgeeks" tutorials. You have some misconceptions.
What they call "uninitialized data", the .bss segment, is in fact a store for variables of static storage duration that are zero-initialized. Including any such variable which is explicitly initialized to value zero.
An explanation of static storage duration and the different common segments, with examples, can be found here.
Only variables with static storage duration end up in .bss and .data. Local variables always end up on the stack, or in CPU registers, no matter if they are initialized or not.
(Please note that none of this is specified by the ISO C standard, but rather by industry de facto standards.)
the uninitialized data of the local function goes to the uninitialized data segment.
Well, that is not entirely true.
Read carefully, (from the same link, emphasis mine)
[...] uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code. [...]
So, the automatic storage variables still resides in stack segment, irrespective of the fact whether they are initialized or not.
That said, a word of caution, this is "A typical memory representation", not universal. C standard does not mandate to have a stack segment (or any other), for the matter.
When I am declaring some variable outside main then compile stores them in some peculiar way.
int i=1,j=1;
void main(void)
{
printf("%d\n%d",&i,&j);
}
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
The problem with is when they are stored on contiguous address spaces it causes some real performance issues like false sharing (have a look here). I've learned that to prevent this, there should be some space between variable's addresses which is automatically provided when i=0 and j=any +ve value.
Now, what I want to understand is:
Why the compiler stores variables to noncontinuous addresses only when one initialized to 0 and other initialized to positive values, and
How can I intentionally do what compiler is doing automatically i.e allocating variables to fairly separated address space.
(Using devcpp gcc 4.9.2)
Assuming you meant printf("%p, %p\n",(void *)&i,(void *)&j);, note the following:
It is not mandated by C specs to allocate variables in contiguous memory.
Often globals initialized with 0 are kept in BSS section (which is a part of data section) to save binary size. Other globals are kept in rest of the data section. (Depends on implementation detail, not mandated by C specs)
How can I intentionally do what compiler is doing automatically?
This is compiler specific question and your compiler documentation should possibly contain an answer to this.
One problem there,
printf("%d\n%d",&i,&j);
invokes undefined behavior. So, the outputs cannot be justified in any way. You need to use %p format specifier and cast the corresponding argument to (void *) to print a pointer.
That said, C standard does neither impose any constraints nor provide any guideline on where and how the variables will be stored in memory. It's up to the compiler implementation to decide how to place different variables in memory. You need to check the documentation of the compiler in use to find out the rules your compiler is following.
To elaborate in a generic way, an object file consists of many segments, like
Header (descriptive and control information)
Code segment ("text segment", executable code)
Data segment (initialized static variables)
Read-only data segment (rodata, initialized static constants)
BSS segment (uninitialized static data, both variables and constants)
External definitions and references for linking
Relocation information
Dynamic linking information
Debugging information
and it's up to the compiler to decide the address space (range/value) to be used for each segment.
As per the rules,
Global variables (i.e., having static storage duration) left uninitialized and initialized with 0 are placed in .bss segment.
Variables initialized with a non-zero value are placed in the .data segment
so, it's fair enough to say that the addresses of two variables pertaining to two different segments will not be contiguous.
Now, your observation checks out.
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory
yes, then all of them go to either .bss or .data and compiler choose to place them one after another, usually.
whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
This also holds true, both the variables are now placed in different segments.
In C/C++, why are globals and static variables initialized to default values?
Why not leave it with just garbage values? Are there any special
reasons for this?
Security: leaving memory alone would leak information from other processes or the kernel.
Efficiency: the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. The OS can even zero freelist pages when the system is otherwise idle, rather than when some client or user is waiting for the program to start.
Reproducibility: leaving the values alone would make program behavior non-repeatable, making bugs really hard to find.
Elegance: it's cleaner if programs can start from 0 without having to clutter the code with default initializers.
One might then wonder why the auto storage class does start as garbage. The answer is two-fold:
It doesn't, in a sense. The very first stack frame page at each level (i.e., every new page added to the stack) does receive zero values. The "garbage", or "uninitialized" values that subsequent function instances at the same stack level see are really the previous values left by other method instances of your own program and its library.
There might be a quadratic (or whatever) runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large array, say, on any given call, and it could be invoked thousands or millions of times. The initialization of statics and globals, OTOH, only needs to happen once.
Because with the proper cooperation of the OS, 0 initializing statics and globals can be implemented with no runtime overhead.
Section 6.7.8 Initialization of C99 standard (n1256) answers this question:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules;
— if it is a union, the first named member is initialized (recursively) according to these rules.
Think about it, in the static realm you can't tell always for sure something is indeed initialized, or that main has started. There's also a static init and a dynamic init phase, the static one first right after the dynamic one where order matters.
If you didn't have zeroing out of statics then you would be completely unable to tell in this phase for sure if anything was initialized AT ALL and in short the C++ world would fly apart and basic things like singletons (or any sort of dynamic static init) would simple cease to work.
The answer with the bulletpoints is enthusiastic but a bit silly. Those could all apply to nonstatic allocation but that isn't done (well, sometimes but not usually).
In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard). Hence, the bss section typically includes all uninitialized variables declared at file scope (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword.
Source: Wikipedia
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why global and static variables are initialized to their default values?
What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?
It is required by the standard (§6.7.8/10).
There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.
Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:
int foo() {
static int *ptr;
if (NULL == ptr)
// initialize it
}
If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.
Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.
Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.
The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.
Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.
Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.
When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.
The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.
Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.
Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.
Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.
There is discussion about this here:
First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.
Take a look : here 6.2.4(3) and 6.7.8 (10)
Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.
Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.
If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.
I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.
I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?
How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.
Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.
Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)
Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.
Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.
offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????
This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.
This is specific to Win32 systems.
It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};
Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.
It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.
Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.
Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.
I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.