Peculiar memory allocation of global variables by c compiler

Peculiar memory allocation of global variables by c compiler - c

When I am declaring some variable outside main then compile stores them in some peculiar way.
int i=1,j=1;
void main(void)
{
printf("%d\n%d",&i,&j);
}
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
The problem with is when they are stored on contiguous address spaces it causes some real performance issues like false sharing (have a look here). I've learned that to prevent this, there should be some space between variable's addresses which is automatically provided when i=0 and j=any +ve value.
Now, what I want to understand is:
Why the compiler stores variables to noncontinuous addresses only when one initialized to 0 and other initialized to positive values, and
How can I intentionally do what compiler is doing automatically i.e allocating variables to fairly separated address space.
(Using devcpp gcc 4.9.2)

Assuming you meant printf("%p, %p\n",(void *)&i,(void *)&j);, note the following:
It is not mandated by C specs to allocate variables in contiguous memory.
Often globals initialized with 0 are kept in BSS section (which is a part of data section) to save binary size. Other globals are kept in rest of the data section. (Depends on implementation detail, not mandated by C specs)
How can I intentionally do what compiler is doing automatically?
This is compiler specific question and your compiler documentation should possibly contain an answer to this.

One problem there,
printf("%d\n%d",&i,&j);
invokes undefined behavior. So, the outputs cannot be justified in any way. You need to use %p format specifier and cast the corresponding argument to (void *) to print a pointer.
That said, C standard does neither impose any constraints nor provide any guideline on where and how the variables will be stored in memory. It's up to the compiler implementation to decide how to place different variables in memory. You need to check the documentation of the compiler in use to find out the rules your compiler is following.
To elaborate in a generic way, an object file consists of many segments, like
Header (descriptive and control information)
Code segment ("text segment", executable code)
Data segment (initialized static variables)
Read-only data segment (rodata, initialized static constants)
BSS segment (uninitialized static data, both variables and constants)
External definitions and references for linking
Relocation information
Dynamic linking information
Debugging information
and it's up to the compiler to decide the address space (range/value) to be used for each segment.
As per the rules,
Global variables (i.e., having static storage duration) left uninitialized and initialized with 0 are placed in .bss segment.
Variables initialized with a non-zero value are placed in the .data segment
so, it's fair enough to say that the addresses of two variables pertaining to two different segments will not be contiguous.
Now, your observation checks out.
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory
yes, then all of them go to either .bss or .data and compiler choose to place them one after another, usually.
whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
This also holds true, both the variables are now placed in different segments.

Related

Organization of Virtual Memory in C

For each of the following, where does it appear to be stored in memory, and in what order: global variables, local variables, static local variables, function parameters, global constants, local constants, the functions themselves (and is main a special case?), dynamically allocated variables.
How will I evaluate this experimentally,i.e., using C code?
I know that
global variables -- data
static variables -- data
constant data types -- code
local variables(declared and defined in functions) -- stack
variables declared and defined in main function -- stack
pointers(ex: char *arr,int *arr) -- data or stack
dynamically allocated space(using malloc,calloc) -- heap

You could write some code to create all of the above, and then print out their addresses. For example:
void func(int a) {
int i = 0;
printf("local i address is %x\n", &i);
printf("parameter a address is %x\n", &a);
}
printf("func address is %x\n", (void *) &func);
note the function address is a bit tricky, you have to cast it a void* and when you take the address of a function you omit the (). Compare memory addresses and you will start to get a picture or where things are. Normally text (instructions) are at the bottom (closest to 0x0000) the heap is in the middle, and the stack starts at the top and grows down.

In theory
Pointers are no different from other variables as far as memory location is concerned.
Local variables and parameters might be allocated on the stack or directly in registers.
constant strings will be stored in a special data section, but basically the same kind of location as data.
numerical constants themselves will not be stored anywhere, they will be put into other variables or translated directly into CPU instructions.
for instance int a = 5; will store the constant 5 into the variable a (the actual memory is tied to the variable, not the constant), but a *= 5 will generate the code necessary to multiply a by the constant 5.
main is just a function like any other as far as memory location is concerned. A local main variable is no different from any other local variable, main code is located somewhere in code section like any other function, argc and argv are just parameters like any others (they are provided by the startup code that calls the main), etc.
code generation
Now if you want to see where the compiler and runtime put all these things, a possibility is to write a small program that defines a few of each, and ask the compiler to produce an assembly listing. You will then see how each element is stored.
For heap data, you will see calls to malloc, which is responsible for interfacing with the dynamic memory allocator.
For stack data, you will see strange references to stack pointers (the ebp register on x86 architectures), that will both be used for parameters and (automatic) local variables.
For global/static data, you will see labels named after your variables.
Constant strings will probably be labelled with an awful name, but you will notice they all go into a section (usually named bss) that will be linked next to data.
runtime addresses
Alternatively, you can run this program and ask it to print the addresses of each element. This, however, will not show you the register usage.
If you use a variable address, you will force the compiler to put it into memory, while it could have kept it into a register otherwise.
Note also that the memory organization is compiler and system dependent. The same code compiled with gcc and MSVC may have completely different addresses and elements in a completely different order.
Code optimizer is likely to do strange things too, so I advise to compile your sample code with all optimizations disabled first.
Looking at what the compiler does to gain size and/or speed might be interesting though.

How do global variables contribute to the size of the executable?

Does having global variables increase the size of the executable? If yes how? Does it increase only the data section size or also the text section size?
If I have a global variable and initialization as below:
char g_glbarr[1024] = {"jhgdasdghaKJSDGksgJKASDGHKDGAJKsdghkajdgaDGKAjdghaJKSDGHAjksdghJKDG"};
Now, does this add 1024 to data section and the size of the initilization string to text section?
If instead if allocating space for this array statically, if I malloc it, and then do a memcpy, only the data section size will reduce or the text section size also will reduce?

Yes, it does. Basically compilers store them to data segment. Sometimes if you use a constant char array in you code (like printf("<1024 char array goes here");) it will go to data segment (AFAIK some old compilers /Borland?/ may store it in the text segment). You can force the compiler to put a global variable in a custom section (for VC++ it was #pragma data_seg(<segment name>)).
Dynamic memory allocation doesn't affect data/text segments, since it allocates memory in the heap.

The answer is implementation-dependent, but for sane implementations this is how it works for variables with static storage duration (global or otherwise):
Whenever the variable is initialized, the whole initialized value of the object will be stored in the executable file. This is true even if only the initial part of it is explicitly initialized (the rest is implicitly zero).
If the variable is constant and initialized, it will be in the "text" segment, or equivalent. Some systems (modern ELF-based, maybe Windows too?) have a separate "rodata" segment for read-only data to allow it to be marked non-executable, separate from program code.
Non-constant initialized variables will be in the "data" segment in the executable, which is mapped into memory in copy-on-write mode by the operating system when the program is loaded.
Uninitialized variables (which are implicitly zero as per the standard) will have no storage reserved in the executable itself, but a size and offset in the "bss" segment, which is created at program load-time by the operating system.
Such uninitialized variables may be created in a separate read-only "bss"-like segment if they're const-qualified.

I am not speaking as an expert, but I would guess that simply having that epic string literal in your program would increase the size of your executable. What you do with that string literal doesn't matter, because it has to be stored somewhere.
Why does it matter which "section" of the executable is increased? This isn't a rhetorical question!

The answer is slightly implementation sensitive, but in general, no. Your g_glbarr is really a pointer to char, or an address. The string itself will be put into the data section with constant strings, and g_glbarr will become a symbol for the address of the string at compile time. You don't end up allocating space for the pointer and the compiler simply resolves the address at link time.
Update
#Jay, it's sorta kinda the same. The integers (usually) just are in-line: the compiler will come as close as it can to just putting the constant in the code, because that's such a common case that most normal architectures have a straightforward way of doing it from immediate data. The string constants will still be in some read-only data section. So when you make something like:
// warning: I haven't compiled this and wouldn't normally
// do it quite this way so I'm not positive this is
// completely grammatical C
struct X {int a; char * b; } x = { 1, "Hello" } ;
the 1 becomes "immediate" data, the "Hello" is allocated in read-only data somewhere, and the compiler will just generate something that allocates a piece of read-write data that looks something like
x:
x.a: WORD 1
x.b WORD #STR42
where STR42 is a symbolic name for the location of the string "Hello" in memory. Then when everything is linked together, the #STR42 is replaced with the actual virtual address of the string in memory.

how about .bss section not zero initialized

as we know .bss contains un-initialized variables. if in c code, programer initialize the variables before using them. then .bss is not necessary to be zero before executing C code.
Am I right?
Thanks

In C code, any variable with static storage duration is defined to be initialized to 0 by the spec (Section 6.7.8 Initialization, paragraph 10):
If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
Some program loaders will fill the whole section with zeroes to start with, and others will fill it 'on demand' as a perfomance improvement. So while you are technically correct that the .bss section may not really contain all zeroes when the C code starts executing, it logically does. In any case, assuming you have a standard compliant toolchain, you can think of it as being all zero.
Any variables that are initialized to non-zero values will never end up in the .bss section; they are handled in the .data or .rodata sections, depending on their particular characteristics.

The ELF specification says:
.bss This section holds uninitialized
data that contribute to the program’s
memory image. By definition, the
system initializes the data with zeros
when the program begins to run. The
section occupies no file space, as
indicated by the section type,
SHT_NOBITS.
It therefore follows that a C global variable which has a value assigned to it cannot be put into the .bss section and will have to go into the .data section. The .data section contains the initial valued for all the variables assigned to it.

That depends on where the variable is in code. For instance if you're talking about a local variable in main() or any other function, then variables are pushed onto the stack (unless you use other modifying keywords). If your variable is global AND uninitialized then it should be kept in .bss. Note that compiler optimization and so forth may change things around a bit. If you want to know for sure use readelf to investigate an ELF binary on linux.

It seems like you might be confused about the mechanism by which the .bss section ends up zero initialized. The code that you compile doesn'thave to explicitly initialize the region to zero because when the operating system first allocates a new page of memory to a process the OS makes sure that the page is zero initialized. This is done for security reasons, so that a process can't go looking for secrets that were left in memory when other processes exited.

Why are static variables auto-initialized to zero? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why global and static variables are initialized to their default values?
What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?

It is required by the standard (§6.7.8/10).
There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.
Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:
int foo() {
static int *ptr;
if (NULL == ptr)
// initialize it
}
If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.

Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.

Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.
The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.
Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.
Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.
When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.
The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.
Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.
Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.

Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.

There is discussion about this here:
First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.

Take a look : here 6.2.4(3) and 6.7.8 (10)

Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.
Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.
If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.
I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.

Where are constant variables stored in C?

I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?

How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.

Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.

Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)

Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.

Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.

offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????

This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.

This is specific to Win32 systems.

It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};

Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.

It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.

Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.

Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.

I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight