Set .data segment size in C, dynamically - c

Is there a means by which to manipulate the .data segment size in C without increasing compile size of the binary (i.e. setting the size without setting any variables within)?

Linux programs have two data sections: ".data" and ".bss". The ".data" is used for variables with initial value (static int x=5), while the ".bss" is used for variables that start with 0 (static int x). Adding data to '.data' will result in space to hold the initial value.
Consider going for the ".bss" section, which will have little impact on object size.

Related

Why does initializing an array with a one and zeros make the executable file so big?

If I compile the following program int array[5000]={0}; int main(){}, the output file size is much smaller than if I do int array[5000]={1}; int main(){}, which initializes the first element with a one and the rest with zeros, so why is there such a big difference on the file size?
Your array is a static global variable.
If it is declared as initialized with zeros only, it can be allocated in a special segment of memory, which is created during the process startup and initialized with zeros.
OTOH if it is declared as containing anythig non-zero, its initial value must be stored inside the program's file, so that when the operating system prepares the program in memory for being run, it can allocate appropriate segment of data and fill it with defined initial values.
See https://en.wikipedia.org/wiki/Data_segment for DATA and BSS segments.
When you don't initialize a global (or static) variable, it get's allocated in an output segment that is called .bss which is all zeros and so, it doesn't need the details to be written in the output file. If you put a single bit different than zero, the variable has to go into the initialized data segment (.data) which is written to the output file, as its contents must be detailed. This means that, even if you explicitly initialize it to zeros, the compiler realizes that the initialization coincides with the one of an uninitialized variable and stores the array in the .bss segment too, avoiding the grow in the final file.
For the .data segment, all of its contents is saved on the executable file, while for the .bss segment, only its size is stored, as the kernel can allocate a zero filled segment for it when it it loaded into memory.
In unix systems, the data segment initialization is made by checking the full size of the data segments (.data plus .bss) but only the .data segment is copied to the segment at loading time. The rest is allways filled by the kernel with zeros, by default. This accelerates the process of loading the code into memory for the kernel and makes the executable smaller.
so why is there such a big difference on the file size?
Essentially, it's because the compiler/linker/executable loader aren't good at optimizing.
If a statically allocated array is full of zeros (or uninitialized) the compiler puts it in a special section (".bss") with everything else that's zeros (or uninitialized); and because the program loader knows the entire section is full of zeros none of the data is stored in the file itself.
If a statically allocated array isn't full of zeros; then the compiler puts it in a different section (".data") and all of the data gets included in the file (even when it's "almost but not quite full of zeros").
Ideally; the compiler/tools would be able to detect simple cases (e.g. an array that is initialized with one non-zero value that is almost but not quite full of zeros) and put the array in the ".bss" so it costs nothing, but then generate a small amount of start-up code to correct it (e.g. set the first element in the array) before any of your code executes.
As a work-around, (if the array isn't read-only) you could do the same optimization yourself (leave the array full of zeros, and put an array[0] = 1; at the start of your main()).
From .bss [BSS in C]
An implementation may also assign statically-allocated variables and constants initialized with a value consisting solely of zero-valued bits to the BSS section.
The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file.
For program int array[5000]={0}; int main(){}
data and bss size:
# size a.out
text data bss dec hex filename
1040 484 20032 21556 5434 a.out
executable size:
# ls -l a.out
-rwxr-xr-x. 1 root root 6338 Sep 7 17:05 a.out
For program int array[5000]={1}; int main(){}
data and bss size:
# size a.out
text data bss dec hex filename
1040 20512 16 21568 5440 a.out
executable size:
# ls -l a.out
-rwxr-xr-x. 1 root root 26362 Sep 7 17:24 a.out
The output shown above is from Linux platform.

Why do we need .bss segment? [duplicate]

What I know is that global and static variables are stored in the .data segment, and uninitialized data are in the .bss segment. What I don't understand is why do we have dedicated segment for uninitialized variables? If an uninitialized variable has a value assigned at run time, does the variable exist still in the .bss segment only?
In the following program, a is in the .data segment, and b is in the .bss segment; is that correct? Kindly correct me if my understanding is wrong.
#include <stdio.h>
#include <stdlib.h>
int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9};
int b[20]; /* Uninitialized, so in the .bss and will not occupy space for 20 * sizeof (int) */
int main ()
{
;
}
Also, consider following program,
#include <stdio.h>
#include <stdlib.h>
int var[10]; /* Uninitialized so in .bss */
int main ()
{
var[0] = 20 /* **Initialized, where this 'var' will be ?** */
}
The reason is to reduce program size. Imagine that your C program runs on an embedded system, where the code and all constants are saved in true ROM (flash memory). In such systems, an initial "copy-down" must be executed to set all static storage duration objects, before main() is called. It will typically go like this pseudo:
for(i=0; i<all_explicitly_initialized_objects; i++)
{
.data[i] = init_value[i];
}
memset(.bss,
0,
all_implicitly_initialized_objects);
Where .data and .bss are stored in RAM, but init_value is stored in ROM. If it had been one segment, then the ROM had to be filled up with a lot of zeroes, increasing ROM size significantly.
RAM-based executables work similarly, though of course they have no true ROM.
Also, memset is likely some very efficient inline assembler, meaning that the startup copy-down can be executed faster.
The .bss segment is an optimization. The entire .bss segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data section is as big as the sum of sizes of the initialized variables. Thus, the .bss makes the executables smaller and quicker to load. Otherwise, the variables could be in the .data segment with explicit initialization to zeroes; the program would be hard-pressed to tell the difference. (In detail, the address of the objects in .bss would probably be different from the address if it was in the .data segment.)
In the first program, a would be in the .data segment and b would be in the .bss segment of the executable. Once the program is loaded, the distinction becomes immaterial. At run time, b occupies 20 * sizeof(int) bytes.
In the second program, var is allocated space and the assignment in main() modifies that space. It so happens that the space for var was described in the .bss segment rather than the .data segment, but that doesn't affect the way the program behaves when running.
From Assembly Language Step-by-Step: Programming with Linux by Jeff Duntemann, regarding the .data section:
The .data section contains data definitions of initialized data items. Initialized
data is data that has a value before the program begins running. These values
are part of the executable file. They are loaded into memory when the
executable file is loaded into memory for execution.
The important thing to remember about the .data section is that the
more initialized data items you define, the larger the executable file
will be, and the longer it will take to load it from disk into memory
when you run it.
and the .bss section:
Not all data items need to have values before the program begins running.
When you’re reading data from a disk file, for example, you need to have a
place for the data to go after it comes in from disk. Data buffers like that are
defined in the .bss section of your program. You set aside some number of
bytes for a buffer and give the buffer a name, but you don’t say what values
are to be present in the buffer.
There’s a crucial difference between data items defined in the .data
section and data items defined in the .bss section: data items in the
.data section add to the size of your executable file. Data items in
the .bss section do not. A buffer that takes up 16,000 bytes (or more,
sometimes much more) can be defined in .bss and add almost nothing
(about 50 bytes for the description) to the executable file size.
Well, first of all, those variables in your example aren't uninitialized; C specifies that static variables not otherwise initialized are initialized to 0.
So the reason for .bss is to have smaller executables, saving space and allowing faster loading of the program, as the loader can just allocate a bunch of zeroes instead of having to copy the data from disk.
When running the program, the program loader will load .data and .bss into memory. Writes into objects residing in .data or .bss thus only go to memory, they are not flushed to the binary on disk at any point.
The System V ABI 4.1 (1997) (AKA ELF specification) also contains the answer:
.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.
says that the section name .bss is reserved and has special effects, in particular it occupies no file space, thus the advantage over .data.
The downside is of course that all bytes must be set to 0 when the OS puts them on memory, which is more restrictive, but a common use case, and works fine for uninitialized variables.
The SHT_NOBITS section type documentation repeats that affirmation:
sh_size This member gives the section’s size in bytes. Unless the section type is SHT_NOBITS, the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.
The C standard says nothing about sections, but we can easily verify where the variable is stored in Linux with objdump and readelf, and conclude that uninitialized globals are in fact stored in the .bss. See for example this answer: What happens to a declared, uninitialized variable in C?
The wikipedia article .bss provides a nice historical explanation, given that the term is from the mid-1950's (yippee my birthday;-).
Back in the day, every bit was precious, so any method for signalling reserved empty space, was useful. This (.bss) is the one that has stuck.
.data sections are for space that is not empty, rather it will have (your) defined values entered into it.

Peculiar memory allocation of global variables by c compiler

When I am declaring some variable outside main then compile stores them in some peculiar way.
int i=1,j=1;
void main(void)
{
printf("%d\n%d",&i,&j);
}
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
The problem with is when they are stored on contiguous address spaces it causes some real performance issues like false sharing (have a look here). I've learned that to prevent this, there should be some space between variable's addresses which is automatically provided when i=0 and j=any +ve value.
Now, what I want to understand is:
Why the compiler stores variables to noncontinuous addresses only when one initialized to 0 and other initialized to positive values, and
How can I intentionally do what compiler is doing automatically i.e allocating variables to fairly separated address space.
(Using devcpp gcc 4.9.2)
Assuming you meant printf("%p, %p\n",(void *)&i,(void *)&j);, note the following:
It is not mandated by C specs to allocate variables in contiguous memory.
Often globals initialized with 0 are kept in BSS section (which is a part of data section) to save binary size. Other globals are kept in rest of the data section. (Depends on implementation detail, not mandated by C specs)
How can I intentionally do what compiler is doing automatically?
This is compiler specific question and your compiler documentation should possibly contain an answer to this.
One problem there,
printf("%d\n%d",&i,&j);
invokes undefined behavior. So, the outputs cannot be justified in any way. You need to use %p format specifier and cast the corresponding argument to (void *) to print a pointer.
That said, C standard does neither impose any constraints nor provide any guideline on where and how the variables will be stored in memory. It's up to the compiler implementation to decide how to place different variables in memory. You need to check the documentation of the compiler in use to find out the rules your compiler is following.
To elaborate in a generic way, an object file consists of many segments, like
Header (descriptive and control information)
Code segment ("text segment", executable code)
Data segment (initialized static variables)
Read-only data segment (rodata, initialized static constants)
BSS segment (uninitialized static data, both variables and constants)
External definitions and references for linking
Relocation information
Dynamic linking information
Debugging information
and it's up to the compiler to decide the address space (range/value) to be used for each segment.
As per the rules,
Global variables (i.e., having static storage duration) left uninitialized and initialized with 0 are placed in .bss segment.
Variables initialized with a non-zero value are placed in the .data segment
so, it's fair enough to say that the addresses of two variables pertaining to two different segments will not be contiguous.
Now, your observation checks out.
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory
yes, then all of them go to either .bss or .data and compiler choose to place them one after another, usually.
whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
This also holds true, both the variables are now placed in different segments.

Memory layout of a c program

I am reading this article http://www.geeksforgeeks.org/memory-layout-of-c-program/,
it said " Uninitialized variable stored in bss", "Initialized variable stored in Data segment"
My question is why we need to have 2 separate segments for variables? 1. BSS 2. Data segment?
Why not just put everything into 1 segment?
BSS takes up no space in the program image. It just indicates how large the BSS section is and the runtime will set that memory to zero.
The data section is filled with the initial values for the variables so it takes space in the program image file.
To my knowledge, uninitialized variables (in .bss) are (or should be) zerod out when entering the program. Initialised variables (.data) get a specific value.
This means that in the executable of your program (stored on disk), the .data segment must be included byte per byte (since each variable has a potentially different value). The .bss however, must not be saved byte per byte. One must only know the size to reserve in memory when loading the executable. The program knows the offset of each variable in .bss
To zero out all the uninitialized variables, a few assembler instructions will do (for x86: rep stosw with some register settings for instance).
Conclusion: loading and initialisation time for .data is lot worse than for large .bss segments, since the .data must be loaded from disk, and .bss is only to be reserved on the fly with very few cpu instructions.

Where are constant variables stored in C?

I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?
How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.
Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.
Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)
Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.
Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.
offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????
This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.
This is specific to Win32 systems.
It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};
Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.
It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.
Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.
Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.
I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.

Resources