Why does initializing an array with a one and zeros make the executable file so big?

Why does initializing an array with a one and zeros make the executable file so big? - c

If I compile the following program int array[5000]={0}; int main(){}, the output file size is much smaller than if I do int array[5000]={1}; int main(){}, which initializes the first element with a one and the rest with zeros, so why is there such a big difference on the file size?

Your array is a static global variable.
If it is declared as initialized with zeros only, it can be allocated in a special segment of memory, which is created during the process startup and initialized with zeros.
OTOH if it is declared as containing anythig non-zero, its initial value must be stored inside the program's file, so that when the operating system prepares the program in memory for being run, it can allocate appropriate segment of data and fill it with defined initial values.
See https://en.wikipedia.org/wiki/Data_segment for DATA and BSS segments.

When you don't initialize a global (or static) variable, it get's allocated in an output segment that is called .bss which is all zeros and so, it doesn't need the details to be written in the output file. If you put a single bit different than zero, the variable has to go into the initialized data segment (.data) which is written to the output file, as its contents must be detailed. This means that, even if you explicitly initialize it to zeros, the compiler realizes that the initialization coincides with the one of an uninitialized variable and stores the array in the .bss segment too, avoiding the grow in the final file.
For the .data segment, all of its contents is saved on the executable file, while for the .bss segment, only its size is stored, as the kernel can allocate a zero filled segment for it when it it loaded into memory.
In unix systems, the data segment initialization is made by checking the full size of the data segments (.data plus .bss) but only the .data segment is copied to the segment at loading time. The rest is allways filled by the kernel with zeros, by default. This accelerates the process of loading the code into memory for the kernel and makes the executable smaller.

so why is there such a big difference on the file size?
Essentially, it's because the compiler/linker/executable loader aren't good at optimizing.
If a statically allocated array is full of zeros (or uninitialized) the compiler puts it in a special section (".bss") with everything else that's zeros (or uninitialized); and because the program loader knows the entire section is full of zeros none of the data is stored in the file itself.
If a statically allocated array isn't full of zeros; then the compiler puts it in a different section (".data") and all of the data gets included in the file (even when it's "almost but not quite full of zeros").
Ideally; the compiler/tools would be able to detect simple cases (e.g. an array that is initialized with one non-zero value that is almost but not quite full of zeros) and put the array in the ".bss" so it costs nothing, but then generate a small amount of start-up code to correct it (e.g. set the first element in the array) before any of your code executes.
As a work-around, (if the array isn't read-only) you could do the same optimization yourself (leave the array full of zeros, and put an array[0] = 1; at the start of your main()).

From .bss [BSS in C]
An implementation may also assign statically-allocated variables and constants initialized with a value consisting solely of zero-valued bits to the BSS section.
The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file.
For program int array[5000]={0}; int main(){}
data and bss size:
# size a.out
text data bss dec hex filename
1040 484 20032 21556 5434 a.out
executable size:
# ls -l a.out
-rwxr-xr-x. 1 root root 6338 Sep 7 17:05 a.out
For program int array[5000]={1}; int main(){}
data and bss size:
# size a.out
text data bss dec hex filename
1040 20512 16 21568 5440 a.out
executable size:
# ls -l a.out
-rwxr-xr-x. 1 root root 26362 Sep 7 17:24 a.out
The output shown above is from Linux platform.

Related

Why do we need .bss segment? [duplicate]

What I know is that global and static variables are stored in the .data segment, and uninitialized data are in the .bss segment. What I don't understand is why do we have dedicated segment for uninitialized variables? If an uninitialized variable has a value assigned at run time, does the variable exist still in the .bss segment only?
In the following program, a is in the .data segment, and b is in the .bss segment; is that correct? Kindly correct me if my understanding is wrong.
#include <stdio.h>
#include <stdlib.h>
int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9};
int b[20]; /* Uninitialized, so in the .bss and will not occupy space for 20 * sizeof (int) */
int main ()
{
;
}
Also, consider following program,
#include <stdio.h>
#include <stdlib.h>
int var[10]; /* Uninitialized so in .bss */
int main ()
{
var[0] = 20 /* **Initialized, where this 'var' will be ?** */
}

The reason is to reduce program size. Imagine that your C program runs on an embedded system, where the code and all constants are saved in true ROM (flash memory). In such systems, an initial "copy-down" must be executed to set all static storage duration objects, before main() is called. It will typically go like this pseudo:
for(i=0; i<all_explicitly_initialized_objects; i++)
{
.data[i] = init_value[i];
}
memset(.bss,
0,
all_implicitly_initialized_objects);
Where .data and .bss are stored in RAM, but init_value is stored in ROM. If it had been one segment, then the ROM had to be filled up with a lot of zeroes, increasing ROM size significantly.
RAM-based executables work similarly, though of course they have no true ROM.
Also, memset is likely some very efficient inline assembler, meaning that the startup copy-down can be executed faster.

The .bss segment is an optimization. The entire .bss segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data section is as big as the sum of sizes of the initialized variables. Thus, the .bss makes the executables smaller and quicker to load. Otherwise, the variables could be in the .data segment with explicit initialization to zeroes; the program would be hard-pressed to tell the difference. (In detail, the address of the objects in .bss would probably be different from the address if it was in the .data segment.)
In the first program, a would be in the .data segment and b would be in the .bss segment of the executable. Once the program is loaded, the distinction becomes immaterial. At run time, b occupies 20 * sizeof(int) bytes.
In the second program, var is allocated space and the assignment in main() modifies that space. It so happens that the space for var was described in the .bss segment rather than the .data segment, but that doesn't affect the way the program behaves when running.

From Assembly Language Step-by-Step: Programming with Linux by Jeff Duntemann, regarding the .data section:
The .data section contains data definitions of initialized data items. Initialized
data is data that has a value before the program begins running. These values
are part of the executable file. They are loaded into memory when the
executable file is loaded into memory for execution.
The important thing to remember about the .data section is that the
more initialized data items you define, the larger the executable file
will be, and the longer it will take to load it from disk into memory
when you run it.
and the .bss section:
Not all data items need to have values before the program begins running.
When you’re reading data from a disk file, for example, you need to have a
place for the data to go after it comes in from disk. Data buffers like that are
defined in the .bss section of your program. You set aside some number of
bytes for a buffer and give the buffer a name, but you don’t say what values
are to be present in the buffer.
There’s a crucial difference between data items defined in the .data
section and data items defined in the .bss section: data items in the
.data section add to the size of your executable file. Data items in
the .bss section do not. A buffer that takes up 16,000 bytes (or more,
sometimes much more) can be defined in .bss and add almost nothing
(about 50 bytes for the description) to the executable file size.

Well, first of all, those variables in your example aren't uninitialized; C specifies that static variables not otherwise initialized are initialized to 0.
So the reason for .bss is to have smaller executables, saving space and allowing faster loading of the program, as the loader can just allocate a bunch of zeroes instead of having to copy the data from disk.
When running the program, the program loader will load .data and .bss into memory. Writes into objects residing in .data or .bss thus only go to memory, they are not flushed to the binary on disk at any point.

The System V ABI 4.1 (1997) (AKA ELF specification) also contains the answer:
.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.
says that the section name .bss is reserved and has special effects, in particular it occupies no file space, thus the advantage over .data.
The downside is of course that all bytes must be set to 0 when the OS puts them on memory, which is more restrictive, but a common use case, and works fine for uninitialized variables.
The SHT_NOBITS section type documentation repeats that affirmation:
sh_size This member gives the section’s size in bytes. Unless the section type is SHT_NOBITS, the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.
The C standard says nothing about sections, but we can easily verify where the variable is stored in Linux with objdump and readelf, and conclude that uninitialized globals are in fact stored in the .bss. See for example this answer: What happens to a declared, uninitialized variable in C?

The wikipedia article .bss provides a nice historical explanation, given that the term is from the mid-1950's (yippee my birthday;-).
Back in the day, every bit was precious, so any method for signalling reserved empty space, was useful. This (.bss) is the one that has stuck.
.data sections are for space that is not empty, rather it will have (your) defined values entered into it.

static int arr[10] memory address always ends in 060

I have a c program that looks like this
main.c
#include <stdio.h>
#define SOME_VAR 10
static int heap[SOME_VAR];
int main(void) {
printf("%p", heap);
return 0;
}
and outputs this when I run the compiled program a few times
0x58aa7c49060
0x56555644060
0x2f8d1f8e060
0x92f58280060
0x59551c53060
0xd474ed6e060
0x767c4561060
0xf515aeda060
0xbe62367e060
Why does it always end in 060? And is the array stored in heap?
Edit: I am on Linux and I have ASLR on. I compiled the program using gcc

The addresses differ because of ASLR (Address space layout ramdomization). Using this, the binary can be mapped at different locations in the virtual address space.
The variable heap is - in contrast to it's name - not located on the heap, but on the bss. The offset in the address space is therefore constant.
Pages are mapped at page granularity, which is 4096 bytes (hex: 0x1000) on many platforms. This is the reason, why the last three hex digits of the address is the same.
When you did the same with a stack variable, the address could even vary in the last digits on some platforms (namely linux with recent kernels), because the stack is not only mapped somewhere else but also receives a random offset on startup.

If you are using Windows, the reason is PE structure.
Your heap variable is stored in .data section of file and its address is calculated based on start of this section. Each section is loaded in an address independently, but its starting address is multiple of page size. Because you have no other variables, its address is probably start of .data section, so its address will be multiple of chunk size.
For example, this is the table of the compiled Windows version of your code:
The .text section is were your compiled code is and .data contains your heap variable. When your PE is loaded into memory, sections are loaded in different address and which is returned by VirtualAlloc() and will be multiple of page size. But address of each variable is relative to start of section that is now a page size. So you will always see a fixed number on lower digits. Since the relative address of heap from start of section is based on compiler, compile options, etc. you will see different number from same code but different compilers, but every time what will be printed is fixed.
When I compile code, I noticed heap is placed on 0x8B0 bytes after start of .data section. So every time that I run this code, my address end in 0x8B0.

The compiler happened to put heap at offset 0x60 bytes in a data segment it has, possibly because the compiler has some other stuff in the first 0x60 bytes, such as data used by the code that starts the main routine. That is why you see “060”; it is just where it happened to be, and there is no great significance to it.
Address space layout randomization changes the base address(es) used for various parts of program memory, but it always does so in units of 0x1000 bytes (because this avoids causing problems with alignment and other issues). So you see the addresses fluctuate by multiples of 0x1000, but the last three digits do not change.
The definition static int heap[SOME_VAR]; defines heap with static storage duration. Typical C implementations store it in a general data section, not in the heap. The “heap” is a misnomer for memory that is used for dynamic allocation. (It is a misnomer because malloc implementations may use a variety of data structures and algorithms, not limited to heaps. They may even use multiple methods in one implementation.)

Set .data segment size in C, dynamically

Is there a means by which to manipulate the .data segment size in C without increasing compile size of the binary (i.e. setting the size without setting any variables within)?

Linux programs have two data sections: ".data" and ".bss". The ".data" is used for variables with initial value (static int x=5), while the ".bss" is used for variables that start with 0 (static int x). Adding data to '.data' will result in space to hold the initial value.
Consider going for the ".bss" section, which will have little impact on object size.

Why does GCC not assign the static variable when it is initialized to 0

I initialize a static variable to 0, but when I see the assembly code, I find that only memory is allocated to the variable. The value is not assigned
And when I initialize the static variable to other numbers, I can find that the memory is assigned a value.
I guess whether GCC thinks the memory should be initialized to 0 by OS before we use the memory.
The GCC option I use is "gcc -m32 -fno-stack-protector -c -o"
When I initialize the static variable to 0, the c code and the assembly code:
static int temp_front=0;
.local temp_front.1909
.comm temp_front.1909,4,4
When I initialize it to other numbers, the code is:
static int temp_front=1;
.align 4
.type temp_front.1909, #object
.size temp_front.1909, 4
temp_front.1909:
.long 1

TL:DR: GCC knows the BSS is guaranteed to be zero-initialized on the platform it's targeting so it puts zero-initialized static data there.
Big picture
The program loader of most modern operating systems gets two different sizes for each part of the program, like the data part. The first size it gets is the size of data stored in the executable file (like a PE/COFF .EXE file on Windows or an ELF executable on Linux), while the second size is the size of the data part in memory while the program is running.
If the data size for the running program is bigger than the amount of data stored in the executable file, the remaining part of the data section is filled with bytes containing zero. In your program, the .comm line tells the linker to reserve 4 bytes without initializing them, so that the OS zero-initializes them on start.
What does gcc do?
gcc (or any other C compiler) allocates zero-initialized variables with static storage duration in the .bss section. Everything allocated in that section will be zero-initialized on program startup. For allocation, it uses the comm directive, and it just specifies the size (4 bytes).
You can see the size of the main section types (code, data, bss) using the size command. If you initialize the variable with one, it is included in a data section, and occupies 4 bytes there. If you initialize it with zero (or not at all), it is instead allocated in the .bss section.
What does ld do?
ld merges all data-type section of all object files (even those from static libraries) into one data section, followed by all .bss-type sections. The executable output contains a simplified view for the operating system's program loader. For ELF files, this is the "program header". You can take a look at it using objdump -p for any format, or readelf for ELF files.
The program headers contain of entries of different type. Among them are a couple of entries with the type PT_LOAD describing the "segments" to be loaded by the operating system. One of these PT_LOAD entries is for the data area (where the .data section is linked). It contains an entry called p_filesz that specifies how many bytes for initialized variables are provided in the ELF file, and an entry called p_memsz telling the loader how much space in the address space should be reserved. The details on which sections get merged into what PT_LOAD entries differ between linkers and depend on command line options, but generally you will find a PT_LOAD entry that describes a region that is both readable and writeable, but not executable, and has a p_filesz value that is smaller than the p_memsz entry (potentially zero if there's only a .bss, no .data section). p_filesz is the size of all read+write data sections, whereas p_memsz is bigger to also provide space for zero-initialized variables.
The amount p_memsz exceeds p_filesz is the sum of all .bss sections linked into the executable. (The values might be off a bit due to alignment to pages or disk blocks)
See chapter 5 in the System V ABI specification, especially pages 5-2 and 5-3 for a description of the program header entries.
What does the operating system do?
The Linux kernel (or another ELF-compliant kernel) iterates over all entries in the program header. For each entry containing the type PT_LOAD it allocates virtual address space. It associates the beginning of that address space with the corresponding region in the executable file, and if the space is writeable, it enables copy-on-write.
If p_memsz exceeds p_filesz, the kernel arranges the remaining address space to be completely zeroed out. So the variable that got allocated in the .bss section by gcc ends up in the "tail" of the read-write PT_LOAD entry in the ELF file, and the kernel provides the zero.
Any whole pages that have no backing data can start out copy-on-write mapped to a shared physical page of zeros.

Why does GCC not assign ...
Most modern OSs will automatically zero-initialize the BSS section.
Using such an OS an "uninitialized" variable is identical to a variable that is initialized to zero.
However, there is one difference: The data of uninitialized variables are not stored in the resulting object and executable files; the data of initialized variables is.
This means that "real" zero-initialized variables may lead to a larger file size compared to uninitialized variables.
For this reason the compiler prefers using "uninitialized" variables if variables are really zero-initialized.
The GCC option I use is ...
Of course there are also operating systems which do not automatically initialize "uninitialized" memory to zero.
As far as I remember Windows 95 is an example for this.
If you want to compile for such an operating system, you may use the GCC command line option -fno-zero-initialized-in-bss. This command line option forces GCC to "really" zero-initialize variables that are zero-initialized.
I just compiled your code with that command line option; the output looks like this:
.data
.align 4
.type temp_front, #object
.size temp_front, 4
temp_front:
.zero 4

There's no point even in Windows 95 to make zero-initialisation in code of every compiled module. May be the Win95 program loader (or even MS-DOS) does not initialize the bss section, but the "ctr0" init module (linked in every comppiled C/C++ program, and that will finally call main() or the DllEntry point, can do that directly in a fast operation for the whole BSS section, whose size is already on the program header and that can also be determined in a static preinitialized variable whose value is computed by the linker, and there's no need to change the way each module is compiled with gcc.
However there are more difficulties about automatic variables (local variables allocated on the stack): the compiler does not know if the variable will be initialized if its first use is by reference in a call parameter (to a non-inlined function, which may be in another module compiled separately or linked from an external library or DLL), supposed to fill it.
GCC only knows when the variable is explicitly assigned in the function itself, but if it gets used by reference only, GCC can now fircibly preinitialize it to zero to prevent it to keep a sensitive value left on the stack. In that case this adds some zero-fill code in the compiled function preamble for these local variables, and this helps prevent some data leaks (generally such leak is unlikely when the varaible is a simple type, but when it is a whole structure, many fields may be left in random state by the subcall.
C11 indicates that such code assuming initialization of auto variables has "undefined" behavior. But GCC will help close the security risk: this is allowed by C11 because this forced zeroing is better to leaving random value and both behaviors are conforming to the "undefined" behavior: zero is as well acceptable as a randomly leaked value.
Some secure functions also avoid leaving senstive data when returning, they explicitly clear the variables they no longer need to avoid expose them after these function return (and notably when they return from a privilege code to an unprivileged one): this is a good practice, but it is independant of the forced initilization of auto variables used by references in subcalls before they were initialized. And GCC is smart enough to not forcibly initialize these auto varaibles when there's explicit code that assign them an explicit value. So the impact is minimal. This feature may be disabled in GCC for those apps that want microoptimizations in terms of performance, but in both cases this does not add to the BSS size, and the image size just grows by only <0.1% for the Linux kernel only because of the few bytes of code compiled in a few functions that benefit of this security fix.
And this has no effect on "uninitialized" static variables, that GCC puts in the BSS section, cleared by the program loader of the OS, or by the small program's crt0 init module.

Memory layout of a c program

I am reading this article http://www.geeksforgeeks.org/memory-layout-of-c-program/,
it said " Uninitialized variable stored in bss", "Initialized variable stored in Data segment"
My question is why we need to have 2 separate segments for variables? 1. BSS 2. Data segment?
Why not just put everything into 1 segment?

BSS takes up no space in the program image. It just indicates how large the BSS section is and the runtime will set that memory to zero.
The data section is filled with the initial values for the variables so it takes space in the program image file.

To my knowledge, uninitialized variables (in .bss) are (or should be) zerod out when entering the program. Initialised variables (.data) get a specific value.
This means that in the executable of your program (stored on disk), the .data segment must be included byte per byte (since each variable has a potentially different value). The .bss however, must not be saved byte per byte. One must only know the size to reserve in memory when loading the executable. The program knows the offset of each variable in .bss
To zero out all the uninitialized variables, a few assembler instructions will do (for x86: rep stosw with some register settings for instance).
Conclusion: loading and initialisation time for .data is lot worse than for large .bss segments, since the .data must be loaded from disk, and .bss is only to be reserved on the fly with very few cpu instructions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight