Make ARM Fromelf Output Binary With Zero Initialized Data - c

I have some data that are zero initialized.
I have allocated an execution region for them in the scatter file.
Some_Execution_Region +0
{
stuff.o (+RO, +RW, +RI)
}
But I don't see any segments of zero initialized data in the resulting binary after using fromelf to convert it from the axf file.
The binary file stops right before where the zero initialized data should start.
So the question is how I can make fromelf generate empty region for the zero initialized data in the binary file.
I've looked up on the ARM site and have had no luck. I only found out some option to disable zero initialized data. (Doesn't this mean since I'm not using that option, I should get my zero initialized data in my binary?)
I currently just run fromelf.exe --bin --output=binary.bin elffile.axf, which doesnt generate the zero data.

There's no need to actually store all of the zeros in the binary. If you are using C then the default is to zero-initialize static variables that are not explicitly initialized to some other value. The C runtime code doesn't copy zeros from the executable to RAM, it just gets the beginning and ending addresses of the part of RAM that needs to be zeroed. The typical way of telling the linker that variables must be zeroed at startup is to put them in the .bss (blanked static storage) segment, but you shouldn't need to do this explicitly unless you are writing assembly code.

Related

where is it documented that global array in C, compiled by gcc, is initialized like "copy-on-write"?

For this C code:
foobar.c:
static int array[256];
int main() {
return 0;
}
the array is initialized to all 0's, by the C standard. However, when I compile
gcc -S foobar.c
this produces the assembly code foobar.s that I can inspect, and nowhere in foobar.s, is there any initialization of contents of the array.
Hence I reason, that the contents are not initialized, only when an element of the array is inspected, is it initialized, kind of like "copy-on-write" mechanism for fork.
Is my reasoning correct? If so, is this a documented feature, and if so where can I find that documentation?
There's kind of a lot of levels here. This answer addresses Linux in particular, but the same concepts are likely to apply on other systems, possibly with different names.
The compiler requires that the object be "zero initialized". In other words, when a memory read instruction is executed with an address in that range, the value that it reads must be zero. As you say, this is necessary to achieve the behavior dictated by the C standard.
The compiler accomplishes this by asking the assembler to fill the space with zeros, one way or another. It may use the .space or .zero directive which implicitly requests this. It will also place the object in a section with the special name .bss (the reasons for this name are historical). If you look further up in the assembly output, you should see a directive like .bss or .section .bss. The assembler and linker promises that this entire section will be (somehow) initialized to zero. This is documented:
The bss section is used for local common variable storage. You may allocate address space in the bss section, but you may not dictate data to load into it before your program executes. When your program starts running, all the contents of the bss section are zeroed bytes.
Okay, so now what do the assembler and linker do to make it happen? Well, an ELF executable file has a segment header, which specifies how and where code and data from the file should be mapped into the program's memory. (Please note that the use of the word "segment" here has nothing to do with the x86 memory segmentation model or segment registers, and is only vaguely related to the term "segmentation fault".) The size of the segment, and the amount of data to be mapped, are specified separately. If the size is greater, then all remaining bytes are to be initialized to zero. This is also documented in the above-linked man page:
PT_LOAD
The array element specifies a loadable segment,
described by p_filesz and p_memsz. The bytes
from the file are mapped to the beginning of the
memory segment. If the segment's memory size
p_memsz is larger than the file size p_filesz,
the "extra" bytes are defined to hold the value
0 and to follow the segment's initialized area.
So the linker ensures that the ELF executable contains such a segment, and that all objects in the .bss section are in this segment, but not within the part that is mapped to the file.
Once all this is done, then the observable behavior is guaranteed: as above, when an instruction attempts to read from this object before it has been written, the value it reads will be zero.
Now as to how that behavior is ensured at runtime: that is the job of the kernel. It could do it by pre-allocating actual physical memory for that range of virtual addresses, and filling it with zeros. Or by an "allocate on demand" method, like what you describe, by leaving those pages unmapped in the CPU's page tables. Then any access to those pages by the application will cause a page fault, which will be handled by the kernel, which will allocate zero-filled physical memory at that time, and then restart the faulting instruction. This is completely transparent to the application. It just sees that the read instruction got the value zero. If there was a page fault, then it just seems to the application like the read instruction took a long time to execute.
The kernel normally uses the "on demand" method, because it is more efficient in case not all of the "zero initialized" memory is actually used. But this is not going to be documented as guaranteed behavior; it is an implementation detail. An application programmer need not care, and in fact must not care, how it works under the hood. If the Linux kernel maintainers decide tomorrow to switch everything to the pre-allocate method, every application will work exactly as it did before, just maybe a little faster or slower.

Why does GCC not assign the static variable when it is initialized to 0

I initialize a static variable to 0, but when I see the assembly code, I find that only memory is allocated to the variable. The value is not assigned
And when I initialize the static variable to other numbers, I can find that the memory is assigned a value.
I guess whether GCC thinks the memory should be initialized to 0 by OS before we use the memory.
The GCC option I use is "gcc -m32 -fno-stack-protector -c -o"
When I initialize the static variable to 0, the c code and the assembly code:
static int temp_front=0;
.local temp_front.1909
.comm temp_front.1909,4,4
When I initialize it to other numbers, the code is:
static int temp_front=1;
.align 4
.type temp_front.1909, #object
.size temp_front.1909, 4
temp_front.1909:
.long 1
TL:DR: GCC knows the BSS is guaranteed to be zero-initialized on the platform it's targeting so it puts zero-initialized static data there.
Big picture
The program loader of most modern operating systems gets two different sizes for each part of the program, like the data part. The first size it gets is the size of data stored in the executable file (like a PE/COFF .EXE file on Windows or an ELF executable on Linux), while the second size is the size of the data part in memory while the program is running.
If the data size for the running program is bigger than the amount of data stored in the executable file, the remaining part of the data section is filled with bytes containing zero. In your program, the .comm line tells the linker to reserve 4 bytes without initializing them, so that the OS zero-initializes them on start.
What does gcc do?
gcc (or any other C compiler) allocates zero-initialized variables with static storage duration in the .bss section. Everything allocated in that section will be zero-initialized on program startup. For allocation, it uses the comm directive, and it just specifies the size (4 bytes).
You can see the size of the main section types (code, data, bss) using the size command. If you initialize the variable with one, it is included in a data section, and occupies 4 bytes there. If you initialize it with zero (or not at all), it is instead allocated in the .bss section.
What does ld do?
ld merges all data-type section of all object files (even those from static libraries) into one data section, followed by all .bss-type sections. The executable output contains a simplified view for the operating system's program loader. For ELF files, this is the "program header". You can take a look at it using objdump -p for any format, or readelf for ELF files.
The program headers contain of entries of different type. Among them are a couple of entries with the type PT_LOAD describing the "segments" to be loaded by the operating system. One of these PT_LOAD entries is for the data area (where the .data section is linked). It contains an entry called p_filesz that specifies how many bytes for initialized variables are provided in the ELF file, and an entry called p_memsz telling the loader how much space in the address space should be reserved. The details on which sections get merged into what PT_LOAD entries differ between linkers and depend on command line options, but generally you will find a PT_LOAD entry that describes a region that is both readable and writeable, but not executable, and has a p_filesz value that is smaller than the p_memsz entry (potentially zero if there's only a .bss, no .data section). p_filesz is the size of all read+write data sections, whereas p_memsz is bigger to also provide space for zero-initialized variables.
The amount p_memsz exceeds p_filesz is the sum of all .bss sections linked into the executable. (The values might be off a bit due to alignment to pages or disk blocks)
See chapter 5 in the System V ABI specification, especially pages 5-2 and 5-3 for a description of the program header entries.
What does the operating system do?
The Linux kernel (or another ELF-compliant kernel) iterates over all entries in the program header. For each entry containing the type PT_LOAD it allocates virtual address space. It associates the beginning of that address space with the corresponding region in the executable file, and if the space is writeable, it enables copy-on-write.
If p_memsz exceeds p_filesz, the kernel arranges the remaining address space to be completely zeroed out. So the variable that got allocated in the .bss section by gcc ends up in the "tail" of the read-write PT_LOAD entry in the ELF file, and the kernel provides the zero.
Any whole pages that have no backing data can start out copy-on-write mapped to a shared physical page of zeros.
Why does GCC not assign ...
Most modern OSs will automatically zero-initialize the BSS section.
Using such an OS an "uninitialized" variable is identical to a variable that is initialized to zero.
However, there is one difference: The data of uninitialized variables are not stored in the resulting object and executable files; the data of initialized variables is.
This means that "real" zero-initialized variables may lead to a larger file size compared to uninitialized variables.
For this reason the compiler prefers using "uninitialized" variables if variables are really zero-initialized.
The GCC option I use is ...
Of course there are also operating systems which do not automatically initialize "uninitialized" memory to zero.
As far as I remember Windows 95 is an example for this.
If you want to compile for such an operating system, you may use the GCC command line option -fno-zero-initialized-in-bss. This command line option forces GCC to "really" zero-initialize variables that are zero-initialized.
I just compiled your code with that command line option; the output looks like this:
.data
.align 4
.type temp_front, #object
.size temp_front, 4
temp_front:
.zero 4
There's no point even in Windows 95 to make zero-initialisation in code of every compiled module. May be the Win95 program loader (or even MS-DOS) does not initialize the bss section, but the "ctr0" init module (linked in every comppiled C/C++ program, and that will finally call main() or the DllEntry point, can do that directly in a fast operation for the whole BSS section, whose size is already on the program header and that can also be determined in a static preinitialized variable whose value is computed by the linker, and there's no need to change the way each module is compiled with gcc.
However there are more difficulties about automatic variables (local variables allocated on the stack): the compiler does not know if the variable will be initialized if its first use is by reference in a call parameter (to a non-inlined function, which may be in another module compiled separately or linked from an external library or DLL), supposed to fill it.
GCC only knows when the variable is explicitly assigned in the function itself, but if it gets used by reference only, GCC can now fircibly preinitialize it to zero to prevent it to keep a sensitive value left on the stack. In that case this adds some zero-fill code in the compiled function preamble for these local variables, and this helps prevent some data leaks (generally such leak is unlikely when the varaible is a simple type, but when it is a whole structure, many fields may be left in random state by the subcall.
C11 indicates that such code assuming initialization of auto variables has "undefined" behavior. But GCC will help close the security risk: this is allowed by C11 because this forced zeroing is better to leaving random value and both behaviors are conforming to the "undefined" behavior: zero is as well acceptable as a randomly leaked value.
Some secure functions also avoid leaving senstive data when returning, they explicitly clear the variables they no longer need to avoid expose them after these function return (and notably when they return from a privilege code to an unprivileged one): this is a good practice, but it is independant of the forced initilization of auto variables used by references in subcalls before they were initialized. And GCC is smart enough to not forcibly initialize these auto varaibles when there's explicit code that assign them an explicit value. So the impact is minimal. This feature may be disabled in GCC for those apps that want microoptimizations in terms of performance, but in both cases this does not add to the BSS size, and the image size just grows by only <0.1% for the Linux kernel only because of the few bytes of code compiled in a few functions that benefit of this security fix.
And this has no effect on "uninitialized" static variables, that GCC puts in the BSS section, cleared by the program loader of the OS, or by the small program's crt0 init module.

Why is there no content for the .bss section in an object (ELF) file?

This question confused me a lot. As far as I know, .bss section is for saving data that initialized but not used yet. But I don't understand what 'content' here mean and why there is no content here?
Thanks for any helps!
The quick response is: Well, there's no content to fill the .bss with, so there's no sense in putting any data on the executable in relation to that section. Only the positions of the variables are stored, but that belongs to another ELF section.
.bss section is where your program has all the uninitialized variables (by default all initialized to zero) The linker only needs to know the actual size of this region and the actual variable positions, but not the values, because its contents are obvious, independently of the nature or the distribution of the variables put there.
When your program is loaded, the kernel normally assigns a read-only segment for the unmodifiable text of the program (.text section) and also puts in that segment the contents of the initialized const variables (.rodata section) so in case yo attempt to modify something there, you get an exception. Then comes the initialized data section with the initial values of all the initialized variables of your program (.data section) and the uninitialized ones (.bss section)
The data segment (look how I call different a section and a load segment) is given more space, the sum of .data and .bss sections, to hold all the variables (both are included, so that's the reason it uses its length) but while the contents of the .data section have to be filled from the file, the contents of the .bss section don't, because all are zeroed by the operating system, before allowing the user process to access the allocated segment. That's not true for small systems, where the operating system doesn't fill the data with zeros... but there, the compiler adds some code to zero all the .bss segment, so again, there's no need to copy any data from the executable file.
The historic (and main) reason for this behaviour is that the pages the kernel assigns that have to be loaded with your program, are cleared to zero for security reasons (so you cannot luckily get a page full of other users' passwords, or other sensible information) so there's no reason to fill it with zeros again and nothing has to be copied there, there's no reason to put anything on the executable file. The pages the kernel maintains normally are zeroed only when they are going to be given to a user, but maintain (as they are designed for that purpose) the information until they are overwritten.
There's no content in the BSS (Block started By Symbol) section because it would be wasted storage. The contents of the BSS is all zeros and it is cleared by the startup code before main is called. Think of the BSS as a run-length compressed block of bytes. All you need to know to uncompress that block is the value (0) and the length, which is stored in the ELF entry for the BSS.
Your notion of "data that [is] initialized but not used yet" is a bit off. Consider that all sections in an ELF file are somehow "not used yet". The text segment may or may not become used (it may contain dead/unreachable code). The data segment may or may not be used at all (you can define objects never used by code).

How to reserve a fixed flash section for data?

I need to store some large chunks of data in flash memory, where it will be read often and occasionally be rewritten using SPM. I already figured out how to use pointers to __flash and pgm_read_byte to access it, how not to omit the const (despite my writing to it), how to actually access the array in a loop so that it doesn't get completely optimised away (after inlining), but I don't really understand how to declare my array.
const uint8_t persistent_data[1024] __attribute__(( aligned(SPM_PAGESIZE),
section("mycustomdata") )) = {};
works about fine, except that I do not want to initialise it. When programming my device (an Arduino ATmega328P), I want this section to be keept so that it retains the data previously written by the application. The above does zero-initialise it, and my hex file contains zeroes that the programmer happily uses to overwrite my data.
Using the __flash modifier instead of __attribute__(( section("…") )) does about the same here, except that it places the array elsewhere and I don't have any control about where it is put. It still does this when I use __flash and omit the initialisation (though I get a "uninitialized variable 'persistent_data' put into program memory area [-Wuninitialized]" warning).
Now I am trying to omit the initialiser:
const uint8_t persistent_data[1024] __attribute__(( aligned(SPM_PAGESIZE),
section("mycustomdata") ));
and get rather unexpected results. The sections data from the .lss output shows
Idx Name Size VMA LMA File off Algn
…
1 mycustomdata 00000480 00800480 000055e2 00005700 2**7
CONTENTS, ALLOC, LOAD, DATA
2 .text 00005280 00000000 00000000 000000d4 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
This does put all the initialisation zeroes in the hex file at the load memory address 55E2 (instead of omitting them), while the virtual memory address (which the variable persistent_data points to) refers to 0480 - in the middle of the code from the text section!
(I also tried to omit the const, and to omit the const and the initialiser, which both had the same effect as omitting only the initialiser).
I am at a loss. Do I need to use extern maybe? (Any attempt at doing so ended up with a "undefined reference to persistent_data" error). Do I need to use a linker script?
How do I make persistent_data refer to a location is program memory that is not used by any other data, and have the compiler not emit any initialisation data for that location in the hex file?
You don't seem to realize that you actually need two versions of your hex file - one that is suitable for a "new" installation on a new (or worse: re-used, thus with random flash content) chip that initializes the flash section to make sure there is no arbitrary data in there that might be interpreted, and another one used to update a pre-programmed chip that misses this section in order to keep data already modified by your users. So, you are going to need the version that initializes this section anyhow.
The simplest way to achieve this is like your first example, initialize the data to build the "naked chip" version of your code, and produce the "update" version by simply removing this initialized section from the object file with objcopy (assumed you use a GNU toolchain). See the -R option of this tool.
Also, make sure this data section is located at a fixed address - you don't want it to move every time you change something in your code.
I would rather try and use EEPROM if available than go through the hassle of reprogramming.

why .bss explicitly initialize global variable to zero?

I am generating mips disassembly in order to simulating it. I need to have big data to work on it but I don't want to have big assembly files so I wanted to work on a big uninitialized array (and then possibly initialize it in my simulator...). So I need this array to be global. And global variables seem to be put on the .bss section to be initialized when the page is actually accessed.
The problem is in my binary the array is in the .bss section, but is explicitly filled with zero...This is not the behaviour expected if I understood correctly what I have found on internet...Is there a way for saying to the compiler (or linker, or loader...I don't understand well which one do what for that) to not really put zero in this array ?
Or alternatively, can we have an option while compiling, or a C instruction for saying we don't want this array for being initialized with 0 ? (I tried to change the array section with attribute but it is still initialized with 0).
By the way, I am generating my disassembly file with objdump, and it normally skip blocks of zeroes, but I really need the other blocks of zeroes to be disassembled, so I using the "-z" option.
What I really don't understand is that everywhere I looked, it was said that .bss section didn't really put zero in the binary file...
The data for the .bss section isn't stored in the compiled object files because, well, there is no data—the compiler puts variables in that segment precisely because they should be zero-initialized.
When the OS loads the executable, it just looks at the size of the .bss segment, allocates that much memory, and zero-initializes it for you. By not storing that data in the executable file, it reduces loading times.
If you want data to be initialized with certain data, then give it an initializer in your code. The compiler will then put it in the .data segment (initialized data) instead of .bss (uninitialized data). When the OS then loads the executable, it will allocate the memory for the data and then copy it in from the executable. This takes extra I/O, but your data is explicitly initialized how you want it.
Alternatively, you could leave the data stay in the .bss segment and then initialize it yourself at runtime. If the data is quick and easy to generate at runtime, it might be faster to recompute it at startup rather then read it off of disk. But those situations are probably rare.
I suspect that using the -z option is causing objdump to show you zeroes for the .bss, even though the zeroes are not actually in your binary. Try using od -t x4 to get a simple hexadecimal dump of what is really in the binary. If od shows you blocks of zeroes, then they really are in the binary.

Resources