why .bss explicitly initialize global variable to zero? - c

I am generating mips disassembly in order to simulating it. I need to have big data to work on it but I don't want to have big assembly files so I wanted to work on a big uninitialized array (and then possibly initialize it in my simulator...). So I need this array to be global. And global variables seem to be put on the .bss section to be initialized when the page is actually accessed.
The problem is in my binary the array is in the .bss section, but is explicitly filled with zero...This is not the behaviour expected if I understood correctly what I have found on internet...Is there a way for saying to the compiler (or linker, or loader...I don't understand well which one do what for that) to not really put zero in this array ?
Or alternatively, can we have an option while compiling, or a C instruction for saying we don't want this array for being initialized with 0 ? (I tried to change the array section with attribute but it is still initialized with 0).
By the way, I am generating my disassembly file with objdump, and it normally skip blocks of zeroes, but I really need the other blocks of zeroes to be disassembled, so I using the "-z" option.
What I really don't understand is that everywhere I looked, it was said that .bss section didn't really put zero in the binary file...

The data for the .bss section isn't stored in the compiled object files because, well, there is no data—the compiler puts variables in that segment precisely because they should be zero-initialized.
When the OS loads the executable, it just looks at the size of the .bss segment, allocates that much memory, and zero-initializes it for you. By not storing that data in the executable file, it reduces loading times.
If you want data to be initialized with certain data, then give it an initializer in your code. The compiler will then put it in the .data segment (initialized data) instead of .bss (uninitialized data). When the OS then loads the executable, it will allocate the memory for the data and then copy it in from the executable. This takes extra I/O, but your data is explicitly initialized how you want it.
Alternatively, you could leave the data stay in the .bss segment and then initialize it yourself at runtime. If the data is quick and easy to generate at runtime, it might be faster to recompute it at startup rather then read it off of disk. But those situations are probably rare.

I suspect that using the -z option is causing objdump to show you zeroes for the .bss, even though the zeroes are not actually in your binary. Try using od -t x4 to get a simple hexadecimal dump of what is really in the binary. If od shows you blocks of zeroes, then they really are in the binary.

Related

Why is there no content for the .bss section in an object (ELF) file?

This question confused me a lot. As far as I know, .bss section is for saving data that initialized but not used yet. But I don't understand what 'content' here mean and why there is no content here?
Thanks for any helps!
The quick response is: Well, there's no content to fill the .bss with, so there's no sense in putting any data on the executable in relation to that section. Only the positions of the variables are stored, but that belongs to another ELF section.
.bss section is where your program has all the uninitialized variables (by default all initialized to zero) The linker only needs to know the actual size of this region and the actual variable positions, but not the values, because its contents are obvious, independently of the nature or the distribution of the variables put there.
When your program is loaded, the kernel normally assigns a read-only segment for the unmodifiable text of the program (.text section) and also puts in that segment the contents of the initialized const variables (.rodata section) so in case yo attempt to modify something there, you get an exception. Then comes the initialized data section with the initial values of all the initialized variables of your program (.data section) and the uninitialized ones (.bss section)
The data segment (look how I call different a section and a load segment) is given more space, the sum of .data and .bss sections, to hold all the variables (both are included, so that's the reason it uses its length) but while the contents of the .data section have to be filled from the file, the contents of the .bss section don't, because all are zeroed by the operating system, before allowing the user process to access the allocated segment. That's not true for small systems, where the operating system doesn't fill the data with zeros... but there, the compiler adds some code to zero all the .bss segment, so again, there's no need to copy any data from the executable file.
The historic (and main) reason for this behaviour is that the pages the kernel assigns that have to be loaded with your program, are cleared to zero for security reasons (so you cannot luckily get a page full of other users' passwords, or other sensible information) so there's no reason to fill it with zeros again and nothing has to be copied there, there's no reason to put anything on the executable file. The pages the kernel maintains normally are zeroed only when they are going to be given to a user, but maintain (as they are designed for that purpose) the information until they are overwritten.
There's no content in the BSS (Block started By Symbol) section because it would be wasted storage. The contents of the BSS is all zeros and it is cleared by the startup code before main is called. Think of the BSS as a run-length compressed block of bytes. All you need to know to uncompress that block is the value (0) and the length, which is stored in the ELF entry for the BSS.
Your notion of "data that [is] initialized but not used yet" is a bit off. Consider that all sections in an ELF file are somehow "not used yet". The text segment may or may not become used (it may contain dead/unreachable code). The data segment may or may not be used at all (you can define objects never used by code).

How to reserve a fixed flash section for data?

I need to store some large chunks of data in flash memory, where it will be read often and occasionally be rewritten using SPM. I already figured out how to use pointers to __flash and pgm_read_byte to access it, how not to omit the const (despite my writing to it), how to actually access the array in a loop so that it doesn't get completely optimised away (after inlining), but I don't really understand how to declare my array.
const uint8_t persistent_data[1024] __attribute__(( aligned(SPM_PAGESIZE),
section("mycustomdata") )) = {};
works about fine, except that I do not want to initialise it. When programming my device (an Arduino ATmega328P), I want this section to be keept so that it retains the data previously written by the application. The above does zero-initialise it, and my hex file contains zeroes that the programmer happily uses to overwrite my data.
Using the __flash modifier instead of __attribute__(( section("…") )) does about the same here, except that it places the array elsewhere and I don't have any control about where it is put. It still does this when I use __flash and omit the initialisation (though I get a "uninitialized variable 'persistent_data' put into program memory area [-Wuninitialized]" warning).
Now I am trying to omit the initialiser:
const uint8_t persistent_data[1024] __attribute__(( aligned(SPM_PAGESIZE),
section("mycustomdata") ));
and get rather unexpected results. The sections data from the .lss output shows
Idx Name Size VMA LMA File off Algn
…
1 mycustomdata 00000480 00800480 000055e2 00005700 2**7
CONTENTS, ALLOC, LOAD, DATA
2 .text 00005280 00000000 00000000 000000d4 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
This does put all the initialisation zeroes in the hex file at the load memory address 55E2 (instead of omitting them), while the virtual memory address (which the variable persistent_data points to) refers to 0480 - in the middle of the code from the text section!
(I also tried to omit the const, and to omit the const and the initialiser, which both had the same effect as omitting only the initialiser).
I am at a loss. Do I need to use extern maybe? (Any attempt at doing so ended up with a "undefined reference to persistent_data" error). Do I need to use a linker script?
How do I make persistent_data refer to a location is program memory that is not used by any other data, and have the compiler not emit any initialisation data for that location in the hex file?
You don't seem to realize that you actually need two versions of your hex file - one that is suitable for a "new" installation on a new (or worse: re-used, thus with random flash content) chip that initializes the flash section to make sure there is no arbitrary data in there that might be interpreted, and another one used to update a pre-programmed chip that misses this section in order to keep data already modified by your users. So, you are going to need the version that initializes this section anyhow.
The simplest way to achieve this is like your first example, initialize the data to build the "naked chip" version of your code, and produce the "update" version by simply removing this initialized section from the object file with objcopy (assumed you use a GNU toolchain). See the -R option of this tool.
Also, make sure this data section is located at a fixed address - you don't want it to move every time you change something in your code.
I would rather try and use EEPROM if available than go through the hassle of reprogramming.

Make ARM Fromelf Output Binary With Zero Initialized Data

I have some data that are zero initialized.
I have allocated an execution region for them in the scatter file.
Some_Execution_Region +0
{
stuff.o (+RO, +RW, +RI)
}
But I don't see any segments of zero initialized data in the resulting binary after using fromelf to convert it from the axf file.
The binary file stops right before where the zero initialized data should start.
So the question is how I can make fromelf generate empty region for the zero initialized data in the binary file.
I've looked up on the ARM site and have had no luck. I only found out some option to disable zero initialized data. (Doesn't this mean since I'm not using that option, I should get my zero initialized data in my binary?)
I currently just run fromelf.exe --bin --output=binary.bin elffile.axf, which doesnt generate the zero data.
There's no need to actually store all of the zeros in the binary. If you are using C then the default is to zero-initialize static variables that are not explicitly initialized to some other value. The C runtime code doesn't copy zeros from the executable to RAM, it just gets the beginning and ending addresses of the part of RAM that needs to be zeroed. The typical way of telling the linker that variables must be zeroed at startup is to put them in the .bss (blanked static storage) segment, but you shouldn't need to do this explicitly unless you are writing assembly code.

Location of variables in C

I'm trying to understand how C allocates memory to global variables.
I'm working on a simple Kernel. So far it can't do much more than print to screen and enable interrupts. I'm now working on a basic physical memory manager.
My memory manager is a bitmap that sets a 1 or 0 if memory is allocated or available. I need to add the memory that my Kernel is using to the bitmap as 'allocated', so nothing overwrites it.
I can easily find out the start of the Kernel, as it's statically loaded to 0x100000. Figuring out the length shouldn't be too difficult either. The part I'm not sure about is where global variables are put in memory?
Let's say my Kernel is 12K, I can then allocate these 3x 4K blocks of memory to it for protection. Do I need to allocate more to cover the variables it uses? Or are the variables part of that 12K?
Thank you for your help, I hope I am making enough sense.
have a look at
http://www.geeksforgeeks.org/archives/14268
your globals mostly are in the BSS
As the previous answer says, most variables are stored in the .bss section but they can also be stored in the .data or .rodata section depending on if you defined the global variables as static or const. After compiling you can use readelf -S kernel.bin to see exactly how much space each section will utilize. For the .bss section the memory is only occupied when the binary is loaded in memory and does not take any space on disk. This means that your compiled kernel binary will be smaller than the actual size it will later use when brought into memory (by grub usually).
A simple way to figure out exactly how much data your kernel will use besides using readelf is to place the .bss section inside the .data section within your linker script. The size of the kernel binary will then be the same size both on disk as in memory (or actually it will be a bit smaller in memory since not all sections are copied by grub) but then at least you know the minimum amount of memory you need to allocate.
I'd recommend using a custom linker script (assuming you use gcc): it makes the layout of kernel sections explicit and customizable (to read more about linker scripts, read info ld). You can see an example of my OS's linker script here.
To see the default linker script use -v/--verbose option of ld.
Mostly global variables are located in .data.* and .rodata.* sections, variables initialized with 0 go in .bss.

Is there a way to know where global and static variables reside inside the data segment (.data + .bss)?

I want to dump all global and static variables to a file and load them back on the next program invocation. A solution I thought of is to dump the .data segment to a file. But .data segment on a 32bit machine spans over 2^32 address space (4GB). In which part of this address space the variables reside? How do I know which part of the .data segment I should dump?
And when loading the dumped file, I guess that since the variables are referenced by offset in the data segment, it will be safe to just memcpy the whole dump to the alleged starting point of the "variables area". Please correct me if I am wrong.
EDIT
A good start is this question.
Your problem is how to find the beginning and the end of the data segment. I am not sure how to do this, but I could give you a couple of ideas.
If all your data are relatively self-contained, (they are declared within the same module, not in separate modules,) you might be able to declare them within some kind of structure, so the beginning will be the address of the structure, and the end will be some variable that you will declare right after the structure. If I remember well, MASM had a "RECORD" directive or something like that which you could use to group variables together.
Alternatively, you may be able to declare two additional modules, one with a variable called "beginning" and another with a variable called "end", and make sure that the first gets linked before anything else, and the second gets linked after everything else. This way, these variables might actually end up marking the beginning and the end of the data segment. But I am not sure about this, I am just giving you a pointer.
One thing to remember is that your data will inevitably contain pointers, so saving and loading all your data will only work if the OS under which you are running can guarantee that your program will always be loaded in the same address. If not, forget it. But if you can have this guarantee, then yes, loading the data should work. You should not even need a memcpy, just set the buffer for the read operation to be the beginning of the data segment.
The state of an entire program can be very complicated, and will not only involve variables but values in registers. You'll almost certainly be better off keeping track of what data you want to store and then storing it to a file yourself. This can be relatively painless with the right setup and encapsulation. Then when you resume the application, read in the program state and resume.
Assuming you are using gnu tools (gcc, binutils) if you look at the linker scripts the embedded folks use like the gba developers and microcontroller developers using roms (yagarto or devkit-arm for example). In the linker script they surround the segments of interest with variables that they can use elsewhere in their code. For rom based software for example you specify the data segment with a ram AT rom or rom AT ram in the linker script meaning link as if the data segment is in ram at this address space, but also link the data itself into rom at this address space, the boot code then copies the .data segment from the rom to the ram using these variables. I dont see why you couldnt do the same thing to have the compiler/linker tools tell you where stuff is then runtime use those variables to grab the data from memory and save it somewhere to hybernate or shut down and then restore that data from wherever. The variables you use to perform the restore of course should not be part of the .data segment or you trash the variables you are using to restore the segment.
In response to your header question, on Windows, the location and size of the data and bss segments can be obtained from the in-memory PE header. How that is laid out and how to parse it is documented in this specification:
http://msdn.microsoft.com/en-us/windows/hardware/gg463119
I do not believe that there is a guarantee that with every execution you will have the sam sequence of variables, hence the offsets may point to the wrong content.

Resources