memory allocation in data/bss/heap and stack - c

I have the following piece of code:
#include <stdio.h>
int global_var;
int global_initialized_var=5;
void function(){
int stack_var;
printf("The function's stack_var is at address 0x%08x\n", &stack_var);
}
int main(){
int stack_var;
static int static_initialized_var = 5;
static int static_var;
int *heap_var_ptr;
heap_var_ptr = (int *) malloc(4);
// Next variables will be at data segment
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These will be in the bss segment
printf("static_var is at address 0x%08x\n", &static_var);
printf("global_var is at address 0x%08x\n", &global_var);
// This will be in heap segment
printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These will be in stack segment
printf("stack_var is at address 0x%08x\n", &stack_var);
function();
}
I am getting back the following:
# ./memory_segments
global_initialized_var is at address 0x0804a018
static_initialized_var is at address 0x0804a01c
static_var is at address 0x0804a028
global_var is at address 0x0804a02c
heap_var is at address 0x09285008
stack_var is at address 0xbf809fbc
The function's stack_var is at address 0xbf809f8c
It is supposed that the first 2 variables because they are initialized static and global should be in the .data segment where the other 2 static_var and global_var should be in .bss segment. The addresses that I am getting I think imply that both of them are in the same memory region. I would do a blind guess and I would say that this is the .bss segment.
Anyway the question is the following, am I right ?? And if I am how is it possible to find out where are the "limits" of these regions (bss, data, etc) or from where they are starting etc.

Assuming you are compiling with something like gcc memaddr.c -g -o memaddr, you can use objdump -h to display size and address of your sections:
$ objdump -h memaddr | grep -e 'Size' -e '\.data' -e '\.bss'
Idx Name Size VMA LMA File off Algn
23 .data 00000018 0000000000601018 0000000000601018 00001018 2**3
24 .bss 00000018 0000000000601030 0000000000601030 00001030 2**3
$
Also you can use objdump -t to display addresses and sections your symbols belong in:
$ objdump -t memaddr | grep "_var"
000000000060102c l O .data 0000000000000004 static_initialized_var.2049
0000000000601040 l O .bss 0000000000000004 static_var.2050
0000000000601044 g O .bss 0000000000000004 global_var
0000000000601028 g O .data 0000000000000004 global_initialized_var
$
So we can see that the .data and .bss sections are fairly small and happen to lie next to each other, so it is not surprising the .data and .bss addresses are so close.

Related

Function address nearly the same as other variables addresses [duplicate]

This question already has answers here:
Possible to know section of memory a variable is located?
(2 answers)
Closed 2 years ago.
Why are function addresses nearly the same as the address of static global variables or dynamically allocated variables? Here is the code for demonstration:
#include <stdio.h>
#include <stdlib.h>
int global_var;
int global_var1;
int global_var2;
static int st_var = 3;
void func()
{
return;
}
int main(void)
{
int x;
int* x_m = malloc(sizeof(int));
printf("Malloc: %p\n", x_m);
printf("Local: %p\n", &x);
printf("Function: %p\n", &func);
printf("Global: %p\n", &global_var);
printf("Global: %p\n", &global_var1);
printf("Global: %p\n", &global_var2);
printf("Static: %p\n", &st_var);
free(x_m);
return 0;
}
Output:
Malloc: 0x55bede9ce2a0
Local: 0x7ffdbc67b25c
Function: 0x55bede7151a9
Global: 0x55bede718024
Global: 0x55bede718030
Global: 0x55bede718020
Static: 0x55bede718010
Can somebody explain this? Because I thought that just global and static variables are stored into the .bss segment.
This is because, usually, the .text section (containing function code) and the .bss section of an ELF executable are mapped "relatively near" each other.
You can check this with readelf:
$ gcc prog.c
$ readelf -S a.out
There are 29 section headers, starting at offset 0x1ac0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[14] .text PROGBITS 00000000000007e0 000007e0
0000000000000302 0000000000000000 AX 0 0 16
...
[24] .bss NOBITS 0000000000201010 00001010
0000000000000010 0000000000000000 WA 0 0 8
...
You can see from above from the "Address" field of .text and .bss that they will be loaded 0x201010-0x7e0 = 0x200830 bytes apart in virtual memory when the program runs.
In any case, this does not mean that your code is in the .bss section or that your variables are in the .text section. They are in two different yet "relatively near" sections.
The distance between the two is arbitrary, there is no real minimum or maximum requirement dictated by the ELF specification. You could write your own linker script to place them farther away if you really want.

Objcopy symbols are mixed or invalid in executable

As a simple example of my problem, let's say we have two data arrays to embed into an executable to be used in a C program: chars and shorts. These data arrays are stored on disk as chars.raw and shorts.raw.
Using objcopy I can create object files that contain the data.
objcopy --input binary --output elf64-x86-64 chars.raw char_data.o
objcopy --input binary --output elf64-x86-64 shorts.raw short_data.o
objdump shows that the data is correctly stored and exported as _binary_chars_raw_start, end, and size.
$ objdump -x char_data.o
char_data.o: file format elf64-x86-64
char_data.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000000e 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_chars_raw_start
000000000000000e g .data 0000000000000000 _binary_chars_raw_end
000000000000000e g *ABS* 0000000000000000 _binary_chars_raw_size
(Similar output for short_data.o)
However, when I link these object files with my code into an executable, I run into problems. For example:
#include <stdio.h>
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
int main(int argc, char **argv) {
printf("%ld == %ld\n", _binary_chars_raw_end - _binary_chars_raw_start, _binary_chars_raw_size / sizeof(char));
printf("%ld == %ld\n", _binary_shorts_raw_end - _binary_shorts_raw_start, _binary_shorts_raw_size / sizeof(short));
}
(compiled with gcc main.c char_data.o short_data.o -o main) prints
14 == 196608
7 == 98304
on my computer. The size _binary_chars_raw_size (and short) is not correct and I don't know why.
Similarly, if the _starts or _ends are used to initialize anything, then they may not even be located near each other in the executable (_end - _start is not equal to the size, and may even be negative).
What am I doing wrong?
The lines:
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
They are not variables themselves. They are variables that are placed themselves at the beginning and end of the region. So the addresses of these variables are the start and end of the region. Do:
#include <stdio.h>
extern char _binary_chars_raw_start;
extern char _binary_chars_raw_end;
extern char _binary_chars_raw_size;
// print ptrdiff_t with %td
printf("%td == %d\n",
// the __difference in addresses__ of these variables
&_binary_chars_raw_end - &_binary_chars_raw_start,
(int)&_binary_chars_raw_size);
// note: alsoo print size_t like result of `sizeof(..)` with %zu
#edit _size is also a pointer

Cannot assign address of variable defined in linker script

I found a solution, although I don't understand what went wrong. Here is the original question. The solution is at the end.
I am following this Raspberry PI OS tutorial with a few tweaks. As the title says, one assignment appears to fail.
Here is my C code:
extern int32_t __end;
static int32_t *arena;
void init() {
arena = &__end;
assert(0 != arena); // fails
...
The assert triggers! Surely the address shouldn't be 0. __end is declared in my linker script:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
.rodata : { *(.rodata) }
.data : { *(.data) }
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start=.;
.bss : { *(.bss) }
__bss_end=.;
/* First usable address for the allocator */
. = ALIGN(4);
__end = .;
}
Investigating in GDB (running it in QEMU):
Thread 1 hit Breakpoint 1, init () at os.c:75
75 arena = &__end;
(gdb) p &__end
$1 = (int32_t *) 0x9440
(gdb) p arena
$2 = (int32_t *) 0x0
(gdb) n
76 assert(0 != arena);
(gdb) p arena
$3 = (int32_t *) 0x0
GDB can find __end but my program cannot?
Here are a few other things I tried:
the tutorial's code works without an issue (implying that QEMU and the ARM compiler are working)
the assertion still fails when running without GDB (implying GDB is not the issue)
I am able to assign 0xccc to arena (implying arena is not the issue)
I am not able to assign &__end to a local variable (implying &__end is the issue).
As requested in the comments, this is how I tried to assign to a local variable:
void* arena2 = (void*)&__end;
assert(0 != arena2);
The assertion fails. In GDB:
Thread 1 hit Breakpoint 1, mem_init () at mem.c:77
77 void* arena2 = (void*)&__end;
(gdb) p arena2
$1 = (void *) 0x13
(gdb) p &__end
$2 = (int32_t *) 0x94a4
(gdb) n
78 assert(0 != arena2);
(gdb) p arena2
$3 = (void *) 0x0
(gdb) p &__end
$4 = (int32_t *) 0x94a4
assert(0 != &__end); succeeds (implying &__end is not the issue?)
N.B. This version of assert is not the same as the one in assert.h, but I don't think it causes the problem. It just checks a condition, prints the condition, and goes to a breakpoint. I can reproduce the issue in GDB with the assert commented out.
N.B.2. I previously included the ARM assembly of the C code in case there was a compiler bug
My solution is to edit the linker script to:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
. = ALIGN(4096);
.rodata : { *(.rodata) }
. = ALIGN(4096);
.data : { *(.data) }
. = ALIGN(4096);
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start = .;
.bss : { *(.bss) }
. = ALIGN(4096);
__bss_end = .;
/* First usable address for the allocator */
. = ALIGN(4096);
__end = .;
}
I don't understand why the additional ALIGNs are important.
The problem you're having here is because the "clear the BSS" loop in boot.S is also clearing some of the compiler-generated data in the ELF file that the C code is using at runtime. Notably, it is accidentally zeroing out the GOT (global offset table) which is in the .got ELF section and which is where the actual address of the __end label has been placed by the linker. So the linker correctly fills in the address in the ELF file, but then the boot.S code zeroes it, and when you try to read it from C then you get zero rather than what you were expecting.
Adding all that alignment in the linker script is probably working around this by coincidentally causing the GOT to not be in the area that gets zeroed.
You can see where the linker has put things by using 'objdump -x myos.elf'. In my test case based on the tutorial you link I see a SYMBOL TABLE which includes among other entries:
000080d4 l .bss 00000004 arena
00000000 l df *ABS* 00000000
000080c8 l O .got.plt 00000000 _GLOBAL_OFFSET_TABLE_
000080d8 g .bss 00000000 __bss_end
0000800c g F .text 00000060 kernel_main
00008000 g .text 00000000 __start
0000806c g .text.boot 00000000 _start
000080d8 g .bss 00000000 __end
00008000 g F .text 0000000c panic
000080c4 g .text.boot 00000000 __bss_start
So you can see that the linker script has set __bss_start to 0x80c4 and __bss_end to 0x80d8, which is a pity because the GOT is at 0x80c4/0x80c8. I think what has happened here is that because you didn't specify explicitly in your linker script where to put the .got and .got.plt sections, the linker has decided to put them after the __bss_start assignment and before the .bss section, so they get covered by the zeroing code.
You can see what the ELF file contents of the .got are with 'objdump --disassemble-all myos.elf', which among other things includes:
Disassembly of section .got:
000080c4 <.got>:
80c4: 000080d8 ldrdeq r8, [r0], -r8 ; <UNPREDICTABLE>
so you can see we have one GOT table entry, whose contents are the address 0x80d8 which is the __end value we want. When the boot.S code zeroes this out your C code reads a 0 rather than the constant it was expecting.
You should probably ensure that the bss start/end are at least 16-aligned, because the boot.S code works via a loop that clears 16 bytes at a time, but I think that if you fix your linker script to explicitly put the .got and .got.plt sections somewhere then you'll find you don't need the 4K alignments everywhere.
FWIW, I diagnosed this using: (1) the QEMU "-d in_asm,cpu,exec,int,unimp,guest_errors -singlestep" options to get a dump of register state and instruction execution and (2) objdump of the ELF file to figure out what the compiler's generated code was actually doing. I had a suspicion this was going to turn out to be either "accidentally zeroed data we shouldn't have" or "failed to include in the image or otherwise initialize data we should have" kind of bug, and so it turned out.
Oh, and the reason GDB was printing the right value for __end when your code wasn't was that GDB could just look directly in the debug/symbol info in the ELF file for the answer; it wasn't doing it by going via the in-memory GOT.

Where are place address of globals, static and string literals by a compiler C?

First I read the address are in .data and .text hold string literals (plus machine code I suppose) after in some other article someone said it's changed and lo longer string literals live in .text but .rodata instead of(it's true my clang compiler output). But the .data contents mistmatch the address I printf in my C program.
Assume this C program:
static int a;
int main()
{
printf("my address = %p\n", &a);
return 0;
}
output of this C program:
$ ./a.out
my address = 0x804a01c
And then contents of .data section:
$ objdump -s -j .data a.out
a.out: file format elf32-i386
Contents of section .data:
804a00c 00000000 00000000
There's no 0x804a01c in this contents. Where does the address lave in?
First I read the address are in .data and .text hold string literals (plus machine code I suppose) after in some other article someone said it's changed and lo longer string literals live in .text but .rodata instead of
It's up to the compiler to decide where it wants to put string literals (which are not machine code).
Most modern compilers will put string literals into .rodata section, which is usually linked into the first PT_LOAD segment, together with .text, .ctors and other read-only sections.
There's no 0x804a01c in this contents. Where does the address lave in?
In .bss. If you want a to reside in .data, you need to initialize it. E.g.
static int a = 42;
Could for example, a string literal be put in .rodata and its address into .data?
Sure:
cat t.c
const char string_literal[] = "abcdefgh"; // in .rodata
const char *p_string_literal = string_literal; // in .data
int main() { return 0; }
gcc -m32 t.c
readelf -x.rodata a.out
Hex dump of section '.rodata':
0x08048488 03000000 01000200 61626364 65666768 ........abcdefgh
0x08048498 00
.
readelf -x.data a.out
Hex dump of section '.data':
0x0804a008 00000000 00000000 90840408 ............
Note: the address of string_literal -- 0x08048490 is spelled "backwards" in .data because x86 is little-endian.
Variables which have static storage allocation, i.e., static and global variables are allocated in the data segment or bss segment depending on whether they are 0 initialized (bss segment) or not (data segment).
Uninitialized static data is always 0 initialized by default. Therefore,
static int a;
is default-initialized to 0 and it goes in the bss segment. String literals are read-only data and are normally stored in the text segment.

If a global variable is initialized to 0, will it go to BSS?

All the initialized global/static variables will go to initialized data section.
All the uninitialized global/static variables will go to uninitialed data section(BSS). The variables in BSS will get a value 0 during program load time.
If a global variable is explicitly initialized to zero (int myglobal = 0), where that variable will be stored?
Compiler is free to put such variable into bss as well as into data. For example, GCC has a special option controlling such behavior:
-fno-zero-initialized-in-bss
If the target supports a BSS section, GCC by default puts variables that are initialized to zero into BSS. This
can save space in the resulting code. This option turns off this
behavior because some programs explicitly rely on variables going to
the data section. E.g., so that the resulting executable can find the
beginning of that section and/or make assumptions based on that.
The default is -fzero-initialized-in-bss.
Tried with the following example (test.c file):
int put_me_somewhere = 0;
int main(int argc, char* argv[]) { return 0; }
Compiling with no options (implicitly -fzero-initialized-in-bss):
$ touch test.c && make test && objdump -x test | grep put_me_somewhere
cc test.c -o test
0000000000601028 g O .bss 0000000000000004 put_me_somewhere
Compiling with -fno-zero-initialized-in-bss option:
$ touch test.c && make test CFLAGS=-fno-zero-initialized-in-bss && objdump -x test | grep put_me_somewhere
cc -fno-zero-initialized-in-bss test.c -o test
0000000000601018 g O .data 0000000000000004 put_me_somewhere
It's easy enough to test for a specific compiler:
$ cat bss.c
int global_no_value;
int global_initialized = 0;
int main(int argc, char* argv[]) {
return 0;
}
$ make bss
cc bss.c -o bss
$ readelf -s bss | grep global_
32: 0000000000400420 0 FUNC LOCAL DEFAULT 13 __do_global_dtors_aux
40: 0000000000400570 0 FUNC LOCAL DEFAULT 13 __do_global_ctors_aux
55: 0000000000601028 4 OBJECT GLOBAL DEFAULT 25 global_initialized
60: 000000000060102c 4 OBJECT GLOBAL DEFAULT 25 global_no_value
We're looking for the location of 0000000000601028 and 000000000060102c:
$ readelf -S bss
There are 30 section headers, starting at offset 0x1170:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[24] .data PROGBITS 0000000000601008 00001008
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601018 00001018
0000000000000018 0000000000000000 WA 0 0 8
It looks like both values are stored in the .bss section on my system: gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4).
The behavior is dependent upon the C implementation. It may end up in either .data or .bss, and to increase changes that it does not end up in .data taking redundant space up, it's better not to explicitly initialize it to 0, since it will be set to 0 anyway if the object is of static duration.

Resources