Function address nearly the same as other variables addresses [duplicate] - c

This question already has answers here:
Possible to know section of memory a variable is located?
(2 answers)
Closed 2 years ago.
Why are function addresses nearly the same as the address of static global variables or dynamically allocated variables? Here is the code for demonstration:
#include <stdio.h>
#include <stdlib.h>
int global_var;
int global_var1;
int global_var2;
static int st_var = 3;
void func()
{
return;
}
int main(void)
{
int x;
int* x_m = malloc(sizeof(int));
printf("Malloc: %p\n", x_m);
printf("Local: %p\n", &x);
printf("Function: %p\n", &func);
printf("Global: %p\n", &global_var);
printf("Global: %p\n", &global_var1);
printf("Global: %p\n", &global_var2);
printf("Static: %p\n", &st_var);
free(x_m);
return 0;
}
Output:
Malloc: 0x55bede9ce2a0
Local: 0x7ffdbc67b25c
Function: 0x55bede7151a9
Global: 0x55bede718024
Global: 0x55bede718030
Global: 0x55bede718020
Static: 0x55bede718010
Can somebody explain this? Because I thought that just global and static variables are stored into the .bss segment.

This is because, usually, the .text section (containing function code) and the .bss section of an ELF executable are mapped "relatively near" each other.
You can check this with readelf:
$ gcc prog.c
$ readelf -S a.out
There are 29 section headers, starting at offset 0x1ac0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[14] .text PROGBITS 00000000000007e0 000007e0
0000000000000302 0000000000000000 AX 0 0 16
...
[24] .bss NOBITS 0000000000201010 00001010
0000000000000010 0000000000000000 WA 0 0 8
...
You can see from above from the "Address" field of .text and .bss that they will be loaded 0x201010-0x7e0 = 0x200830 bytes apart in virtual memory when the program runs.
In any case, this does not mean that your code is in the .bss section or that your variables are in the .text section. They are in two different yet "relatively near" sections.
The distance between the two is arbitrary, there is no real minimum or maximum requirement dictated by the ELF specification. You could write your own linker script to place them farther away if you really want.

Related

Objcopy symbols are mixed or invalid in executable

As a simple example of my problem, let's say we have two data arrays to embed into an executable to be used in a C program: chars and shorts. These data arrays are stored on disk as chars.raw and shorts.raw.
Using objcopy I can create object files that contain the data.
objcopy --input binary --output elf64-x86-64 chars.raw char_data.o
objcopy --input binary --output elf64-x86-64 shorts.raw short_data.o
objdump shows that the data is correctly stored and exported as _binary_chars_raw_start, end, and size.
$ objdump -x char_data.o
char_data.o: file format elf64-x86-64
char_data.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000000e 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_chars_raw_start
000000000000000e g .data 0000000000000000 _binary_chars_raw_end
000000000000000e g *ABS* 0000000000000000 _binary_chars_raw_size
(Similar output for short_data.o)
However, when I link these object files with my code into an executable, I run into problems. For example:
#include <stdio.h>
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
int main(int argc, char **argv) {
printf("%ld == %ld\n", _binary_chars_raw_end - _binary_chars_raw_start, _binary_chars_raw_size / sizeof(char));
printf("%ld == %ld\n", _binary_shorts_raw_end - _binary_shorts_raw_start, _binary_shorts_raw_size / sizeof(short));
}
(compiled with gcc main.c char_data.o short_data.o -o main) prints
14 == 196608
7 == 98304
on my computer. The size _binary_chars_raw_size (and short) is not correct and I don't know why.
Similarly, if the _starts or _ends are used to initialize anything, then they may not even be located near each other in the executable (_end - _start is not equal to the size, and may even be negative).
What am I doing wrong?
The lines:
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
They are not variables themselves. They are variables that are placed themselves at the beginning and end of the region. So the addresses of these variables are the start and end of the region. Do:
#include <stdio.h>
extern char _binary_chars_raw_start;
extern char _binary_chars_raw_end;
extern char _binary_chars_raw_size;
// print ptrdiff_t with %td
printf("%td == %d\n",
// the __difference in addresses__ of these variables
&_binary_chars_raw_end - &_binary_chars_raw_start,
(int)&_binary_chars_raw_size);
// note: alsoo print size_t like result of `sizeof(..)` with %zu
#edit _size is also a pointer

Advantage of different data section in c [duplicate]

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?
The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().
By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

Where are place address of globals, static and string literals by a compiler C?

First I read the address are in .data and .text hold string literals (plus machine code I suppose) after in some other article someone said it's changed and lo longer string literals live in .text but .rodata instead of(it's true my clang compiler output). But the .data contents mistmatch the address I printf in my C program.
Assume this C program:
static int a;
int main()
{
printf("my address = %p\n", &a);
return 0;
}
output of this C program:
$ ./a.out
my address = 0x804a01c
And then contents of .data section:
$ objdump -s -j .data a.out
a.out: file format elf32-i386
Contents of section .data:
804a00c 00000000 00000000
There's no 0x804a01c in this contents. Where does the address lave in?
First I read the address are in .data and .text hold string literals (plus machine code I suppose) after in some other article someone said it's changed and lo longer string literals live in .text but .rodata instead of
It's up to the compiler to decide where it wants to put string literals (which are not machine code).
Most modern compilers will put string literals into .rodata section, which is usually linked into the first PT_LOAD segment, together with .text, .ctors and other read-only sections.
There's no 0x804a01c in this contents. Where does the address lave in?
In .bss. If you want a to reside in .data, you need to initialize it. E.g.
static int a = 42;
Could for example, a string literal be put in .rodata and its address into .data?
Sure:
cat t.c
const char string_literal[] = "abcdefgh"; // in .rodata
const char *p_string_literal = string_literal; // in .data
int main() { return 0; }
gcc -m32 t.c
readelf -x.rodata a.out
Hex dump of section '.rodata':
0x08048488 03000000 01000200 61626364 65666768 ........abcdefgh
0x08048498 00
.
readelf -x.data a.out
Hex dump of section '.data':
0x0804a008 00000000 00000000 90840408 ............
Note: the address of string_literal -- 0x08048490 is spelled "backwards" in .data because x86 is little-endian.
Variables which have static storage allocation, i.e., static and global variables are allocated in the data segment or bss segment depending on whether they are 0 initialized (bss segment) or not (data segment).
Uninitialized static data is always 0 initialized by default. Therefore,
static int a;
is default-initialized to 0 and it goes in the bss segment. String literals are read-only data and are normally stored in the text segment.

memory allocation in data/bss/heap and stack

I have the following piece of code:
#include <stdio.h>
int global_var;
int global_initialized_var=5;
void function(){
int stack_var;
printf("The function's stack_var is at address 0x%08x\n", &stack_var);
}
int main(){
int stack_var;
static int static_initialized_var = 5;
static int static_var;
int *heap_var_ptr;
heap_var_ptr = (int *) malloc(4);
// Next variables will be at data segment
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These will be in the bss segment
printf("static_var is at address 0x%08x\n", &static_var);
printf("global_var is at address 0x%08x\n", &global_var);
// This will be in heap segment
printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These will be in stack segment
printf("stack_var is at address 0x%08x\n", &stack_var);
function();
}
I am getting back the following:
# ./memory_segments
global_initialized_var is at address 0x0804a018
static_initialized_var is at address 0x0804a01c
static_var is at address 0x0804a028
global_var is at address 0x0804a02c
heap_var is at address 0x09285008
stack_var is at address 0xbf809fbc
The function's stack_var is at address 0xbf809f8c
It is supposed that the first 2 variables because they are initialized static and global should be in the .data segment where the other 2 static_var and global_var should be in .bss segment. The addresses that I am getting I think imply that both of them are in the same memory region. I would do a blind guess and I would say that this is the .bss segment.
Anyway the question is the following, am I right ?? And if I am how is it possible to find out where are the "limits" of these regions (bss, data, etc) or from where they are starting etc.
Assuming you are compiling with something like gcc memaddr.c -g -o memaddr, you can use objdump -h to display size and address of your sections:
$ objdump -h memaddr | grep -e 'Size' -e '\.data' -e '\.bss'
Idx Name Size VMA LMA File off Algn
23 .data 00000018 0000000000601018 0000000000601018 00001018 2**3
24 .bss 00000018 0000000000601030 0000000000601030 00001030 2**3
$
Also you can use objdump -t to display addresses and sections your symbols belong in:
$ objdump -t memaddr | grep "_var"
000000000060102c l O .data 0000000000000004 static_initialized_var.2049
0000000000601040 l O .bss 0000000000000004 static_var.2050
0000000000601044 g O .bss 0000000000000004 global_var
0000000000601028 g O .data 0000000000000004 global_initialized_var
$
So we can see that the .data and .bss sections are fairly small and happen to lie next to each other, so it is not surprising the .data and .bss addresses are so close.

Difference between data section and the bss section in C

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?
The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().
By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

Resources