How is the alignment of .data and .bss determined - c

The alignment of .data and .bss is sometimes 4 bytes and sometimes it is 32 bytes.
Example 1: As per the last column in below output the alignment of bss and data is 32 bytes
bash-3.00$ readelf --sections libmodel.so
There are 39 section headers, starting at offset 0x1908a63c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[25] .data PROGBITS 01e221e0 1e211e0 26ca54 00 WA 0 0 32
[26] .bss NOBITS 0208ec40 208dc34 374178 00 WA 0 0 32
...
Example 2: As per the below output the alignment os .data and .bss is 4 bytes
bash-3.00$ readelf --sections ./a.out
There are 28 section headers, starting at offset 0x78c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[22] .data PROGBITS 0804956c 00056c 000034 00 WA 0 0 4
[23] .bss NOBITS 080495a0 0005a0 000004 00 WA 0 0 4
...
what determines the alignment for .bss and .data? Why is it sometimes 4 bytes and at other times 32 bytes?

I think those different align values were determined by respective ld scripts.
For libmodel.so: section .data align=16, section .bss align=16
For a.out: section .data align=4, section .bss align=4

Why is it sometimes 4 bytes and at other times 32 bytes?
Once an executable is loaded into memory, it is addressed via process virtual addresses. The alignment constraints are due to virtual address addressing. For example, if you look at the man page for elf, check out the description for sh_addralign. This is the reason why different elf objects specify different alignment requirements. You can experiment by changing the source for a.out to include a double and see if the alignment changes.
Note: This applies to memory alignment only. There is an alignment restriction for the actual on-disk file as well. Why? I am thinking this would be to help ease of mapping file data to in-core structures post reading the file in. Others can correct me if I am wrong here.
Update: Would like to clarify one issue. The virtual address alignment is necessary only because of the underlying memory access granularity enforced by the chipset. So, the same program compiled for different architectures may lead to diff alignment restrictions.

There's a really nice book, that will give you a great understanding of not only your question, but everything related to it.
The website of the book is here
Chapter n. 7 is the one you are looking for.
I don't know if you want a straight answer, or a more sophisticated one, but I hope it helps.

For bss the alignment is decided using size of data types, if it is not aligned then linker first align the variable address where the bss start can be placed.

Related

Where in object file does the code of function "main" starts?

I have an object file of a C program which prints hello world, just for the question.
I am trying to understand using readelf utility or gdb or hexedit(I can't figure which tool is a correct one) where in the file does the code of function "main" starts.
I know using readelf that symbol _start & main occurs and the address where it is mapped in a virtual memory. Moreover, I also know what the size of .text section and the of coruse where entry point specified, i.e the address which the same of text section.
The question is - Where in the file does the code of function "main" starts? I tought that is the entry point and the offset of the text section but how I understand it the sections data, bss, rodata should be ran before main and it appears after section text in readelf.
Also I tought we should sum the size all the lines till main in symbol table, but I am not sure at all if it is correct.
Additional question which follow up this one is if I want to replace main function with NOP instrcutres or plant one ret instruction in my object file. how can I know the offset where I can do it using hexedit.
So, let's go through it step by step.
Start with this C file:
#include <stdio.h>
void printit()
{
puts("Hello world!");
}
int main(void)
{
printit();
return 0;
}
As the comments look like you are on x86, compile it as 32-bit non-PIE executable like this:
$ gcc -m32 -no-pie -o test test.c
The -m32 option is needed, because I am working at a x86-64 machine. As you already know, you can get the virtual memory address of main using readelf, objdump or nm, for example like this:
$ nm test | grep -w main
0804918d T main
Obviously, 804918d can not be an offset in the file that is just 15 kB big. You need to find the mapping between virtual memory addresses and file offsets. In a typical ELF file, the mapping is included twice. Once in a detailed form for linkers (as object files are also ELF files) and debuggers, and a second time in a condensed form that is used by the kernel for loading programs. The detailed form is the list of sections, consisting of section headers, and you can view it like this (the output is shortened a bit, to make the answer more readable):
$ readelf --section-headers test
There are 29 section headers, starting at offset 0x3748:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[...]
[11] .init PROGBITS 08049000 001000 000020 00 AX 0 0 4
[12] .plt PROGBITS 08049020 001020 000030 04 AX 0 0 16
[13] .text PROGBITS 08049050 001050 0001c1 00 AX 0 0 16
[14] .fini PROGBITS 08049214 001214 000014 00 AX 0 0 4
[15] .rodata PROGBITS 0804a000 002000 000015 00 A 0 0 4
[...]
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Here you find that the .text section starts at (virtual) address 08049050 and has a size of 1c1 bytes, so it ends at address 08049211. The address of main, 804918d is in this range, so you know main is a member of the text section. If you subtract the base of the text section from the address of main, you find that main is 13d bytes into the text section. The section listing also contains the file offset where the data for the text section starts. It's 1050, so the first byte of main is at offset 0x1050 + 0x13d == 0x118d.
You can do the same calculation using program headers:
$ readelf --program-headers test
[...]
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00160 0x00160 R 0x4
INTERP 0x000194 0x08048194 0x08048194 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x002e8 0x002e8 R 0x1000
LOAD 0x001000 0x08049000 0x08049000 0x00228 0x00228 R E 0x1000
LOAD 0x002000 0x0804a000 0x0804a000 0x0019c 0x0019c R 0x1000
LOAD 0x002f0c 0x0804bf0c 0x0804bf0c 0x00110 0x00114 RW 0x1000
[...]
The second load line tells you that the area 08049000 (VirtAddr) to 08049228 (VirtAddr + MemSiz) is readable and executable, and loaded from offset 1000 in the file. So again you can calculate that the address of main is 18d bytes into this load area, so it has to reside at offset 0x118d inside the executable. Let's test that:
$ ./test
Hello world!
$ echo -ne '\xc3' | dd of=test conv=notrunc bs=1 count=1 seek=$((0x118d))
1+0 records in
1+0 records out
1 byte copied, 0.0116672 s, 0.1 kB/s
$ ./test
$
Overwriting the first byte of main with 0xc3, the opcode for return (near) on x86, causes the program to not output anything anymore.
_start normally belongs to a module ( a *.o file) that is fixed (it is called differently on different systems, but a common name is crt0.o which is written in assembler.) That fixed code prepares the stack (normally the arguments and the environment are stored in the initial stack segment by the execve(2) system call) the mission of crt0.s is to prepare the initial C stack frame and call main(). Once main() ends, it is responsible of getting the return value from main and calling all the atexit() handlers to finish calling the _exit(2) system call.
The linking of crt0.o is normally transparent due to the fact that you always call the compiler to do the linking itself, so you normally don't have to add crt0.o as the first object module, but the compiler knows (lately, all this stuff has grown considerably, since we depend on architecture and ABIs to pass parameters between functions)
If you execute the compiler with the -v option, you'll get the exact command line it uses to call the linker and you'll get the secrets of the final memory map your program has on its first stages.

Advantage of different data section in c [duplicate]

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?
The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().
By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

Why does objcopy exclude one section but not another?

Background
I'm attempting to utilize a special section of SRAM on my STM32 device which is located at address 0x40006000. One way of doing this which I saw in ST's example code was just to simply create pointers whose value happened to live inside that section of RAM. What I'm trying to do is get the linker to manage static allocations in that section for me.
Basically, I'm going from something like this:
static uint16_t *buffer0 = ((uint16_t *)0x40006000);
static uint16_t *buffer1 = ((uint16_t *)0x40006080);
To something like this (which I think is far less breakable and not as hacky, though not as portable):
#define _PMA __attribute__((section(".pma"), aligned(2)))
static uint16_t _PMA buffer0[64];
static uint16_t _PMA buffer1[64];
In order to accomplish this, I've modified my linker script to have a new memory called "PMA" located at 0x40006000 and I have located the ".pma" section inside it as follows:
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
PMA (xrw) : ORIGIN = 0x40006000, LENGTH = 1024 /* This is the memory I added */
}
SECTIONS
{
.text
{
..blah blah blah..
} > FLASH
...more sections, like rodata and init_array..
/* Initialized data goes into RAM, load LMA copy after code */
.data
{
..blah blah blah with some linker symbols to denote the start and end..
} >RAM AT> FLASH
.bss
{
..blah blah blah..
} >RAM
.pma /* My new section! */
{
_pma_start = .;
. = ALIGN(2);
*(.pma)
*(.pma*)
} >PMA
}
What Happens
So this seems all fine and dandy, my stuff compiles and the map shows me that buffer0 and buffer1 are indeed placed at 0x40006000 and 0x40006080. Here is the output of the last bit of my makefile:
arm-none-eabi-gcc obj/usb_desc.o obj/usb_application.o obj/osc.o obj/usb.o obj/main.o obj/system_stm32f1xx.o obj/queue.o obj/list.o obj/heap_1.o obj/port.o obj/tasks.o obj/timers.o obj/startup_stm32f103x6.o -TSTM32F103X8_FLASH.ld -mthumb -mcpu=cortex-m3 --specs=nosys.specs -Wl,-Map,bin/blink.map -o bin/blink.elf
arm-none-eabi-objdump -D bin/blink.elf > bin/blink.lst
arm-none-eabi-size --format=SysV bin/blink.elf
bin/blink.elf :
section size addr
.isr_vector 268 134217728
.text 13504 134218000
.rodata 44 134231504
.ARM 8 134231548
.init_array 8 134231556
.fini_array 4 134231564
.data 1264 536870912
.jcr 4 536872176
.bss 1348 536872180
._user_heap_stack 1536 536873528
.pma 256 1073766400
.ARM.attributes 41 0
.debug_info 26748 0
.debug_abbrev 5331 0
.debug_aranges 368 0
.debug_line 5274 0
.debug_str 8123 0
.comment 29 0
.debug_frame 4988 0
Total 69146
arm-none-eabi-objcopy -R .stack -O binary bin/blink.elf bin/blink.bin
I see that .pma has 256 bytes used, just as I expected. The address looks correct as well. Now, when I ls the bin directory I'm greeted by this:
-rwxr-xr-x 1 kevin users 939548800 Nov 2 00:04 blink.bin*
-rwxr-xr-x 1 kevin users 221528 Nov 2 00:04 blink.elf*
That bin file is what I'm loading onto my chip's flash via openocd. It is an image of the flash starting at 0x08000000.
Sidenote: I actually attempted to load that bin file onto my chip before I had realized how huge it was...that didn't work obviously.
Here's what I get when I remove the PMA section:
arm-none-eabi-size --format=SysV bin/blink.elf
bin/blink.elf :
section size addr
.isr_vector 268 134217728
.text 13504 134218000
.rodata 44 134231504
.ARM 8 134231548
.init_array 8 134231556
.fini_array 4 134231564
.data 1392 536870912
.jcr 4 536872304
.bss 1348 536872308
._user_heap_stack 1536 536873656
.ARM.attributes 41 0
.debug_info 26748 0
.debug_abbrev 5331 0
.debug_aranges 368 0
.debug_line 5274 0
.debug_str 8123 0
.comment 29 0
.debug_frame 4988 0
Total 69018
And the file size is exactly as I would expect:
-rwxr-xr-x 1 kevin users 15236 Nov 2 00:09 blink.bin
-rwxr-xr-x 1 kevin users 198132 Nov 2 00:09 blink.elf
Question
As I understand it, what is going on here is that objcopy has just copied everything from 0x08000000 to 0x400060FF into that bin file. That is obviously not what I wanted to happen. I expected that it would just copy the flash.
Now, obviously I can just say -R .pma on my objcopy command and everything will be happy. However, what I'm curious about is how objcopy knows not to copy .data into the binary image. I noticed that running objcopy -R .data has the exact same result as running objcopy without that. Same thing with .bss. This tells me that it isn't the AT command in the linker script (which is the only real difference I can see between .data and .bss)
What can I do to make my .pma section behave the same way as .data or .bss from objcopy's perspective? Is there something interesting going on with .data/.bss in the intermediate elf file I'm using perhaps (see above for the linker command generating the elf file)?
Having defined your section as ".pma" most probably gave it the type "PROGBITS" (check with readelf), which indicates a section to be loaded on the target.
What you want/need is to define a section that doesn't have to be loaded on the target, like the ".bss" section, which has the type "NOBITS".
I frequently use the following section definition to avoid having certain buffers into the ".bss" section (as this slows down the startup phase due to the zero-initalization of the ".bss" section):
static uint8_t uart1_buffer_rx[4096] __attribute__((section(".noinit,\"aw\",%nobits#")));
I don't remember why I used the name ".noinit", but this sections appears after the ".bss" section.
In your case, it will probably help to add the "aw" and "nobits" flags after the ".pma" section declaration.

Interesting binary dump of executable file

For some reason I made simple program in C to output binary representation of given input:
int main()
{
char c;
while(read(0,&c,1) > 0)
{
unsigned char cmp = 128;
while(cmp)
{
if(c & cmp)
write(1,"1",1);
else
write(1,"0",1);
cmp >>= 1;
}
}
return 0;
}
After compilation:
$ gcc bindump.c -o bindump
I made simple test to check if program is able to print binary:
$ cat bindump | ./bindump | fold -b100 | nl
Output is following: http://pastebin.com/u7SasKDJ
I suspected the output to look like random series of ones and zeroes. However, output partially seems to be quite more interesting. For example take a look at the output between line 171 and 357. I wonder why there are lots of zeros in compare to other sections of executable ?
My architecture is:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 28
Stepping: 10
CPU MHz: 1000.000
BogoMIPS: 3325.21
Virtualization: VT-x
L1d cache: 24K
L1i cache: 32K
L2 cache: 512K
When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:
readelf -a bindump | less
For example, section .text contains CPU instructions, .data global variables, .bss uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt and .got which are jump tables, debugging information, etc.
Btw. it is much more convenient to examine the binary content of files with hexdump:
hexdump -C bindata | less
There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.
Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off – offset in the file is important here):
> readelf -a bindata
...
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[28] .shstrtab STRTAB 00000000 00074c 000106 00 0 0 1
[29] .symtab SYMTAB 00000000 000d2c 000440 10 30 45 4
...
You can examine the content of an individual section with -x:
> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................
You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048134 0 SECTION LOCAL DEFAULT 1
2: 08048148 0 SECTION LOCAL DEFAULT 2
3: 08048168 0 SECTION LOCAL DEFAULT 3
4: 0804818c 0 SECTION LOCAL DEFAULT 4
...
According to the specification (page 1-18), each entry has the following format:
typedef struct {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.
This is not really a programming question, but however...
A binary normally consists of different sections: code, data, debugging info, etc. Since these sections contents differ by type, I would pretty much expect them to look different.
I.e. the symbol table consists of address offsets in your binary. If I read your lspci correctly, you are on a 32-bit system. That means Each offset has four bytes, and given the size of your program, in most cases two of those bytes will be zero. And there are more effects like this.
You didn't strip your program, that means there's still lots of information (symbol table etc.) present in the binary. Try stripping the binary and have a look at it again.

Difference between data section and the bss section in C

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?
The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().
By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

Resources