I'm trying to write a script that parses .map files generated by GCC and help me figure out what the footprints of various libraries are in memory. (Github repository of what I have so far)
I'm trying to understand what the following means / how it should be read :
.debug_line 0x00000000 0x67b1
*(.debug_line .debug_line.* .debug_line_end)
.debug_line 0x00000000 0x105 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/../../../../msp430-elf/lib/crt0.o
.debug_line 0x00000105 0x4b5 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.bc_printf.constprop.1
0x000005ba 0x23 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.deferred_exec
0x000005dd 0x60 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line.text.startup.main
0x0000063d 0xfc CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
.debug_line_end
0x00000739 0x0 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
I've been assuming the first line, which is not indented, is some sort of section heading and that the address (0x0000000) somehow relates to where the contents of the section should be written. 0x00000000 somehow results in the contents being discarded.
The indented lines (those which start with .debug_line) also have an address. These addresses, when the symbol is going to be written to the binary, are absolute.
.text 0x000048d8 0x3312
0x000048d8 . = ALIGN (0x2)
*(.lower.text.* .lower.text)
0x000048d8 . = ALIGN (0x2)
*(.text .stub .text.* .gnu.linkonce.t.* .text:*)
.text 0x000048d8 0xa4 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/crtbegin.o
.text.bc_printf.constprop.1
0x0000497c 0x1a CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
How does the address in the section heading, or whatever the first line should be called, translate to the address in the actual symbols and so forth? i.e., How would 0x00000000 result in the symbols being discarded (even though the addresses they contain may indeed be valid writeable locations otherwise)
What does the indented line beginning with a *, i.e. *(.debug_line .debug_line.* .debug_line_end) mean? I'm fairly certain this line is critical to defining the treatment of the second last line in the first snippet, since .debug_line_end isn't defined all by itself in a way similar to how .debug_line is in the first line of the snippet. I can't seem to find any references to the structure of the file, though.
With those answers, I'd like to be able to understand how .eh_frame in the following snippet is resolved. I can post the full map file if it's necessary, but I suspect all the information needed is somehow encoded within the snippet, and at that within the top and bottom few lines only.
.rodata 0x00004400 0x464
0x00004400 . = ALIGN (0x2)
*(.plt)
0x00004400 . = ALIGN (0x2)
*(.lower.rodata.* .lower.rodata)
0x00004400 . = ALIGN (0x2)
*(.rodata .rodata.* .gnu.linkonce.r.* .const .const:*)
.rodata 0x00004400 0x11 CMakeFiles/firmware-msp430f5529.elf.dir/main.c.obj
*fill* 0x00004411 0x1
.rodata.tUsbRequestList
0x00004412 0x15c ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004412 tUsbRequestList
.rodata.stUsbHandle
0x0000456e 0x40 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x0000456e stUsbHandle
.rodata.report_desc_HID0
0x000045ae 0x24 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x000045ae report_desc_HID0
.rodata.abromStringDescriptor
0x000045d2 0x146 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x000045d2 abromStringDescriptor
.rodata.abromConfigurationDescriptorGroup
0x00004718 0xef ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004718 abromConfigurationDescriptorGroup
.rodata.abromDeviceDescriptor
0x00004807 0x12 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x00004807 abromDeviceDescriptor
*fill* 0x00004819 0x1
.rodata.report_desc_size
0x0000481a 0x2 ../peripherals/USB_config/libusb-config-msp430f5529.a(descriptors.c.obj)
0x0000481a report_desc_size
.rodata 0x0000481c 0x11 ../subsystem/libcontrol-iface-msp430f5529.a(control_iface.c.obj)
*fill* 0x0000482d 0x1
.rodata.GPIO_PORT_TO_BASE
0x0000482e 0x1c ../peripherals/driverlib/MSP430F5xx_6xx/libdriverlib-msp430f5529.a(gpio.c.obj)
.rodata 0x0000484a 0x7 ../lib/libprintf-msp430f5529.a(printf.c.obj)
*(.rodata1)
*(.eh_frame_hdr)
*(.eh_frame)
*fill* 0x00004851 0x3
.eh_frame 0x00004854 0x0 /opt/ti/gcc/bin/../lib/gcc/msp430-elf/4.9.1/crtbegin.o
Related
I have an object file of a C program which prints hello world, just for the question.
I am trying to understand using readelf utility or gdb or hexedit(I can't figure which tool is a correct one) where in the file does the code of function "main" starts.
I know using readelf that symbol _start & main occurs and the address where it is mapped in a virtual memory. Moreover, I also know what the size of .text section and the of coruse where entry point specified, i.e the address which the same of text section.
The question is - Where in the file does the code of function "main" starts? I tought that is the entry point and the offset of the text section but how I understand it the sections data, bss, rodata should be ran before main and it appears after section text in readelf.
Also I tought we should sum the size all the lines till main in symbol table, but I am not sure at all if it is correct.
Additional question which follow up this one is if I want to replace main function with NOP instrcutres or plant one ret instruction in my object file. how can I know the offset where I can do it using hexedit.
So, let's go through it step by step.
Start with this C file:
#include <stdio.h>
void printit()
{
puts("Hello world!");
}
int main(void)
{
printit();
return 0;
}
As the comments look like you are on x86, compile it as 32-bit non-PIE executable like this:
$ gcc -m32 -no-pie -o test test.c
The -m32 option is needed, because I am working at a x86-64 machine. As you already know, you can get the virtual memory address of main using readelf, objdump or nm, for example like this:
$ nm test | grep -w main
0804918d T main
Obviously, 804918d can not be an offset in the file that is just 15 kB big. You need to find the mapping between virtual memory addresses and file offsets. In a typical ELF file, the mapping is included twice. Once in a detailed form for linkers (as object files are also ELF files) and debuggers, and a second time in a condensed form that is used by the kernel for loading programs. The detailed form is the list of sections, consisting of section headers, and you can view it like this (the output is shortened a bit, to make the answer more readable):
$ readelf --section-headers test
There are 29 section headers, starting at offset 0x3748:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[...]
[11] .init PROGBITS 08049000 001000 000020 00 AX 0 0 4
[12] .plt PROGBITS 08049020 001020 000030 04 AX 0 0 16
[13] .text PROGBITS 08049050 001050 0001c1 00 AX 0 0 16
[14] .fini PROGBITS 08049214 001214 000014 00 AX 0 0 4
[15] .rodata PROGBITS 0804a000 002000 000015 00 A 0 0 4
[...]
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Here you find that the .text section starts at (virtual) address 08049050 and has a size of 1c1 bytes, so it ends at address 08049211. The address of main, 804918d is in this range, so you know main is a member of the text section. If you subtract the base of the text section from the address of main, you find that main is 13d bytes into the text section. The section listing also contains the file offset where the data for the text section starts. It's 1050, so the first byte of main is at offset 0x1050 + 0x13d == 0x118d.
You can do the same calculation using program headers:
$ readelf --program-headers test
[...]
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00160 0x00160 R 0x4
INTERP 0x000194 0x08048194 0x08048194 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x002e8 0x002e8 R 0x1000
LOAD 0x001000 0x08049000 0x08049000 0x00228 0x00228 R E 0x1000
LOAD 0x002000 0x0804a000 0x0804a000 0x0019c 0x0019c R 0x1000
LOAD 0x002f0c 0x0804bf0c 0x0804bf0c 0x00110 0x00114 RW 0x1000
[...]
The second load line tells you that the area 08049000 (VirtAddr) to 08049228 (VirtAddr + MemSiz) is readable and executable, and loaded from offset 1000 in the file. So again you can calculate that the address of main is 18d bytes into this load area, so it has to reside at offset 0x118d inside the executable. Let's test that:
$ ./test
Hello world!
$ echo -ne '\xc3' | dd of=test conv=notrunc bs=1 count=1 seek=$((0x118d))
1+0 records in
1+0 records out
1 byte copied, 0.0116672 s, 0.1 kB/s
$ ./test
$
Overwriting the first byte of main with 0xc3, the opcode for return (near) on x86, causes the program to not output anything anymore.
_start normally belongs to a module ( a *.o file) that is fixed (it is called differently on different systems, but a common name is crt0.o which is written in assembler.) That fixed code prepares the stack (normally the arguments and the environment are stored in the initial stack segment by the execve(2) system call) the mission of crt0.s is to prepare the initial C stack frame and call main(). Once main() ends, it is responsible of getting the return value from main and calling all the atexit() handlers to finish calling the _exit(2) system call.
The linking of crt0.o is normally transparent due to the fact that you always call the compiler to do the linking itself, so you normally don't have to add crt0.o as the first object module, but the compiler knows (lately, all this stuff has grown considerably, since we depend on architecture and ABIs to pass parameters between functions)
If you execute the compiler with the -v option, you'll get the exact command line it uses to call the linker and you'll get the secrets of the final memory map your program has on its first stages.
I found a solution, although I don't understand what went wrong. Here is the original question. The solution is at the end.
I am following this Raspberry PI OS tutorial with a few tweaks. As the title says, one assignment appears to fail.
Here is my C code:
extern int32_t __end;
static int32_t *arena;
void init() {
arena = &__end;
assert(0 != arena); // fails
...
The assert triggers! Surely the address shouldn't be 0. __end is declared in my linker script:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
.rodata : { *(.rodata) }
.data : { *(.data) }
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start=.;
.bss : { *(.bss) }
__bss_end=.;
/* First usable address for the allocator */
. = ALIGN(4);
__end = .;
}
Investigating in GDB (running it in QEMU):
Thread 1 hit Breakpoint 1, init () at os.c:75
75 arena = &__end;
(gdb) p &__end
$1 = (int32_t *) 0x9440
(gdb) p arena
$2 = (int32_t *) 0x0
(gdb) n
76 assert(0 != arena);
(gdb) p arena
$3 = (int32_t *) 0x0
GDB can find __end but my program cannot?
Here are a few other things I tried:
the tutorial's code works without an issue (implying that QEMU and the ARM compiler are working)
the assertion still fails when running without GDB (implying GDB is not the issue)
I am able to assign 0xccc to arena (implying arena is not the issue)
I am not able to assign &__end to a local variable (implying &__end is the issue).
As requested in the comments, this is how I tried to assign to a local variable:
void* arena2 = (void*)&__end;
assert(0 != arena2);
The assertion fails. In GDB:
Thread 1 hit Breakpoint 1, mem_init () at mem.c:77
77 void* arena2 = (void*)&__end;
(gdb) p arena2
$1 = (void *) 0x13
(gdb) p &__end
$2 = (int32_t *) 0x94a4
(gdb) n
78 assert(0 != arena2);
(gdb) p arena2
$3 = (void *) 0x0
(gdb) p &__end
$4 = (int32_t *) 0x94a4
assert(0 != &__end); succeeds (implying &__end is not the issue?)
N.B. This version of assert is not the same as the one in assert.h, but I don't think it causes the problem. It just checks a condition, prints the condition, and goes to a breakpoint. I can reproduce the issue in GDB with the assert commented out.
N.B.2. I previously included the ARM assembly of the C code in case there was a compiler bug
My solution is to edit the linker script to:
ENTRY(_start)
SECTIONS
{
/* Starts at LOADER_ADDR. 0x8000 is a convention. */
. = 0x8000;
__start = .;
.text : {
*(.text)
}
. = ALIGN(4096);
.rodata : { *(.rodata) }
. = ALIGN(4096);
.data : { *(.data) }
. = ALIGN(4096);
/* Define __bss_start and __bss_end for boot.s to set to 0 */
__bss_start = .;
.bss : { *(.bss) }
. = ALIGN(4096);
__bss_end = .;
/* First usable address for the allocator */
. = ALIGN(4096);
__end = .;
}
I don't understand why the additional ALIGNs are important.
The problem you're having here is because the "clear the BSS" loop in boot.S is also clearing some of the compiler-generated data in the ELF file that the C code is using at runtime. Notably, it is accidentally zeroing out the GOT (global offset table) which is in the .got ELF section and which is where the actual address of the __end label has been placed by the linker. So the linker correctly fills in the address in the ELF file, but then the boot.S code zeroes it, and when you try to read it from C then you get zero rather than what you were expecting.
Adding all that alignment in the linker script is probably working around this by coincidentally causing the GOT to not be in the area that gets zeroed.
You can see where the linker has put things by using 'objdump -x myos.elf'. In my test case based on the tutorial you link I see a SYMBOL TABLE which includes among other entries:
000080d4 l .bss 00000004 arena
00000000 l df *ABS* 00000000
000080c8 l O .got.plt 00000000 _GLOBAL_OFFSET_TABLE_
000080d8 g .bss 00000000 __bss_end
0000800c g F .text 00000060 kernel_main
00008000 g .text 00000000 __start
0000806c g .text.boot 00000000 _start
000080d8 g .bss 00000000 __end
00008000 g F .text 0000000c panic
000080c4 g .text.boot 00000000 __bss_start
So you can see that the linker script has set __bss_start to 0x80c4 and __bss_end to 0x80d8, which is a pity because the GOT is at 0x80c4/0x80c8. I think what has happened here is that because you didn't specify explicitly in your linker script where to put the .got and .got.plt sections, the linker has decided to put them after the __bss_start assignment and before the .bss section, so they get covered by the zeroing code.
You can see what the ELF file contents of the .got are with 'objdump --disassemble-all myos.elf', which among other things includes:
Disassembly of section .got:
000080c4 <.got>:
80c4: 000080d8 ldrdeq r8, [r0], -r8 ; <UNPREDICTABLE>
so you can see we have one GOT table entry, whose contents are the address 0x80d8 which is the __end value we want. When the boot.S code zeroes this out your C code reads a 0 rather than the constant it was expecting.
You should probably ensure that the bss start/end are at least 16-aligned, because the boot.S code works via a loop that clears 16 bytes at a time, but I think that if you fix your linker script to explicitly put the .got and .got.plt sections somewhere then you'll find you don't need the 4K alignments everywhere.
FWIW, I diagnosed this using: (1) the QEMU "-d in_asm,cpu,exec,int,unimp,guest_errors -singlestep" options to get a dump of register state and instruction execution and (2) objdump of the ELF file to figure out what the compiler's generated code was actually doing. I had a suspicion this was going to turn out to be either "accidentally zeroed data we shouldn't have" or "failed to include in the image or otherwise initialize data we should have" kind of bug, and so it turned out.
Oh, and the reason GDB was printing the right value for __end when your code wasn't was that GDB could just look directly in the debug/symbol info in the ELF file for the answer; it wasn't doing it by going via the in-memory GOT.
I'm trying to access the "rendezvous structure" (struct r_debug *)
in order to find the link map of a process.
But I keep running into invalid adresses and I really can't figure out
what's going on.
Here's how I go on trying to find it:
1. Get the AT_PHDR value from the auxiliary vector
2. Go through the program headers until I find the PT_DYNAMIC segment
3. Try to access the vaddr of that segment (PT_DYNAMIC) to get the dynamic tags
4. Iterate through the dynamic tags until I find DT_DEBUG. If I get here I should be done
The issue is I can't get past step 3 because the vaddr of the PT_DYNAMIC
segment always points to an invalid address.
What am I doing wrong ? Do I need to find the relocation of the vaddr ?
I have looked at the LLDB sources but I can't figure out how they got the address.
UPDATE: #EmployedRussian was right, I was looking at a position-independent executable.
His solution to calculate the relocation worked wonderfully.
What am I doing wrong ?
Most likely you are looking at position-independent executable. If your readelf -Wl a.out looks like this:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R 0x8
INTERP 0x000238 0x0000000000000238 0x0000000000000238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x016d28 0x016d28 R E 0x200000
LOAD 0x017250 0x0000000000217250 0x0000000000217250 0x0010d0 0x001290 RW 0x200000
DYNAMIC 0x017df8 0x0000000000217df8 0x0000000000217df8 0x0001e0 0x0001e0 RW 0x8
then you need to adjust Phdr_pt_dynamic.p_vaddr by the executable relocation address (the key is that the first Phdr_pt_load.p_vaddr == 0).
You can find this relocation address as the delta between AT_PHDR value in the aux vector and Phdr_pt_phdr.p_vaddr.
(Above I use Phdr_xxx as a shorthand for Phdr[j] with .p_type == xxx).
You are also doing it in much more complicated way than you have to: the address of the dynamic array is trivially available as _DYNAMIC[]. See this answer.
Background
I'm attempting to utilize a special section of SRAM on my STM32 device which is located at address 0x40006000. One way of doing this which I saw in ST's example code was just to simply create pointers whose value happened to live inside that section of RAM. What I'm trying to do is get the linker to manage static allocations in that section for me.
Basically, I'm going from something like this:
static uint16_t *buffer0 = ((uint16_t *)0x40006000);
static uint16_t *buffer1 = ((uint16_t *)0x40006080);
To something like this (which I think is far less breakable and not as hacky, though not as portable):
#define _PMA __attribute__((section(".pma"), aligned(2)))
static uint16_t _PMA buffer0[64];
static uint16_t _PMA buffer1[64];
In order to accomplish this, I've modified my linker script to have a new memory called "PMA" located at 0x40006000 and I have located the ".pma" section inside it as follows:
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 64K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
PMA (xrw) : ORIGIN = 0x40006000, LENGTH = 1024 /* This is the memory I added */
}
SECTIONS
{
.text
{
..blah blah blah..
} > FLASH
...more sections, like rodata and init_array..
/* Initialized data goes into RAM, load LMA copy after code */
.data
{
..blah blah blah with some linker symbols to denote the start and end..
} >RAM AT> FLASH
.bss
{
..blah blah blah..
} >RAM
.pma /* My new section! */
{
_pma_start = .;
. = ALIGN(2);
*(.pma)
*(.pma*)
} >PMA
}
What Happens
So this seems all fine and dandy, my stuff compiles and the map shows me that buffer0 and buffer1 are indeed placed at 0x40006000 and 0x40006080. Here is the output of the last bit of my makefile:
arm-none-eabi-gcc obj/usb_desc.o obj/usb_application.o obj/osc.o obj/usb.o obj/main.o obj/system_stm32f1xx.o obj/queue.o obj/list.o obj/heap_1.o obj/port.o obj/tasks.o obj/timers.o obj/startup_stm32f103x6.o -TSTM32F103X8_FLASH.ld -mthumb -mcpu=cortex-m3 --specs=nosys.specs -Wl,-Map,bin/blink.map -o bin/blink.elf
arm-none-eabi-objdump -D bin/blink.elf > bin/blink.lst
arm-none-eabi-size --format=SysV bin/blink.elf
bin/blink.elf :
section size addr
.isr_vector 268 134217728
.text 13504 134218000
.rodata 44 134231504
.ARM 8 134231548
.init_array 8 134231556
.fini_array 4 134231564
.data 1264 536870912
.jcr 4 536872176
.bss 1348 536872180
._user_heap_stack 1536 536873528
.pma 256 1073766400
.ARM.attributes 41 0
.debug_info 26748 0
.debug_abbrev 5331 0
.debug_aranges 368 0
.debug_line 5274 0
.debug_str 8123 0
.comment 29 0
.debug_frame 4988 0
Total 69146
arm-none-eabi-objcopy -R .stack -O binary bin/blink.elf bin/blink.bin
I see that .pma has 256 bytes used, just as I expected. The address looks correct as well. Now, when I ls the bin directory I'm greeted by this:
-rwxr-xr-x 1 kevin users 939548800 Nov 2 00:04 blink.bin*
-rwxr-xr-x 1 kevin users 221528 Nov 2 00:04 blink.elf*
That bin file is what I'm loading onto my chip's flash via openocd. It is an image of the flash starting at 0x08000000.
Sidenote: I actually attempted to load that bin file onto my chip before I had realized how huge it was...that didn't work obviously.
Here's what I get when I remove the PMA section:
arm-none-eabi-size --format=SysV bin/blink.elf
bin/blink.elf :
section size addr
.isr_vector 268 134217728
.text 13504 134218000
.rodata 44 134231504
.ARM 8 134231548
.init_array 8 134231556
.fini_array 4 134231564
.data 1392 536870912
.jcr 4 536872304
.bss 1348 536872308
._user_heap_stack 1536 536873656
.ARM.attributes 41 0
.debug_info 26748 0
.debug_abbrev 5331 0
.debug_aranges 368 0
.debug_line 5274 0
.debug_str 8123 0
.comment 29 0
.debug_frame 4988 0
Total 69018
And the file size is exactly as I would expect:
-rwxr-xr-x 1 kevin users 15236 Nov 2 00:09 blink.bin
-rwxr-xr-x 1 kevin users 198132 Nov 2 00:09 blink.elf
Question
As I understand it, what is going on here is that objcopy has just copied everything from 0x08000000 to 0x400060FF into that bin file. That is obviously not what I wanted to happen. I expected that it would just copy the flash.
Now, obviously I can just say -R .pma on my objcopy command and everything will be happy. However, what I'm curious about is how objcopy knows not to copy .data into the binary image. I noticed that running objcopy -R .data has the exact same result as running objcopy without that. Same thing with .bss. This tells me that it isn't the AT command in the linker script (which is the only real difference I can see between .data and .bss)
What can I do to make my .pma section behave the same way as .data or .bss from objcopy's perspective? Is there something interesting going on with .data/.bss in the intermediate elf file I'm using perhaps (see above for the linker command generating the elf file)?
Having defined your section as ".pma" most probably gave it the type "PROGBITS" (check with readelf), which indicates a section to be loaded on the target.
What you want/need is to define a section that doesn't have to be loaded on the target, like the ".bss" section, which has the type "NOBITS".
I frequently use the following section definition to avoid having certain buffers into the ".bss" section (as this slows down the startup phase due to the zero-initalization of the ".bss" section):
static uint8_t uart1_buffer_rx[4096] __attribute__((section(".noinit,\"aw\",%nobits#")));
I don't remember why I used the name ".noinit", but this sections appears after the ".bss" section.
In your case, it will probably help to add the "aw" and "nobits" flags after the ".pma" section declaration.
The alignment of .data and .bss is sometimes 4 bytes and sometimes it is 32 bytes.
Example 1: As per the last column in below output the alignment of bss and data is 32 bytes
bash-3.00$ readelf --sections libmodel.so
There are 39 section headers, starting at offset 0x1908a63c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[25] .data PROGBITS 01e221e0 1e211e0 26ca54 00 WA 0 0 32
[26] .bss NOBITS 0208ec40 208dc34 374178 00 WA 0 0 32
...
Example 2: As per the below output the alignment os .data and .bss is 4 bytes
bash-3.00$ readelf --sections ./a.out
There are 28 section headers, starting at offset 0x78c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[22] .data PROGBITS 0804956c 00056c 000034 00 WA 0 0 4
[23] .bss NOBITS 080495a0 0005a0 000004 00 WA 0 0 4
...
what determines the alignment for .bss and .data? Why is it sometimes 4 bytes and at other times 32 bytes?
I think those different align values were determined by respective ld scripts.
For libmodel.so: section .data align=16, section .bss align=16
For a.out: section .data align=4, section .bss align=4
Why is it sometimes 4 bytes and at other times 32 bytes?
Once an executable is loaded into memory, it is addressed via process virtual addresses. The alignment constraints are due to virtual address addressing. For example, if you look at the man page for elf, check out the description for sh_addralign. This is the reason why different elf objects specify different alignment requirements. You can experiment by changing the source for a.out to include a double and see if the alignment changes.
Note: This applies to memory alignment only. There is an alignment restriction for the actual on-disk file as well. Why? I am thinking this would be to help ease of mapping file data to in-core structures post reading the file in. Others can correct me if I am wrong here.
Update: Would like to clarify one issue. The virtual address alignment is necessary only because of the underlying memory access granularity enforced by the chipset. So, the same program compiled for different architectures may lead to diff alignment restrictions.
There's a really nice book, that will give you a great understanding of not only your question, but everything related to it.
The website of the book is here
Chapter n. 7 is the one you are looking for.
I don't know if you want a straight answer, or a more sophisticated one, but I hope it helps.
For bss the alignment is decided using size of data types, if it is not aligned then linker first align the variable address where the bss start can be placed.