Interesting binary dump of executable file

Interesting binary dump of executable file - c

For some reason I made simple program in C to output binary representation of given input:
int main()
{
char c;
while(read(0,&c,1) > 0)
{
unsigned char cmp = 128;
while(cmp)
{
if(c & cmp)
write(1,"1",1);
else
write(1,"0",1);
cmp >>= 1;
}
}
return 0;
}
After compilation:
$ gcc bindump.c -o bindump
I made simple test to check if program is able to print binary:
$ cat bindump | ./bindump | fold -b100 | nl
Output is following: http://pastebin.com/u7SasKDJ
I suspected the output to look like random series of ones and zeroes. However, output partially seems to be quite more interesting. For example take a look at the output between line 171 and 357. I wonder why there are lots of zeros in compare to other sections of executable ?
My architecture is:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 28
Stepping: 10
CPU MHz: 1000.000
BogoMIPS: 3325.21
Virtualization: VT-x
L1d cache: 24K
L1i cache: 32K
L2 cache: 512K

When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:
readelf -a bindump | less
For example, section .text contains CPU instructions, .data global variables, .bss uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt and .got which are jump tables, debugging information, etc.
Btw. it is much more convenient to examine the binary content of files with hexdump:
hexdump -C bindata | less
There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.
Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off – offset in the file is important here):
> readelf -a bindata
...
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[28] .shstrtab STRTAB 00000000 00074c 000106 00 0 0 1
[29] .symtab SYMTAB 00000000 000d2c 000440 10 30 45 4
...
You can examine the content of an individual section with -x:
> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................
You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048134 0 SECTION LOCAL DEFAULT 1
2: 08048148 0 SECTION LOCAL DEFAULT 2
3: 08048168 0 SECTION LOCAL DEFAULT 3
4: 0804818c 0 SECTION LOCAL DEFAULT 4
...
According to the specification (page 1-18), each entry has the following format:
typedef struct {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.

This is not really a programming question, but however...
A binary normally consists of different sections: code, data, debugging info, etc. Since these sections contents differ by type, I would pretty much expect them to look different.
I.e. the symbol table consists of address offsets in your binary. If I read your lspci correctly, you are on a 32-bit system. That means Each offset has four bytes, and given the size of your program, in most cases two of those bytes will be zero. And there are more effects like this.
You didn't strip your program, that means there's still lots of information (symbol table etc.) present in the binary. Try stripping the binary and have a look at it again.

Related

How the virtual address of the process memory is calculated?

I'm write the following program to examine process memory layout:
#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/time.h>
#include <unistd.h>
#define CHAR_LEN 255
char filepath[CHAR_LEN];
char line[CHAR_LEN];
char address[CHAR_LEN];
char perms[CHAR_LEN];
char offset[CHAR_LEN];
char dev[CHAR_LEN];
char inode[CHAR_LEN];
char pathname[CHAR_LEN];
int main() {
printf("Hello world.\n");
sprintf(filepath, "/proc/%u/maps", (unsigned)getpid());
FILE *f = fopen(filepath, "r");
printf("%-32s %-8s %-10s %-8s %-10s %s\n", "address", "perms", "offset",
"dev", "inode", "pathname");
while (fgets(line, sizeof(line), f) != NULL) {
sscanf(line, "%s%s%s%s%s%s", address, perms, offset, dev, inode, pathname);
printf("%-32s %-8s %-10s %-8s %-10s %s\n", address, perms, offset, dev,
inode, pathname);
}
fclose(f);
return 0;
}
I compile the program as gcc -static -O0 -g -std=gnu11 -o test_helloworld_memory_map test_helloworld_memory_map.c -lpthread. I first run readelf -l test_helloworld_memory_map and obtain:
Elf file type is EXEC (Executable file)
Entry point 0x400890
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000c9e2e 0x00000000000c9e2e R E 200000
LOAD 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000001c98 0x0000000000003db0 RW 200000
NOTE 0x0000000000000190 0x0000000000400190 0x0000000000400190
0x0000000000000044 0x0000000000000044 R 4
TLS 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000000020 0x0000000000000050 R 8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000000148 0x0000000000000148 R 1
Section to Segment mapping:
Segment Sections...
00 .note.ABI-tag .note.gnu.build-id .rela.plt .init .plt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata __libc_subfreeres __libc_atexit .stapsdt.base __libc_thread_subfreeres .eh_frame .gcc_except_table
01 .tdata .init_array .fini_array .jcr .data.rel.ro .got .got.plt .data .bss __libc_freeres_ptrs
02 .note.ABI-tag .note.gnu.build-id
03 .tdata .tbss
04
05 .tdata .init_array .fini_array .jcr .data.rel.ro .got
Then, I run the program and obtain:
address perms offset dev inode pathname
00400000-004ca000 r-xp 00000000 fd:01 12551992 /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
006c9000-006cc000 rw-p 000c9000 fd:01 12551992 /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
006cc000-006ce000 rw-p 00000000 00:00 0 /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
018ac000-018cf000 rw-p 00000000 00:00 0 [heap]
7ffc2845c000-7ffc2847d000 rw-p 00000000 00:00 0 [stack]
7ffc28561000-7ffc28563000 r--p 00000000 00:00 0 [vvar]
7ffc28563000-7ffc28565000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
I'm confused about why the virtual address of memory segment is different from one shown in "/proc/[pid]/maps". For example, the virtual address of the 2nd memory segment is 0xc9eb8 shown by readelf but in the process memory, it is calculated to 0x6c9000. How's this calculation is done?
I know the linker specifies 0x400000 as the starting address of the first memory segment and process memory shows address aligned to the page size (4K) (e.g., 0xc9e2e is aligned to 0xca000 plus 0x400000). I think this has something to do with "Align" column shown by readelf. However, reading ELF header makes me confuse:
p_align This member holds the value to which the segments are
aligned in memory and in the file. Loadable process seg‐
ments must have congruent values for p_vaddr and p_offset,
modulo the page size. Values of zero and one mean no
alignment is required. Otherwise, p_align should be a pos‐
itive, integral power of two, and p_vaddr should equal
p_offset, modulo p_align.
In specific, what does the last sentence means?:
Otherwise, p_align should be a positive, integral power of two, and p_vaddr should equal p_offset, modulo p_align.
What's the calculation formula it is talking about?
Thanks much!

CPU address mapping has a "page" granularity, 4K is still a very common page size. /proc/$pid/maps shows you the OS mappings, it doesn't show you what addresses the process actually cares about inside the mapped ranges. Your process only cares about what starts at offset eb8 into the first mapped page, but the CPU (and hence the OS that's controlling it for you) can't be bothered to map down to byte granularity, and the linker knows it, so it sets up the disk file with cpu-page-sized blocks.

It means that for other than loadable segments, i.e. those without LOAD, the last n bits in the offset must match the last n in virtual address; and the value of the p_align field is the 1 << n.
For example, the stack says it can be placed anywhere, just that the address needs to be 16-aligned.
For loadable they need to be at least page-aligned. Take the second one from your example:
Offset VirtAddr
LOAD 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000001c98 0x0000000000003db0 RW 200000
Given page size of 4096, the last 12 bits of the offset must be the same as the last 12 bits of the virtual address. This is because a dynamic linker usually uses mmapto map the pages directly from the file into memory, and this can be only page-granular. So in fact the dynamic linker did map the first part of this range from the file.
006c9000-006cc000 rw-p 000c9000 fd:01 12551992
/home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
Further see that the file size is less than virtual size - the rest of the data will be zero mapped in the other mapping:
006cc000-006ce000 rw-p 00000000 00:00 0
/home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
If you read the bytes at 0x00000000006c9000 - 0x00000000006c9eb7 you should see the exact same bytes as those at 0x00000000004c9000 - 0x00000000006c9eb7, this is because the data segment and code segment come right after each other in the file without padding - this saves lots of disk space and actually helps saving ram too because the executable takes less space in the block device caches!

Advantage of different data section in c [duplicate]

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?

The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().

By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

ebpf - sections names

Is it mandatory to have unique names for every program section in bpf program? For instance, this program compiles fine with llvm-5.0 :
...
SEC("sockops")
int bpf1(struct bpf_sock_ops *sk_ops)
{
return 1;
}
SEC("sockops")
int bpf2(struct bpf_sock_ops *sk_ops)
{
return 1;
}
SEC("sockops")
int bpf3(struct bpf_sock_ops *sk_ops)
{
return 1;
}
SEC("sockops")
int bpf_main(struct bpf_sock_ops *sk_ops)
{
__u32 port = bpf_ntohl(sk_ops->remote_port);
switch (port) {
case 5000:
bpf_tail_call(sk_ops, &jmp_table, 1);
break;
case 6000:
bpf_tail_call(sk_ops, &jmp_table, 2);
break;
case 7000:
bpf_tail_call(sk_ops, &jmp_table, 3);
break;
}
sk_ops->reply = 0;
return 1;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
However, llvm-objdump reports only one program section:
$ llvm-objdump-5.0 -section-headers bpf_main.o
bpf_main.o: file format ELF64-BPF
Sections:
Idx Name Size Address Type
0 00000000 0000000000000000
1 .strtab 000000a5 0000000000000000
2 .text 00000000 0000000000000000 TEXT DATA
3 sockops 000001f8 0000000000000000 TEXT DATA
4 .relsockops 00000030 0000000000000000
5 maps 0000001c 0000000000000000 DATA
6 .rodata.str1.16 00000021 0000000000000000 DATA
7 .rodata.str1.1 0000000e 0000000000000000 DATA
8 license 00000004 0000000000000000 DATA
9 version 00000004 0000000000000000 DATA
10 .eh_frame 00000090 0000000000000000 DATA
11 .rel.eh_frame 00000040 0000000000000000
12 .symtab 00000138 0000000000000000
Is there a compiler option to give at least a warning?
UPDATE - so I understand that sections provided in the same bpf file must have unique names. Is it true in cases when bpf programs reside in different files and compiled independently, so that they could be loaded and call one another, e.g. tail calls?

Apparently the compiler concatenates everything in the same section (I simplified bpf_main to have it compile and to test).
$ readelf -x sockops bpf_prog.o
Hex dump of section 'sockops':
0x00000000 b7000000 01000000 95000000 00000000 ................
0x00000010 b7000000 01000000 95000000 00000000 ................
0x00000020 b7000000 01000000 95000000 00000000 ................
0x00000030 b7000000 00000000 95000000 00000000 ...............
From here, I cannot see how you intend to later retrieve the individual programs to load them properly. Usually, user space tools get one program from a section, and attempt to load them by passing the instructions to the kernel through the bpf() syscall; here, whatever program you use will probably try to load the concatenated four programs at once.
Is there a particular reason why you would like to have all programs in sections with the same name?
I have no idea for the warning in LLVM on such a thing. I guess not: people may have reason to put different stuff in a section. The case here is specific to BPF, and I do not think people usually try to put several programs in a same section. But this is just a guess, I may be wrong.
Regarding your update:
I do not believe using similar sections names in different object files is an issue. As long as your user space tool is able to retrieve the programs from object files and to perform relocation, it should work fine. The kernel does not see any section name anyway. What it takes from the attr argument is an array of instructions:
struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* checked when prog_type=kprobe */
__u32 prog_flags;
char prog_name[BPF_OBJ_NAME_LEN];
__u32 prog_ifindex; /* ifindex of netdev to prep for */
};
(from include/uapi/linux/bpf.h, note the insns attribute). For tail calls your user space code also needs to create the specific maps holding referenced to the programs you want to jump to (more bpf() calls). But there again, it's all the responsibility of user space to extract the information from the ELF files. Libbpf should be able to handle it correctly I think, but I have not tried it.
A side note to finish: I don't know what your use case is exactly, but since you are trying to compile your code in different object files and to use calls, you might be interested to learn that eBPF now supports “function calls”, i.e. you can define several (non-inline) functions, possibly in several ELF files, and call them in your program. More information in the cover letter. Again, I didn't have the time to experiment with it so far.

How is the alignment of .data and .bss determined

The alignment of .data and .bss is sometimes 4 bytes and sometimes it is 32 bytes.
Example 1: As per the last column in below output the alignment of bss and data is 32 bytes
bash-3.00$ readelf --sections libmodel.so
There are 39 section headers, starting at offset 0x1908a63c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[25] .data PROGBITS 01e221e0 1e211e0 26ca54 00 WA 0 0 32
[26] .bss NOBITS 0208ec40 208dc34 374178 00 WA 0 0 32
...
Example 2: As per the below output the alignment os .data and .bss is 4 bytes
bash-3.00$ readelf --sections ./a.out
There are 28 section headers, starting at offset 0x78c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[22] .data PROGBITS 0804956c 00056c 000034 00 WA 0 0 4
[23] .bss NOBITS 080495a0 0005a0 000004 00 WA 0 0 4
...
what determines the alignment for .bss and .data? Why is it sometimes 4 bytes and at other times 32 bytes?

I think those different align values were determined by respective ld scripts.
For libmodel.so: section .data align=16, section .bss align=16
For a.out: section .data align=4, section .bss align=4

Why is it sometimes 4 bytes and at other times 32 bytes?
Once an executable is loaded into memory, it is addressed via process virtual addresses. The alignment constraints are due to virtual address addressing. For example, if you look at the man page for elf, check out the description for sh_addralign. This is the reason why different elf objects specify different alignment requirements. You can experiment by changing the source for a.out to include a double and see if the alignment changes.
Note: This applies to memory alignment only. There is an alignment restriction for the actual on-disk file as well. Why? I am thinking this would be to help ease of mapping file data to in-core structures post reading the file in. Others can correct me if I am wrong here.
Update: Would like to clarify one issue. The virtual address alignment is necessary only because of the underlying memory access granularity enforced by the chipset. So, the same program compiled for different architectures may lead to diff alignment restrictions.

There's a really nice book, that will give you a great understanding of not only your question, but everything related to it.
The website of the book is here
Chapter n. 7 is the one you are looking for.
I don't know if you want a straight answer, or a more sophisticated one, but I hope it helps.

For bss the alignment is decided using size of data types, if it is not aligned then linker first align the variable address where the bss start can be placed.

Difference between data section and the bss section in C

When checking the disassembly of the object file through the readelf, I see the data and the bss segments contain the same offset address.
The data section will contain the initialized global and static variables. BSS will contain un-initialized global and static variables.
1 #include<stdio.h>
2
3 static void display(int i, int* ptr);
4
5 int main(){
6 int x = 5;
7 int* xptr = &x;
8 printf("\n In main() program! \n");
9 printf("\n x address : 0x%x x value : %d \n",(unsigned int)&x,x);
10 printf("\n xptr points to : 0x%x xptr value : %d \n",(unsigned int)xptr,*xptr);
11 display(x,xptr);
12 return 0;
13 }
14
15 void display(int y,int* yptr){
16 char var[7] = "ABCDEF";
17 printf("\n In display() function \n");
18 printf("\n y value : %d y address : 0x%x \n",y,(unsigned int)&y);
19 printf("\n yptr points to : 0x%x yptr value : %d \n",(unsigned int)yptr,*yptr);
20 }
output:
SSS:~$ size a.out
text data bss dec hex filename
1311 260 8 1579 62b a.out
Here in the above program I don't have any un-initialized data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes?
Also when I disassemble the object file,
EDITED :
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000110 0000cf 00 A 0 0 4
data, rodata and bss has the same offset address. Does it mean the rodata, data and bss refer to the same address?
Do Data section, rodata section and the bss section contain the data values in the same address, if so how to distinguish the data section, bss section and rodata section?

The .bss section is guaranteed to be all zeros when the program is loaded into memory. So any global data that is uninitialized, or initialized to zero is placed in the .bss section. For example:
static int g_myGlobal = 0; // <--- in .bss section
The nice part about this is, the .bss section data doesn't have to be included in the ELF file on disk (ie. there isn't a whole region of zeros in the file for the .bss section). Instead, the loader knows from the section headers how much to allocate for the .bss section, and simply zero it out before handing control over to your program.
Notice the readelf output:
[ 3] .data PROGBITS 00000000 000110 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000110 000000 00 WA 0 0 4
.data is marked as PROGBITS. That means there are "bits" of program data in the ELF file that the loader needs to read out into memory for you. .bss on the other hand is marked NOBITS, meaning there's nothing in the file that needs to be read into memory as part of the load.
Example:
// bss.c
static int g_myGlobal = 0;
int main(int argc, char** argv)
{
return 0;
}
Compile it with $ gcc -m32 -Xlinker -Map=bss.map -o bss bss.c
Look at the section headers with $ readelf -S bss
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
:
[13] .text PROGBITS 080482d0 0002d0 000174 00 AX 0 0 16
:
[24] .data PROGBITS 0804964c 00064c 000004 00 WA 0 0 4
[25] .bss NOBITS 08049650 000650 000008 00 WA 0 0 4
:
Now we look for our variable in the symbol table: $ readelf -s bss | grep g_myGlobal
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
Note that g_myGlobal is shown to be a part of section 25. If we look back in the section headers, we see that 25 is .bss.
To answer your real question:
Here in the above program I dont have any un-intialised data but the BSS has occupied 8 bytes. Why does it occupy 8 bytes ?
Continuing with my example, we look for any symbol in section 25:
$ readelf -s bss | grep 25
9: 0804825c 0 SECTION LOCAL DEFAULT 9
25: 08049650 0 SECTION LOCAL DEFAULT 25
32: 08049650 1 OBJECT LOCAL DEFAULT 25 completed.5745
37: 08049654 4 OBJECT LOCAL DEFAULT 25 g_myGlobal
The third column is the size. We see our expected 4-byte g_myGlobal, and this 1-byte completed.5745. This is probably a function-static variable from somewhere in the C runtime initialization - remember, a lot of "stuff" happens before main() is ever called.
4+1=5 bytes. However, if we look back at the .bss section header, we see the last column Al is 4. That is the section alignment, meaning this section, when loaded, will always be a multiple of 4 bytes. The next multiple up from 5 is 8, and that's why the .bss section is 8 bytes.
Additionally We can look at the map file generated by the linker to see what object files got placed where in the final output.
.bss 0x0000000008049650 0x8
*(.dynbss)
.dynbss 0x0000000000000000 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
*(.bss .bss.* .gnu.linkonce.b.*)
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crt1.o
.bss 0x0000000008049650 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crti.o
.bss 0x0000000008049650 0x1 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtbegin.o
.bss 0x0000000008049654 0x4 /tmp/ccKF6q1g.o
.bss 0x0000000008049658 0x0 /usr/lib/libc_nonshared.a(elf-init.oS)
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/32/crtend.o
.bss 0x0000000008049658 0x0 /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../lib/crtn.o
Again, the third column is the size.
We see 4 bytes of .bss came from /tmp/ccKF6q1g.o. In this trivial example, we know that is the temporary object file from the compilation of our bss.c file. The other 1 byte came from crtbegin.o, which is part of the C runtime.
Finally, since we know that this 1 byte mystery bss variable is from crtbegin.o, and it's named completed.xxxx, it's real name is completed and it's probably a static inside some function. Looking at crtstuff.c we find the culprit: a static _Bool completed inside of __do_global_dtors_aux().

By definition, the bss segment takes some place in memory (when the program starts) but don't need any disk space. You need to define some variable to get it filled, so try
int bigvar_in_bss[16300];
int var_in_data[5] = {1,2,3,4,5};
Your simple program might not have any data in .bss, and shared libraries (like libc.so) may have "their own .bss"
File offsets and memory addresses are not easily related.
Read more about the ELF specification, also use /proc/ (eg cat /proc/self/maps would display the address space of the cat process running that command).
Read also proc(5)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight