understanding size command for data bss segment in C - c

I'm getting unexpected output from size command.
Afaik initialized global and static variables stored in data segment and uninitialized and initialized to 0 global/static variables stored in bss segment.
printf("%d",sizeof(int)); gives int size 4. However, bss and data segment is not increasing accordingly to 4.
#include <stdio.h>
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
#include <stdio.h>
int g; //uninitialised global variable so, stored in bss segment
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2528 14864 3a10 memory-layout.exe
why bss increased by 16 (2528 - 2512) instead of 4? (in above code)
#include <stdio.h>
int g=0; //initialised to 0 so, stored in bss segment
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
there is no increment in bss in spite of using global variable. why's that?
#include <stdio.h>
int main()
{ static int g; //should be on bss segment
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.ex
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
no increment in bss segment in spite of using static variable, why?
and I have one more question, what dec represents here?

The first thing to consider is memory alignment. Variables and sections can be padded to make them sit on address boundaries. In the second example you are seeing an increase of 16 from the first, which suggests padding for 16-byte boundaries (2512 / 16 = 157, 2528 / 16 = 158). This is entirely implementation dependent.
As far as C is concerned, the second example differs from the third because the compiler cannot know if int g is a definition or just a declaration for an integer defined in another file (where it could be any value). It leaves a reference for the linker to deal with instead, which may lead to differences in padding.
In the third example, g is explicitly defined and set to 0, so the compiler knows to put this in the BSS section.
It's possible to demonstrate this with the generated assembly from my system:
with int g (no BSS section is defined in this case)
.comm g,4,4
This is a instruction for the linker to deal with the symbol, as the compiler cannot fully determine what to do with it.
with int g = 0
.bss
.align 4
.type g, #object
.size g, 4
g:
.zero 4
Here the compiler knows exactly what to do and so defines a BSS section for the symbol.
In my case, the linker resolves these identically. Both are placed in the BSS section at the same address, and so there is no difference in BSS size. You can examine the layout with a utility like nm.
nm -n file2 file3 | grep g$
000000000060103c B g
000000000060103c B g
i.e. on this system g is at the same address. Alternatively, with a debugger:
(gdb) info symbol 0x60103c
g in section .bss of /tmp/file2
Note also that in the final example the variable can be optimised out, since it has internal linkage.
As for dec, it is simply the sum of the sections in decimal.

This is from gcc on linux:
No Variable
text data bss dec hex filename
915 248 8 1171 493 none.out
Uninitialized Global
text data bss dec hex filename
915 248 12 1175 497 u_g.out
Initialized Global to 123
text data bss dec hex filename
915 252 8 1175 497 i_g.out
Initialized Local to 124
text data bss dec hex filename
915 252 8 1175 497 i_l.out
Initialized Global to 0
text data bss dec hex filename
915 248 12 1175 497 i_g_0.out
Initialized Local to 0
text data bss dec hex filename
915 248 12 1175 497 i_l_0.out
This is from mingw64 on Windows:
No Variable
text data bss dec hex filename
3173 1976 448 5597 15dd none.out
Uninitialized Global
text data bss dec hex filename
3173 1976 464 5613 15ed u_g.out
Initialized Global to 123
text data bss dec hex filename
3173 1976 448 5597 15dd i_g.out
Initialized Local to 124
text data bss dec hex filename
3173 1976 448 5597 15dd i_l.out
Initialized Global to 0
text data bss dec hex filename
3173 1976 480 5629 15fd i_g_0.out
Initialized Local to 0
text data bss dec hex filename
3173 1976 480 5629 15fd i_l_0.out
So although I don't have a final answer to the question (wouldn't fit in a comment), results make me suspect the executable file format of Windows and/or MinGW (i.e. not gcc).

BSS only contains static and global values which are not explicitly initialized. Even though you are explicitly initializing it to the same value to which it would be initialized if it were not initialized explicitly, the fact of explicit initialization means it doesn't belong in bss.

Related

GCC array opimization

I have a process with main() only and a lookup table that is mostly empty:
int arr[10] = {0, 0, 0, 0, 1, 0, 0, 1, 0, 0};
When I put this array in global area outside of main() and compile with gcc -O2 file.c, I get the following executable:
bash# size a.out
text data bss dec hex filename
1135 616 8 1759 6df a.out
When I put this array inside main() function and compile with gcc -O2 file.c, I get the following executable:
bash# size a.out
text data bss dec hex filename
1135 560 8 1703 6a7 a.out
Then, I change the size of the array to 10000, without modifying the contents, and run the test again. This time the results are:
Outside main():
bash# size a.out
text data bss dec hex filename
1135 40576 8 41719 a2f7 a.out
Inside main():
bash# size a.out
text data bss dec hex filename
1135 560 8 1703 6a7 a.out
Why the optimization is not working when the array is in global area.
Is there a way to keep a large mostly empty lookup table in global area and still have it optimized??
/*have it start emtpy so it can go into .bss*/
int arr[10000];
//__attribute__((constructor))
void arr__init(void)
{
//set the ones
arr[4]=1; arr[7]=1;
}
int main()
{
//call the initializer
//(or uncomment the constructor attr to have it called before main automatically (nonstandard))
arr__init();
return arr[4]+arr[7]+arr[2];
}
size call on the object file:
text data bss dec hex filename
148 0 40000 40148 9cd4 a.out

Why different size of memory gets allocated to integer in BSS and in Data segment? [duplicate]

This question already has an answer here:
Why the int type takes up 8 bytes in BSS section but 4 bytes in DATA section
(1 answer)
Closed 5 years ago.
Please go through the following program -
#include <stdio.h>
void main()
{
}
Memory allocated for each segment is as follows(by using size command on Unix)-
text data bss dec hex filename
1040 484 16 1540 604 try
After declaration of global variable-
#include <stdio.h>
int i;
void main()
{
}
Memory allocated for each segment is as follows(by using size command on Unix)
Here variable 'i' has received memory in BSS(previously it was 16 and now it is 24)-
text data bss dec hex filename
1040 484 24 1548 60c try
After declaration of global variable and initializing it with 10-
#include <stdio.h>
int i=10;
void main()
{
}
Memory allocated for each segment is as follows(by using size command on Unix)
Here variable 'i' has received memory in data segment(previously it was 484 and now it is 488)-
text data bss dec hex filename
1040 488 16 1544 608 try
My question is why the global variable 'i' got the memory of size 8 bytes when it was stored in BSS but got 4 bytes when it was stored in data segment?
Why there is the difference in allocating memory to an integer in BSS and data segment?
why the global variable 'i' got the memory of size 8 bytes when it was stored in BSS but got 4 bytes when it was stored in data segment?
First, why 4 bytes in data segment?
As many folks already answered this - The .data segment contains any global or static variables that are initialized beforehand. An integer is of 4 bytes in size and that is reflecting in data segment size when you have global int i=10; in your program.
Now, why 8 bytes in .bss segment?
You are observing this behavior because of the default linker script of GNU linker GNU ld. You can get information about linker script here.
While linking, GNU linker (GNU ld) is using the default linker script.
The default linker script specifies the alignment for .bss segment.
If you want to see the default linker script, you can do it using command -
gcc -Wl,-verbose main.c
The output of this gcc command will contain following statement:
using internal linker script:
==================================================
// The content between these two lines is the default linker script
==================================================
In the default linker script, you can find the .bss section:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
Here, you can see . = ALIGN(. != 0 ? 64 / 8 : 1); which indicates the default alignment as 8 bytes.
The program:
#include <stdio.h>
int i;
void main()
{
}
when built with default linker script, 'i' get the memory of size 8 bytes in BSS because of 8 bytes alignment:
# size a.out
text data bss dec hex filename
1040 484 24 1548 60c a.out
[bss = 24 bytes (16 + 8)]
GNU linker provides a provision to pass your own linker script to it and in that case, it uses the script passed to it to build the target instead of default linker script.
Just to try this, you can copy the content of default linker script in a file and use this command to pass your linker script to GNU ld:
gcc -Xlinker -T my_linker_script main.c
Since you can have your own linker script, so you can make changes in it and see the change in behavior.
In the .bss section, change this . = ALIGN(. != 0 ? 64 / 8 : 1); to . = ALIGN(. != 0 ? 32 / 8 : 1);. This will change the default alignment from 8 bytes to 4 bytes. Now build your target using linker script with this change.
The output is:
# size a.out
text data bss dec hex filename
1040 484 20 1544 608 a.out
Here you can see bss size is 20 bytes (16 + 4) because of 4 bytes alignment.
Hope this answer your question.

Ambiguous behaviour of .bss segment in C program

I wrote the simple C program (test.c) below:-
#include<stdio.h>
int main()
{
return 0;
}
and executed the follwing to understand size changes in .bss segment.
gcc test.c -o test
size test
The output came out as:-
text data bss dec hex filename
1115 552 8 1675 68b test
I didn't declare anything globally or of static scope. So please explain why the bss segment size is of 8 bytes.
I made the following change:-
#include<stdio.h>
int x; //declared global variable
int main()
{
return 0;
}
But to my surprise, the output was same as previous:-
text data bss dec hex filename
1115 552 8 1675 68b test
Please explain.
I then initialized the global:-
#include<stdio.h>
int x=67; //initialized global variable
int main()
{
return 0;
}
The data segment size increased as expected, but I didn't expect the size of bss segment to reduce to 4 (on the contrary to 8 when nothing was declared). Please explain.
text data bss dec hex filename
1115 556 4 1675 68b test
I also tried the comands objdump, and nm, but they too showed variable x occupying .bss (in 2nd case). However, no change in bss size is shown upon size command.
I followed the procedure according to:
http://codingfox.com/10-7-memory-segments-code-data-bss/
where the outputs are coming perfectly as expected.
When you compile a simple main program you are also linking startup code.
This code is responsible, among other things, to init bss.
That code is the code that "uses" 8 bytes you are seeing in .bss section.
You can strip that code using -nostartfiles gcc option:
-nostartfiles
Do not use the standard system startup files when linking. The standard system libraries are used normally, unless -nostdlib or -nodefaultlibs is used
To make a test use the following code
#include<stdio.h>
int _start()
{
return 0;
}
and compile it with
gcc -nostartfiles test.c
Youll see .bss set to 0
text data bss dec hex filename
206 224 0 430 1ae test
Your first two snippets are identical since you aren't using the variable x.
Try this
#include<stdio.h>
volatile int x;
int main()
{
x = 1;
return 0;
}
and you should see a change in .bss size.
Please note that those 4/8 bytes are something inside the start-up code. What it is and why it varies in size isn't possible to tell without digging into all the details of mentioned start-up code.

Why the int type takes up 8 bytes in BSS section but 4 bytes in DATA section

I am trying to learn the structure of executable files of C program. My environment is GCC and 64bit Intel processor.
Consider the following C code a.cc.
#include <cstdlib>
#include <cstdio>
int x;
int main(){
printf("%d\n", sizeof(x));
return 10;
}
The size -o a shows
text data bss dec hex filename
1134 552 8 1694 69e a
After I added another initialized global variable y.
int y=10;
The size a shows (where a is the name of the executable file from a.cc)
text data bss dec hex filename
1134 556 12 1702 6a6 a
As we know, the BSS section stores the size of uninitialized global variables and DATA stores initialized ones.
Why int takes up 8 bytes in BSS? The sizeof(x) in my code shows that the int actually takes up 4 bytes.
The int y=10 added 4 bytes to DATA which makes sense since int should take 4 bytes. But, why does it adds 4 bytes to BSS?
The difference between two size commands stays the same after deleting the two lines #include ....
Update:
I think my understanding of BSS is wrong. It may not store the uninitialized global variables. As the Wikipedia says "The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file." For example, even the one line C code int main(){} has bss 8.
Does the 8 or 16 of BSS comes from alignment?
It doesn't, it takes up 4 bytes regardless of which segment it's in. You can use the nm tool (from the GNU binutils package) with the -S argument to get the names and sizes of all of the symbols in the object file. You're likely seeing secondary affects of the compiler including or not including certain other symbols for whatever reasons.
For example:
$ cat a1.c
int x;
$ cat a2.c
int x = 1;
$ gcc -c a1.c a2.c
$ nm -S a1.o a2.o
a1.o:
0000000000000004 0000000000000004 C x
a2.o:
0000000000000000 0000000000000004 D x
One object file has a 4-byte object named x in the uninitialized data segment (C), while the other object file has a 4-byte object named x in the initialized data segment (D).

Memory map of C program with no global and local variables

I write a basic code as
#include<stdio.h>
int main(void)
{
return 0;
}
and check its size as
gcc -Wall test1.c
size a.out
text data bss dec hex filename
988 260 8 1256 4e8 a.out
Just for knowledge i want to know that i do not declare any variable global or local, initialize or uninitialized then why data and bss is shown as 260 and 8 respectively.
Is this for stack pointer and other variables required for code execution ?

Resources