GCC array opimization - c

I have a process with main() only and a lookup table that is mostly empty:
int arr[10] = {0, 0, 0, 0, 1, 0, 0, 1, 0, 0};
When I put this array in global area outside of main() and compile with gcc -O2 file.c, I get the following executable:
bash# size a.out
text data bss dec hex filename
1135 616 8 1759 6df a.out
When I put this array inside main() function and compile with gcc -O2 file.c, I get the following executable:
bash# size a.out
text data bss dec hex filename
1135 560 8 1703 6a7 a.out
Then, I change the size of the array to 10000, without modifying the contents, and run the test again. This time the results are:
Outside main():
bash# size a.out
text data bss dec hex filename
1135 40576 8 41719 a2f7 a.out
Inside main():
bash# size a.out
text data bss dec hex filename
1135 560 8 1703 6a7 a.out
Why the optimization is not working when the array is in global area.
Is there a way to keep a large mostly empty lookup table in global area and still have it optimized??

/*have it start emtpy so it can go into .bss*/
int arr[10000];
//__attribute__((constructor))
void arr__init(void)
{
//set the ones
arr[4]=1; arr[7]=1;
}
int main()
{
//call the initializer
//(or uncomment the constructor attr to have it called before main automatically (nonstandard))
arr__init();
return arr[4]+arr[7]+arr[2];
}
size call on the object file:
text data bss dec hex filename
148 0 40000 40148 9cd4 a.out

Related

Why has the .bss segment not increased when variables are added?

Recently,I learned that the .bss segment store uninitialized data. However, when I try a small program as below and use size(1) command in terminal, the .bss segment didn't change, even if I add some global variables. Do I misunderstand something?
jameschu#aspire-e5-573g:~$ cat test.c
#include <stdio.h>
int main(void)
{
printf("hello world\n");
return 0;
}
jameschu#aspire-e5-573g:~$ gcc -c test.c
jameschu#aspire-e5-573g:~$ size test.o
text data bss dec hex filename
89 0 0 89 59 test.o
jameschu#aspire-e5-573g:~$ cat test.c
#include <stdio.h>
int a1;
int a2;
int a3;
int main(void)
{
printf("hello world\n");
return 0;
}
jameschu#aspire-e5-573g:~$ gcc -c test.c
jameschu#aspire-e5-573g:~$ size test.o
text data bss dec hex filename
89 0 0 89 59 test.o
This is because the way global variables work.
The problem that is being solved is that it is possible to declare a global variable, without initializing it, in several .c files and not getting a duplicate symbol error. That is, every global uninitialized declaration works like a weak declaration, that can be considered external if no other declaration contains an initialization.
How it this implemented by the compiler? Easy:
when compiling, instead of adding that variable in the bss segment it will be added to the COMMON segment.
when linking, however, it will merge all the COMMON variables with the same name and discard anyone that is already in other section. The remaining ones will be moved to the bss of the executable.
And that is why you don't see your variables in the bss of the object file, but you do in the executable file.
You can check the contents of the object sections using a more modern alternative to size, such as objdump -x. And note how the variables are placed in *COM*.
It is worth noting that if you declare your global variable as static you are saying that the variable belongs to that compilation unit, so the COMMON is not used and you get the behavior you expect:
int a;
int b;
static int c;
$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o
Initializing to 0 will get a similar result.
int a;
int b;
int c = 0;
$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o
However initializing to anything other than 0 will move that variable to data:
int a;
int b = 1;
int c = 0;
$ size test.o
text data bss dec hex filename
91 4 4 99 5f test.o

Ambiguous behaviour of .bss segment in C program

I wrote the simple C program (test.c) below:-
#include<stdio.h>
int main()
{
return 0;
}
and executed the follwing to understand size changes in .bss segment.
gcc test.c -o test
size test
The output came out as:-
text data bss dec hex filename
1115 552 8 1675 68b test
I didn't declare anything globally or of static scope. So please explain why the bss segment size is of 8 bytes.
I made the following change:-
#include<stdio.h>
int x; //declared global variable
int main()
{
return 0;
}
But to my surprise, the output was same as previous:-
text data bss dec hex filename
1115 552 8 1675 68b test
Please explain.
I then initialized the global:-
#include<stdio.h>
int x=67; //initialized global variable
int main()
{
return 0;
}
The data segment size increased as expected, but I didn't expect the size of bss segment to reduce to 4 (on the contrary to 8 when nothing was declared). Please explain.
text data bss dec hex filename
1115 556 4 1675 68b test
I also tried the comands objdump, and nm, but they too showed variable x occupying .bss (in 2nd case). However, no change in bss size is shown upon size command.
I followed the procedure according to:
http://codingfox.com/10-7-memory-segments-code-data-bss/
where the outputs are coming perfectly as expected.
When you compile a simple main program you are also linking startup code.
This code is responsible, among other things, to init bss.
That code is the code that "uses" 8 bytes you are seeing in .bss section.
You can strip that code using -nostartfiles gcc option:
-nostartfiles
Do not use the standard system startup files when linking. The standard system libraries are used normally, unless -nostdlib or -nodefaultlibs is used
To make a test use the following code
#include<stdio.h>
int _start()
{
return 0;
}
and compile it with
gcc -nostartfiles test.c
Youll see .bss set to 0
text data bss dec hex filename
206 224 0 430 1ae test
Your first two snippets are identical since you aren't using the variable x.
Try this
#include<stdio.h>
volatile int x;
int main()
{
x = 1;
return 0;
}
and you should see a change in .bss size.
Please note that those 4/8 bytes are something inside the start-up code. What it is and why it varies in size isn't possible to tell without digging into all the details of mentioned start-up code.

understanding size command for data bss segment in C

I'm getting unexpected output from size command.
Afaik initialized global and static variables stored in data segment and uninitialized and initialized to 0 global/static variables stored in bss segment.
printf("%d",sizeof(int)); gives int size 4. However, bss and data segment is not increasing accordingly to 4.
#include <stdio.h>
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
#include <stdio.h>
int g; //uninitialised global variable so, stored in bss segment
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2528 14864 3a10 memory-layout.exe
why bss increased by 16 (2528 - 2512) instead of 4? (in above code)
#include <stdio.h>
int g=0; //initialised to 0 so, stored in bss segment
int main()
{
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.exe
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
there is no increment in bss in spite of using global variable. why's that?
#include <stdio.h>
int main()
{ static int g; //should be on bss segment
return 0;
}
C:\Program Files (x86)\Dev-Cpp\MinGW64\bin>size memory-layout.ex
text data bss dec hex filename
10044 2292 2512 14848 3a00 memory-layout.exe
no increment in bss segment in spite of using static variable, why?
and I have one more question, what dec represents here?
The first thing to consider is memory alignment. Variables and sections can be padded to make them sit on address boundaries. In the second example you are seeing an increase of 16 from the first, which suggests padding for 16-byte boundaries (2512 / 16 = 157, 2528 / 16 = 158). This is entirely implementation dependent.
As far as C is concerned, the second example differs from the third because the compiler cannot know if int g is a definition or just a declaration for an integer defined in another file (where it could be any value). It leaves a reference for the linker to deal with instead, which may lead to differences in padding.
In the third example, g is explicitly defined and set to 0, so the compiler knows to put this in the BSS section.
It's possible to demonstrate this with the generated assembly from my system:
with int g (no BSS section is defined in this case)
.comm g,4,4
This is a instruction for the linker to deal with the symbol, as the compiler cannot fully determine what to do with it.
with int g = 0
.bss
.align 4
.type g, #object
.size g, 4
g:
.zero 4
Here the compiler knows exactly what to do and so defines a BSS section for the symbol.
In my case, the linker resolves these identically. Both are placed in the BSS section at the same address, and so there is no difference in BSS size. You can examine the layout with a utility like nm.
nm -n file2 file3 | grep g$
000000000060103c B g
000000000060103c B g
i.e. on this system g is at the same address. Alternatively, with a debugger:
(gdb) info symbol 0x60103c
g in section .bss of /tmp/file2
Note also that in the final example the variable can be optimised out, since it has internal linkage.
As for dec, it is simply the sum of the sections in decimal.
This is from gcc on linux:
No Variable
text data bss dec hex filename
915 248 8 1171 493 none.out
Uninitialized Global
text data bss dec hex filename
915 248 12 1175 497 u_g.out
Initialized Global to 123
text data bss dec hex filename
915 252 8 1175 497 i_g.out
Initialized Local to 124
text data bss dec hex filename
915 252 8 1175 497 i_l.out
Initialized Global to 0
text data bss dec hex filename
915 248 12 1175 497 i_g_0.out
Initialized Local to 0
text data bss dec hex filename
915 248 12 1175 497 i_l_0.out
This is from mingw64 on Windows:
No Variable
text data bss dec hex filename
3173 1976 448 5597 15dd none.out
Uninitialized Global
text data bss dec hex filename
3173 1976 464 5613 15ed u_g.out
Initialized Global to 123
text data bss dec hex filename
3173 1976 448 5597 15dd i_g.out
Initialized Local to 124
text data bss dec hex filename
3173 1976 448 5597 15dd i_l.out
Initialized Global to 0
text data bss dec hex filename
3173 1976 480 5629 15fd i_g_0.out
Initialized Local to 0
text data bss dec hex filename
3173 1976 480 5629 15fd i_l_0.out
So although I don't have a final answer to the question (wouldn't fit in a comment), results make me suspect the executable file format of Windows and/or MinGW (i.e. not gcc).
BSS only contains static and global values which are not explicitly initialized. Even though you are explicitly initializing it to the same value to which it would be initialized if it were not initialized explicitly, the fact of explicit initialization means it doesn't belong in bss.

Why the int type takes up 8 bytes in BSS section but 4 bytes in DATA section

I am trying to learn the structure of executable files of C program. My environment is GCC and 64bit Intel processor.
Consider the following C code a.cc.
#include <cstdlib>
#include <cstdio>
int x;
int main(){
printf("%d\n", sizeof(x));
return 10;
}
The size -o a shows
text data bss dec hex filename
1134 552 8 1694 69e a
After I added another initialized global variable y.
int y=10;
The size a shows (where a is the name of the executable file from a.cc)
text data bss dec hex filename
1134 556 12 1702 6a6 a
As we know, the BSS section stores the size of uninitialized global variables and DATA stores initialized ones.
Why int takes up 8 bytes in BSS? The sizeof(x) in my code shows that the int actually takes up 4 bytes.
The int y=10 added 4 bytes to DATA which makes sense since int should take 4 bytes. But, why does it adds 4 bytes to BSS?
The difference between two size commands stays the same after deleting the two lines #include ....
Update:
I think my understanding of BSS is wrong. It may not store the uninitialized global variables. As the Wikipedia says "The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file." For example, even the one line C code int main(){} has bss 8.
Does the 8 or 16 of BSS comes from alignment?
It doesn't, it takes up 4 bytes regardless of which segment it's in. You can use the nm tool (from the GNU binutils package) with the -S argument to get the names and sizes of all of the symbols in the object file. You're likely seeing secondary affects of the compiler including or not including certain other symbols for whatever reasons.
For example:
$ cat a1.c
int x;
$ cat a2.c
int x = 1;
$ gcc -c a1.c a2.c
$ nm -S a1.o a2.o
a1.o:
0000000000000004 0000000000000004 C x
a2.o:
0000000000000000 0000000000000004 D x
One object file has a 4-byte object named x in the uninitialized data segment (C), while the other object file has a 4-byte object named x in the initialized data segment (D).

Memory map of C program with no global and local variables

I write a basic code as
#include<stdio.h>
int main(void)
{
return 0;
}
and check its size as
gcc -Wall test1.c
size a.out
text data bss dec hex filename
988 260 8 1256 4e8 a.out
Just for knowledge i want to know that i do not declare any variable global or local, initialize or uninitialized then why data and bss is shown as 260 and 8 respectively.
Is this for stack pointer and other variables required for code execution ?

Resources