I know that on ELF platforms, __attribute__((constructor)) uses the .ctors ELF section. Now I realized that the function attribute works with GCC on MinGW as well and I'm wondering how it is implemented.
For MinGW targets (and other COFF targets, like Cygwin) compiler just emits each constructor function address in .ctors COFF section:
$ cat c1.c
void c1() {
}
$ x86_64-w64-mingw32-gcc -c c1.c
$ objdump -x c1.o | grep ctors
# nothing
$ cat c1.c
__attribute__((constructor)) void c1() {
}
$ x86_64-w64-mingw32-gcc -c c1.c
$ objdump -x c1.o | grep ctors
5 .ctors 00000008 0000000000000000 0000000000000000 00000150 2**3
GNU ld linker (for MinGW targets) is then configured (via its default linker script) to combine these sections into regular .text section with __CTOR_LIST__ symbol pointing to the first item, and having the last item terminated with zero. (Probably .rdata section would be clearer since these are just addresses of functions, not CPU instructions, but for some reason .text is used. In fact LLVM LLD linker targeting MinGW places them in .rdata.)
LD linker:
$ x86_64-w64-mingw32-ld --verbose
...
.text ... {
...
__CTOR_LIST__ = .;
LONG (-1); LONG (-1);
KEEP (*(.ctors));
KEEP (*(.ctor));
KEEP (*(SORT_BY_NAME(.ctors.*)));
LONG (0); LONG (0);
...
...
}
Then it is up to C runtime library to run these constructors during initialization, by using this __CTOR_LIST__ symbol.
From mingw-w64 C runtime:
extern func_ptr __CTOR_LIST__[];
void __do_global_ctors (void)
{
// finds the last (zero terminated) item
...
// then runs from last to first:
for (i = nptrs; i >= 1; i--)
{
__CTOR_LIST__[i] ();
}
...
}
(also, it is very similar in Cygwin runtime)
This can be also seen in the debugger:
$ echo $MSYSTEM
MINGW64
$ cat c11.c
#include <stdio.h>
__attribute__((constructor))
void i1() {
puts("i 1");
}
int main() {
puts("main");
return 0;
}
$ gcc c11.c -o c11
$ gdb ./c11.exe
(gdb) b i1
(gdb) r
(gdb) bt
#0 0x00007ff603591548 in i1 ()
#1 0x00007ff6035915f2 in __do_global_ctors () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:44
#2 0x00007ff60359164f in __main () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:58
#3 0x00007ff60359139b in __tmainCRTStartup () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:313
#4 0x00007ff6035914f6 in mainCRTStartup () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:202
(gdb)
Note that in some environments (not MinGW and not Linux) it is instead the responsibility of GCC (its compiler runtime libgcc, more specifically its static part called crtbegin.o and crtend.o) and not C runtime to run these constructors.
Also, for comparison, on ELF targets (like Linux) GCC compiler used similar mechanism like the one described above for MinGW (it used ELF .ctors sections, although the rest was a bit different), but since GCC 4.7 (released in 2012) it uses slightly different mechanism (ELF .init_array section).
Related
I want to remove unused functions from code while compiling. Then I write some code (main.c):
#include <stdio.h>
const char *get1();
int main()
{
puts( get1() );
}
and getall.c:
const char *get1()
{
return "s97symmqdn-1";
}
const char *get2()
{
return "s97symmqdn-2";
}
const char *get3()
{
return "s97symmqdn-3";
}
Makefile
test1 :
rm -f a.out *.o *.a
gcc -ffunction-sections -fdata-sections -c main.c getall.c
ar cr libgetall.a getall.o
gcc -Wl,--gc-sections main.o -L. -lgetall
After run make test1 && objdump --sym a.out | grep get , I only find the next 2 lines output:
0000000000000000 l df *ABS* 0000000000000000 getall.c
0000000000400535 g F .text 000000000000000b get1
I guess the get2 and get3 was removed. But when I open the a.out by vim, I found s97symmqdn-1 s97symmqdn-2 s97symmqdn-3 exists.
Is the function get2 get3 removed really ? How I can remove the symbol s97symmqdn-2 s97symmqdn-3 ? Thank you for your reply.
My system is centos7 and gcc version is 4.8.5
The compilation options -ffunction-sections -fdata-sections and linkage option --gc-sections
are working correctly in your example. Your static library is superfluous, so it can
be simplified to:
$ gcc -ffunction-sections -fdata-sections -c main.c getall.c
$ gcc -Wl,--gc-sections main.o getall.o -Wl,-Map=mapfile
in which I'm also asking for the linker's mapfile.
The unused functions get2 and get3 are absent from the executable:
$ nm a.out | grep get
0000000000000657 T get1
and the mapfile shows that the unused function-sections .text.get2 and .text.get3 in which get2 and get3 are
respectively defined were discarded in the linkage:
mapfile (1)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
...
Nevertheless, as you found, all three of the string literals "s97symmqdn-(1|2|3)"
are in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
s97symmqdn-2
s97symmqdn-3
That is because -fdata-sections applies just to the same data objects that
__attribute__ ((__section__("name"))) applies to1, i.e. to the definitions
of variables that have static storage duration. It is not applied to anonymous string literals like your
"s97symmqdn-(1|2|3)". They are all just placed in the .rodata section as usual,
and there we find them:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06ed 73393773 796d6d71 646e2d31 00733937 s97symmqdn-1.s97
06fd 73796d6d 71646e2d 32007339 3773796d symmqdn-2.s97sym
070d 6d71646e 2d3300 mqdn-3.
--gc-sections does not allow the linker to discard .rodata from the program
because it is not an unused section: it contains "s97symmqdn-1", referenced
in the program by get1 as well as the unreferenced strings "s97symmqdn-2"
and "s97symmqdn-3"
Fix
To get these three string literals separated into distinct data sections, you
need to assign them to distinct named objects, e.g.
getcall.c (2)
const char *get1()
{
static const char s[] = "s97symmqdn-1";
return s;
}
const char *get2()
{
static const char s[] = "s97symmqdn-2";
return s;
}
const char *get3()
{
static const char s[] = "s97symmqdn-3";
return s;
}
If we recompile and relink with that change, we see:
mapfile (2)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
.rodata.s.1797
0x0000000000000000 0xd getall.o
.rodata.s.1800
0x0000000000000000 0xd getall.o
...
Now there are two new discarded data-sections, which contain
the two string literals we don't need, as we can see in the object file:
$ objdump -s -j .rodata.s.1797 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1797:
0000 73393773 796d6d71 646e2d32 00 s97symmqdn-2.
and:
$ objdump -s -j .rodata.s.1800 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1800:
0000 73393773 796d6d71 646e2d33 00 s97symmqdn-3.
Only the referenced string "s97symmqdn-1" now appears anywhere in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
and it is the only string in the program's .rodata:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06f0 73393773 796d6d71 646e2d31 00 s97symmqdn-1.
[1] Likewise, -function-sections has the same effect as qualifying the
definition of every function foo with __attribute__ ((__section__(".text.foo")))
I have a function in my C code that is being called implicitly, and getting dumped by the linker. how can I prevent this phenomena?
I'm compiling using gcc and the linker flag -gc-sections, and I don't want to exclude the whole file from the flag. I tried using attributes: "used" and "externally_visible" and neither has worked.
void __attribute__((section(".mySec"), nomicromips, used)) func(){
...
}
on map file I can see that the function has compiled but didn't linked. am I using it wrong? is there any other way to do it?
You are misunderstanding the used attribute
used
This attribute, attached to a function, means that code must be emitted for the function even if it appears that the function is not referenced...
i.e the compiler must emit the function definition even the function appears
to be unreferenced. The compiler will never conclude that a function is unreferenced
if it has external linkage. So in this program:
main1.c
static void foo(void){}
int main(void)
{
return 0;
}
compiled with:
$ gcc -c -O1 main1.c
No definition of foo is emitted at all:
$ nm main1.o
0000000000000000 T main
because foo is not referenced in the translation unit, is not external,
and so may be optimised out.
But in this program:
main2.c
static void __attribute__((used)) foo(void){}
int main(void)
{
return 0;
}
__attribute__((used)) compels the compiler to emit the local definition:
$ gcc -c -O1 main2.c
$ nm main2.o
0000000000000000 t foo
0000000000000001 T main
But this does nothing to inhibit the linker from discarding a section
in which foo is defined, in the presence of -gc-sections, even if foo is external, if that section is unused:
main3.c
void foo(void){}
int main(void)
{
return 0;
}
Compile with function-sections:
$ gcc -c -ffunction-sections -O1 main3.c
The global definition of foo is in the object file:
$ nm main3.o
0000000000000000 T foo
0000000000000000 T main
But after linking:
$ gcc -Wl,-gc-sections,-Map=mapfile main3.o
foo is not defined in the program:
$ nm a.out | grep foo; echo Done
Done
And the function-section defining foo was discarded:
mapfile
...
...
Discarded input sections
...
...
.text.foo 0x0000000000000000 0x1 main3.o
...
...
As per Eric Postpischil's comment, to force the linker to retain
an apparently unused function-section you must tell it to assume that the program
references the unused function, with linker option {-u|--undefined} foo:
main4.c
void __attribute__((section(".mySec"))) foo(void){}
int main(void)
{
return 0;
}
If you don't tell it that:
$ gcc -c main4.c
$ gcc -Wl,-gc-sections main4.o
$ nm a.out | grep foo; echo Done
Done
foo is not defined in the program. If you do tell it that:
$ gcc -c main4.c
$ gcc -Wl,-gc-sections,--undefined=foo main4.o
$ nm a.out | grep foo; echo Done
0000000000001191 T foo
Done
it is defined. There's no use for attribute used.
Apart from -u already mentioned here are two other ways to keep the symbol using GCC.
Create a reference to it without calling it
This approach does not require messing with linker scripts, which means it will work for hosted programs and libraries using the operating system's default linker script.
However it varies with compiler optimization settings and may not be very portable.
For example, in GCC 7.3.1 with LD 2.31.1, you can keep a function without actually calling it, by calling another function on its address, or branching on a pointer to its address.
bool function_exists(void *address) {
return (address != NULL);
}
// Somewhere reachable from main
assert(function_exists(foo));
assert(foo != NULL); // Won't work, GCC optimises out the constant expression
assert(&foo != NULL); // works on GCC 7.3.1 but not GCC 10.2.1
Another way is to create a struct containing function pointers, then you can group them all together and just check the address of the struct. I use this a lot for interrupt handlers.
Modify the linker script to keep the section
If you are developing a hosted program or a library, then it's pretty tricky to change the linker script.
Even if you do, its not very portable, for example gcc on OSX does not actually use the GNU linker since OSX uses the Mach-O format instead of ELF.
Your code already shows a custom section though, so it's possible you are working on an embedded system and can easily modify the linker script.
SECTIONS {
// ...
.mySec {
KEEP(*(.mySec));
}
}
I am compiling the below code with "-nostdlib". My understanding was that arm-none-eabi-gcc will not use the _start in "crt0.o" but it will use the user defined _start. For this I was expecting to create a start.S file and put the _start symbol.
But if I compile the below shown code without the _start symbol defined from my side, I am not getting any warning. I was expecting "warning: cannot find entry symbol _start;"
Questions:
1) Why am I not getting the warning ? From where did GCC get the _start symbol ?
2) If gcc got the _start symbol from a file from somewhere, could you let me know how to ask GCC to use the _start from my start.S file ?
$ cat test.c
int main()
{
volatile int i=0;
i = i+1;
return 0;
}
$ cat linker.ld
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 20K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
$ arm-none-eabi-gcc -Wall -Werror -O2 -mfpu=neon-vfpv4 -mfloat-abi=hard -march=armv7-a -mtune=cortex-a7 -nostdlib -T linker.ld test.c -o test.o
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 >(release) [ARM/embedded-4_9-branch revision 224288]
Compile and link with arm-none-eabi-gcc -v -Wall -Werror -O2.... to understand what the compiler is doing (and which crt0 it is using; that crt0 probably has a _start calling your main, also _start might be the default entry point for your linker)
Notice that -nostdlib is related to the (lack of) C standard library; perhaps you want to compile in a freestanding environment (see this), then use -ffreestanding (and in that case main has no particular meaning, you need to define your starting function[s], and no standard C functions like malloc or printf are available except perhaps setjmp).
Read the C99 standard n1256 draft. It explains what freestanding means in ยง5.1.2.1
This question already has answers here:
How can I make GCC compile the .text section as writable in an ELF binary?
(4 answers)
Closed 8 years ago.
I need to make .text segment of an executable ELF writable.
The program i need to modify is written in C and i can compile it. Any ideas?
Thanks A lot.
For the answer below, I'm going to use this test program:
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
printf ("Hello world\n");
void *m = main;
*((char *) m) = 0;
exit (0);
}
Compile with:
$ gcc -g -o test test.c
As expected:
$ gdb test
...
(gdb) run
Starting program: /home/amb/so/test
Hello world
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005a2 in main (argc=1, argv=0x7fffffffe628) at test.c:9
9 *((char *)m) = 0;
(gdb)
The obvious route here is to use the -Wl flag to gcc to pass -N or (aka --omagic) to the linker, i.e. gcc ... -Wl,--omagic ..., though this may have other undesirable results (e.g. disabling shared libraries). From the man page:
-N
--omagic
Set the text and data sections to be readable and writable. Also, do not page-align the
data segment, and disable linking against shared libraries. If the output format
supports Unix style magic numbers, mark the output as "OMAGIC". Note: Although a
writable text section is allowed for PE-COFF targets, it does not conform to the format
specification published by Microsoft.
Let's give that a go:
$ gcc --static -g -Wl,--omagic -o test test.c
$ ./test
Hello world
$
That works fine, but you've lost dynamic library support.
To keep dynamic library support, and retain a writable text segment, you should be able to use:
objcopy --writable-text ...
From the man page:
--writable-text
Mark the output text as writable. This option isn't meaningful for all object file
formats.
This ought to work, but doesn't, as objdump will verify. So here's a solution that gets a bit further than --writable-text which as OP has stated in the comments does not appear to do what it says on the tin^Wmanpage.
Let's see how the sections are marked:
$ gcc -g -o test test.
$ objdump -h test | fgrep -A1 .text
12 .text 00000192 0000000000400490 0000000000400490 00000490 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
Now let's get rid of that READONLY flag:
$ objcopy --set-section-flags .text=contents,alloc,load,code test test1
$ objdump -h test1 | fgrep -A1 .text
12 .text 00000192 0000000000400490 0000000000400490 00000490 2**4
CONTENTS, ALLOC, LOAD, CODE
and now READONLY has gone, as requested.
But:
$ gdb test1
...
(gdb) run
Starting program: /home/amb/so/test1
Hello world
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005a2 in main (argc=1, argv=0x7fffffffe628) at test.c:9
9 *((char *)m) = 0;
(gdb)
I suspect the issue here is that something else other than the ELF section name is making the section read-only when actually loaded. Which is probably why people are suggesting you use mprotect. Sorry not to have been more help.
I have followed some tutorials on the web and created my own kernel. It is booting on GRUB with QEMU succesfully. But I have the problem described in this SO question, and I cannot solve it. I can have that workaround described, but I also need to use global variables, it would make the job easier, but I do not understand what should I change in linker to properly use global variables and inline strings.
main.c
struct grub_signature {
unsigned int magic;
unsigned int flags;
unsigned int checksum;
};
#define GRUB_MAGIC 0x1BADB002
#define GRUB_FLAGS 0x0
#define GRUB_CHECKSUM (-1 * (GRUB_MAGIC + GRUB_FLAGS))
struct grub_signature gs __attribute__ ((section (".grub_sig"))) =
{ GRUB_MAGIC, GRUB_FLAGS, GRUB_CHECKSUM };
void putc(unsigned int pos, char c){
char* video = (char*)0xB8000;
video[2 * pos ] = c;
video[2 * pos + 1] = 0x3F;
}
void puts(char* str){
int i = 0;
while(*str){
putc(i++, *(str++));
}
}
void main (void)
{
char txt[] = "MyOS";
puts("where is this text"); // does not work, puts(txt) works.
while(1){};
}
Makefile:
CC = gcc
LD = ld
CFLAGS = -Wall -nostdlib -ffreestanding -m32 -g
LDFLAGS = -T linker.ld -nostdlib -n -melf_i386
SRC = main.c
OBJ = ${SRC:.c=.o}
all: kernel
.c.o:
#echo CC $<
#${CC} -c ${CFLAGS} $<
kernel: ${OBJ} linker.ld
#echo CC -c -o $#
#${LD} ${LDFLAGS} -o kernel ${OBJ}
clean:
#echo cleaning
#rm -f ${OBJ} kernel
.PHONY: all
linker.ld
OUTPUT_FORMAT("elf32-i386")
ENTRY(main)
SECTIONS
{
.grub_sig 0xC0100000 : AT(0x100000)
{
*(.grub_sig)
}
.text :
{
*(.text)
}
.data :
{
*(.data)void main (void)
}
.bss :
{
*(.bss)
}
/DISCARD/ :
{
*(.comment)
*(.eh_frame)
}
}
What works:
void main (void)
{
char txt[] = "MyOS";
puts(txt);
while(1) {}
}
What does not work:
1)
char txt[] = "MyOS";
void main (void)
{
puts(txt);
while(1) {}
}
2)
void main (void)
{
puts("MyOS");
while(1) {}
}
Output of assembly: (external link, because it is a little long) http://hastebin.com/gidebefuga.pl
If you look at objdump -h output, you'll see that virtual and linear addresses do not match for any of the sections. If you look at objdump -d output, you'll see that the addresses are all in the 0xC0100000 range.
However, you do not provide any addressing information in the multiboot header structure; you only provide the minimum three fields. Instead, the boot loader will pick a good address (1M on x86, i.e. 0x00100000, for both virtual and linear addresses), and load the code there.
One might think that that kind of discrepancy should cause the kernel to not run at all, but it just happens that the code generated by the above main.c does not use the addresses for anything except read-only constants. In particular, GCC generates jumps and calls that use relative addresses (signed offsets relative to the address of the next instruction on x86), so the code still runs.
There are two solutions, first one trivial.
Most bootloaders on x86 load the image at the smallest allowed virtual and linear address, 1M (= 0x00100000 = 1048576). Therefore, if you tell your linker script to use both virtual and linear addresses starting at 0x00100000, i.e.
.grub_sig 0x00100000 : AT(0x100000)
{
*(.grub_sig)
}
your kernel will Just Work. I have verified this fixes the issue you are having, after removing the extra void main(void) from your linker script, of course. To be specific, I constructed an 33 MB virtual disk, containing one ext2 partition, installed grub2 on it (using 1.99-21ubuntu3.10) and the above kernel, and ran the image successfully under qemu-kvm 1.0 (1.0+noroms-0ubuntu14.11).
The second option is to set the bit 16 in the multiboot flags, and supply the five additional words necessary to tell the bootloader where the code expects to be resident. However, 0xC0100000 will not work -- at least grub2 will just freak out and reboot --, whereas something like 0x00200000 does work fine. This is because multiboot is really designed to use virtual == linear addresses, and there may be other stuff already present at the highest addresses (similar to why addresses below 1M is avoided).
Note that the boot loader does not provide you with a stack, so it's a bit of a surprise the code works at all.
I personally recommend you use a simple assembler file to construct the signature, and reserve some stack space. For example, start.asm simplified from here,
BITS 32
EXTERN main
GLOBAL start
SECTION .grub_sig
signature:
MAGIC equ 0x1BADB002
FLAGS equ 0
dd MAGIC, FLAGS, -(MAGIC+FLAGS)
SECTION .text
start:
mov esp, _sys_stack ; End of stack area
call main
jmp $ ; Infinite loop
SECTION .bss
resb 16384 ; reserve 16384 bytes for stack
_sys_stack: ; end of stack
compile using
nasm -f elf start.asm -o start.o
and modify your linker script to use start instead of main as the entry point,
ENTRY(start)
Remove the multiboot stuff from your main.c, then compile and link to kernel using e.g.
gcc -Wall -nostdlib -ffreestanding -fno-stack-protector -O3 -fomit-frame-pointer -m32 -c main.c -o main.o
ld -T linker.ld -nostdlib -n -melf_i386 start.o main.o -o kernel
and you have a good start to work on your own kernel.
Questions? Comments?