Let's say I have the following assembly program:
.globl _start
_start:
mov $1, %eax
int $0x80
And I assemble/link it with:
$ as file.s
$ ld a.out -o a
This will run fine, and return the status code of 0 to linux. However, when I remove the line .globl start I get the following error:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078
What does 0000000000400078 mean? And also, if ld expects the _start symbol on entry, why is it even necessary to declare .globl _start ?
However, when I remove the line .globl _start ...
The .globl line means that the name _start is "visible" outside the file file.s. If you remove that line, the name _start is only for use inside the file file.s and in a larger program (containing multiple files) you could even use the name _start in multiple files.
(This is similar to static variables in C/C++: If you generate assembler code from C or C++, the difference between real global variables and static variables is that there is a .globl line for the global variables and no .globl line for static variables. And if you are familiar with C, you know that static variables cannot be used in other files.)
The linker (ld) is also not able to use the name _start if it can be used inside the file only.
What does 0000000000400078 mean?
Obviously 0x400078 is the address of the first byte of your program. ld assumes that the program starts at the first byte if no symbol named _start is found.
... why is it even necessary to declare .globl _start?
It is not guaranteed that _start is located at the first byte of your program.
Counterexample:
.globl _start
write_stdout:
mov $4, %eax
mov $1, %ebx
int $0x80
ret
exit:
mov $1, %eax
mov $0, %ebx
int $0x80
jmp exit
_start:
mov $text, %ecx
mov $(textend-text), %edx
call write_stdout
mov $text2, %ecx
mov $(textend2-text2), %edx
call write_stdout
call exit
text:
.ascii "Hello\n"
textend:
text2:
.ascii "World\n"
textend2:
If you remove the .globl line, ld will not be able to find the _start: line and assume that your program starts at the first byte - which is the write_stdout: line!
... and if you have multiple .s files in a larger program (or even a combination of .s, .c and .cc), you don't have control about which code is located at the first byte of your program!
Related
The problem
I'm currently reading this book and with the chapter about dynamic linking with the following code:
link_example.s
.globl main
.section .data
output:
.ascii "Yeet\n\0"
.section .text
main:
enter $0, $0
movq stdout, %rdi
movq $output, %rsi
call fprintf
movq $0, %rax
leave
ret
Now according to the book, I need to compile it as follows to link the C-library dynamically:
gcc -rdynamic link_example.s -o link_example
But I'm getting the following error message:
/usr/bin/ld: /tmp/cchUlvqS.o: relocation R_X86_64_32S against symbol `stdout##GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
What am I doing wrong?
What have you tried?
Adding -fPIE flag
I tried what the compiler suggested by adding the -fPIE flag:
gcc -rdynamic -fPIE link_example.s -o link_example
but I'm still getting the same error again.
Searching similar posts
I found a similar post which said, that I just need to use the -shared flag:
gcc -shared link_example.s -o link_example
but this gives me:
/usr/bin/ld: /tmp/ccxktZan.o: relocation R_X86_64_32S against symbol `stdout##GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
and if I add the -fPIC flag:
gcc -shared -fPIC link_example.s -o link_example
then I'm getting this:
/usr/bin/ld: /tmp/ccKIQ9sl.o: relocation R_X86_64_32S against symbol `stdout##GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
Let me show you how to fix up the assembly language from your book so it works with your compiler's default settings.
As the comments on the question say, the problem is that your compiler defaults to generating position-independent executables. This means the addresses of stdout, fprintf, and output are not known at link time, so the linker is unable to "relocate" the instructions that refer to them.
What is known at link time, however, is the offsets between the addresses of these things and the program counter. That means, if you just write the assembly a little differently, it will work. Like this:
.globl main
.section .data
output:
.ascii "Yeet\n\0"
.section .text
main:
enter $0, $0
movq stdout(%rip), %rdi
leaq output(%rip), %rsi
call fprintf#PLT
movq $0, %rax
leave
ret
Notice that the change is a little different for all three. mov stdout, %rdi becomes mov stdout(%rip), %rdi -- just a different addressing mode for the same instruction. Loading from memory at the fixed address stdout becomes loading from memory at the fixed displacement stdout from the RIP register (aka the program counter). Loading the fixed address output with mov $output, %rsi, on the other hand, becomes lea output(%rip), %rsi. I would suggest you think of this one as always having been a load-effective-address operation, but the old code, with the executable at a fixed address, was able to express that operation with move-immediate instead of an actual lea instruction. Finally, call fprintf becomes call fprintf#PLT. This is telling the linker that the call needs to go through the procedure linkage table -- your book should explain what this is and why it's needed.
Incidentally, I see several other problems with this assembly language, of which the most important are:
The string "Yeet\n\0" belongs in the read-only data section.
The x86-64 ABI says that variadic functions like fprintf need to be told the number of floating point arguments they are receiving, by setting eax appropriately.
enter and leave are unnecessary on x86-64. (Also, enter is a painfully slow microcoded instruction that shouldn't be used at all.)
I would have written something like this instead:
.section .rodata, "a", #progbits
.output:
.string "Yeet\n"
.section .text, "ax", #progbits
.globl main
.type main, #function
main:
sub $8, %rsp
mov stdout(%rip), %rdi
lea .output(%rip), %rsi
xor %eax, %eax
call fprintf#PLT
xor %eax, %eax
add $8, %rsp
ret
(You need to subtract 8 from %rsp at the beginning of the function, and add it back afterward, because the ABI says that %rsp must always be a multiple of 16 at the point of a call instruction -- which means that it's not a multiple of 16 on entry to any function, but instead %rsp mod 16 is 8, because call pushes eight bytes (the return address). You were getting this for free as a side effect of the enter and leave, but take those out and you have to do it by hand.)
following is a sample code for GCC variable attribute extension,
#include<stdio.h>
int main(void){
int sam __attribute__((unused))= 10;
int p = sam+1;
printf("\n%d" , p);
}
for the assembly code of above program generated using:
gcc -S sample.c
the .s file dosen't contain the variable sam in it,whereas the output of program is "11" which is correct.
So does the compiler neglect completely the unused variable and not output it in the executable? If so why is the output of program correct?Can anyone explain the working of unused and used variable attributes in gcc.
Thanks
So does the compiler neglect completely the unused variable
Depends what you mean by "neglect". The compiler optimizes the code, but doesn't completely ignore the variable, as otherwise compiler couldn't calculate the result.
not output it in the executable?
Yes.
If so why is the output of program correct?
Because this is what a compiler does - generates programs with the output as described by the programming language. Code is not 1:1 to assembly, code is a language that describes the behavior of a program. Theoretically, as long as the output of the compiled program is correct with what is in the source code, compiler can generate any assembly instructions it wants.
You may want to research the terms side effect and the as-if rule in the context of C programming language.
Can anyone explain the working of unused and used variable attributes in gcc.
There is no better explanation then in GCC documentation about Variable Attributes:
unused
This attribute, attached to a variable, means that the variable is meant to be possibly unused. GCC does not produce a warning for this variable.
used
This attribute, attached to a variable with static storage, means that the variable must be emitted even if it appears that the variable is not referenced.
When applied to a static data member of a C++ class template, the attribute also means that the member is instantiated if the class itself is instantiated.
The attribute unused is to silence the compiler warning from -Wunused-* warnings.
The attribute used is meant to be used on variables (but I think works also on functions) so that the variable is generated to the assembly code. even if it is not unused anywhere.
try compiling with no optimisation
gcc -O0 sample.c -S -masm=intel
generated assembly code
.file "sample.c"
.intel_syntax noprefix
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "\12%d\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
mov DWORD PTR -4[rbp], 10
mov eax, DWORD PTR -4[rbp]
add eax, 1
mov DWORD PTR -8[rbp], eax
mov eax, DWORD PTR -8[rbp]
mov edx, eax
lea rcx, .LC0[rip]
call printf
mov eax, 0
add rsp, 48
pop rbp
ret
.seh_endproc
.ident "GCC: (GNU) 10.2.0"
.def printf; .scl 2; .type 32; .endef
I'm just trying to load the value of myarray[0] to eax:
.text
.data
# define an array of 3 words
array_words: .word 1, 2, 3
.globl main
main:
# assign array_words[0] to eax
mov $0, %edi
lea array_words(,%edi,4), %eax
But when I run this, I keep getting seg fault.
Could someone please point out what I did wrong here?
It seems the label main is in the .data section.
It leads to a segmentation fault on systems that doesn't allow to execute code in the .data section. (Most modern systems map .data with read + write but not exec permission.)
Program code should be in the .text section. (Read + exec)
Surprisingly, on GNU/Linux systems, hand-written asm often results in an executable .data unless you're careful to avoid that, so this is often not the real problem: See Why data and stack segments are executable? But putting code in .text where it belongs can make some debugging tools work better.
Also you need to ret from main or call exit (or make an _exit system call) so execution doesn't fall off the end of main into whatever bytes come next. See What happens if there is no exit system call in an assembly program?
You need to properly terminate your program, e.g. on Linux x86_64 by calling the sys_exit system call:
...
main:
# assign array_words[0] to eax
mov $0, %edi
lea array_words(,%edi,4), %eax
mov $60, %rax # System-call "sys_exit"
mov $0, %rdi # exit code 0
syscall
Otherwise program execution continues with the memory contents following your last instruction, which are most likely in all cases invalid instructions (or even invalid memory locations).
I'm getting "no such instruction" errors when compiling a .s file with this command:
$ gcc -s -o scall scall.s
scall.s: Assembler messages:
scall.s:2: Error: no such instruction: `section '
scall.s:4: Error: no such instruction: `global _start'
scall.s:7: Error: unsupported instruction `mov'
scall.s:8: Error: unsupported instruction `mov'
scall.s:11: Error: operand size mismatch for `int'
scall.s:13: Error: no such instruction: `section .data'
scall.s:15: Error: no such instruction: `msglength .word 12'
Here is the code of the file:
section .text
global _start
_start:
mov 4,%eax
mov 1,%ebx
mov $message,%ecx
mov $msglength,%edx
int $0x80
section .data
message: .ascii "Hello world!"
msglength .word 12
How can I get rid of the errors?
I think the following code will compile ("gcc" can compile .s and .S files and link them with C library by default but "as" do the same and don't link code with C library)
as :
.section .text
.global _start
_start:
mov $4,%eax
mov $1,%ebx
mov $message,%ecx
mov msglength,%edx
int $0x80
mov $1, %eax
mov $0, %ebx
int $0x80
.section .data
message: .ascii "Hello world!"
msglength: .word 12
gcc:
.section .text
.global main
main:
mov $4,%eax
mov $1,%ebx
mov $message,%ecx
mov msglength,%edx
int $0x80
mov $1, %eax
mov $0, %ebx
int $0x80
.section .data
message: .ascii "Hello world!"
msglength: .word 12
Correct it as follows and compile it with -c param gcc -c test.s -o test
.text
_start:
.global main
main:
mov 4,%eax
mov 1,%ebx
mov $message,%ecx
mov $msglength,%edx
int $0x80
.data
message: .ascii "Hello world!"
msglength: .word 12
I am writing a simple program in assembler on Linux x86_64 (GAS syntax). I have to read a number that coded in binary system and saved in a text file. So, I have my text file "data.txt" (it's in the same directory as my source file) and below is the most important fragment of my code:
SYS_WRITE = 4
EXIT_SUCCESS = 0
SYS_READ = 3
SYS_OPEN = 5
.data
BIN_LEN = 24
.comm BIN, BIN_LEN
BIN: .space BIN_LEN, 0
.text
PATH: .ascii "data.txt\0"
.global _start
_start:
mov $SYS_OPEN, %eax # open
mov $PATH, %ebx # path
mov $0, %ecx # read only
mov $0666, %edx # mode
int $0x80 # call (open file)
mov $SYS_READ, %eax # reading
mov $3, %ebx # descriptor
mov $BIN, %ecx # bufor
mov $BIN_LEN, %edx # bufor size
int $0x80 # call (read line from file)
After calling the second syscall, the %eax register should contain the number of read bytes.
In my file "data.txt" I have "10101", but when I debug my program with gdb, it shows that the is -11 in %eax, so there was some kind of an error. But I am sure that "10101" was loaded to the buffer (BIN), because when I want to display what the buffer has inside, there is properly written number from the file. I need the number of read bytes to the further algorithm. I have no idea why %eax contains error code instead of the number of bytes loaded to the buffer. I wonder if it may be connected with calling syscall with 32-bit registers, but in all other cases it works properly.
Please, help me.
I entered your code and compiled it on my x64 running fedora 20 using the as and ld 32 bit options to assemble and link it, and it ran perfectly, placing 0x18 into the %eax reg after syscall. If you solved the problem I would like to know what caused it and how you fixed it.
cheers