Assembly error when compiling with GCC - c

I'm getting "no such instruction" errors when compiling a .s file with this command:
$ gcc -s -o scall scall.s
scall.s: Assembler messages:
scall.s:2: Error: no such instruction: `section '
scall.s:4: Error: no such instruction: `global _start'
scall.s:7: Error: unsupported instruction `mov'
scall.s:8: Error: unsupported instruction `mov'
scall.s:11: Error: operand size mismatch for `int'
scall.s:13: Error: no such instruction: `section .data'
scall.s:15: Error: no such instruction: `msglength .word 12'
Here is the code of the file:
section .text
global _start
_start:
mov 4,%eax
mov 1,%ebx
mov $message,%ecx
mov $msglength,%edx
int $0x80
section .data
message: .ascii "Hello world!"
msglength .word 12
How can I get rid of the errors?

I think the following code will compile ("gcc" can compile .s and .S files and link them with C library by default but "as" do the same and don't link code with C library)
as :
.section .text
.global _start
_start:
mov $4,%eax
mov $1,%ebx
mov $message,%ecx
mov msglength,%edx
int $0x80
mov $1, %eax
mov $0, %ebx
int $0x80
.section .data
message: .ascii "Hello world!"
msglength: .word 12
gcc:
.section .text
.global main
main:
mov $4,%eax
mov $1,%ebx
mov $message,%ecx
mov msglength,%edx
int $0x80
mov $1, %eax
mov $0, %ebx
int $0x80
.section .data
message: .ascii "Hello world!"
msglength: .word 12

Correct it as follows and compile it with -c param gcc -c test.s -o test
.text
_start:
.global main
main:
mov 4,%eax
mov 1,%ebx
mov $message,%ecx
mov $msglength,%edx
int $0x80
.data
message: .ascii "Hello world!"
msglength: .word 12

Related

Default value of _start

Let's say I have the following assembly program:
.globl _start
_start:
mov $1, %eax
int $0x80
And I assemble/link it with:
$ as file.s
$ ld a.out -o a
This will run fine, and return the status code of 0 to linux. However, when I remove the line .globl start I get the following error:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078
What does 0000000000400078 mean? And also, if ld expects the _start symbol on entry, why is it even necessary to declare .globl _start ?
However, when I remove the line .globl _start ...
The .globl line means that the name _start is "visible" outside the file file.s. If you remove that line, the name _start is only for use inside the file file.s and in a larger program (containing multiple files) you could even use the name _start in multiple files.
(This is similar to static variables in C/C++: If you generate assembler code from C or C++, the difference between real global variables and static variables is that there is a .globl line for the global variables and no .globl line for static variables. And if you are familiar with C, you know that static variables cannot be used in other files.)
The linker (ld) is also not able to use the name _start if it can be used inside the file only.
What does 0000000000400078 mean?
Obviously 0x400078 is the address of the first byte of your program. ld assumes that the program starts at the first byte if no symbol named _start is found.
... why is it even necessary to declare .globl _start?
It is not guaranteed that _start is located at the first byte of your program.
Counterexample:
.globl _start
write_stdout:
mov $4, %eax
mov $1, %ebx
int $0x80
ret
exit:
mov $1, %eax
mov $0, %ebx
int $0x80
jmp exit
_start:
mov $text, %ecx
mov $(textend-text), %edx
call write_stdout
mov $text2, %ecx
mov $(textend2-text2), %edx
call write_stdout
call exit
text:
.ascii "Hello\n"
textend:
text2:
.ascii "World\n"
textend2:
If you remove the .globl line, ld will not be able to find the _start: line and assume that your program starts at the first byte - which is the write_stdout: line!
... and if you have multiple .s files in a larger program (or even a combination of .s, .c and .cc), you don't have control about which code is located at the first byte of your program!

Linking a compiled assembly and C file with ld

I have compiled these programs:
BITS 16
extern _main
start:
mov ax, 07C0h
add ax, 288
mov ss, ax
mov sp, 4096
mov ax, 07C0h
mov ds, ax
mov si, text_string
call print_string
jmp $
text_string db 'Calling Main Script'
call _main
print_string:
mov ah, 0Eh
.repeat:
lodsb
cmp al, 0
je .done
int 10h
jmp .repeat
.done:
ret
times 510-($-$$) db 0
dw 0xAA55
and this as a test just to try linking them
int main()
{
return 0;
}
both compile completely fine on their own using:
gcc -Wall -m32 main.c
nasm -f elf bootloader.asm
however I cannot link them using:
ld bootloader.o main.o -lc -I /lib/Id-linux.so.2
and I get this error:
ld: i386 architecture of input file `bootloader.o' is incompatible with i386:x86-64 output
ld: i386 architecture of input file `main.o' is incompatible with i386:x86-64 output
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
ld: bootloader.o: file class ELFCLASS32 incompatible with ELFCLASS64
ld: final link failed: file in wrong format
Any help would be great thanks
GCC by default already dynamically linking with libc, so if you want linking manually using ld, be sure make your ELF executable static, you can passing with -static flag.
gcc -o <filename> <filename>.c -static -Wall -m32 then link with ld -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o <filename> -lc <filename>.o
I guess, since assembler like NASM has statically (stand-alone without libc) you can make ELF dynamic executable directly with libc, you can passing with -dynamic-linker flag.
For example :
x86
nasm -f elf32 -o <filename>.o <filename>.asm
ld -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o <filename> -lc <filename>.o
x86_64
nasm -f elf64 -o <filename>.o <filename>.asm
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o <filename> -lc <filename>.o
In case you just want to do some simple assembly programming on your PC, don't actually need 16bit code, and don't want to dive into bootloaders and OS development, you can get started much more easily by writing 32bit (IA32) or 64bit (AMD64) application code. Instead of BIOS interrupts, you'd use (Linux) system calls.
An example "hello world" for i386 would be:
.section .text._start
.global _start
.type _start, %function
_start:
mov $4, %eax
mov $1, %ebx
mov $message, %ecx
mov $14, %edx
int $0x80
mov $1, %eax
xor %ebx, %ebx
int $0x80
.section .rodata.message
.type message, %object
message:
.ascii "Hello, World!\n"
Assemble, link and execute via
as --32 test32.S -o test32.o && ld -m elf_i386 test32.o -o test32 && ./test32
The same thing for AMD64:
.section .text._start
.global _start
.type _start, %function
_start:
mov $1, %rax
mov $1, %rdi
mov $message, %rsi
mov $14, %rdx
syscall
mov $0x3c, %rax
xor %rdi, %rdi
syscall
.section .rodata.message
.type message, %object
message:
.ascii "Hello, World!\n"
Assemble, link and execute via
as --64 test64.S -o test64.o && ld -m elf_x86_64 test64.o -o test64 && ./test64
Just for fun, the same thing for ARM (32bit):
.syntax unified
.arch armv6
.arm
.section .text._start
.global _start
.type _start, %function
_start:
movs r7, #4
movs r0, #1
ldr r1, =#message
movs r2, #14
svc #0
movs r7, #1
movs r0, #0
svc #0
.ltorg
.section .rodata.message
.type message, %object
message:
.ascii "Hello, World!\n"
Assemble, link and execute via (e.g. on a Raspberry PI or Beaglebone):
as testarm.S -o testarm.o && ld testarm.o -o testarm && ./testarm

Non-used Reservated Stack in Intel x86 Assembly

I am in the beginning of learning intel's x86 assembly code and compiled this simple "hello world" c program (without the cfi additions for simplicity):
#include
int main(int argc, char* argv[]) {
printf("hello world!");
return 0;
}
The following x86 code came out:
.file "helloworld.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "hello world!"
.text
.globl main
.type main, #function
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR -4[rbp], edi
mov QWORD PTR -16[rbp], rsi
lea rdi, .LC0[rip]
mov eax, 0
call printf#PLT
mov eax, 0
leave
ret
.size main, .-main
.ident "GCC: (Debian 7.2.0-19) 7.2.0"
.section .note.GNU-stack,"",#progbits
The question: Why are those 16 bytes for local variables reserved on the stack but aren't used in any way? The program even does the same, without those lines, so for which reason were they created?

Understanding GCC inline assembly with a Hello World program

A friend helped me come up with the following code to use inline assembly in GCC on a 64-bit Windows machine:
int main() {
char* str = "Hello World";
int ret;
asm volatile(
"call puts"
: "=a" (ret), "+c" (str)
:
: "rdx", "rdi", "rsi", "r8", "r9", "r10", "r11");
return 0;
}
After compiling with -S -masm=intel (I prefer Intel syntax), I get this assembly code:
.file "hello.c"
.intel_syntax noprefix
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Hello World\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
push rdi
.seh_pushreg rdi
push rsi
.seh_pushreg rsi
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
lea rax, .LC0[rip]
mov QWORD PTR -8[rbp], rax
mov rax, QWORD PTR -8[rbp]
mov rcx, rax
/APP
# 7 "hello.c" 1
call puts
# 0 "" 2
/NO_APP
mov DWORD PTR -12[rbp], eax
mov QWORD PTR -8[rbp], rcx
mov eax, 0
add rsp, 48
pop rsi
pop rdi
pop rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 4.9.2"
It works, but it sure looks messy with what appears to be superfluous code. Then again, my last experience with assembly was with the 65816 back in the 80s, and it wasn't inline. Anyway, I cleaned up the code, and the following accomplishes the exact same thing, as far as I can tell:
.intel_syntax noprefix
.data:
.ascii "Hello World\0"
.text
.globl main
main:
sub rsp, 48
lea rax, .data[rip]
mov rcx, rax
call puts
mov eax, 0
add rsp, 48
ret
Much simpler. What's all that extra stuff GCC added?
Edit: Not a duplicate because in addition to the structured exception handling, I'm also asking about the callee-saved registers, the call to __main, the explicit size directives, and the APP/NO_APP section.

How to use lea instruction in a subroutine using GAS

I'm trying to convert a NASM code to GAS. I can't make the lea instruction work.
Here's my original code and this completely works:
section .bss
arr resb 10
section .text
global _start:
_start:
push arr
call getInput
...
getInput:
mov esi, 0
mov ebp, [esp+4]
loop:
...
mov eax, 3
mov ebx, 0
lea ecx, [ebp+esi]
mov edx, 2
int 80h
...
And here's the GAS counterpart I'm trying to write:
.data
arr: .space 10
.text
.globl _start
_start:
push arr
call getInput
...
getInput:
movl $0, %esi
movl 4(%esp,1), %ebp
loop:
...
movl $3, %eax
movl $0, %ebx
leal (%ebp,%esi), %ecx
movl $2, %edx
int $0x80
I've been searching for hours on how to properly do it but I can't find a tutorial on this matter. It produces a segmentation fault when I run it. Please help me.
P.S. I use these commands to compile and link (thanks to someone here who answered my previous question):
as --32 -o sample.o sample.s
ld -m elf_i386 -o sample sample.o

Resources