Linker error using program counter on Apple Silicon - linker

Following the HelloSilicon tutorial on ARM assembly for Apple Silicon I decided to play around with the general purpose registers and with the program counter.
When feeding the source to the linker with any of the registrers ("R1", "R2"... ) or "PC" ("R15") I got the following linker error:
% ld -o cicle cicle.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64
Undefined symbols for architecture arm64:
"PC", referenced from:
_start in cicle.o
ld: symbol(s) not found for architecture arm64
The source is:
.global _start // Provide program starting address to linker
.align 2 // Make sure everything is aligned properly
// Setup the parameters to print hello world
// and then call the Kernel to do it.
_start:
label:
mov X0, #1 // 1 = StdOut
adr X1, helloworld // string to print
mov X2, #13 // length of our string
mov X16, #4 // Unix write system call
svc #0x80 // Call kernel to output the string
b PC //Expecting endless loop
//b PC-6 //Expecting endless "Hello World" loop
// Setup the parameters to exit the program
// and then call the kernel to do it.
mov X0, #0 // Use 0 return code
mov X16, #1 // System call number 1 terminates this program
svc #0x80 // Call kernel to terminate the program
helloworld: .ascii "Hello World!\n"
I understand that this problem can be solved when coding on XCode 12+ changing the Bundle to Mach-O on the build settings, however, appending the bundle argument -b sends a ld: warning: option -b is obsolete and being ignored warning.

When feeding the source to the linker with any of the registrers ("R1", "R2"... ) or "PC" ("R15") I got the following linker error:
These registers don't exist, you're reading the manual for the wrong architecture. You need ARMv8, not ARMv7.
As for this:
b PC //Expecting endless loop
//b PC-6 //Expecting endless "Hello World" loop
PC is not a thing. It's just an identifier like any other, such as _start. If you want to refer to the current instruction, use a dot (.):
b . // infinite loop
b .+8 // skip the next instruction
Note that any arithmetic is unscaled. If you want to jump 6 instructions back, you'll have to do b .-(4*6). If you attempted to compile b .-6, you'd get an error because b is only able to encode offsets aligned to 4 bytes.

Related

How to fix " Infinite Loop error on jumping to C code from bootloader"

I am actually trying to run C code to write my operating system kernel for studying how operating systems work. I am stuck on this infinite loop when the bootloader jumps to my C code. How should I prevent this error
Although my bootloader works correctly the problem comes when my bootloader jumps to the kernel code written in C as a.COM program. The main thing is that the dummy code just keeps on printing a character again and again although the code must run only once. It seems as if the main code is being called again and again. Here is the code for the startpoint.asm assembly header and bootmain.cpp file.
Here is the code for startpoint.asm which is used while linking at first so that the code can be invoked automatically. (Written in MASM )
Note: The code is loaded at the address 2000H:0000H.
;------------------------------------------------------------
.286 ; CPU type
;------------------------------------------------------------
.model TINY ; memory of model
;---------------------- EXTERNS -----------------------------
extrn _BootMain:near ; prototype of C func
;------------------------------------------------------------
;------------------------------------------------------------
.code
main:
jmp short start ; go to main
nop
;----------------------- CODE SEGMENT -----------------------
start:
cli
mov ax,cs ; Setup segment registers
mov ds,ax ; Make DS correct
mov es,ax ; Make ES correct
mov ss,ax ; Make SS correct
mov bp,2000h
mov sp,2000h ; Setup a stack
sti
; start the program
call _BootMain
ret
END main ; End of prog
Code for bootmain.cpp
extern "C" void BootMain()
{
__asm
{
mov ah,0EH
mov al,'G'
int 10H
}
return;
}
The compiling and linker commands are as follows:
Code to compile bootmain.cpp:
CL.EXE /AT /G2 /Gs /Gx /c /Zl bootmain.cpp
Code to compile startpoint.asm:
ML.EXE /AT /c startpoint.asm
Code to link them both (In preserved order):
LINK.EXE /T /NOD startPoint.obj bootmain.obj
Expected output:
G
Actual Output:
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
Take a closer look at the end of start.
start is never called -- it is jumped to directly, and it sets up the stack itself. When _BootMain returns, the stack is empty; the ret at the end of start will pop garbage data from above the end of the stack and attempt to jump to it. If that memory contains zeroes, program flow will return to main.
You need to set up something specific to happen after _BootMain returns. If you just want the system to hang after executing _BootMain, insert an infinite loop (e.g. jmp .) to the end of start instead of the erroneous ret.
Alternatively, consider having your bootloader set up the stack itself and call the COM executable. When that returns, the bootloader can take appropriate action.

Unable to find entry point _start GCC [duplicate]

I'm trying to compile and run following program without main() function in C. I have compiled my program using the following command.
gcc -nostartfiles nomain.c
And compiler gives warning
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400340
Ok, No problem. then, I have run executable file(a.out), both printf statements print successfully, and then get segmentation fault.
So, my question is, Why segmentation fault after successfully execute print statements?
my code:
#include <stdio.h>
void nomain()
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
}
output:
Hello World...
Successfully run without main...
Segmentation fault (core dumped)
Note:
Here, -nostartfiles gcc flag prevents the compiler from using standard startup files when linking
Let's have a look at the generated assembly of your program:
.LC0:
.string "Hello World..."
.LC1:
.string "Successfully run without main..."
nomain:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
Note the ret statement. Your program's entry point is determined to be nomain, all is fine with that. But once the function returns, it attempts to jump into an address on the call stack... that isn't populated. That's an illegal access and a segmentation fault follows.
A quick solution would be to call exit() at the end of your program (and assuming C11 we might as well mark the function as _Noreturn):
#include <stdio.h>
#include <stdlib.h>
_Noreturn void nomain(void)
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
exit(0);
}
In fact, now your function behaves pretty much like a regular main function, since after returning from main, the exit function is called with main's return value.
In C, when functions/subroutines are called the stack is populated as (in the order):
The arguments,
Return address,
Local variables, --> top of the stack
main() being the start point, ELF structures the program in such a way that whatever instructions comes first would get pushed first, in this case printfs are.
Now, program is sort of truncated without return-address OR __end__ and infact it assumes that whatever is there on the stack at that(__end__) location is the return-address, but unfortunately its not and hence it crashes.

Linux syscalls and errno

Context: I am trying to write a small C program with inline asm that should run under Linux on an x86_64 system and being compiled with gcc in order to better understand how syscalls work under Linux.
My question is: How are error numbers returned from a syscall (e.g. write) in this environment? I understand that when I use a library such as glibc, it takes care of saving the resulting error code in the global errno variable. But where is the error number stored when I call a syscall directly through inline assembler? Will it be stored inside a separate register, or will it be encoded in %rax?
Let take the write syscall on linux as an example:
When call write then after the syscall returns I find it stores 0xfffffffffffffff2 inside %rax, do I need to
somehow extract the error code from that?
If I have the error code number, where should I look to identify the actual error that occured? Lets say I get the number 5 returned, which header file do I need to consult to find the corresponding symbolic error name.
I am calling the write syscall like this:
asm ("mov $1,%%rax;"
"mov $1,%%rdi;"
"mov %1,%%rsi;"
"mov %2,%%rdx;"
"syscall;"
"mov %%rax,%0;"
: "=r" (result)
: "r" (msg), "r" (len)
: "%rdx", "%rsi", "%rax", "%rdi" /* EDIT: this is needed or else the registers will be overwritten */
);
with result, msg and len defined like so:
long result = 0;
char* msg = "Hello World\n";
long len = 12;
The Linux syscall's convention is that they encode both the possible error code and the return value for successful call in the return value. It's just glibc or other C libraries's wrappers that they will set errno to the error code returned by the underlying syscall, and the wrapper will return -1. Taking the write as an example, the kernel does the error processing similar to this:
ssize_t write(int fd, ...) {
if (fd is not valid)
return -EBADF;
return do_write(...);
}
So as you can see, the error code is just in the return value, and depending on the semantics, there is always a way to check if the syscall succeeded or not by comparing it to a value not possible for successful operation. For most syscalls, like write, that means check if it is negative.
Architecture Calling Conventions
As you already guessed, you can't use errno because it's GLibC specific. The information you want will be in rax if it's a x86_64. The man page man 2 syscall has the following explanation:
Architecture calling conventions
Every architecture has its own way of invoking and passing arguments
to the kernel. The details for various architectures are listed in
the two tables below.
The first table lists the instruction used to transition to kernel
mode (which might not be the fastest or best way to transition to the
kernel, so you might have to refer to vdso(7)), the register used to
indicate the system call number, the register used to return the
system call result, and the register used to signal an error.
arch/ABI instruction syscall # retval error Notes
────────────────────────────────────────────────────────────────────
alpha callsys v0 a0 a3 [1]
arc trap0 r8 r0 -
arm/OABI swi NR - a1 - [2]
arm/EABI swi 0x0 r7 r0 -
arm64 svc #0 x8 x0 -
blackfin excpt 0x0 P0 R0 -
i386 int $0x80 eax eax -
ia64 break 0x100000 r15 r8 r10 [1]
m68k trap #0 d0 d0 -
microblaze brki r14,8 r12 r3 -
mips syscall v0 v0 a3 [1]
nios2 trap r2 r2 r7
parisc ble 0x100(%sr2, %r0) r20 r28 -
powerpc sc r0 r3 r0 [1]
s390 svc 0 r1 r2 - [3]
s390x svc 0 r1 r2 - [3]
superh trap #0x17 r3 r0 - [4]
sparc/32 t 0x10 g1 o0 psr/csr [1]
sparc/64 t 0x6d g1 o0 psr/csr [1]
tile swint1 R10 R00 R01 [1]
x86_64 syscall rax rax - [5]
x32 syscall rax rax - [5]
xtensa syscall a2 a2 -
And note number [5]:
[5] The x32 ABI uses the same instruction as the x86_64 ABI and
is used on the same processors. To differentiate between
them, the bit mask __X32_SYSCALL_BIT is bitwise-ORed into the
system call number for system calls under the x32 ABI. Both
system call tables are available though, so setting the bit
is not a hard requirement.
(In that man page, a table showing how to pass arguments to system calls follows. It's an interesting read.)
How are error numbers returned from a syscall (e.g. write) in this environment?:
You gotta check your rax register for the return value.
On Linux, a failed system call using the syscall assembly instruction will return the value -errno in the rax register. So in your case 0-0xfffffffffffffff2 == 0xE which is 14. So your errno is 14.
How do you find what errno 14 means? You should google search "Linux error code table" or look in errno.h and you'll find the answer.
Take a look here:
http://www.virtsync.com/c-error-codes-include-errno
According to that table, 14 is EFAULT which means "Bad address".
IIRC in the x86-64 ABI, an error is transmitted from the syscall with the carry bit set. Then eax contains the errno code.
I would suggest to study the lower layers of the source code of some libc library, like musl-libc

Running linked C object files(compiled from C and x86-64 assembly) on Cygwin shell, no output?

Trying to deal with my assignment of Assembly Language...
There are two files, hello.c and world.asm, the professor ask us to compile the two file using gcc and nasm and link the object code together.
I can do it under 64 bit ubuntu 12.10 well, with native gcc and nasm.
But when I try same thing on 64 bit Win8 via cygwin 1.7 (first I try to use gcc but somehow the -m64 option doesn't work, and since the professor ask us to generate the code in 64-bit, I googled and found a package called mingw-w64 which has a compiler x86_64-w64-mingw32-gcc that I can use -m64 with), I can get the files compiled to mainhello.o and world.o and link them to a main.out file, but somehow when I type " ./main.out" and wait for the "Hello world", nothing happens, no output no error message.
New user thus can't post image, sorry about that, here is the screenshot of what happens in the Cygwin shell:
I'm just a newbie to everything, I know I can do the assignment under ubuntu, but I'm just being curious about what's going on here?
Thank you guys
hello.c
//Purpose: Demonstrate outputting integer data using the format specifiers of C.
//
//Compile this source file: gcc -c -Wall -m64 -o mainhello.o hello.c
//Link this object file with all other object files:
//gcc -m64 -o main.out mainhello.o world.o
//Execute in 64-bit protected mode: ./main.out
//
#include <stdio.h>
#include <stdint.h> //For C99 compatability
extern unsigned long int sayhello();
int main(int argc, char* argv[])
{unsigned long int result = -999;
printf("%s\n\n","The main C program will now call the X86-64 subprogram.");
result = sayhello();
printf("%s\n","The subprogram has returned control to main.");
printf("%s%lu\n","The return code is ",result);
printf("%s\n","Bye");
return result;
}
world.asm
;Purpose: Output the famous Hello World message.
;Assemble: nasm -f elf64 -l world.lis -o world.o world.asm
;===== Begin code area
extern printf ;This function will be linked into the executable by the linker
global sayhello
segment .data ;Place initialized data in this segment
welcome db "Hello World", 10, 0
specifierforstringdata db "%s", 10,
segment .bss
segment .text
sayhello:
;Output the famous message
mov qword rax, 0
mov rdi, specifierforstringdata
mov rsi, welcome
call printf
;Prepare to exit from this function
mov qword rax, 0
ret;
;===== End of function sayhello
;Purpose: Output the famous Hello World message.
;Assemble: nasm -f win64 -o world.o world.asm
;===== Begin code area
extern _printf ;This function will be linked into the executable by the linker
global _sayhello
segment .data ;Place initialized data in this segment
welcome db "Hello World", 0
specifierforstringdata db "%s", 10, 0
segment .text
_sayhello:
;Output the famous message
sub rsp, 40 ; shadow space and stack alignment
mov rcx, specifierforstringdata
mov rdx, welcome
call _printf
add rsp, 40 ; clean up stack
;Prepare to exit from this function
mov qword rax, 0
ret
;===== End of function sayhello
When linked together with the C wrapper in the question and run in plain cmd window it works fine:
Depending on your toolchain you might need to remove the leading underscores from symbols.

Undefined reference to multiply

I'm trying to call C function in assembler. This is my code:
C:
int multiply(int what)
{
return what * 2;
}
ASM:
extern multiply
start:
mov eax, 10
push eax
call multiply
jmp $
;empty
times 510-($-$$) db 0
dw 0xAA55
I'm compiling C code to elf by gcc (MinGW) and ASM code by NASM. I'm compiling it without any problems, but when I'm trying to use this code(for creating .bin file):
gcc -o test.bin work.o test.o
I' getting this error:
Does anybody know how to call C function from ASM code, compile it and link it to working .bin file? Please help.
Try to add '_' to multiply:
extern _multiply
Works for me in this simple example:
global _main
extern _printf
section .data
text db "291 is the best!", 10, 0
strformat db "%s", 0
section .code
_main
push dword text
push dword strformat
call _printf
add esp, 8
ret
Try "global multiply" instead of "extern multiply" in your .asm file. You shouldn't need the underscore for ELF (I don't think), but you can get Nasm to automagically add an underscore to anything "extern" or "global" by adding "--prefix _" to Nasm's command line.
Edit: I take that back, "extern" is correct. You seem not to have a "main". Try adding "--nostartfiles"
(may be only one hyphen) to gcc's command line.
Best,
Frank

Resources