Make a small mach-o executable with C

Make a small mach-o executable with C - c

For curiosity's sake, I'm trying to see what's the smallest that I can make a C program with a minimum of assembly language. I want to see if I can make a simple OpenGL demo (i.e. demo scene) using OpenGL and GLUT linked dynamically, without the standard library. However, I'm running into trouble with the most basic stuff.
I've created a test main.c file that contains
void newStart() {
//Do stuff here...
asm("movl $1, %eax;"
"xorl %ebx, %ebx;"
"int $0x80;");
}
and I'm making it with
gcc main.c -nostdlib -e newStart -o min
using the '-e' option as recommended by this StackOverflow question. I get the following error when I try to compile it:
ld: warning: symbol dyld_stub_binder not found, normally in libSystem.dylib
ld: entry point (newStart) undefined. for architecture x86_64
I'm running OS X 10.7 (Lion). Can anyone help me out?

For newStart(), the corresponding symbol is _newStart. You should use that for the -e option:
gcc main.c -nostdlib -e _newStart -o min
See this Stack Overflow question about why underscores are prepended to (extern) function names: Why do C compilers prepend underscores to external names?

Related

OCaml: Issues linking C and OCaml

I am able to wrap C code and access it from the OCaml interpreter, but cannot build a binary! I'm 98% sure it is some linking problem, but can't find the tools to explore the linkage.
Getting even to this point was a chore, (endless quantities of Error: The external function is not available messages) so I'll document everything I did.
A 'system' file stuff.c
#include <stdio.h>
int fun(int z) // Emulate a "real" subroutine
{
printf("duuude whoa z=%d\n", z);
return 42;
}
Compile above as
cc -fPIC -c stuff.c
ld -shared -o libstuff.so stuff.o
An OCaml wrapper around above, in ocmstuff.c:
#include <caml/mlvalues.h>
CAMLprim value yofun(value z) {
return Val_long(fun(Long_val(z)));
}
Build above as
cc -fPIC -c ocmstuff.c
ld -shared -o dllostuff.so ocmstuff.o -L . -lstuff -lc -rpath .
Yes, the rpath really is needed, else the next steps suffer. (Edit: If you don't use rpath, you'll need to use LD_LIBRARY_PATH=. instead. For the final 'production' version, you'd change the rpath to the actual library path, or do ld.so.conf trickery or install into 'standard' locations, or tell your users about LD_LIBRARY_PATH. This is just like what you'd do for any other system. The rpath solution seems to be the most stable and reliable solution.)
Next, a module declaration, stored in fapi.mli
module Fapi : sig
external ofun : int -> int = "yofun" ;;
end
Build above as:
ocamlc -a -o fapi.cma -intf fapi.mli -dllib -lostuff
Does it work? Yes it does:
$ rlwrap ocaml fapi.cma
OCaml version 4.11.1
open Fapi ;;
Fapi.ofun 33 ;;
duuude whoa z=33
- : int = 42
#
So the wrapper works fine. Now lets compile with it. Here's myprog.ml:
open Fapi ;;
Fapi.ofun 33 ;;
Compile it:
ocamlc -c myprog.ml
ocamlc -o myprog myprog.cmo fapi.cma
The very last command spews:
File "_none_", line 1:
Error: Required module `Fapi' is unavailable
I am 98% sure the above error is due to some silly linking error, but I cannot track it down. Why do I think this? Well, here's a related problem that provides a hint.
$ rlwrap ocaml
open Fapi ;;
# Fapi.ofun 33 ;;
Error: The external function `yofun' is not available
#
Well, that's odd. It clearly must have found fapi.cma because that is the only way it can know about yofun. But somehow, it doesn't know it needs to dig into dllostuff.so for that. Or possibly dllostuff.so is failing to correctly link/load libstuff.so ? Or maybe libc.so to get printf ? I'm pretty sure its one of these last few, but I just can't get it to work, and don't have the tools to debug it. (nm and ldd -r look healthy. Are there some similar tools for the assorted cma,cmo,cmi,cmx files?)

Interfacing with C is much easier if you use dune. You don't need to know the low-level details it is all handled for you.
Now, back to your example. This is definitely not how OCaml users are interfacing with C, but if you really want to learn about it here are a few notes.
The reason why you have the error is that:
you specified modules in an incorrect order, it should be topological, not reverse topological order, i.e., the dependency comes before dependent
you do not have the .ml file (the -intf option means very different)
The reason why the last snippet doesn't work is because you're not loading the library. The ocaml binary obviously doesn't have any fapi units linked into it, so you have to explicitly load it using either #load directive or by passing it in the command line.
Also the following line is not necessary,
ld -shared -o dllostuff.so ocstuff.o -L . -lstuff -lc -rpath .
First of all, there is no need to link a stub file into a shared library. It is counterproductive and doesn't really bring you anything. Second, passing -rpath . will render the end executable unusable, unless the shared objects are stored in the same folder as the executable. Just remove this.
Just to complete your exercise, here is how it could be built and run. First, let's fix the stub file. We need the ml file and we also need to remove an extra module definition,
$ cat fapi.{ml,mli}
external ofun : int -> int = "yofun" ;;
external ofun : int -> int = "yofun" ;;
Yes, they are the same. The mli file is not really needed here, but let's keep it for the sake of completeness.
The way how you build the pure C part is fine, as long as you get a relocatable .so file it works.
Now to build the ocstuff.c (which we conventionally call stubs) you just need to do,
ocamlc -c ocstuff.c
Don't turn it into a shared library, don't do anything else with it. Now let's build the fapi library,
ocamlc -c fapi.mli
ocamlc -c fapi.ml
Now let's build the library that contains both OCaml and C code,
ocamlmklib -o fapi fapi.cmo ocstuff.o -lstuff -L.
Now we can finally build the executable,
ocamlc -c myprog.ml
LD_LIBRARY_PATH=. ocamlc -o myprog fapi.cma myprog.cmo
and run it,
LD_LIBRARY_PATH=. ./myprog
duuude whoa z=33
Notice that we have to use the LD_LIBRARY_PATH to tell the system dynamic loader where to look for the external dependency libstuff.so. You can, of course, use rpath to specify its location (pass it to ocamlmklib via -ccopt) but in general it is assumed that the external library is installed at some location that the system loader knows.
Again, unless you're developing your own build system, please use dune or oasis for building OCaml programs. These systems will handle all low-level details in the best possible way.
P.S. It is also worth mentioning that you're not building a binary, but a bytecode executable. For binaries, you will have to use the ocamlopt compiler. And this would be a completely different story. Again, dune is the solution.

There is a lot to take in here, but these lines are suspicious:
ocamlc -c myprog.ml
ocamlc -o myprog myprog.cmo fapi.cma
OCaml expects modules in topologically sorted order, with a module appearing on the command line before the modules that refer to it.
So it would seem the last line should be this:
ocamlc -o myprog fapi.cma myprog.cmo
I hope this helps, it's just a quick response.

The answer provided by ivg works. It also provides enough hints to retrofit the original question to get the correct behavior. The changes to the original recipe are:
Create fapi.mli and fapi.ml which both have the same content: external ofun : int -> int = "yofun" ;;
Compile both the above with ocaml -c. The mli must be compiled first: it yields an interface file cmi which is needed before the ml file can be compiled into it's object file cmo.
The name dllostuff.so was wrong: it must be dllfapi.so to maintain naming consistency.
Build the cma archive/library as ocamlc -a -o fapi.cma fapi.cmo -dllib -lfapi
That's it! Other than these, the original instructions work. The answer from ivg suggests using
ocamlmklib -o fapi fapi.cmo ostuff.o -L. -lstuff
instead of
ld -shared -o dllfapi.so ostuff.o -L. -lstuff
Either of these work. The primary difference is that ocamlmklib also creates a static-linked library libfapi.a. Other than that, it creates the dllfapi.so as before. (That version also contains a motley assortment of typical gcc symbols, for handling exceptions, library ctors, etc. It's not clear why these are needed here, since they'll show up sooner or later anyway.)

Why is clang removing an underscore from a function declared as 'extern "C"'?

I'm watching a video in an attempt to better understand object files. The presenter uses the following as an example of a program that produces a very simple object file:
extern "C" void _start() {
asm("mov $60, %eax\n"
"mov $24567837, %edi\n"
"syscall\n");
}
The program is compiled via
clang++ -c step0.cpp -O1 -o step0.o
and linked via
ld -static step0.o -o step0
I get this error message when trying to link:
Undefined symbols for architecture x86_64:
"start", referenced from:
-u command line option
(maybe you meant: __start)
ld: symbol(s) not found for inferred architecture x86_64
I don't pass the -u command line option, so I'm not sure why I'm getting that error message.

clang isn't removing an underscore, it's adding an underscore. Your program is actually exporting a __start symbol, but ld expects you to have a start symbol for your entry point, i.e. ld runs with -u start by default for your architecture.
You could disable this check in ld with -U start (which suppresses the error from the start symbol being undefined) or via -undefined suppress (which suppresses all undefined symbol errors). However, you will end up with an executable that does not have an entry point for your architecture, so the program won't actually work.
Instead of suppressing the error, I suggest controlling the symbol that clang chooses directly. You can tell clang what symbol to generate by using a standalone asm declaration:
void _start() asm ("start");
Make sure this standalone declaration is separate from the function definition.
You can read more about controlling the symbols generated by gcc here: https://stackoverflow.com/a/1035937/12928775
Also, as was pointed out in a comment to a similar answer, you will most likely want to use __attribute__((naked)) on the function definition to prevent clang from generating a stack frame on entry. See: https://stackoverflow.com/a/60311490/12928775

relocation R_X86_64_32S against `.text' can not be used when making a shared object

I am compiling a static library, which leverages some inline assembly code.
I notice that when I use labels for the jmp instruction:
int foo(){
asm volatile
(
"mov 0x60(%r8),%r11d\n\t"
"jmp *S_401a70\n\t"
...
"S_401a70: xor %rax, %rax\n\t"
...
)
}
and compile the code into a static library with the following flags:
-Wl,--no-undefined -nostdlib -nodefaultlibs -nostartfiles -L$(SOME_LIBRARY_PATH) \
-Wl,--whole-archive -l$(SOME_Library_Name) -Wl,--no-whole-archive \
-Wl,-Bstatic -Wl,-Bsymbolic -Wl,--no-undefined \
-Wl,-pie,-eenclave_entry -Wl,--export-dynamic \
-Wl,--defsym,__ImageBase=0
I would get some errors like:
/usr/bin/ld: Enclave/libtest.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
However, since I am compiling into a static library, I don't think -fPIC would make sense. I tried so, but it doesn't work at all.
This seems like an issue with the gcc assembly extension, but I am not sure. Could anyone shed some lights on this? Thank you!

It is not a tool issue. First of all -fPIC affects only C code. And affects it in such way that generated code won't contain absolute addresses of referred data/code and won't rely on its own address in memory (it is a somewhat simplified explanation). Next - it has nothing to do with assembly inlines. Since here code was generated by programmer. And if it is written in a way that introduces absolute addresses or some stuff that introduces dependency on its memory location - compiler can't help with it.
P.S. You may built static library even with position-dependent code but it won't be accepted by linker if someone will try to link it into shared library, since resulting shared library should be position-independent.

Trying to understand the main function with GCC and Windows

They say that main() is a function like any other function, but "marked" as an entry point inside the binary, an entry point that the operating system may find (Don't know how) and start the program from there. So, I'm trying to find out more about this function. What have I done? I created a simple .C file with this code inside:
int main(int argc, char **argv) {
return (0);
}
I saved the file, installed the GCC compiler (in Windows, MingW environment) and created a batch file like this:
gcc -c test.c -nostartfiles -nodefaultlibs -nostdlib -nostdinc -o test.o
gcc -o test.exe -nostartfiles -nodefaultlibs -nostdlib -nostdinc -s -O2 test.o
#%comspec%
I did this to obtain a very simplistic compiler and linker, no library, no header, just the compiler. So, the compiling goes well but the linking stops with this error:
test.c:(.text+0xa): undefined reference to '___main'
collect2.exe: error: Id returned 1 exit status
I thought that the main function is exported by the linker but I believed that you didn't need any library with additional information about it. But it looks like it does. In my case I supposed that it must be the standard GCC library, so I downloaded the source code of it and opened this file: libgcc2.c
Now, I don't know if that is the file where the main function is constructed to be linked by GCC. In fact, I don't understand how the main function is used by GCC. Why does the linker need the gcc standard libraries? To know what about main? I hope this has made my question quite specific and clear. Thanks!

When gcc puts together all object files (test.o) and libraries to form a binary it also prepends a small object (usually crt0.o or crt1.o), which is responsible for calling your main(). You can see what gcc is doing, when you add -v on the command line:
$ gcc -v -o test.exe test.o
crt0/crt1 does some setup and then calls into main. But the linker is finally responsible for building the executable according to the OS. With -v you can see also an option for the target system. In my case it's for Linux 64 bit: -m elf_x86_64. For your system this will be something like -m windows or -m mingw.

The error happens because you use these two options: -nodefaultlibs -nostdlib
These tell GCC that it should not link your code against libc.a/c.lib which contains the code which really calls main(). In a nutshell, every OS is slightly different and most of them don't care about C and main(). Each has their own special way to start a process and most of them are not compatible with the C API.
So the solution of the C developers was to put "glue code" into the C standard library libc.a which contains the interface which the OS expects, creates the standard C environment (setting up the memory allocation structures so malloc() will map the OS's memory management functions, set up stdio, etc) and eventually calls main()
For C developers, this means they get a libc.a for their OS (along with the compiler binaries) and they don't need to care about how the setup works.
Another source of confusion is the name of the reference. On most systems, the symbolic name of main() is _main (i.e. one underscore) while __main is the name of an internal function called by the setup code which eventually calls the real main()

Linking a C program directly with ld fails with undefined reference to `__libc_csu_fini`

I'm trying to compile a C program under Linux. However, out of curiosity, I'm trying to execute some steps by hand: I use:
the gcc frontend to produce assembler code
then run the GNU assembler to get an object file
and then link it with the C runtime to get a working executable.
Now I'm stuck with the linking part.
The program is a very basic "Hello world":
#include <stdio.h>
int main() {
printf("Hello\n");
return 0;
}
I use the following command to produce the assembly code:
gcc hello.c -S -masm=intel
I'm telling gcc to quit after compiling and dump the assembly code with Intel syntax.
Then I use th GNU assembler to produce the object file:
as -o hello.o hello.s
Then I try using ld to produce the final executable:
ld hello.o /usr/lib/libc.so /usr/lib/crt1.o -o hello
But I keep getting the following error message:
/usr/lib/crt1.o: In function `_start':
(.text+0xc): undefined reference to `__libc_csu_fini'
/usr/lib/crt1.o: In function `_start':
(.text+0x11): undefined reference to `__libc_csu_init'
The symbols __libc_csu_fini/init seem to be a part of glibc, but I can't find them anywhere! I tried linking against libc statically (against /usr/lib/libc.a) with the same result.
What could the problem be?

/usr/lib/libc.so is a linker script which tells the linker to pull in the shared library /lib/libc.so.6, and a non-shared portion, /usr/lib/libc_nonshared.a.
__libc_csu_init and __libc_csu_fini come from /usr/lib/libc_nonshared.a. They're not being found because references to symbols in non-shared libraries need to appear before the archive that defines them on the linker line. In your case, /usr/lib/crt1.o (which references them) appears after /usr/lib/libc.so (which pulls them in), so it doesn't work.
Fixing the order on the link line will get you a bit further, but then you'll probably get a new problem, where __libc_csu_init and __libc_csu_fini (which are now found) can't find _init and _fini. In order to call C library functions, you should also link /usr/lib/crti.o (after crt1.o but before the C library) and /usr/lib/crtn.o (after the C library), which contain initialisation and finalisation code.
Adding those should give you a successfully linked executable. It still won't work, because it uses the dynamically linked C library without specifying what the dynamic linker is. You'll need to tell the linker that as well, with something like -dynamic-linker /lib/ld-linux.so.2 (for 32-bit x86 at least; the name of the standard dynamic linker varies across platforms).
If you do all that (essentially as per Rob's answer), you'll get something that works in simple cases. But you may come across further problems with more complex code, as GCC provides some of its own library routines which may be needed if your code uses certain features. These will be buried somewhere deep inside the GCC installation directories...
You can see what gcc is doing by running it with either the -v option (which will show you the commands it invokes as it runs), or the -### option (which just prints the commands it would run, with all of the arguments quotes, but doesn't actually run anything). The output will be confusing unless you know that it usually invokes ld indirectly via one of its own components, collect2 (which is used to glue in C++ constructor calls at the right point).

I found another post which contained a clue: -dynamic-linker /lib/ld-linux.so.2.
Try this:
$ gcc hello.c -S -masm=intel
$ as -o hello.o hello.s
$ ld -o hello -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o hello.o -lc /usr/lib/crtn.o
$ ./hello
hello, world
$

Assuming that a normal invocation of gcc -o hello hello.c produces a working build, run this command:
gcc --verbose -o hello hello.c
and gcc will tell you how it's linking things. That should give you a good idea of everything that you might need to account for in your link step.

In Ubuntu 14.04 (GCC 4.8), the minimal linking command is:
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
/usr/lib/x86_64-linux-gnu/crt1.o \
/usr/lib/x86_64-linux-gnu/crti.o \
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/ \
-lc -lgcc -lgcc_s \
hello.o \
/usr/lib/x86_64-linux-gnu/crtn.o
Although they may not be necessary, you should also link to -lgcc and -lgcc_s, since GCC may emit calls to functions present in those libraries for operations which your hardware does not implement natively, e.g. long long int operations on 32-bit. See also: Do I really need libgcc?
I had to add:
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/ \
because the default linker script does not include that directory, and that is where libgcc.a was located.
As mentioned by Michael Burr, you can find the paths with gcc -v. More precisely, you need:
gcc -v hello_world.c |& grep 'collect2' | tr ' ' '\n'

This is how I fixed it on ubuntu 11.10:
apt-get remove libc-dev
Say yes to remove all the packages but copy the list to reinstall after.
apt-get install libc-dev

If you're running a 64-bit OS, your glibc(-devel) may be broken. By looking at this and this you can find these 3 possible solutions:
add lib64 to LD_LIBRARY_PATH
use lc_noshared
reinstall glibc-devel

Since you are doing the link process by hand, you are forgetting to link the C run time initializer, or whatever it is called.
To not get into the specifics of where and what you should link for you platform, after getting your intel asm file, use gcc to generate (compile and link) your executable.
simply doing gcc hello.c -o hello should work.

Take it:
$ echo 'main(){puts("ok");}' > hello.c
$ gcc -c hello.c -o hello.o
$ ld hello.o -o hello.exe /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o \
-dynamic-linker /lib/ld-linux.so.2 -lc
$ ./hello.exe
ok
Path to /usr/lib/crt*.o will when glibc configured with --prefix=/usr

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight