How to compile glibc for use without an operating system - linker

I would like to compile the functions of glibc to an object file which will then be linked to a program which I am running on a computer without any operating system. Some functions, such as open, I want to just fail with ENOSYS. Other functions I will write myself, such as putchar, and then have glibc use those functions in it's own (like printf). I also want to use functions that don't need a file system or process management system or anything like that, such as strlen. How can I do this?

Most C libraries rely on the kernel heavily, so it's not reasonable to port 'em. But since most of it doesn't need to be implemented, you can get away easily with a few prototypes for stubs and gcc builtins.
You can implement stubs easily using weak symbols:
#define STUB __attribute__((weak, alias("__stub"))) int
#define STUB_PTR __attribute__((weak, alias("__stub0"))) void *
int __stub();
void *__stub0();
Then defining the prototypes becomes trivial:
STUB read(int, void*, int);
STUB printf(const char *, ...);
STUB_PTR mmap(void*, int, int, int, int, int);
And the actual functions could be:
int __stub()
{
errno = ENOSYS;
return -1;
}
void *__stub0()
{
errno = ENOSYS;
return NULL;
}
If you need some non-trivial function, like printf, take it from uClibc instead of glibc (or some other smaller implementation).

Related

How to call a user-defined syscall as a function

Lets say I have a user-defined syscall: foo (with code number 500).
To call it, I simply write in a C file:
syscall(SYS_code, args);
How can I call it using just foo(args)?
You cannot. Not unless you first convince the kernel developers that your syscall is worth being added, then it gets added, and then userspace libraries such as the standard C library (glibc on most Linux distributions) decide to implement a wrapper for it like they do for most of the syscalls.
In other words: since the above is impossible, all you can do is define the wrapper function yourself in your own program.
#define SYS_foo 500
long foo(int a, char *b) {
return syscall(SYS_foo, a, b);
}

Does the standard "Function Calling Sequence" described in Sys V ABI specs (both i386 and AMD64) apply to the static C functions?

In computer software, an application binary interface (ABI) is an interface between two binary program modules; often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
An ABI defines how data structures or computational routines are accessed in machine code, which is a low-level, hardware-dependent format; in contrast, an API defines this access in source code, which is a relatively high-level, relatively hardware-independent, often human-readable format. A common aspect of an ABI is the calling convention, which determines how data is provided as input to or read as output from computational routines; examples are the x86 calling conventions.
-- https://en.wikipedia.org/wiki/Application_binary_interface
I am sure that the standard "Function Calling Sequence" described in Sys V ABI specs (both i386 and AMD64) constraints the calling of those extern functions in a C library, but does it constraints the calling of those static functions too?
Here is an example:
$cat abi.c
#include<stdio.h>
typedef void (*ret_function_t)(int,int);
ret_function_t gl_fp = NULL;
static void prnt(int i, int j){
printf("hi from static prnt:%d:%d\n", i, j);
}
void api_1(int i){
gl_fp = prnt;
printf("hi from extern api_1:%d\n", i);
}
ret_function_t api_2(void){
return gl_fp;
}
$cat abi_main.c
#include<stdio.h>
typedef void (*ret_function_t)(int,int);
extern void api_1(int i);
extern ret_function_t api_2(void);
int main(){
api_1(1111);
api_2()(2222, 3333);
}
$gcc abi_main.c abi.c -o abi_test
$./abi_test
hi from extern api_1:1111
hi from static prnt:2222:3333
The function calling sequence (including registers usage, stack frame, parameters passing, variable arguments...) details are defined in the Sys V ABI when abi_main.c call the api_1 and api_2 since they are extern, but what about the calling of the static function prnt which been defined in abi.c? Does it belong to the ABI standard or to the compiler to decide?
Yes, they do apply. Static functions are just plain functions with traslation-unit visibility. The ABI is a compiler generation task, C standard deliberately says nothing about it. It becomes clear when removing the static word from your code. The reasoning is the same. The drawback with this approach is that compiler cannot check the linkage right (caller-callee), but only its type (void (*ret_function_t)(int,int);) at compile time, since you are the one who links at runtime. So, it is not recommended.
What happens is that your compiler will generate code for any calling function, following some ABI, lets call it ABI-a. And it will generate code for
a function being called according to some other ABI, lets say ABI-b. If ABI-a == ABI-b, that always work, and this is the case if you compile both files with the same ABI.
For example, this works if prnt function were located at address 0x12345678:
ret_function_t gl_fp = (ret_function_t)0x12345678;
It also works as long as there is a function with the right arguments at 0x12345678. As you can see, the function cannot be inlined because the compiler does not know which function definition will end up in that memory spot, there could be many.

Check if a system implements a function

I'm creating a cross-system application. It uses, for example, the function itoa, which is implemented on some systems but not all. If I simply provide my own itoa implementation:
header.h:115:13: error: conflicting types for 'itoa'
extern void itoa(int, char[]);
In file included from header.h:2:0,
from file.c:2:0,
c:\path\to\mingw\include\stdlib.h:631:40: note: previous declaration of 'itoa' was here
_CRTIMP __cdecl __MINGW_NOTHROW char* itoa (int, char*, int);
I know I can check if macros are predefined and define them if not:
#ifndef _SOME_MACRO
#define _SOME_MACRO 45
#endif
Is there a way to check if a C function is pre-implemented, and if not, implement it? Or to simply un-implement a function?
Given you have already written your own implementation of itoa(), I would recommend that you rename it and use it everywhere. At least you are sure you will get the same behavior on all platforms, and avoid the linking issue.
Don't forget to explain your choice in the comments of your code...
I assume you are using GCC, as I can see MinGW in your path... there's one way the GNU linker can take care of this for you. So you don't know whether there is an itoa implementation or not. Try this:
Create a new file (without any headers) called my_itoa.c:
char *itoa (int, char *, int);
char *my_itoa (int a, char *b, int c)
{
return itoa(a, b, c);
}
Now create another file, impl_itoa.c. Here, write the implementation of itoa but add a weak alias:
char* __attribute__ ((weak)) itoa(int a, char *b, int c)
{
// implementation here
}
Compile all of the files, with impl_itoa.c at the end.
This way, if itoa is not available in the standard library, this one will be linked. You can be confident about it compiling whether or not it's available.
Ajay Brahmakshatriya's suggestion is a good one, but unfortunately MinGW doesn't support weak definition last I checked (see https://groups.google.com/forum/#!topic/mingwusers/44B4QMPo8lQ, for instance).
However, I believe weak references do work in MinGW. Take this minimal example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
__attribute__ ((weak)) char* itoa (int, char*, int);
char* my_itoa (int a, char* b, int c)
{
if(itoa != NULL) {
return itoa(a, b, c);
} else {
// toy implementation for demo purposes
// replace with your own implementation
strcpy(b, "no itoa");
return b;
}
}
int main()
{
char *str = malloc((sizeof(int)*3+1));
my_itoa(10, str, 10);
printf("str: %s\n", str);
return 0;
}
If the system provides an itoa implementation, that should be used and the output would be
str: 10
Otherwise, you'll get
str: no itoa
There are two really important related points worth making here along the "don't do it like this" lines:
Don't use atoi because it's not safe.
Don't use atoi because it's not a standard function, and there are good standard functions (such as snprintf) which are available to do what you want.
But, putting all this aside for one moment, I want to introduce you to autoconf, part of the GNU build system. autoconf is part of a very comprehensive, very portable set of tools which aim to make it easier to write code which can be built successfully on a wide range of target systems. Some would argue that autoconf is too complex a system to solve just the one problem you pose with just one library function, but as any program grows, it's likely to face more hurdles like this, and getting autoconf set up for your program now will put you in a much stronger position for the future.
Start with a file called Makefile.in which contains:
CFLAGS=--ansi --pedantic -Wall -W
program: program.o
program.o: program.c
clean:
rm -f program.o program
and a file called configure.ac which contains:
AC_PREREQ([2.69])
AC_INIT(program, 1.0)
AC_CONFIG_SRCDIR([program.c])
AC_CONFIG_HEADERS([config.h])
# Checks for programs.
AC_PROG_CC
# Checks for library functions.
AH_TEMPLATE([HAVE_ITOA], [Set to 1 if function atoi() is available.])
AC_CHECK_FUNC([itoa],
[AC_DEFINE([HAVE_ITOA], [1])]
)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT
and a file called program.c which contains:
#include <stdio.h>
#include "config.h"
#ifndef HAVE_ITOA
/*
* WARNING: This code is for demonstration purposes only. Your
* implementation must have a way of ensuring that the size of the string
* produced does not overflow the buffer provided.
*/
void itoa(int n, char* p) {
sprintf(p, "%d", n);
}
#endif
int main(void) {
char buffer[100];
itoa(10, buffer);
printf("Result: %s\n", buffer);
return 0;
}
Now run the following commands in turn:
autoheader: This generates a new file called config.h.in which we'll need later.
autoconf: This generates a configuration script called configure
./configure: This runs some tests, including checking that you have a working C compiler and, because we've asked it to, whether an itoa function is available. It writes its results into the file config.h for later.
make: This compiles and links the program.
./program: This finally runs the program.
During the ./configure step, you'll see quite a lot of output, including something like:
checking for itoa... no
In this case, you'll see that the config.h find contains the following lines:
/* Set to 1 if function atoi() is available. */
/* #undef HAVE_ITOA */
Alternatively, if you do have atoi available, you'll see:
checking for itoa... yes
and this in config.h:
/* Set to 1 if function atoi() is available. */
#define HAVE_ITOA 1
You'll see that the program can now read the config.h header and choose to define itoa if it's not present.
Yes, it's a long way round to solve your problem, but you've now started using a very powerful tool which can help you in a great number of ways.
Good luck!

Is there a way to overwrite the malloc/free function in C?

Is there a way to hook the malloc/free function call from a C application it self?
malloc() and free() are defined in the standard library; when linking code, the linker will search the library only for symbols that are not already resolved by eailier encountered object code, and object files generated from compilation are always linked before any libraries.
So you can override any library function simply by defining it in your own code, ensuring that it has the correct signature (same name, same number and types of parameters and same return type).
Yes you can. Here's an example program. It compiles and builds with gcc 4.8.2 but does not do anything useful since the implementations are not functional.
#include <stdlib.h>
int main()
{
int* ip = malloc(sizeof(int));
double* dp = malloc(sizeof(double));
free(ip);
free(dp);
}
void* malloc(size_t s)
{
return NULL;
}
void free(void* p)
{
}
Not sure if this counts as "overwriting', but you can effectively change the behavior of code that calls malloc and free by using a macro:
#define malloc(x) my_malloc(x)
#define free(x) my_free(x)
void * my_malloc(size_t nbytes)
{
/* Do your magic here! */
}
void my_free(void *p)
{
/* Do your magic here! */
}
int main(void)
{
int *p = malloc(sizeof(int) * 4); /* calls my_malloc */
free(p); /* calls my_free */
}
You may need LD_PRELOAD mechanism to replace malloc and free.
As many mentioned already, this is very platform specific. Most "portable" way is described in an accepted answer to this question. A port to non-posix platforms requires finding an appropriate replacement to dlsym.
Since you mention Linux/gcc, hooks for malloc would probably serve you the best.
Depending on the platform you are using, you may be able to remove the default malloc/free from the library and add your own using the linker or librarian tools. I'd suggest you only do this in a private area and make sure you can't corrupt the original library.
On the Windows platform there is a Detour library. It basically patches any given function on the assembler level. This allows interception of any C library or OS call, like CreateThread, HeapAlloc, etc. I used this library for overriding memory allocation functions in a working application.
This library is Windows specific. On other platforms most likely there are similar libraries.
C does not provide function overloading. So you cannot override.

Overriding C library functions, calling original

I am a bit puzzled on how and why this code works as it does. I have not actually encountered this in any project I've worked on, and I have not even thought of doing it myself.
override_getline.c:
#include <stdio.h>
#define OVERRIDE_GETLINE
#ifdef OVERRIDE_GETLINE
ssize_t getline(char **lineptr, size_t *n, FILE *stream)
{
printf("getline &lineptr=%p &n=%p &stream=%p\n", lineptr, n, stream);
return -1; // note: errno has undefined value
}
#endif
main.c:
#include <stdio.h>
int main()
{
char *buf = NULL;
size_t len = 0;
printf("Hello World! %zd\n", getline(&buf, &len, stdin));
return 0;
}
And finally, example compile and run command:
gcc main.c override_getline.c && ./a.out
With the OVERRIDE_GETLINE define, the custom function gets called, and if it is commented out, normal library function gets called, and both work as expected.
Questions
What is the correct term for this? "Overriding", "shadowing", something else?
Is this gcc-specific, or POSIX, or ANSI C, or even undefined in all?
Does it make any difference if function is ANSI C function or (like here) a POSIX function?
Where does the overriding function get called? By other .o files in the same linking, at least, and I presume .a files added to link command too. How about static or dynamic libs added with -l command line option of linker?
If it is possible, how do I call the library version of getline from the overriden getline?
The linker will search the files you provide on the command line first for symbols, before it searches in libraries. This means that as soon as it sees that getline has been defined, it will no longer look for another getline symbol. This is how linkers works on all platforms.
This of course has implications for your fifth point, in that there is no possibility to call the "original" getline, as your function is the original from the point of view of the linker.
For the fifth point, you may want to look at e.g. this old answer.
There's no standard way to have two functions of the same name in your program, but with some UNIX-like implementations (notably GNU libc) you might be able to get away with this:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
ssize_t getline(char **lineptr, size_t *n, FILE *stream)
{
ssize_t (*realfunc)(char**, size_t *, FILE*) =
(ssize_t(*)(char**, size_t *, FILE*))(dlsym (RTLD_NEXT, "getline"));
return realfunc(lineptr, n, stream);
}
You will need to link with -ldl for this.
What is happening here is that you are relying on the behaviour of the linker. The linker finds your implementation of getline before it sees the version in the standard library, so it links to your routine. So in effect you are overriding the function via the mechanism of link order. Of course other linkers may behave differently, and I believe the gcc linker may even complain about duplicate symbols if you specify appropriate command line switches.
In order to be able to call both your custom routine and the library routine you would typically resort to macros, e.g.
#ifdef OVERRIDE_GETLINE
#define GETLINE(l, n, s) my_getline(l, n, s)
#else
#define GETLINE(l, n, s) getline(l, n, s)
#endif
#ifdef OVERRIDE_GETLINE
ssize_t my_getline(char **lineptr, size_t *n, FILE *stream)
{
// ...
return getline(lineptr, n, stream);
}
#endif
Note that this requires your code to call getline as GETLINE, which is rather ugly.
What you see is expected behaviour if you linking with shared libraries. Linker will just assign it to your function, as it was first. It will also be correctly called from any other external libraries functions, - because linker will make your function exportable when it will scan linking libraries.
But - if you, say, have no external libraries that links to your function (so it isn't marked exportable, and isn't inserted to symbol table), and then dlopen() some library that want to use it during runtime - it will not find required function. Furthermore, if you first dlopen(RTLD_NOW|RTLD_GLOBAL) original library, every subsequent dlopen()'d library will use this library code, not yours. Your code (or any libraries that you've linked with during compilation phase, not runtime) will still stick with your function, no matter what.

Resources