Taking input without using libc - c

How would I take input or print output without using libc? I can use system calls, but didn't know if that would help.

There is no platform-independent way to do this. In fact, the whole point of having libc is to have a common interface to a set of functionality that most systems provide, but do so in fundamentally different ways.
Your best option would probably be to consult the documentation for whatever system you are currently using. You could look up your OS's set of interrupts and then try using the asm keyword to write assembly instructions that tell the OS to read input or display output. You could look into libraries provided by the OS for doing input and output on file descriptors, then use those functions instead. Or, you could look at process creation libraries, then spawn off a process to read or write data from the console, where the second program uses libc. None of these are guaranteed to be at all portable, though.
Hope this helps!

I stepped through write(2) with gdb to get an idea of how the system call ABI works.
Anyway, no libc at all. Note that without special tricks, the cc(1) compiler/linker front-end will still link you with libc, but you won't be using it for anything. The C runtime start-up code will make some libc calls, but this program won't.
void mywrite(int fd, const void *b, int c) {
asm("movl $1, %eax");
asm("syscall");
}
int main(void) { const char *s = "Hello world.\n"; return mywrite(1, s, 13), 0; }

Related

Getting printf in assembly with only system calls?

I am looking to understand the printf() statement at the assembly level. However most of the assembly programs do something like call an external print function whose dependency is met by some other object file that the linker adds on. I would like to know what is inside that print function in terms of system calls and very basic assembly code. I want a piece of assembly code where the only external calls are the system calls, for printf. I'm thinking of something like a de assembled object file. Where can I get something like that??
I would suggest instead to stay first at the C level, and study the source code of some existing C standard library free software implementation on Linux. Look into the source code of musl-libc or of GNU libc (a.k.a. glibc). You'll understand that several intermediate (usually internal) functions are useful between printf and the basic system calls (listed in syscalls(2) ...). Use also strace(1) on a sample C program doing printf (e.g. the usual hello-world example).
In particular, musl-libc has a very readable stdio/printf.c implementation, but you'll need to follow several other C functions there before reaching the write(2) syscall. Notice that some buffering is involved. See also setvbuf(3) & fflush(3). Several answers (e.g. this and that one) explain the chain between functions like printf and system calls (up to kernel code).
I want a piece of assembly code where the only external calls are the system calls, for printf
If you want exactly that, you might start from musl-libc's stdio/printf.c, add any additional source file from musl-libc till you have no more external undefined symbols, and compile all of them with gcc -flto -O2 and perhaps also -S, you probably will finish with a significant part of musl-libc in object (or assembly) form (because printf may call malloc and many other functions!)... I'm not sure it is worth the pain.
You could also statically link your libc (e.g. libc.a). Then the linker will link only the static library members needed by printf (and any other function you are calling).
To be picky, system calls are not actually external calls (your libc write function is actually a tiny wrapper around the raw system call). You could make them using SYSENTER machine instructions (but using vdso(7) is preferable: more portable, and perhaps quicker), and you don't even need a valid stack pointer (on x86_64) to make a system call.
You can write Linux user-level programs without even using the libc; the bones implementation of Scheme is such a program (and you'll find others).
The function printf() is in the standard C library, so it is linked into your program and not copied into it. Dynamically linked libraries save memory because you don't have the exact same code copied in resident memory for every program that uses it.
Think about what printf() does. Interpreting the formatted string and generating the correct output is fairly complex. The series of functions that printf() belongs to also buffers the output. You probably don't really want to re-implement all of this in assembly. The standard C library is omnipresent, and probably available for you.
Maybe you're looking for write(2), which is the system call for unbuffered writes of just bytes to a file descriptor. You'd have to generate the string to print beforehand and format it yourself. (See also open(2) for opening files.)
To disassemble a binary, you can use objdump:
objdump -d binary
where binary is some compiled binary. This gives opcodes and human readable instructions. You probably want to redirect to a file and read elsewhere.
You can disassemble the standard C binary on your system and try to interpret it if you want (strongly not recommended). The problem is that it will be far too complex to understand. Things like printf() were written in C, then compiled and assembled. You can't (within a reasonable number of decades) restore the high level structure from the assembly of a compiled (non-trivial) program. If you really want to try this, good luck.
An easier thing to do is to look at the C source code for printf() itself. The real work is actually done in vfprintf() which is in stdio-common/vfprintf.c of the GNU C library source code.

C Linux Tracing all function calls including function inside library

i have program like
int main()
{
char *ptr = malloc(2);
free(ptr);
}
So i just want to trace of all function calls happening inside the program ,till system call
like
malloc
|____ libc( sme_fn)
|
|__sme_system_call
Could you please tell some way to get this ?
As you know, "system calls" come in two flavors:
Calls directly to the operating system ("open", "close", "fork", "exec", "exit", etc)
Standard C runtime functions for the platform ("printf()", "malloc()", "free(), etc.)
You can view the former with "strace".
You can view (at least calls into) the latter with gdb.
You can look at the complete implementation, and all internals, directly from the source code:
Gnu C Library
Linux kernel
Finally, if you're having issues with "malloc()", "valgrind" is (one of several) very, very useful tools to consider.
If you're using gcc, compile with -pg and then use the gprof command.
Or, if you're on Linux, you can use oprofile to do something similar without recompiling.
Both tools should give you call graphs, which is what you're looking for.

What is the need for C startup routine?

Quoting from one of the unix programming books,
When a C program is executed by the
kernelby, one of the exec functions
calls special start-up routine. This
function is called before the main
function is called. The executable
program file specifies this routine as
the starting address for the program;
this is set up by the link editor when
it is invoked by the C compiler. This
start-up routine takes values from the
kernel the command-line arguments and
the environment and sets things up so
that the main function is called as
shown earlier.
Why do we a need a middle man start-up routine. The exec function could have straightway called the main function and the kernel could have directly passed the command line arguments and environment to the main function. Why do we need the start-up routine in between?
Because C has no concept of "plug in". So if you want to use, say, malloc() someone has to initialize the necessary data structures. The C programmers were lazy and didn't want to have to write code like this all the time:
main() {
initialize_malloc();
initialize_stdio();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
... oh wow, can we start already? ...
}
So the C compiler figures out what needs to be done, generates the necessary code and sets up everything so you can start with your code right away.
The start-up routine initializes the CRT (i.e. creates the CRT heap so that malloc/free work, initializes standard I/O streams, etc.); in case of C++ it also calls the globals' constructors. There may be other system-specific setup, you should check the sources of your run-time library for more details.
Calling main() is a C thing, while calling _start() is a kernel thing, indicated by the entry point in the binary format header. (for clarity: the kernel doesn't want or need to know that we call it _start)
If you would have a non-C binary, you might not have a main() function, you might not even have the concept of a "function" at all.
So the actual question would be: why doesn't a compiler give the address of main() as a starting point? That's because typical libc implementations want to do some initializations before really starting the program, see the other answers for that.
edit as an example, you can change the entry point like this:
$ cat entrypoint.c
int blabla() { printf("Yes it works!\n"); exit(0); }
int main() { printf("not called\n"); }
$ gcc entrypoint.c -e blabla
$ ./a.out
Yes it works!
Important to know also is that an application program is executed in user mode, and any system calls out, set the privileged bit and go into kernel mode. This helps increase OS security by preventing the user from accessing kernel level system calls and a myriad of other complications. So a call to printf will trap, set kernel mode bit, execute code, then reset to user mode and return to your application.
The CRT is required to help you and allow you to use the languages you want in Windows and Linux. it provides some very fundamental bootstrapping into the OS to provide you with feature sets for development.

Is there a way to access debug symbols at run time?

Here's some example code to give an idea of what I want.
int regular_function(void)
{
int x,y,z;
/** do some stuff **/
my_api_call();
return x;
}
...
void my_api_call(void)
{
char* caller = get_caller_file();
int line = get_caller_line();
printf("I was called from %s:%d\n", caller, line);
}
Is there a way to implement the get_caller_file() and get_caller_line()? I've seen/used tricks like #defineing my_api_call as a function call passing in the __FILE__ and __LINE__ macros. I was wondering if there was a way to access that information (assuming it's present) at run time instead of compile time? Wouldn't something like Valgrind have to do something like this in order to get the information it returns?
If you have compiled your binary with debug symbols, you may access it using special libraries, like libdwarf for DWARF debug format.
This is highly environment-specific. In most Windows and Linux implementations where debug symbols are provided, the tool vendor provides or documents a way of doing that. For a better answer, provide implementation specifics.
Debugging symbols, if available, need to be stored somewhere so the debugger can get at them. They may or may not be stored in the executable file itself.
You may or may not know the executable file name (argv[0] is not required to have the full path of the program name, or indeed have any useful information in it - see here for details).
Even if you could locate the debugging symbols, you would have to decode them to try and figure out where you were called from.
And your code may be optimised to the point where the information is useless.
That's the long answer. The short answer is that you should probably rely on passing in __FILE__ and __LINE__ as you have been. It's far more portable an reliable.

LINUX: Is it possible to write a working program that does not rely on the libc library?

I wonder if I could write a program in the C-programming language that is executable, albeit not using a single library call, e.g. not even exit()?
If so, it obviously wouldn't depend on libraries (libc, ld-linux) at all.
I suspect you could write such a thing, but it would need to have an endless loop at the end, because you can't ask the operation system to exit your process. And you couldn't do anything useful.
Well start with compiling an ELF program, look into the ELF spec and craft together the header, the program segments and the other parts you need for a program. The kernel would load your code and jump to some initial address. You could place an endless loop there. But without knowing some assembler, that's hopeless from the start on anyway.
The start.S file as used by glibc may be useful as a start point. Try to change it so that you can assemble a stand-alone executable out of it. That start.S file is the entry point of all ELF applications, and is the one that calls __libc_start_main which in turn calls main. You just change it so it fits your needs.
Ok, that was theoretical. But now, what practical use does that have?
Answer to the Updated Question
Well. There is a library called libgloss that provides a minimal interface for programs that are meant to run on embedded systems. The newlib C library uses that one as its system-call interface. The general idea is that libgloss is the layer between the C library and the operation system. As such, it also contains the startup files that the operation system jumps into. Both these libraries are part of the GNU binutils project. I've used them to do the interface for another OS and another processor, but there does not seem to be a libgloss port for Linux, so if you call system calls, you will have to do it on your own, as others already stated.
It is absolutely possible to write programs in the C programming language. The linux kernel is a good example of such a program. But also user programs are possible. But what is minimally required is a runtime library (if you want to do any serious stuff). Such one would contain really basic functions, like memcpy, basic macros and so on. The C Standard has a special conformance mode called freestanding, which requires only a very limited set of functionality, suitable also for kernels. Actually, i have no clue about x86 assembler, but i've tried my luck for a very simple C program:
/* gcc -nostdlib start.c */
int main(int, char**, char**);
void _start(int args)
{
/* we do not care about arguments for main. start.S in
* glibc documents how the kernel passes them though.
*/
int c = main(0,0,0);
/* do the system-call for exit. */
asm("movl %0,%%ebx\n" /* first argument */
"movl $1,%%eax\n" /* syscall 1 */
"int $0x80" /* fire interrupt */
: : "r"(c) :"%eax", "%ebx");
}
int main(int argc, char** argv, char** env) {
/* yeah here we can do some stuff */
return 42;
}
We're happy, it actually compiles and runs :)
Yes, it is possible, however you will have to make system calls and set up your entry point manually.
Example of a minimal program with entry point:
.globl _start
.text
_start:
xorl %eax,%eax
incl %eax
movb $42, %bl
int $0x80
Or in plain C (no exit):
void __attribute__((noreturn)) _start() {
while(1);
}
Compiled with:
gcc -nostdlib -o example example.s
gcc -nostdlib -o example example.c
In pure C? As others have said you still need a way to make syscalls, so you might need to drop down to inline asm for that. That said, if using gcc check out -ffreestanding.
You'd need a way to prevent the C compiler from generating code that depends on libc, which with gcc can be done with -fno-hosted. And you'd need one assembly language routine to implement syscall(2). They're not hard to write if you can get suitable OS doco. After that you'd be off to the races.
Well, you would need to use some system calls to load all it's information into memory, so I doubt it.
And you would almost have to use exit(), just because of the way that Linux works.
Yes you can, but it's pretty tricky.
There is essentially absolutely no point.
You can statically link a program, but then the appropriate pieces of the C library are included in its binary (so it doesn't have any dependencies).
You can completely do without the C library, in which case you need to make system calls using the appropriate low-level interface, which is architecture dependent, and not necessarily int 0x80.
If your goal is making a very small self-contained binary, you might be better off static-linking against something like uclibc.

Resources