How do I make syscalls from my C program - c

How do I make system calls from my C program. For example, how do I call the following function? What headers would I have to include?
asmlinkage long sys_exit(int error_code);

You would normally call C library wrappers for the system calls (for example open() and read() are just wrappers). Wrappers are more friendly.
As an alternative to do the work yourself in assembly, you can try the syscall(2) function from glibc. This function makes a system call without a wrapper, and is specially useful when invoking a system call that has no wrapper function. In this manner you just need to provide symbolic constants for system call numbers, and also i think is more portable than coding the system call in assembler instructions.
Example from the doc:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
pid_t tid;
tid = syscall(SYS_gettid);
tid = syscall(SYS_tgkill, getpid(), tid);
}

asm volatile(
"xorq %%rdi, %%rdi;" /* return value */
"movq $60, %%rax;" /* syscall id (/asm/unistd_64.h */
"syscall;"
::: "rdi", "rax"
);
You can't call it from pure C but you need to invoke it from assembly as would any wrapper like glibc.
Another way is using int 80h, but that's rather outdated.
In rdi you put error_code (0 in this case) while in rax the number which identifies the system call, available in /usr/include/asm/unistd.h that will in turn point you to the 32 or 64 bit version.
#define __NR_exit 60

Related

Why do some Linux system calls not have a wrapper, but are documented as if they do?

Let's look at the gettid system call as an example:
http://man7.org/linux/man-pages/man2/gettid.2.html
I know gettid is not implemented in libc and I need to make a system call directly in order to use it (syscall(SYS_gettid)). I have verified this myself with this C code:
#include <stdio.h>
#include <sys/types.h>
int main(){
pid_t a = gettid();
return 0;
}
which doesn't link and gives this warning when compiling: warning: implicit declaration of function 'gettid'; did you mean 'getline'.
Now my question is, why has the Linux documentation documented it as if this function actually exists?
SYNOPSIS
#include <sys/types.h>
pid_t gettid(void);
They have no example of how to make a direct system call and instead they have the above code snippet which doesn't exist and can't be used. Is there something I'm missing?
The syscall doesn't have a wrapper in the GNU C library (before 2.30), this is just a prototype of how the function would look if it did.
As noted in the man page:
NOTES
Glibc does not provide a wrapper for this system call; call it using syscall(2).
Here's an example of the gettid wrapper:
#define _GNU_SOURCE
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
pid_t gettid(void)
{
pid_t tid = (pid_t)syscall(SYS_gettid);
return tid;
}
As you can see, this is the same prototype as described in the man-page. The prototype in the man-page is just for reference, so you can create a wrapper around the system call if you (or the libc developers) so choose.
If you're just starting to learn C, I suggest you stop trying to understand system calls and their wrappers in the C library until you have more experience in the language. The difference will then be clear.

Correct inline assembly code for sys_uname

I have written inline assembly code for the system call sys_uname, but it doesn't seem to be correct.
#include <sys/utsname.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscalls.h>
#include <string.h>
struct utsname stroj;
__asm__ volatile("pushl $0;"
"pushl %%ebx;"
"pushl $0;"
"int $0x80;"
:
: "a" (SYS_uname), "b" (&stroj)
);
//uname(&stroj); -> when I do this it works, but again, I want to use inline assembly
write(1, stroj.nodename, strlen(stroj.nodename)); // 1 = stdout
Is there some glaring problem that I am not addressing? This write prints out nothing, literally "".
This answer assumes there is a reason why you wish to use system calls directly rather than through C library functions.
A version of the inline assembly that would be correct could look like:
#include <sys/utsname.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <string.h>
#include <unistd.h>
/* SYS_uname has the value 164 */
/* #define SYS_uname 164 */
#define SYS_uname SYS_freebsd4_uname
int main()
{
u_int syscallnum = SYS_uname;
struct utsname stroj;
asm("push %[stroj]\n\t"
"push %%eax\n\t" /* Required dummy value for int 0x80 */
"int $0x80\n\t"
"add $8, %%esp" /* 2*4 bytes removed from stack */
: "+a"(syscallnum) /* error code also returned in syscallnum */
: [stroj]"r"(&stroj)
: "memory");
write(1, stroj.nodename, strlen(stroj.nodename));
return 0;
}
With FreeBSD 32-bit system calls the parameters are pushed on the stack in reverse order. A dummy value (any value) has to be pushed on the stack before issuing int $0x80. You need to adjust the stack pointer ESP after the system call. Any registers that may change need to be dealt with as well. int $0x80 will return an error code in EAX. The code above returns that value back in the syscallnum variable. If you modify a register in inline assembly and don't let GCC know it can cause undefined behaviour that is often hard to hunt down.
If you pass addresses via registers you will need to add memory operands (even if they are dummies) to specify that the data at the pointer in the registers is being read and/or written to. Alternatively you can specify the memory clobber which may be easier to understand although is a more brute force approach.
GCC's inline assembly is powerful but difficult to get right and can cause unexpected behaviour if you get it wrong. You should only use inline assembly as a last resort. FreeBSD has a syscall function that can be used to call most system calls.
You could have written the inline assembly above as:
asm(
"push %[stroj]\n\t"
"push %%eax\n\t" /* Required dummy value for int 0x80 */
"int $0x80\n\t"
"add $8, %%esp" /* 2*4 bytes removed from stack */
: "+a"(syscallnum), /* error code also returned in syscallnum */
"=m"(stroj)
: [stroj]"r"(&stroj));
FreeBSD 2+ doesn't support obsolete SYS_uname
If you try to run the code above you will discover it doesn't return anything. If you use the program TRUSS with a command like truss ./progname you should see something like this in the output:
obs_uname(0xffffc6f8,0x0,0x0,0x0,0x0,0x0) ERR#78 'Function not implemented'
This is because FreeBSD 2+ doesn't support the SYS_uname system call and is now considered obsolete. FreeBSD's libc uname makes calls to SYS___sysctl to populate the fields of the utsname structure. From the command line you can query the nodename using:
sysctl kern.hostname
You can call sysctl through a system call this way:
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/sysctl.h>
#define OSNAME_MAX_LEN 256
/* SYS___sysctl has the value 202 */
/* #define SYS___sysctl 202 */
int main(void)
{
char osname[OSNAME_MAX_LEN];
size_t osnamelen = sizeof(osname) - 1;
int name[] = {CTL_KERN, KERN_HOSTNAME};
u_int namelen = sizeof(name) / sizeof(name[0]);
char * old = osname;
size_t * oldlenp = &osnamelen;
u_int syscallnum = SYS___sysctl;
asm("push %[newlen]\n\t"
"push %[new]\n\t"
"push %[oldlenp]\n\t"
"push %[old]\n\t"
"push %[namelen]\n\t"
"push %[name]\n\t"
"push %%eax\n\t" /* Required dummy value */
"int $0x80\n\t"
"add $28, %%esp" /* 7*4=28 bytes to remove from stack */
: "+a"(syscallnum) /* error code also returned in syscallnum */
: [name]"r"(name), [namelen]"r"(namelen),
[old]"r"(old) , [oldlenp]"r"(oldlenp),
[new]"i"(NULL), [newlen]"i"(0)
: "memory");
if (syscallnum) {
return EXIT_FAILURE;
}
osname[osnamelen]='\0'; /* Ensure the OS Name is Null terminated */
printf("This machine's node name is %s\n", osname);
return EXIT_SUCCESS;
}
When inline assembly adjusts ESP (push etc) it can cause memory operands generated by GCC and passed via a constraint to point at the wrong memory locations. This is especially true if any of the data is placed on the stack. To avoid this problem it is easiest to pass the addresses via registers.
Using the syscall function rather than inline assembly it could have also been written this way:
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/sysctl.h>
#define OSNAME_MAX_LEN 256
/* SYS___sysctl has the value 202 */
/* #define SYS___sysctl 202 */
int main(void)
{
char osname[OSNAME_MAX_LEN];
size_t osnamelen = sizeof(osname) - 1;
int name[] = {CTL_KERN, KERN_HOSTNAME};
u_int namelen = sizeof(name) / sizeof(name[0]);
char * old = osname;
size_t * oldlenp = &osnamelen;
if (syscall(SYS___sysctl, name, namelen, old, oldlenp, NULL, 0) == -1) {
perror("sysctl");
return EXIT_FAILURE;
}
osname[osnamelen]='\0'; /* Ensure the OS Name is Null terminated */
printf("This machine's node name is %s\n", osname);
return EXIT_SUCCESS;
}

How to implement another variation of clone(2) syscall in linux kernel?

I'm trying to create another version of clone(2) syscall(in kernel space) to create a clone of a user process with some additional parameters.This system call will be doing exactly the same job as clone(2) but I want to pass one additional parameters to the kernel from user_space.However when I see the glibc's code
it seems that every parameter are not passed in the same order as user's call of the clone()
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, void *newtls, pid_t *ctid */ );
rather some of them are handled by glibc's code itself.I searched the internet to learn how glib's clone() works but couldn't find any better documentation.
Can anyone please explain
How glibc handles the clone()?
And also all the parameters of syscall in kernel are not exactly the same as clone in glibc, so how is these variation handled?
How glibc handles the clone()?
Via arch-specific assembly wrappers. For i386, see sysdeps/unix/sysv/linux/i386/clone.S in the glibc sources; for x86-64, see sysdeps/unix/sysv/linux/x86-64/clone.S, and so on.
The normal syscall wrappers are not sufficient, because it is up to the userspace to switch stacks. The above assembly files have pretty informative comments as to what actually needs to be done in userspace in addition to the syscall.
All the parameters of syscall in kernel are not exactly the same as clone in glibc, so how is these variation handled?
C library functions that map to a syscall are wrapper functions.
Consider, for example, the POSIX.1 write() C library low-level I/O function, and the Linux write() syscall. The parameters are basically the same, as are the error conditions, but the error return values differ. The C library function returns -1 with errno set if an error occurs, whereas the Linux syscall returns negative error codes (which basically match errno values).
If you look at e.g. sysdeps/unix/sysv/linux/x86_64/sysdep.h, you can see that the basic syscall wrapper for Linux on x86-64 boils down to
# define INLINE_SYSCALL(name, nr, args...) \
({ \
unsigned long int resultvar = INTERNAL_SYSCALL (name, , nr, args); \
if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (resultvar, ))) \
{ \
__set_errno (INTERNAL_SYSCALL_ERRNO (resultvar, )); \
resultvar = (unsigned long int) -1; \
} \
(long int) resultvar; })
which just calls the actual syscall, then checks if the syscall return value indicated an error; and if it does, changes the result to -1 and sets errno accordingly. It's just funny-looking, because it relies on GCC extensions to make it behave as a single statement.
Let's say you added a new syscall to Linux, say
SYSCALL_DEFINE2(splork, unsigned long, arg1, void *, arg2);
and, for whatever reasons, you wish to expose it to userspace as
int splork(void *arg2, unsigned long arg1);
No problem! All you need is to provide a minimal header file,
#ifndef _SPLORK_H
#define _SPLORK_H
#define _GNU_SOURCE
#include <sys/syscall.h>
#include <errno.h>
#ifndef __NR_splork
#if defined(__x86_64__)
#define __NR_splork /* syscall number on x86-64 */
#else
#if defined(__i386)
#define __NR_splork /* syscall number on i386 */
#endif
#endif
#ifdef __NR_splork
#ifndef SYS_splork
#define SYS_splork __NR_splork
#endif
int splork(void *arg2, unsigned long arg1)
{
long retval;
retval = syscall(__NR_splork, (long)arg1, (void *)arg2);
if (retval < 0) {
/* Note: For backward compatibility, we might wish to use
*(__errno_location()) = -retval;
here. */
errno = -retval;
return -1;
} else
return (int)retval;
}
#else
#undef SYS_splork
int splork(void *arg2, unsigned long arg1)
{
/* Note: For backward compatibility, we might wish to use
*(__errno_location()) = ENOTSUP;
here. */
errno = ENOTSUP;
return -1;
}
#endif
#endif /* _SPLORK_H */
The SYS_splork and __NR_splork are preprocessor macros defining the syscall number for the new syscall. Since the syscall number is likely not (yet?) included in the official kernel sources and headers, the above header file explicitly declares it for each supported architecture. For architectures where it is not supported, the splork() function will always return -1 with errno == ENOTSUP.
Note, however, that Linux syscalls are limited to 6 parameters. If your kernel function needs more, you need to pack the parameters into a structure, pass the address of that structure to the kernel, and use copy_from_user() to copy the values to the same structure in-kernel.
In all Linux architectures, pointers and long are of the same size (int may be smaller than pointer), so I recommend you use either long or fixed-size types in such structures to pass data to/from the kernel.
It possible to use the clone syscall with almost no assembler.
The problem is not the stack switching which is done by the kernel as part of the system call but likely the glibc syscall() wrapper.
Using these bare-bones wrappers instead:
long _x64_syscall0(long n) {
long ret;
__asm__ __volatile__("syscall" : "=a"(ret) : "a"(n) : "rcx", "r11", "memory");
return ret;
}
long _x64_syscall5(long n, long a1, long a2, long a3, long a4, long a5) {
long ret;
register long r10 __asm__("r10") = a4;
register long r8 __asm__("r8") = a5;
__asm__ __volatile__("syscall"
: "=a"(ret)
: "a"(n), "D"(a1), "S"(a2), "d"(a3), "r"(r10), "r"(r8)
: "rcx", "r11", "memory");
return ret;
}
A sketch of clone usage would look like:
int ret = _x64_syscall5(56 /* clone */ , CLONE_VM | SIGCHLD,
(long long)new_stack, 0, 0, 0);
if (ret == 0) {
// we are the child
ChildFunc();
_x64_syscall0(60 /* exit */ );
} else {
// we are the parent
}

Sleep | warning implicit declaration of function `sleep'?

I am learning C.
In this program
I use sleep function to slowdown a count down.
My text book doesn't specify a library I should include to use the sleep function.
So I use it without including any special library for it and it works.
But it gives me this warning message in codeblocks.
I tried to include <windows.h> but still the same warning message appears.
warning D:\Project\C language\trial8\trial8.c|19|warning: implicit
declaration of function `sleep'|
And here is my code.
#include <stdio.h>
int main()
{
int start;
do
{
printf("Please enter the number to start\n");
printf("the countdown (1 to 100):");
scanf("%d",&start);
}
while(start<1 || start>100);
do
{
printf("T-minus %d\n",start);
start--;
sleep(3000);
}
while(start>0);
printf("Zero!\n Go!\n");
return(0);
}
I want to know what does the warning message mean? How important is it? Is there anything that I should do about it? Note that the program works anyway.
The issue is in the libraries (header files):
on Windows:
#include <windows.h> and Sleep(1000); => 1000 milliseconds
on Linux:
#include <unistd.h> and sleep(1); => 1 second
The function sleep is not part of C programming language. So, C compiler needs a declaration/prototype of it so that it can get to know about about number of arguments and their data types and return data type of the function. When it doesn't find it, it creates an Implicit Declaration of that function.
In Linux, sleep has a prototype in <unistd.h> and in windows, there is another function Sleep which has a prototype in <windows.h> or <synchapi.h>.
You can always get away with including header, if you explicitly supply the prototype of the function before using it. It is useful when you need only few functions from a header file.
The prototype of Sleep function in C on windows is:
VOID WINAPI Sleep(_In_ DWORD dwMilliseconds);
Remember, it is always a good practice to supply the prototype of the function being used either by including the appropriate header file or by explicitly writing it. Even, if you don't supply it, compiler will just throw a warning most of the time and it will make an assumption which in most cases will be something that you don't want. It is better to include the header file as API might change in future versions of the Library.
Windows doesn't have the sleep function. Instead, it has Sleep, which takes the number of milliseconds to sleep:
VOID WINAPI Sleep(
_In_ DWORD dwMilliseconds
);
You'll need to either #include <windows.h> or #include <synchapi.h>, depending on the version of Windows you're running. See MSDN for more details.
Update in 2022:
As it is stated on the Linux man page here we need to include unistd.h and should do fine for all OS.
#include <stdio.h>
#include <unistd.h>
int main()
{
sleep(1); /* sleep for 1 second*/
printf("END\n");
return 0;
}
To make it more cross-platform, try this:
#ifdef _WIN32
#include <Windows.h>
#else
#include <unistd.h>
#endif

How to load Linux kernel modules from C code?

I have an application that has both two external kernel modules and a userspace daemon. I want to load the modules from the daemon code, written in C, at startup, and unload them on clean exit. Can I load them in a cleaner way than doing system("modprobe module"); and unload them using the corresponding rmmod?
init_module / remove_module minimal runnable example
Tested on a QEMU + Buildroot VM and Ubuntu 16.04 host with this simple parameter printer module .
We use the init_module / finit_module and remove_module Linux system calls.
The Linux kernel offers two system calls for module insertion:
init_module
finit_module
and:
man init_module
documents that:
The finit_module() system call is like init_module(), but reads the module to be loaded from the file descriptor fd. It is useful when the authenticity of a kernel module can be determined from its location in the filesystem; in cases where that is possible, the overhead of using cryptographically signed modules to determine the authenticity of a module can be avoided. The param_values argument is as for init_module().
finit is newer and was added only in v3.8. More rationale: https://lwn.net/Articles/519010/
glibc does not seem to provide a C wrapper for them, so we just create our own with syscall.
insmod.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#define init_module(module_image, len, param_values) syscall(__NR_init_module, module_image, len, param_values)
#define finit_module(fd, param_values, flags) syscall(__NR_finit_module, fd, param_values, flags)
int main(int argc, char **argv) {
const char *params;
int fd, use_finit;
size_t image_size;
struct stat st;
void *image;
/* CLI handling. */
if (argc < 2) {
puts("Usage ./prog mymodule.ko [args="" [use_finit=0]");
return EXIT_FAILURE;
}
if (argc < 3) {
params = "";
} else {
params = argv[2];
}
if (argc < 4) {
use_finit = 0;
} else {
use_finit = (argv[3][0] != '0');
}
/* Action. */
fd = open(argv[1], O_RDONLY);
if (use_finit) {
puts("finit");
if (finit_module(fd, params, 0) != 0) {
perror("finit_module");
return EXIT_FAILURE;
}
close(fd);
} else {
puts("init");
fstat(fd, &st);
image_size = st.st_size;
image = malloc(image_size);
read(fd, image, image_size);
close(fd);
if (init_module(image, image_size, params) != 0) {
perror("init_module");
return EXIT_FAILURE;
}
free(image);
}
return EXIT_SUCCESS;
}
GitHub upstream.
rmmod.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#define delete_module(name, flags) syscall(__NR_delete_module, name, flags)
int main(int argc, char **argv) {
if (argc != 2) {
puts("Usage ./prog mymodule");
return EXIT_FAILURE;
}
if (delete_module(argv[1], O_NONBLOCK) != 0) {
perror("delete_module");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
GitHub upstream.
Busybox source interpretation
Busybox provides insmod, and since it is designed for minimalism, we can try to deduce how it is done from there.
On version 1.24.2, the entry point is at modutils/insmod.c function insmod_main.
The IF_FEATURE_2_4_MODULES is optional support for older Linux kernel 2.4 modules, so we can just ignore it for now.
That just forwards to modutils.c function bb_init_module.
bb_init_module attempts two things:
mmap the file to memory through try_to_mmap_module.
This always sets image_size to the size of the .ko file as a side effect.
if that fails, malloc the file to memory with xmalloc_open_zipped_read_close.
This function optionally unzips the file first if it is a zip, and just mallocs it otherwise.
I don't understand why this zipping business is done, since we can't even rely on it because the try_to_mmap_module does not seem to unzip things.
Finally comes the call:
init_module(image, image_size, options);
where image is the executable that was put into memory, and options are just "" if we call insmod file.elf without further arguments.
init_module is provided above by:
#ifdef __UCLIBC__
extern int init_module(void *module, unsigned long len, const char *options);
extern int delete_module(const char *module, unsigned int flags);
#else
# include <sys/syscall.h>
# define init_module(mod, len, opts) syscall(__NR_init_module, mod, len, opts)
# define delete_module(mod, flags) syscall(__NR_delete_module, mod, flags)
#endif
ulibc is an embedded libc implementation, and it seems to provide init_module.
If it is not present, I think glibc is assumed, but as man init_module says:
The init_module() system call is not supported by glibc. No declaration is provided in glibc headers, but, through a quirk of history, glibc does export an ABI for
this system call. Therefore, in order to employ this system call, it is sufficient to manually declare the interface in your code; alternatively, you can invoke
the system call using syscall(2).
BusyBox wisely follows that advice and uses syscall, which glibc provides, and which offers a C API for system calls.
insmod/rmmod use the functions init_module and delete_module to do this, which also have a man-page available. They both declare the functions as extern instead of including a header, but the man-page says they should be in <linux/module.h>.
I'd recommend against the use of system() in any daemon code that runs with root permissions as it's relatively easy to exploit from a security standpoint. modprobe and rmmod are, indeed, the right tools for the job. However, it'd be a bit cleaner and much more secure to use an explicit fork() + exec() to invoke them.
I'm not sure there's a cleaner way than system.
But for sure, if you want to load/unload the modules from your userspace daemon, then you force yourself to run the daemon as root*, which may not be considered as secure.
*: or you can add the explicit commands in the sudoers file, but this will be a nightmare to manage when deploying your application.
You can perform the same tasks that modprobe and Co. do, but I doubt that could be characterized as cleaner.

Resources