Related
I'm trying to understand the relationship between C language system calls API, syscall assembler instruction and the exception mechanism (interrupts) used to switch contexts between processes. There's a lot to study out on my own, so please bear with me.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
So the call to the write function in the following C code:
#include <unistd.h>
int main(void)
{
write(2, "There was an error writing to standard out\n", 44);
return 0;
}
Is compiled to assembly as a syscall instruction:
mov eax,4 ; system call number (sys_write)
syscall
And the instruction, in turn, is implemented by OS as exceptions mechanism (interrupt)?
TL;DR
The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.
The syscall instruction jumps to a kernel entry-point that dispatches the call.
Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.
They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).
The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).
However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.
The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.
The C runtime implements the above-mentioned interactions through an environment specific mechanism.
There could be various layers of software abstractions but in the end the OS APIs get called.
The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.
Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.
Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.
A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.
Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.
The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.
The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.
We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.
The x86 architecture historically offered different ways to do so, for example, a call gate.
Another way is through the int instruction, it has a few advantages:
It is what the BIOS and the DOS did in their times.
In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
It saves the flags.
It is easy to set up (setting up ISR is relatively easy).
With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow.
Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).
With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.
Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.
I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).
With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.
This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).
64-bit versions of both Linux and Windows use syscall.
To recap, control can be given to the kernel through three mechanism:
Software interrupts.
This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
Via sysenter.
Used by 32-bit versions of Linux since 2.5.
Via syscall.
Used by 64-bit versions of Linux and Windows.
Here is a nice page to put it in a better shape.
The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.
The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.
It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.
The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.
The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).
The CPU also has exceptions.
I believe we are talking about the last meaning.
Exceptions are dispatched through interrupts, they are a kind of interrupt.
It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.
They, thus, are not used to transfer control to the kernel (they could).
Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.
We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly […]?
No.
The C compiler handles system calls the same way that it handles calls to any other function:
; write(2, "There was an error writing to standard out\n", 44);
mov $44, %edx
lea .LC0(%rip), %rsi ; address of the string
mov $2, %edi
call write
The implementation of these functions in libc (your system's C library) will probably contain a syscall instruction, or whatever the equivalent is on your system's architecture.
EDIT
Yes, the C application calls a C library function which buried in the C library solution is a system specific call or set of calls, which use an architecturally specific way to reach the operating system, which has an exception/interrupt handler setup to deal with these system calls. Actually doesnt have to be architecturally specific, can simply jump/call to a well known address, but with modern desire for security and protection modes, a simple call wont have those added features, still functionally correct though.
How the library is implemented is implementation defined. And how the compiler connects your code to that library runtime or link time has a number of combinations as to how that can happen, there is no one way it can or needs to happen, so it is implementation defined as well. So long as it is functionally correct and doesnt interfere with the C standards then it can work.
With operating systems like windows and linux and others on our phones and tables there is a strong desire to isolate the applications from the system so they cannot do damage in various ways, so protection is desired, and you need to have an architecturally specific way to make a function call into the operating system that is not a normal call as it switches modes. If the architecture has more than one way to do this then the operating system can choose one or more of the ways as part of their design.
A "software interrupt" is one common way as with hardware interrupts most solutions include a table of handler addresses, by extending that table and having some of the vectors be tied to a software created "interrupt" (hitting a special instruction rather than a signal changing state on an input) but go through the same stop, save some state, call the vector, etc.
Not a direct answer to the question but this might interest you (I don't have enough karma to comment) - it explains all the user space execution (including glibc and how it does syscalls) in detail:
http://www.maizure.org/projects/printf/index.html
You'll probably be interested in particular in 'Step 8 - Final string written to standard output':
And what does __libc_write look like...?
000000000040f9c0 <__libc_write>:
40f9c0: 83 3d c5 bb 2a 00 00 cmpl $0x0,0x2abbc5(%rip) # 6bb58c <__libc_multiple_threads>
40f9c7: 75 14 jne 40f9dd <__write_nocancel+0x14>
000000000040f9c9 <__write_nocancel>:
40f9c9: b8 01 00 00 00 mov $0x1,%eax
40f9ce: 0f 05 syscall
...cut...
Write simply checks the threading state and, assuming all is well,
moves the write syscall number (1) in to EAX and enters the kernel.
Some notes:
x86-64 Linux write syscall is 1, old x86 was 4
rdi refers to stdout
rsi points to the string
rdx is the string size count
Note that this was for the author's x86-64 Linux system.
For x86, this provides some help:
http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.
But as a general answer for a Linux kernel:
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
Yes
I am creating the application on STM32 microcontroller. I am using some library stack. Above that stack, I am using my application. I have two questions:
How can I detect and handle the stack over flow in runtime. Because I don't know how much memory that library is using.
How can I detect and handle the stack over flow in runtime if I am developing the code from the scratch. I read some where we have to maintain some count for each declaration. Is this correct way or any standard way to find it.
if you are limited to your device and no "high sophisticated" tools available, you could at least try "the old way". A simple stack guard may help. Somewhere in your code (depends on the tools you use), there must be the definition of the stack area. Something similar to:
.equ stacksize, 1024
stack: .space stacksize,0
(gnu as syntax, your's might be different)
with your device's startup code somewhere initializing the stack register to the top address of the stack area.
A stack guard would then just add a "magic number" to the stack top and bottom:
.equ stackmagic,0xaffeaffe
.equ stacksize, 1024
stacktop: .int stackmagic
stack: .space stacksize,0
stackbottom: .int stackmagic
with some code at least periodically checking (e.g. in a timer interrupt routine or - if available - in your debugger) if the stackmagic values are still there.
If your software is not tiny, I would first try to debug most of it on your laptop or desktop or tablet (perhaps running Linux, because it has good tools and very standard compliant compilers, similar to the cross-compiler you are using). Then you can profit from tools like valgrind or GCC compilation options like -Wall -Wextra -g -fsanitize=address etc....
You might store an approximation of the top of stack at start of your main function (e.g. by doing extern int* start_top_of_stack; then int i=0; start_top_of_stack= &i; near beginning of your main function. You could then have some local int j=0; in several functions and check at start of them that &j - start_top_of_stack is not too big.
But there is no silver bullet. I am just suggesting a trick.
If your application is critical to the point of accepting costly development efforts, you could use some formal method & source static program analysis tools (e.g. Frama-C, or make your own using MELT). If you are cross-compiling with a recent GCC you might want to use -Wstack-usage=some length and/or -fstack-usage to check that every call frame is not too big, or to compute manually the required stack depth.
I am working on a driver code, which is causing stack overrun
issues and memory corruption. Presently running the module gives,
"Exception stack" and the stack trace looks corrupted.
The module had compile warnings. The warnings were resolved
with gcc option "-WFrame-larger-than=len".
The issue is possibly being caused by excessive in-lining and lots of
function arguments and large number of nested functions. I need to continue
testing and continue re-factoring the code, is it possible to
make any modifications kernel to increase the stack size ? Also how would you go about debugging such issues.
Though your module would compile with warnings with "-WFrame-larger-than=len", it would still cause the stack overrun and could corrupt the in-core data structures, leading the system to an inconsistency state.
The Linux kernel stack size was limited to the 8KiB (in kernel versions earlier before 3.18), and now 16KiB (for the versions later than 3.18). There is a recent commit due to lots of issues in virtio and qemu-kvm, kernel stack has been extended to 16KiB.
Now if you want to increase stack size to 32KiB, then you would need to recompile the kernel, after making the following change in the kernel source file:(arch/x86/include/asm/page_64_types.h)
// for 32K stack
- #define THREAD_SIZE_ORDER 2
+ #define THREAD_SIZE_ORDER 3
A recent commit shows on Linux kernel version 3.18, shows the kernel stack size already being increased to 16K, which should be enough in most cases.
"
commit 6538b8ea886e472f4431db8ca1d60478f838d14b
Author: Minchan Kim <minchan#kernel.org>
Date: Wed May 28 15:53:59 2014 +0900
x86_64: expand kernel stack to 16K
"
Refer LWN: [RFC 2/2] x86_64: expand kernel stack to 16K
As for debugging such issues there is no single line answer how to, but here are some tips I can share. Use and dump_stack() within your module to get a stack trace in the syslog which really helps in debugging stack related issues.
Use debugfs, turn on the stack depth checking functions with:
# mount -t debugfs nodev /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
and regularly capture the output of the following files:
# cat /sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace
The above files will report the highest stack usage when the module is loaded and tested.
Leave the below command running:
while true; do date; cat /sys/kernel/debug/tracing/stack_max_size;
cat /sys/kernel/debug/tracing/stack_trace; echo ======; sleep 30; done
If you see the stack_max_size value exceeding maybe ~14000 bytes (for 16KiB stack version of the kernel) then the stack trace would be worth capturing looking into further. Also you may want to set-up crash tool to capture vmcore core file in cases of panics.
I am using Visual Studio 13 to compile c codes for the first time. The codes run perfectly O.K. with 2d arays of size 64*64 (there are a few arrays in my programme) but if I increase the array size to 128*128 it does not run (but compile correctly). Instead it gives a message ".exe has stopped working". My machine has 4GB ram and the same programme run with 128*128 array if I run the codes from linux.
Let me provide some more details: I have run the same code from linux using Intel C Compiler (non-commercial version) in the same machine. But due to some problem I am now constrained to work from a Windows environment. I searched and have installed two c- compilers (1) Visual Studio 13 and (2) Borland C. Both work well with a small array. But the moment I increase array size Visual Studio give the message ".exe has stopped working". I compile the programme using 'cl' from "Developers Command Prompt VS 13".
I feel the problem is with stack size.
In the link detailed explanation (as provided below) I have seen a command "ulimit" used in linux environment to increase the stack size. I remember using it a few years ago.
I feel we are close to the solution, but my problem with Windows (and VS 2013) persists as I failed to execute dumpbin /headers executable_file or editbin /STACK:size. Actually I feel I do not know how to execute them. I tried to execute them from "Developer Command Prompt VS 13" as well as using Run (windows start bottom->search (run)->Run (prop up)). I request you kindly to provide more details if possible.
I searched and found this website and think here the solution can be found.
Please help. I want to run using Visual Studio 13 from Windows.
It seems that the reason behind this is the stack overflow. The issue can be resolved by increasing the stack size.
In visual studio you can do this by using /STACK:reserve[,commit]. Read the MSDN article.
For more detailed explanation:
Under Windows platforms, the stack size information is contained in the executable files. It can be set during compilation in Visual studio C++.
Alternatively, Microsoft provides a program editbin.exe which can change the
executable files directly. Here are more details:
Windows (during compilation):
Select Project->Setting.
Select Link page.
Select Category to Output.
Type your preferred stack size in Reserve: field under Stack
allocations. eg, 32768 in decimal or 0x20000 in hexadecimal.
Windows (to modify the executable file):
There are two programs included in Microsoft Visual Studio, dumpbin.exe and editbin.exe. Run dumpbin /headers executable_file, and you can see the size of stack reserve information in optional header values. Run editbin /STACK:size to change the default stack size.
Visual Studio does not work?
Although I don't regard VS as a valid development tool, I highly doubt that it would cause your problem.
128 * 128 is 16384. If you have too little stack space (on Windows, it's 1MB by default if I'm not mistaken), and you define an array of e. g. big enough structs (sized 64 bytes, more precisely), then this can easily cause a stack overflow, since automatic arrays are typically (although not necessarily) allocated on the stack.
It sounds like you're trying to declare large arrays on the stack. Stack memory is typically limited; it sounds like you're overflowing it.
You can fix this by giving your array static duration
static BigStruct arr[128][128];
or by dynamically allocating memory for it
BigStruct (*arr)[128] = malloc(sizeof(*arr) * 128);
// use arr
free(arr);
This question is about Mac OS Classic, which has been obsolete for several years now. I hope someone still knows something about it!
I've been building a PEF executable parser for the past few weeks and I've plugged a PowerPC interpreter to it. With a good dose of wizardry, I would expect to be able to run (to some extent) some Mac OS 9 programs under Mac OS X. In fact, I'm now ready to begin testing with small applications.
To help me with that, I have installed an old version of Mac OS inside SheepShaver and downloaded the (now free) MPW Tools1, and I built a "hello world" MPW tool (just your classic puts("Hello World!") C program, except compiled for Mac OS 9).
When built, this generates a program with a code section and a data section. I expected that I would be able to just jump to the main symbol of the executable (as specified in the header of the loader section), but I hit a big surprise: the compiler placed the main symbol inside the data section.
Obviously, there's no executable code in the data section.
Going back to the Mac OS Runtime Architectures document (published in 1997, surprisingly still up on Apple's website), I found out that this is totally legal:
Using the Main Symbol as a Data Structure
As mentioned before, the
main symbol does not have to point to a routine, but can point to a
block of data instead. You can use this fact to good effect with
plug-ins, where the block of data referenced by the main symbol can
contain essential information about the plug-in. Using the main symbol
in this fashion has several advantages:
The Code Fragment Manager
returns the address of the main symbol when you programmatically
prepare a fragment, so you do not need to call FindSymbol.
You do
not have to reserve and document the specific name of an export for
your plug-in.
However, not having a specific symbol name means that
the plug-in’s purpose is not quite as obvious. A plug-in can store its
name, icon, or information about its symbols in the main symbol data
structure. Storing symbolic information in this fashion eliminates the
need for multiple FindSymbol calls.
My conclusion, therefore, is that MPW tools run as plugins inside the MPW shell, and that the executable's main symbol points to some data structure that should tell it how to start.
But that still doesn't help me figure out what's in that data structure, and just looking at its hex dump has not been very instructive (I have an idea where the compiler put the __start address for this particular program, but that's definitely not enough to make a generic MPW shell "replacement"). And obviously, most valuable information sources on this topic seem to have disappeared with Mac OS 9 in 2004.
So, what is the format of the data structure pointed by the main symbol of a MPW tool?
1. Apparently, Apple very recently pulled the plug of the FTP server that I got the MPW Tools from, so it probably is not available anymore; though a google search for "MPW_GM.img.bin" does find some alternatives).
As it turns out, it's not too complicated. That "data structure" is simply a transition vector.
I didn't realize it right away because of bugs in my implementation of the relocation virtual machine that made these two pointers look like garbage.
Transition vectors are structures that contain (in this order) an entry point (4 bytes) and a "table of contents" offset (4 bytes). This offset should be loaded into register r2 before executing the code pointed to by the entry point.
(The Mac OS Classic runtime only uses the first 8 bytes of a transition vector, but they can technically be of any size. The address of the transition vector is always passed in r12 so the callee may access any additional information it would need.)