It's found in almost every language and I've used it most of time.
I don't know it's internal, and wonder how does it really works.
How does it work at native levels at runtime of any language ?
For ex: If a stackoverflow or dividebyzero occurs inside try, then how does catch prevents falling of program ?
The statement "at the native level at runtime of any language," is an oxymoron. All native parts of exception handling are platform, not language, dependent. Some parts of exception handling are even hardware dependent (Divide by zero is always a hardware exception, for instance.)
In the specific case of divide by zero on .NET, on Windows, on x86, it goes something like this:
Your application tries to divide by zero.
The CPU saves some application state and executes code located the "Divide Error" address in the trap table (which so happens to be the zeroth element of the trap table.)
The trap handler code (which is part of the Windows Kernel) triggers mechanisms to eventually (in the executive) raise an SEH exception for divide by zero which will be propagated into the Object Manager, then into the .NET runtime.
The .NET runtime code in mscoree.dll gets the divide by zero as an HRESULT COR_E_DIVIDEBYZERO from a COM object.
.NET converts the HRESULT into a System.DivideByZeroException.
Your code sees the exception as a glorified long jump to the "closest" enclosing catch block, or finally block.
You either handle the exception, or it gets propagated out of your code and then your application crashes.
In general, you can think of exceptions as long jumps that carry a pointer to some Thread-local state information (the exception). The target of the long jump is usually known at compile time.
Not every language has exception handling built in, either. C, for instance, does not have structured exception handling.
When an exception is thrown and control passes from a try block to a handler, the run time calls destructors for all automatic objects constructed since the beginning of the try block. This process is called stack unwinding.
Related
I'm working on a framework on C language, for it I want to implement exceptions, for it I'm using longjump with setjump, but on x64 machines longjump still outputs an integer.
I've created a class (struct with vptr essentially), which represents exception, but to throw it out in code I need to throw a pointer to this structure. The pointer has an unsigned long long value (qword) for x64 machines and unsigned int (dword) for x86, so I shall require only qword to be in output to handle the error.
Are there implementations of longjmp and setjmp, which can output qword?
Or maybe I could write my own longjump, but for it the original source code is required.
You can enclose your jmp_buf-typed variable in a larger structure, possibly larger by just sizeof(void*). Then just before calling longjmp() you can store the pointer in that extra space. There's no need to try to squeeze a pointer into an int.
Example:
#include <stdio.h>
#include <setjmp.h>
struct jmp_buf_struct
{
jmp_buf jb;
void* exc_ptr;
};
void may_throw(jmp_buf jb)
{
struct jmp_buf_struct* jbs_ptr = (struct jmp_buf_struct*)jb;
jbs_ptr->exc_ptr = "Exception message!";
longjmp(jb, 1);
}
int main()
{
struct jmp_buf_struct jbs;
if (setjmp(jbs.jb))
{
printf("Threw %p = \"%s\".", jbs.exc_ptr, (char*)jbs.exc_ptr);
}
else
{
may_throw(jbs.jb);
puts("Didn't throw.");
}
return 0;
}
Output:
Threw 0x55638ebc78c4 = "Exception message!".
If you want to be portable, then you can use array indexes, to an array where you store all the 64bit pointers (or simply an array of pointers to structures with pages of information about what to do on some exception).
How do you populate such an array is another question. Of course you don't need to populate the array with all the instances that can become an exception, only with the ones you have try-ed and are able to catch. But probably you will need more than just a pointer (as you have to deal with the runtime case in which you have the same exception, catched in multiple active places in your stack.)
Once you solve the above problem, probably you can even use a short int for the thing, once you have understood the nature of the problem you need to solve.
Based on reading the comments, I see you comment that a global variable is not suitable because of multithreading concerns. First, you can have it global, in the context of the thread (as e.g. the errno variable), as that's the reason of using a void * to call the routine the thread executes, and returns back once finished. You can have it there, private to thread global data.
On a second point, if you want to manage such strange things from the point of view of C, as manipulating the stack in weird ways, the functions mentioned do (I don't believe you know completely how the internals of setjmp()/longjmp() work.) I can tell you that the setjmp()/longjmp() api was written a long of time ago (in the range of 50 years now), in the times of old V6 unix code, to cope with unknown unix device drivers error processing ---a very controlled and simple environment---) simply, the use of longjmp() is far more complicated (and strongly discouraged, even by their authors K & R) than switching to a different language (like the suggested C++) that fully supports exceptions in its core (this recommendation is not mine, it has been suggested in the comments to your question)
Third. If you use setjmp() and longjmp() you need to know also that they (both) use the calling thread's stack to mark the pointer and the where to go to store the information. So, you have to control, for example, that if you do a longjmp() in a signal handler, you can severely destroy the stack of the thread executing the signal handler (which is the one that got interrupted by the signal) if it is not the same thread as the one who did the setjmp() call. The reason for this is that the thread interrupted will switch its stack with the one of the thread that did the setjmp() and both threads will begin executing code with the same stack at different points, this jumps back to the time of implementation of both functions (there was only a pdp computer available, no multiple cpus/cores as it is common today, so there was only a stack) You have to be specially careful here, because normally, the thread that generates the exception is the same that the places to catch it, but this can be false for asynchronous traps, like signal processing.
By the way, what you are doing is very interesting, and will allow you to know how a language implements internally complex behaviours like exception processing. I applause you for your courage on trying this kind of things, and don't hesitate, that if you need a mentor in C++ I'll be available for you.
Just don't give up!!
During a discussion today I came across that there are checks in the VxWorks and in LynxOS which tells you that the address you assign for a pointer is from a valid range. This the first time I am hearing about this code like I assign int *i=&variable;.
I should get a warning or error which says that In my application I cannot assign the address value to the integer.
Like while I do a NULL check I am only checking the address 0x00000000. But there can be the case the address might be 0x00000001. Which is also an invalid case if its an unmapped area and might not be accessible. Is any one aware of some thing similar for Linux or can guide how its done in VxWorks or LynxOS.
Any ideas??
The function you seek in VxWorks is called vxMemProbe.
Basically the vxMemProbe libraries insert special exception handling code to catch a page fault or bus error. The vxMemProbe function is used to check if the address is valid for read or write. It also allows you to test if the particular address is accessible with a given data width (8,16,32,64 bits) and alignment.
The underlying mechanism of vxMemProbe is tied to the specific architectures exception handling mechanisms. The vxMemProbe libraries insert code into the exception handlers. When you probe an address that triggers an exception the handler checks to see if vxMemProbe triggered the exception. If so, then the handler restores the state processor prior to the exception and returns execution to where vxMemProbe was called while also returning value via the architectures given calling conventions.
In general you can't do what you want, as explained in Felix Palmen's answer.
I should get a warning or error which says that In my application I cannot assign the address value to the integer.
Statically and reliably detecting all pointer faults is impossible (because it could be proven equivalent to solving the halting problem). BTW you might consider using static program analysis tools like Frama-C.
On Linux, in principle, you might test at runtime if a given address is valid in your virtual address space by e.g. using /proc/, e.g. by parsing the /proc/self/maps pseudo textual file (to understand what I mean try cat /proc/$$/maps in a terminal, then cat /proc/self/maps). See proc(5). In practice I don't recommend doing that often (it probably would be too slow), and of course it is not a builtin function of the compiler (you should code it yourself). BTW, be aware of ASLR.
However, there are tools to help detect (some of) the faulty address uses, in particular valgrind and the address sanitizer facility, read about instrumentation options of GCC and try to compile with -fsanitize=address ...
Don't forget to compile your code with all warnings and debug info, so use gcc -Wall -Wextra -g to compile it.
BTW, if you store in some global pointer the address of some local variable and dereference that pointer after that local variable is in scope, you still have some undefined behavior (even if your code don't crash, because you usually dereference some random address on your call stack) and you should be very scared. UB should be always avoided.
There are several misconceptions here:
From the perspective of the language C, there's only one pointer value that's guaranteed to be invalid, and this is NULL. For other values, it depends on the context. A pointer is valid when it points to an object that is currently alive. (Note that this is trivially true in your int *i = &variable example, as this is only valid syntax when there is a variable accessible from your current scope)
NULL does not necessarily mean a value with all bits zero. This is the most common case, but there can be platforms that use a different bit pattern for the NULL pointer. It's even allowed by the C standard that pointers of different types have different representations for NULL. Still, converting 0 to a pointer type is guaranteed to result in the NULL pointer for this type.
I don't know what exactly you're referring to in VxWorks, but of course Linux checks memory accesses. If a process tries to access an address that's not mapped in the virtual address space, this process is sent a SIGSEGV signal, which causes immediate abnormal program termination (Segmentation fault).
Since, for example, glBufferData can report an GL_OUT_OF_MEMORY error, I expected glTexImage to do so to, but it doesn't.
Presumably it is possible to run out of texture memory, so how do I detect the event?
Any OpenGL function can theoretically result in a GL_OUT_OF_MEMORY error, if as a side effect of some process, memory needs to be allocated but cannot. As stated by GL 4.4 core profile, section 2.3:
The Specification attempts to explicitly describe these implicit error conditions (with the exception of OUT_OF_MEMORY) wherever they apply
So the error descriptions don't have to say that GL_OUT_OF_MEMORY can happen. It always can. Though it is odd that they're inconsistent about it, specifically calling out the possibility in certain cases but not others.
This is not something most people would probably use, but it just came to mind and was bugging me.
Is it possible to have some machine code in say, a c-string, and then cast its address to a function pointer and then use it to run that machine code?
In theory you can, per Carl Norum. This is called "self-modifying code."
In practice what will usually stop you is the operating system. Most of the major modern operating systems are designed to make a distinction between "readable", "readwriteable", and "executable" memory. When this kind of OS kernel loads a program, it puts the code into a special "executable" page which is marked read-only, so that a user application cannot modify it; at the same time, trying to GOTO an address that is not in an "executable" page will also cause a fault exception. This is for security purposes, because many kinds of malware and viruses and other hacks depend upon making the program jump into modified memory. For example, a hacker might feed an app data that causes some function to write malicious code into the stack, and then run it.
But at heart, what the operating system itself does to load a program is exactly what you describe -- it loads code into memory, flags the memory as executable, and jumps into it.
In the embedded hardware world, there may not be an OS to get in your way, and so some platforms use this pretty regularly. On the PlayStation 2 I used to do this all the time -- if there was some code that was specific to, say, the desert level, and used nowhere else, I wouldn't keep it in memory all the time -- instead I'd load it along with the desert level, and fix up my function pointers to the right executable. When the user left the level, I'd dump that code from memory, set all those function pointers to an exception handler, and load the code for the next level into the same space.
Yes, you can absolutely do that. There's nothing stopping you unless your system or compiler prevent it somehow (like you have a Harvard architecture, for example). Just make sure your 'data' is valid instructions before you jump, or you risk disaster.
It is not possible even to attempt doing something like this legally in C language, since there's no legal way to make a function pointer to point to "data". Function pointers in C language can only be initialized/assigned from other function pointers, even if you use an explicit conversion. If you violate this rule, the behavior is undefined.
It is also possible to initialize a function pointer from an integer (by using an explicit conversion) with implementation-defined results (as opposed to undefined results in other cases). However, an attempt to execute the "data" by making a call through a pointer obtained in such a way still leads to undefined behavior.
If you are willing to ignore the fact that the behavior is undefined, then the actual manifestations of that undefined behavior will look differently on different platforms. On some platform it might even appear to "work".
One could also imagine a superoptimzer doing this to test small assembler sequences against the specifications of the function it optimizes.
I've posted a question about validating a pointer's accessibility. The conclusion was either to use IsBadReadPtr to check the pointer, or SEH to catch the exception (and preferably to use neither, and debug the application, but that's not the issue here).
IsBadReadPtr is said to be bad because, among other reasons, it would try to read the pointer, and would catch any exception. It might catch a stack guard page exception, and thus prevent it from reaching the memory manager, which should have enlarged the stack.
If I use SEH and catch only EXCEPTION_ ACCESS_VIOLATION exceptions, would this create the same problem?
Another thing: What are the implications of using SEH?
This article suggests that "the compiler can’t perform flow analysis in code protected by SEH". How about if I call a function inside __try block. Would the compiler not optimize the called function at all?
If I use SEH and catch only EXCEPTION_ ACCESS_VIOLATION exceptions, would this create the same problem?
I think it would. A workaround might be to probe the stack[s] of any thread[s] you know and care about, before you start calling IsBadReadPtr (by "probe the stack" I mean to deliberately touch every memory page in the stack, to ensure every page is pre-allocated).
Would the compiler not optimize the called function at all?
If the function is not inlined, I would expect the compiler to apply the usual optimizations (optimization of the function wouldn't be affected by where the function is called from).