I'm writing a small library that takes a FILE * pointer as input.
If I immediately check this FILE * pointer and find it leads to a segfault, is it more correct to handle the signal, set errno, and exit gracefully; or to do nothing and use the caller's installed signal handler, if he has one?
The prevailing wisdom seems to be "libraries should never cause a crash." But my thinking is that, since this particular signal is certainly the caller's fault, then I shouldn't attempt to hide that information from him. He may have his own handler installed to react to the problem in his own way. The same information CAN be retrieved with errno, but the default disposition for SIGSEGV was set for a good reason, and passing the signal up respects this philosophy by either forcing the caller to be handle his errors, or by crashing and protecting him from further damage.
Would you agree with this analysis, or do you see some compelling reason to handle SIGSEGV in this situation?
Taking over handlers is not library business, I'd say it's somewhat offensive of them unless explicitly asked for. To minimize crashes library may validate their input to some certain extent. Beyond that: garbage in — garbage out.
The prevailing wisdom seems to be "libraries should never cause a crash."
I don't know where you got that from - if they pass an invalid pointer, you should crash. Any library will.
I would consider it reasonable to check for the special case of a NULL pointer. But beyond that, if they pass junk, they violated the function's contract and they get a crash.
This is a subjective question, and possibly not fit for SO, but I will present my opinion:
Think about it this way: If you have a function that takes a nul-terminated char * string and is documented as such, and the caller passes a string without the nul terminator, should you catch the signal and slap the caller on the wrist? Or should you let it crash and make the bad programmer using your API fix his/her code?
If your code takes a FILE * pointer, and your documentation says "pass any open FILE *", and they pass a closed or invalidated FILE * object, they've broken the contract. Checking for this case would slow down the code of people who properly use your library to accommodate people who don't, whereas letting it crash will keep the code as fast as possible for the people who read the documentation and write good code.
Do you expect someone who passes an invalid FILE * pointer to check for and correctly handle an error? Or are they more likely to blindly carry on, causing another crash later, in which case handling this crash may just disguise the error?
Kernels shouldn't crash if you feed them a bad pointer, but libraries probably should. That doesn't mean you should do no error checking; a good program dies immediately in the face of unreasonably bad data. I'd much rather a library call bail with assert(f != NULL) than to just trundle on and eventually dereference the NULL pointer.
Sorry, but people who say a library should crash are just being lazy (perhaps in consideration time, as well as development efforts). Libraries are collections of functions. Library code should not "just crash" any more than other functions in your software should "just crash".
Granted, libraries may have some issues around how to pass errors across the API boundary, if multiple languages or (relatively) exotic language features like exceptions would normally be involved, but there's nothing TOO special about that. Really, it's just part of the burden of writing libraries, as opposed to in-application code.
Except where you really can't justify the overhead, every interface between systems should implement sanity checking, or better, design by contract, to prevent security issues, as well as bugs.
There are a number of ways to handle this, What you should probably do, in order of preference, is one of:
Use a language that supports exceptions (or better, design by contract) within libraries, and throw an exception on or allow the contract to fail.
Provide an error handling signal/slot or hook/callback mechanism, and call any registered handlers. Require that, when your library is initialised, at least one error handler is registered.
Support returning some error code in every function that could possibly fail, for any reason. But this is the old, relatively insane way of doing things from C (as opposed to C++) days.
Set some global "an error has occurred flag", and allow clearing that flag before calls. This is also old, and completely insane, mostly because it moves error status maintence burden to the caller, AND is unsafe when it comes to threading.
Related
Suppose I have the following line in my code:
struct info *pinfo = malloc(sizeof(struct info));
Usually there is another line of code like this one:
if (!pinfo)
<handle this error>
But does it really worth it? especially if the object is so small that the code generated to check it might need more memory than the object itself.
It's true that running out of memory is rare, especially for little test programs that are only allocating tens of bytes of memory, especially on modern systems that have many gigabytes of memory available.
Yet malloc failures are very common, especially for little test programs.
malloc can fail for two reasons:
There's not enough memory to allocate.
malloc detects that the memory-allocation heap is messed up, perhaps because you did something wrong with one of your previous memory allocations.
Now, it turns out that #2 happens all the time.
And, it turns out that #1 is pretty common, too, although not because there's not enough memory to satisfy the allocation the programmer meant to do, but because the programmer accidentally passed a preposterously huge number to malloc, accidentally asking for more memory than there is in the known universe.
So, yes, it turns out that checking for malloc failure is a really good idea, even though it seems like malloc "can't fail".
The other thing to think about is, what if you take the shortcut and don't check for malloc failure? If you sail along and use the null pointer that malloc gave you instead, that'll cause your program to immediately crash, and that'll alert you to your problem just as well as an "out of memory" message would have, without your having to wear your fingers to the bone typing if(!pinfo) and fprintf(stderr, "out of memory\n"), right?
Well, no.
Depending on what your program accidentally does with the null pointer, it's possible it won't crash right away. Anyway, the crash you get, with a message like "Segmentation violation - core dumped" doesn't tell you much, doesn't tell you where your problem is. You can get segmentation violations for all sorts of reasons (especially in little test programs, especially if you're a beginner not quite sure what you're doing). You can spend hours in a futile effort to figure out why your program is crashing, without realizing it's because malloc is returning a null pointer. So, definitely, you should always check for malloc failure, even in the tiniest test programs.
Deciding which errors to test for, versus those that "can't happen" or for whatever reason aren't worth catching, is a hard problem in general. It can take a fair amount of experience to know what is and isn't worth checking for. But, truly, anybody who's programmed in C for very long can tell you emphatically: malloc failure is definitely worth checking for.
If your program is calling malloc all over the place, checking each and every call can be a real nuisance. So a popular strategy is to use a malloc wrapper:
void *my_malloc(size_t n)
{
void *ret = malloc(n);
if(ret == NULL) {
fprintf(stderr, "malloc failed (%s)\n", strerror(errno));
exit(1);
}
return ret;
}
There are three ways of thinking about this function:
Whenever you have some processing that you're doing repetitively, all over the place (in this case, checking for malloc failure), see if you can move it off to (centralize it in) a single function, like this.
Unlike malloc, my_malloc can't fail. It never returns a null pointer. It's almost magic. You can call it whenever and wherever you want, and you never have to check its return value. It lets you pretend that you never have to worry about running out of memory (which was sort of the goal all along).
Like any magical result, my_malloc's benefit — that it never seems to fail — comes at a price. If the underlying malloc fails, my_malloc summarily exits (since it can't return in that case), meaning that the rest of your program doesn't get a chance to clean up. If the program were, say, a text editor, and whenever it had a little error it printed "out of memory" and then basically threw away the file the user had been editing for the last hour, the user might not be too pleased. So you can't use the simple my_malloc trick in production programs that might lose data. But it's a huge convenience for programs that don't have to worry about that sort of thing.
If malloc fails then chances are the system is out of memory or it's something else your program can't handle. It should abort immediately and at most log some diagnostics. Not handling NULL from malloc will make you end up in undefined behavior land. One might argue that having to abort because of a failure of malloc is already catastrophic but just letting it exhibit UB falls under a worse category.
But what if the malloc fails? You will dereference the NULL pointer, which is UB (undefined behaviour) and your program will (probably) fail!
Sometimes code which checks the correctness of the data is longer than the code which does something with it :).
This is very simply, if you won't check for NULL you might end up with runtime error. Checking for NULL will help you to avoid program from unexpected crash and gracefully handle the error case.
If you just want to quickly test some algorithm, then fine, but know it can fail. For example run it in the debugger.
When you include it in your Real World Program, then add all the error checking and handling needed.
I'm making a data structures and algorithms library in C for learning purposes (so this doesn't necessarily have to be bullet-proof), and I'm wondering how void functions should handle errors on preconditions. If I have a function for destroying a list as follows:
void List_destroy(List* list) {
/*
...
free()'ing pointers in the list. Nothing to return.
...
*/
}
Which has a precondition that list != NULL, otherwise the function will blow up in the caller's face with a segfault.
So as far as I can tell I have a few options: one, I throw in an assert() statement to check the precondition, but that means the function would still blow up in the caller's face (which, as far as I have been told, is a big no-no when it comes to libraries), but at least I could provide an error message; or two, I check the precondition, and if it fails I jump to an error block and just return;, silently chugging along, but then the caller doesn't know the List* was NULL.
Neither of these options seem particularly appealing. Moreover, implementing a return value for a simple destroy() function seems like it should be unnecessary.
EDIT: Thank you everyone. I settled on implementing (in all my basic list functions, actually) consistent behavior for NULL List* pointers being passed to the functions. All the functions jump to an error block and exit(1) as well as report an error message to stderr along the lines of "Cannot destroy NULL list." (or push, or pop, or whatever). I reasoned that there's really no sensible reason why a caller should be passing NULL List* pointers anyway, and if they didn't know they were then by all means I should probably let them know.
Destructors (in the abstract sense, not the C++ sense) should indeed never fail, no matter what. Consistent with this, free is specified to return without doing anything if passed a null pointer. Therefore, I would consider it reasonable for your List_destroy to do the same.
However, a prompt crash would also be reasonable, because in general the expectation is that C library functions crash when handed invalid pointers. If you take this option, you should crash by going ahead and dereferencing the pointer and letting the kernel fire a SIGSEGV, not by assert, because assert has a different crash signature.
Absolutely do not change the function signature so that it can potentially return a failure code. That is the mistake made by the authors of close() for which we are still paying 40 years later.
Generally, you have several options if a constraint of one of your functions is violated:
Do nothing, successfully
Return some value indicating failure (or set something pointed-to by an argument to some error code)
Crash randomly (i.e. introduce undefined behaviour)
Crash reliably (i.e. use assert or call abort or exit or the like)
Where (but this is my personal opinion) this is a good rule of thumb:
the first option is the right choice if you think it's OK to not obey the constraints (i.e. they aren't real constraints), a good example for this is free.
the second option is the right choice, if the caller can't know in advance if the call will succeed; a good example is fopen.
the third and fourth option are a good choice if the former two don't apply. A good example is memcpy. I prefer the use of assert (one of the fourth options) because it enables both: Crashing reliably if someone is unwilling to read your documentation and introduce undefined behaviour for people who do read it (they will prevent that by obeying your constraints), depending on whether they compile with NDEBUG defined or not. Dereferencing a pointer argument can serve as an assert, because it will make your program crash (which is the right thing, people not reading your documentation should crash as early as possible) if these people pass an invalid pointer.
So, in your case, I would make it similar to free and would succeed without doing anything.
HTH
If you wish not to return any value from function, then it is good idea to have one more argument for errCode.
void List_destroy(List* list, int* ErrCode) {
*ErrCode = ...
}
Edit:
Changed & to * as question is tagged for C.
I would say that simply returning in case the list is NULL would make sense at this would indicate that list is empty(not an error condition). If list is an invalid pointer, you cant detect that and let kernel handle it for you by giving a seg fault and let programmer fix it.
In a codebase I reviewed, I found the following idiom.
void notify(struct actor_t act) {
write(act.pipe, "M", 1);
}
// thread A sending data to thread B
void send(byte *data) {
global.data = data;
notify(threadB);
}
// in thread B event loop
read(this.sock, &cmd, 1);
switch (cmd) {
case 'M': use_data(global.data);break;
...
}
"Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail".
The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "Listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had a bug with this idiom".
"But, it says in the book..."
"Quiet!", he hushed me promptly, "Maybe theoretically, it's not guaranteed, but in practice, the fact you used a function call is effectively a memory barrier. The compiler will not reorder the instruction global.data = data, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assured, we have ample real world problems to worry about. We don't need to invest extra effort in bogus theoretical problems.
"Rest assured my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non-problems."
Is he correct? Is that really a non-issue in practice (say x86, x64 and ARM)?
It's against everything I learned, but he does have a long beard and a really smart looks!
Extra points if you can show me a piece of code proving him wrong!
Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.
If your shared data variable is declared as volatile, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile, nor what it was meant for.
(If I had any votes left, I'd +1 your question for the narration.)
In practice, a function call is a compiler barrier, meaning that the compiler will not move global memory accesses past the call. A caveat to this is functions which the compiler knows something about, e.g. builtins, inlined functions (keep in mind IPO!) etc.
So a processor memory barrier (in addition to a compiler barrier) is in theory needed to make this work. However, since you're calling read and write which are syscalls that change the global state, I'm quite sure that the kernel issues memory barriers somewhere in the implementation of those. There is no such guarantee though, so in theory you need the barriers.
The basic rule is: the compiler must make the global state appear to be exactly as you coded it, but if it can prove that a given function doesn't use global variables then it can implement the algorithm any way it chooses.
The upshot is that traditional compilers always treated functions in another compilation unit as a memory barrier because they couldn't see inside those functions. Increasingly, modern compilers are growing "whole program" or "link time" optimization strategies which break down these barriers and will cause poorly written code to fail, even though it's been working fine for years.
If the function in question is in a shared library then it won't be able to see inside it, but if the function is one defined by the C standard then it doesn't need to -- it already knows what the function does -- so you have to be careful of those also. Note that a compiler will not recognise a kernel call for what it is, but the very act of inserting something that the compiler can't recognise (inline assembler, or a function call to an assembler file) will create a memory barrier in itself.
In your case, notify will either be a black box the compiler can't see inside (a library function) or else it will contain a recognisable memory barrier, so you are most likely safe.
In practice, you have to write very bad code to fall over this.
In practice, he's correct and a memory barrier is implied in this specific case.
But the point is that if its presence is "debatable", the code is already too complex and unclear.
Really guys, use a mutex or other proper constructs. It's the only safe way to deal with threads and to write maintainable code.
And maybe you'll see other errors, like that the code is unpredictable if send() is called more than one time.
Some time ago I downloaded a sourcecode from the Internet. There were several malloc calls, and after that there was no check for NULL. As far as I know you need to check for NULL after calling malloc.
Is there a good reason for somebody not check for NULL after calling malloc? Am I missing something?
As Jens Gustedt mentioned in a comment, by the time malloc() returns an error your program is likely to be in a heap of trouble already. Does it make sense to put in a bunch of error handling code to handle the situation, when the program is likely not going to be able to do much of anything anyway? For many programs the answer might be 'no', for others it might be very important to do something appropriate.
You can try allocating your memory through a simple 'malloc-or-die' wrapper function that guarantees that the allocation succeeds or the program will terminate:
void* m_malloc(size_t size)
{
void* p;
// make sure a size request of `0` doesn't trigger
// an error situation needlessly
if (size == 0) size = 1;
p = malloc(size);
if (!p) {
// attempt to log the error or whatever
abort();
}
return p;
}
One problem that you then run into is that there's not much you can reliably do except maybe terminate the program. Even logging the problem is likely to require some memory allocation, so the logging facility will probably have its own problems (unless your allocation failure is due to trying to allocate an unreasonably large block of memory).
You might try to solve that issue by allocating a 'fail-safe' block early in your program that can be freed when you need to log the problem (I think there are quite a few programs that use this strategy). But how much work you are willing to put into this kind of error handling depends on your specific needs. If your program needs to ensure that something of significant complexity is done when malloc() returns an error, you'll need to have corresponding safeguards to make sure you can do those things in a very low-memory situation. Generally this means additional complexity, and it may not always be worth the effort.
People don't check because they're lazy, it makes their code uglier, and they don't want to figure out how to recover from errors everywhere.
I've heard a few programmers say, "If I can't malloc a block the system is going to crash soon anyway because VM is full, so why should I bother checking?"
I disagree. You should check for errors, even if it means just logging the error and calling exit() or throwing an exception. While we were trending towards systems with huge disks and always-on paged memory, the industry has flipped and now we have smartphones and tablets with limited RAM and no on-demand paging. Plus even on the desktop our datasets have grown so much that sometimes malloc will fail.
If you don't want to add extra lines of code everywhere, just write your own malloc replacement that calls malloc and checks for errors and use it instead of malloc.
They just don't care about unexpected crashes!
When you do malloc, it's very likely you are going to store something immediately. So if you don't check for NULL, then program may crash subsequently when trying to store something there.
This is unlikely in small programs where malloc hardly fails when requested for small amount of memory. So the malloc doesn't return NULL.
But it's usually good to practice the NULL check for malloc even in small programs, in my opinion.
If you need more memory and malloc can not give you more can you do anything about it?
I guess exit gracefully.
But if you exit, I guess they think it doesn't really matter how you exit (might as well crash and avoid, the what they think as "overhead" for checks for null).
Perhaps the functionality was such that they didn't have any need for cleanup code?
I don't agree though. You should check for NULL on malloc's return
This is not something most people would probably use, but it just came to mind and was bugging me.
Is it possible to have some machine code in say, a c-string, and then cast its address to a function pointer and then use it to run that machine code?
In theory you can, per Carl Norum. This is called "self-modifying code."
In practice what will usually stop you is the operating system. Most of the major modern operating systems are designed to make a distinction between "readable", "readwriteable", and "executable" memory. When this kind of OS kernel loads a program, it puts the code into a special "executable" page which is marked read-only, so that a user application cannot modify it; at the same time, trying to GOTO an address that is not in an "executable" page will also cause a fault exception. This is for security purposes, because many kinds of malware and viruses and other hacks depend upon making the program jump into modified memory. For example, a hacker might feed an app data that causes some function to write malicious code into the stack, and then run it.
But at heart, what the operating system itself does to load a program is exactly what you describe -- it loads code into memory, flags the memory as executable, and jumps into it.
In the embedded hardware world, there may not be an OS to get in your way, and so some platforms use this pretty regularly. On the PlayStation 2 I used to do this all the time -- if there was some code that was specific to, say, the desert level, and used nowhere else, I wouldn't keep it in memory all the time -- instead I'd load it along with the desert level, and fix up my function pointers to the right executable. When the user left the level, I'd dump that code from memory, set all those function pointers to an exception handler, and load the code for the next level into the same space.
Yes, you can absolutely do that. There's nothing stopping you unless your system or compiler prevent it somehow (like you have a Harvard architecture, for example). Just make sure your 'data' is valid instructions before you jump, or you risk disaster.
It is not possible even to attempt doing something like this legally in C language, since there's no legal way to make a function pointer to point to "data". Function pointers in C language can only be initialized/assigned from other function pointers, even if you use an explicit conversion. If you violate this rule, the behavior is undefined.
It is also possible to initialize a function pointer from an integer (by using an explicit conversion) with implementation-defined results (as opposed to undefined results in other cases). However, an attempt to execute the "data" by making a call through a pointer obtained in such a way still leads to undefined behavior.
If you are willing to ignore the fact that the behavior is undefined, then the actual manifestations of that undefined behavior will look differently on different platforms. On some platform it might even appear to "work".
One could also imagine a superoptimzer doing this to test small assembler sequences against the specifications of the function it optimizes.