Related
The valgrind quickstart page mentions:
Try to make your program so clean that Memcheck reports no errors. Once you achieve this state, it is much easier to see when changes to the program cause Memcheck to report new errors. Experience from several years of Memcheck use shows that it is possible to make even huge programs run Memcheck-clean. For example, large parts of KDE, OpenOffice.org and Firefox are Memcheck-clean, or very close to it.
This block left me a little perplexed. Seeing as the way the C standard works, I would assume most (if not all) practices that produce memcheck errors would invoke undefined behavior on the program, and should therefore be avoided like the plague.
However, the last sentence in the quoted block implies there are in fact "famous" programs that run in production with memcheck errors. After reading this, I thought I'd put this to test and I tried running VLC with valgrind, getting a bunch of memcheck errors right after starting it.
This lead me to this question: are there ever good reasons not to eliminate such errors from a program in production? Is there ever anything to be gained from releasing a program that contains such errors and, if so, how do the developers keep it safe despite the fact that a program that contains such errors can, to my knowledge, act unpredictably and there is no way to make assumptions about its behavior in general? If so, can you provide real-world examples of cases in which the program is better off running with those errors than without?
There has been a case where fixing the errors reported by Valgrind actually led to security flaws, see e.g. https://research.swtch.com/openssl . Intention of the use of uninitialised memory was to increase entropy by having some random bytes, the fix led to more predictable random numbers, indeed weakening security.
In case of VLC, feel free to investigate ;-)
One instance is when you are deliberately writing non-portable code to take advantage of system-specific optimizations. Your code might be undefined behavior with respect to the C standard, but you happen to know that your target implementation does define the behavior in a way that you want.
A famous example is optimized strlen implementations such as those discussed at vectorized strlen getting away with reading unallocated memory. You can design such algorithms more efficiently if they are allowed to potentially read past the terminating null byte of the string. This is blatant UB for standard C, since this might be past the end of the array containing the string. But on a typical real-life machine (say for instance x86 Linux), you know what will actually happen: if the read touches an unmapped page, you will get SIGSEGV, and otherwise the read will succeed and give you whatever bytes happen to be in that region of memory. So if your algorithm checks alignment to avoid crossing page boundaries unnecessarily, it may still be perfectly safe for x86 Linux. (Of course you should use appropriate ifdef's to ensure that such code isn't used on systems where you can't guarantee its safety.)
Another instance, more relevant to memcheck, might be if you happen to know that your system's malloc implementation always rounds up allocation requests to, say, multiples of 32 bytes. If you have allocated a buffer with malloc(33) but now find that you need 20 more bytes, you could save yourself the overhead of realloc() because you know that you were actually given 64 bytes to play with.
memcheck is not perfect. Following are some problems and possible reasons for higher false positive rate:
memcheck's ability and shadow bit propagation related rules to decrease overhead - but it affects false positive rate
imprecise representation of flag registers
higher optimization level
From memcheck paper (published in usenix 2005) - but things might definitely have changed since then.
A system such as Memcheck cannot simultaneously be free of false
negatives and false positives, since that would be equivalent to
solving the Halting Problem. Our design attempts to almost completely
avoid false negatives and to minimise false positives. Experience in
practice shows this to be mostly successful. Even so, user feedback
over the past two years reveals an interesting fact: many users have
an (often unstated) expectation that Memcheck should not report any
false positives at all, no matter how strange the code being checked
is.
We believe this to be unrealistic. A better expectation is to accept
that false positives are rare but inevitable. Therefore it will
occasionally necessary to add dummy initialisations to code to make
Memcheck be quiet. This may lead to code which is slightly more
conservative than it strictly needs to be, but at least it gives a
stronger assurance that it really doesn't make use of any undefined
values.
A worthy aim is to achieve Memcheck-cleanness, so that new errors are
immediately apparent. This is no different from fixing source code to
remove all compiler warnings, even ones which are obviously harmless.
Many large programs now do run Memcheck-clean, or very nearly so. In
the authors' personal experience, recent Mozilla releases come close
to that, as do cleaned-up versions of the OpenOffice.org-680
development branch, and much of the KDE desktop environment. So this
is an achievable goal.
Finally, we would observe that the most effective use of Memcheck
comes not only from ad-hoc debugging, but also when routinely used on
applications running their automatic regression test suites. Such
suites tend to exercise dark corners of implementations, thereby
increasing their Memcheck-tested code coverage.
Here's a section on avoiding false positives:
Memcheck has a very low false positive rate. However, a few hand-coded assembly sequences, and a few very
rare compiler-generated idioms can cause false positives.
You can find the origin of the error using --track-origins=yes option, you may be able to see what's going on.
If a piece of code is running in a context that would never cause uninitialized storage to contain confidential information that it must not leak, some algorithms may benefit from a guarantee that reading uninitialized storage will have no side effects beyond yielding likely-meaningless values. For example, if it's necessary to quickly set up a hash map, which will often have only a few items placed in it before it's torn down, but might sometimes have many items, a useful approach is to have an array which holds data items and has values in the order they were added, along with a hash table that maps hash values to storage slot numbers. If the number of items stored into the table is N, an item's hash is H, and attempting to access hashTable[H] is guaranteed yield a value I that will either be the number stored there, if any, or else an arbitrary number, then one of three things will happen:
I might be greater than or equal to N. In that case, the table does not contain a value with a hash of H.
I might be less than N, but items[I].hash != H. In that case, the table does not contain a value with a hash of H.
I might be less than N, and items[I].hash == H. In that case, the table rather obviously contains at least one value (the one in slot I) with a hash of H.
Note that if the uninitialized hash table could contain confidential data, an adversary who can trigger hashing requests may be able to use timing attacks to gain some information about its contents. The only situations where the value read from a hash table slot could affect any aspect of function behavior other than execution time, however, would be those in which the hash table slot had been written.
To put things another way, the hash table would contain a mixture of initialized entries that would need to be read correctly, and meaningless uninitialized entries whose contents could not observably affect program behavior, but the code might not be able to determine whether the contents of an entry might affect program behavior until after it had read it.
For program to read uninitialized data when it's expecting to read initialized data would be a bug, and since most places where a program would attempt to read data would be expecting initialized data, most attempts to read uninitialized data would be bugs. If a language included a construct to explicitly request that an implementation either read data if it had been written, and otherwise or yield some arbitrary value with no side effects, it would make sense to regard attempts to read uninitialized data without such a construct as a defect. In a language without such a construct, however, the only way to avoid warnings about reading uninitialized data would be to forego some useful algorithms that could otherwise benefit from the aforementioned guarantee.
My experience of posts concerning Valgrind on Stack Overflow is that there is often either or both a misplaced sense of overconfidence or a lack of understanding of the what the compiler and Valgrind are doing [neither of these observations is aimed at the OP]. Ignoring errors for either of these reasons is a recipe for disaster.
Memcheck false positives are quite rare. I've used Valgrind for many years and I can count the types of false positives that I've encountered on one hand. That said, there is an ongoing battle by the Valgrind developers and the code that optimising compilers emit. For instance see this link (if anyone is interested, there are plenty other good presentations about Valgrind on the FOSDEM web site). In general, the problem is that optimizing compilers can make changes so long as there is no observable difference in the behaviour. Valgrind has baked in assumptions about how executables work, and if a new compiler optimization steps outside of those assumptions false positives can result.
False negatives usually mean that Valgrind has not correctly encapsulated some behaviour. Usually this will be a bug in Valgrind.
What Valgrind won't be able to tell you is how serious the error is. For instance, you may have a printf that is passed a pointer to character array that contains some uninitialized bytes but which is always nul terminated. Valgrind will detect an error, and at runtime you might get some random rubbish on the screen, which may be harmless.
One example that I've come across where a fix is probably not worth the effort is the use of the putenv function. If you need to put a dynamically allocated string into the environment then freeing that memory is a pain. You either need to save the pointer somewhere or save a flag that indicates that the env var has been set, and then call some cleanup function before your executable terminates. All that just for a leak of around 10-20 bytes.
My advice is
Aim for zero errors in your code. If you allow large numbers of errors then the only way to tell if you introduce new errors is to use scripts that filter the errors and compare them with some reference state.
Make sure that you understand the errors that Valgrind generates. Fix them if you can.
Use suppression files. Use them sparingly for errors in third party libraries that you cannot fix, harmless errors for which the fix is worse than the error and any false positives.
Use the -s/-v Valgrind options and remove unused suppressions when you can (this will probably require some scripting).
Consider the two following lines of C :
int a[1] = {0};
a[1] = 0;
The second line makes a write access somewhere in the memory where it should not. Sometimes such programs will give a segfault during the execution, and sometimes not, depending on the environment I suppose, and maybe other things.
I wonder if there is a way to force, as much as possible, such programs to segfault (by compiling them in a special way for instance, or execute them in some virtual machine, I don't know).
This is for pedagogic purpose.
According to the C language standard these kinds of accesses are undefined behaviour and the compiler and runtime are not obliged to make them segfault (though they obviously do sometimes).
For pedagogical purposes you can have a look at the address sanitizers in popular compilers like GCC (-fsanitize=address in https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) and Clang (https://clang.llvm.org/docs/AddressSanitizer.html).
In simple terms these options cause the compiler to instrument memory accesses with extra logic to catch out-of-bounds memory accesses and produce a user-visible error (though not exacly a segfault message), allowing users to spot such errors and fix them.
This might be what you are looking for.
Valgrind on Linux, stack guards for most compilers, debug options for you selected runtime (e.g. Application Verifier on Windows), there are plenty of options.
In your example the overflow is on the stack, which will always require the compiler to emit the appropriate guards. For dynamic memory allocations it's either up to the used C/C++ runtime library or a custom wrapper inside your application to catch this.
Tools like valgrind catch the heap based buffer overflow as they happen, as they actually execute the code in a VM.
Compiler assisted options work with canaries which are placed in front and back of the buffer, and which are typically checked again when the buffer is released. Options from the address sanitizer family may also add additional checks to all accesses on fields of a fixed size, but this won't work if raw pointers are involved.
Debug options for the runtime typically only provide a very rough granularity. Often they work by simply placing each allocation in a dedicated page in a non-continous address space. Accessing the gaps in between the pages then is an instant error. However only minor buffer overflows are typically not detected immediately.
Finally there is also static code analysis which all modern compilers support to some extent, which can easily detect at least trivial mistakes like the one in your example.
None of these options is able to catch all possible errors though. The C language gives you plenty of options to achieve undefined behavior which none of these tools can detect.
This question already has answers here:
What is the function of this statement *(long*)0=0;?
(4 answers)
Closed 8 years ago.
I just saw in a code the following line :
#define ERR_FATAL( str, a, b, c ) {while(1) {*(unsigned int *)0 = 0xdeadbeef;} }
I know that 0xdeadbeef means error, but what putting this value mean when it's in address 0 ?
What address 0 represents ?
The address 0x0 is recognized by the compiler as a NULL pointer and is going to be an implementation defined invalid memory address, on some systems is quite literally at the first addressable location in the system memory but on others it's just a placeholder in the code for some generic invalid memory address. From the point of view of the C code we don't know what address that will be, only that it's invalid to access it. Essentially what this code snippet is trying to do is to write to an "illegal" memory address, with the value 0xdeadbeef. The value itself is some hex that spells out "dead beef" hence indicating that the program is dead beef (ie. a problem), if you aren't a native english speaker I can see how this might not be so clear :). The idea is that this will trigger a segmentation fault or similar, with the intention of informing you that there is a problem by immediately terminating the program with no cleanup or other operations performed in the interim (the macro name ERR_FATAL hints at that). Most operating systems don't give programs direct access to all the memory in the system and the code presumes that the operating system won't let you directly access memory that's located at address 0x0. Given that you tagged this question with the linux tag this is the behavior you will see (because linux will not allow an access to that memory address). Note that if you are working on something like an embedded system where there's no such guarantee then this could cause a bunch of problems as you might be overwriting something important.
Note that there's going to be better ways out there to report problems than this that don't depend on certain types of undefined behaviors causing certain side effects. Using things like assert is going to likely be a better choice. If you want to terminate the program using abort() is a better choice as it in the standard library and does exactly what you want. See the answer from ComicSansMS for more about why this is preferable.
Putting this value (or any value for that matter) in address 0 is supposed to terminate the program immediately.
A fatal error indicates an error situation so severe that the program cannot safely continue execution without risking further data corruption.
In particular, no pending cleanup operations are to be executed, as they could have an undesired effect. Think of flushing a buffer of corrupted data to the filesystem. Instead, the program is to terminate immediately and allow further examination of the situation via a core dump or an attached debugger.
Note that C already provides a standard library function for this purpose: abort(). Calling abort would be preferable for this purpose for a number of reasons:
Writing to address 0 is not guaranteed to terminate the program. This unnecessarily restricts portability of the code and might have devastating consequences in case the code actually gets recompiled and executed on a platform where writing to address 0 results in an actual memory store operation.
Calling abort() is more understandable to someone reading the code. Your question proves that many developers will not understand what the code in question is supposed to do. While the name of the macro and the value deadbeef give some hints, it is still unnecessarily obscure. Also, note that the name of the macro will not be visible when looking at the disassembled code in a debugger.
Calling abort() signals intent more clearly. This is not only true for the code itself, but also for the observable behavior of the binary. Assuming the operation executes as intended on a Unix machine, you would get a SIGSEGV as a result, which is a signal indicating memory corruption. abort() on the other hand causes SIGABRT which indicates an abnormal program termination. Unless the reason for the fatal error was indeed a memory corruption, throwing SIGSEGV in this case obscures why the program is failing and might be misleading to a developer trying to hunt down the error. This is particularly delicate when you think that the signal might not be caught by a debugger, but by an automated signal handler, which might then invoke unfitting code for error handling.
Therefore, if the sole intent of the macro is to signal a fatal error (as the name suggests), calling abort would be a better implementation.
Dereferencing an invalid pointer (which is what the above code is trying to do) results in Undefined Behavior. As such the system could do anything - it could crash, it could do nothing, or demons can fly out of your nose. This page is an excellent page on what every C programmer should know about Undefined Behavior.
Thus the above code is the wrong way to cause a crash. As a comenter pointed out, it would be better to call abort().
I'm writing a small library that takes a FILE * pointer as input.
If I immediately check this FILE * pointer and find it leads to a segfault, is it more correct to handle the signal, set errno, and exit gracefully; or to do nothing and use the caller's installed signal handler, if he has one?
The prevailing wisdom seems to be "libraries should never cause a crash." But my thinking is that, since this particular signal is certainly the caller's fault, then I shouldn't attempt to hide that information from him. He may have his own handler installed to react to the problem in his own way. The same information CAN be retrieved with errno, but the default disposition for SIGSEGV was set for a good reason, and passing the signal up respects this philosophy by either forcing the caller to be handle his errors, or by crashing and protecting him from further damage.
Would you agree with this analysis, or do you see some compelling reason to handle SIGSEGV in this situation?
Taking over handlers is not library business, I'd say it's somewhat offensive of them unless explicitly asked for. To minimize crashes library may validate their input to some certain extent. Beyond that: garbage in — garbage out.
The prevailing wisdom seems to be "libraries should never cause a crash."
I don't know where you got that from - if they pass an invalid pointer, you should crash. Any library will.
I would consider it reasonable to check for the special case of a NULL pointer. But beyond that, if they pass junk, they violated the function's contract and they get a crash.
This is a subjective question, and possibly not fit for SO, but I will present my opinion:
Think about it this way: If you have a function that takes a nul-terminated char * string and is documented as such, and the caller passes a string without the nul terminator, should you catch the signal and slap the caller on the wrist? Or should you let it crash and make the bad programmer using your API fix his/her code?
If your code takes a FILE * pointer, and your documentation says "pass any open FILE *", and they pass a closed or invalidated FILE * object, they've broken the contract. Checking for this case would slow down the code of people who properly use your library to accommodate people who don't, whereas letting it crash will keep the code as fast as possible for the people who read the documentation and write good code.
Do you expect someone who passes an invalid FILE * pointer to check for and correctly handle an error? Or are they more likely to blindly carry on, causing another crash later, in which case handling this crash may just disguise the error?
Kernels shouldn't crash if you feed them a bad pointer, but libraries probably should. That doesn't mean you should do no error checking; a good program dies immediately in the face of unreasonably bad data. I'd much rather a library call bail with assert(f != NULL) than to just trundle on and eventually dereference the NULL pointer.
Sorry, but people who say a library should crash are just being lazy (perhaps in consideration time, as well as development efforts). Libraries are collections of functions. Library code should not "just crash" any more than other functions in your software should "just crash".
Granted, libraries may have some issues around how to pass errors across the API boundary, if multiple languages or (relatively) exotic language features like exceptions would normally be involved, but there's nothing TOO special about that. Really, it's just part of the burden of writing libraries, as opposed to in-application code.
Except where you really can't justify the overhead, every interface between systems should implement sanity checking, or better, design by contract, to prevent security issues, as well as bugs.
There are a number of ways to handle this, What you should probably do, in order of preference, is one of:
Use a language that supports exceptions (or better, design by contract) within libraries, and throw an exception on or allow the contract to fail.
Provide an error handling signal/slot or hook/callback mechanism, and call any registered handlers. Require that, when your library is initialised, at least one error handler is registered.
Support returning some error code in every function that could possibly fail, for any reason. But this is the old, relatively insane way of doing things from C (as opposed to C++) days.
Set some global "an error has occurred flag", and allow clearing that flag before calls. This is also old, and completely insane, mostly because it moves error status maintence burden to the caller, AND is unsafe when it comes to threading.
This is not something most people would probably use, but it just came to mind and was bugging me.
Is it possible to have some machine code in say, a c-string, and then cast its address to a function pointer and then use it to run that machine code?
In theory you can, per Carl Norum. This is called "self-modifying code."
In practice what will usually stop you is the operating system. Most of the major modern operating systems are designed to make a distinction between "readable", "readwriteable", and "executable" memory. When this kind of OS kernel loads a program, it puts the code into a special "executable" page which is marked read-only, so that a user application cannot modify it; at the same time, trying to GOTO an address that is not in an "executable" page will also cause a fault exception. This is for security purposes, because many kinds of malware and viruses and other hacks depend upon making the program jump into modified memory. For example, a hacker might feed an app data that causes some function to write malicious code into the stack, and then run it.
But at heart, what the operating system itself does to load a program is exactly what you describe -- it loads code into memory, flags the memory as executable, and jumps into it.
In the embedded hardware world, there may not be an OS to get in your way, and so some platforms use this pretty regularly. On the PlayStation 2 I used to do this all the time -- if there was some code that was specific to, say, the desert level, and used nowhere else, I wouldn't keep it in memory all the time -- instead I'd load it along with the desert level, and fix up my function pointers to the right executable. When the user left the level, I'd dump that code from memory, set all those function pointers to an exception handler, and load the code for the next level into the same space.
Yes, you can absolutely do that. There's nothing stopping you unless your system or compiler prevent it somehow (like you have a Harvard architecture, for example). Just make sure your 'data' is valid instructions before you jump, or you risk disaster.
It is not possible even to attempt doing something like this legally in C language, since there's no legal way to make a function pointer to point to "data". Function pointers in C language can only be initialized/assigned from other function pointers, even if you use an explicit conversion. If you violate this rule, the behavior is undefined.
It is also possible to initialize a function pointer from an integer (by using an explicit conversion) with implementation-defined results (as opposed to undefined results in other cases). However, an attempt to execute the "data" by making a call through a pointer obtained in such a way still leads to undefined behavior.
If you are willing to ignore the fact that the behavior is undefined, then the actual manifestations of that undefined behavior will look differently on different platforms. On some platform it might even appear to "work".
One could also imagine a superoptimzer doing this to test small assembler sequences against the specifications of the function it optimizes.