string overflow detection in C - c

We are using DevPartners boundchecker for detecting memory leak issues. It is doing a wonderful job, though it does not find string overflows like the following
char szTest [1] = "";
for (i = 0; i < 100; i ++) {
strcat (szTest, "hi");
}
Question-1: Is their any way, I can make BoundsChecker to detect this?
Question-2: Is their any other tool that can detect such issues?

I tried it in my devpartner (msvc6.6) (devpartner 7.2.0.372)
I confirm your observed behavior.
I get an access violation after about 63 passes of the loop.
What does compuware have to say about the issue?
CppCheck will detect this issue.

One option is to simply ban the use of string functions that don't have information about the destination buffer. A set of macros like the following in a universally included header can be helpful:
#define strcpy strcpy_is_banned_use_strlcpy
#define strcat strcat_is_banned_use_strlcat
#define strncpy strncpy_is_banned_use_strlcpy
#define strncat strncat_is_banned_use_strlcat
#define sprintf sprintf_is_banned_use_snprintf
So any attempted uses of the 'banned' routines will result in a linker error that also tells you what you should use instead. MSVC has done something similar that can be controlled using macros like _CRT_SECURE_NO_DEPRECATE.
The drawback to this technique is that if you have a large set of existing code, it can be a huge chore to get things moved over to using the new, safer routines. It can drive you crazy until you've gotten rid of the functions considered dangerous.

valgrind will detect writing past dynamically allocated data, but I don't think it can do so for automatic arrays like in your example. If you are using strcat, strcpy, etc., you have to make sure that the destination is big enough.
Edit: I was right about valgrind, but there is some hope:
Unfortunately, Memcheck doesn't do bounds checking on static or stack arrays. We'd like to, but it's just not possible to do in a reasonable way that fits with how Memcheck works. Sorry.
However, the experimental tool Ptrcheck can detect errors like this. Run Valgrind with the --tool=exp-ptrcheck option to try it, but beware that it is not as robust as Memcheck.
I haven't used Ptrcheck.

You may find that your compiler can help. For example, in Visual Studio 2008, check the project properties - C/C++ - Code Generation page. Theres a "Buffer Security Check" option.
My guess would be that it reserves a bit of extra memory and writes a known sequence in there. If that sequence gets modified, it assumes a buffer overrun. I'm not sure, though - I remember reading this somewhere, but I don't remember for certain if it was about VC++.

Given that you've tagged this C++, why use a pointer to char at all?
std::stringstream test;
std::fill_n(std::ostream_iterator<std::string>(test), 100, "hi");

If you enable the /RTCs compiler switch, it may help catch problems like this. With this switch on, the test caused an access violation when running the strcat only one time.
Another useful utility that helps with problems like this (more heap-oriented than stack but extremely helpful) is application verifier. It is free and can catch a lot of problems related to heap overflow.

An alternative: our Memory Safety Checker.
I think it will handle this case.

The problem was that by default, the API Validation subsystem is not enabled, and the messages you were interested in come from there.
I can't speak for older versions of BoundsChecker, but version 10.5 has no particular problems with this test. It reports the correct results and BoundsChecker itself does not crash. The test application does, however, because this particular test case completely corrupts the call stack that led to the function where the test code was, and as soon as that function terminated, the application did too.
The results: 100 messages about write overrun to a local variable, and 99 messages about the destination string not being null terminated. Technically, that second message is not right, but BoundsChecker only searches for the null termination within the bounds of the destination string itself, and after the first strcat call, it no longer contains a zero byte within its bounds.
Disclaimer: I work for MicroFocus as a developer working on BoundsChecker.

Related

Why can I use a char pointer without malloc?

I've programmed something similar and I'm wondering why it works...
char* produceAString(void){
char* myString;
while(somethingIsGoingOn){
//fill myString with a random amountof chars
}
return myString;
}
The theory tells me that I should use malloc to allocate space, when I'm using pointers. But in this case I don't know how much space I need for myString, therefore I just skipped it.
But why does this work? Is it just bad code, which luckily worked for me, or is there something special behind char pointers?
It worked due to pure chance. It might not work the next time you try it. Uninitialized pointers can point anywhere in memory. Writing to them can cause an instant access violation, or a problem that will manifest later, or nothing at all.
This is generally bad code, yes. Also whatever compiler you use is probably not very intelligent or warnings turned off since they usually throw an error or at least a warning like "variable used uninitialized" which is completely true.
You are in ( bad ) luck that when the code runs the point is garbage and somehow the OS allows the write ( or read ), probably you are running in debug mode?
My personal experience is that in some cases its predictable what the OS will do, but you should never ever rely on those things, one example is if you build with MinGW in debug mode, the unintialized values are usualy follow a pattern or zero, in release build its usually complete random junk.
Since you "point to a memory location" it must point to a valid location whenever it is an another variable ( pointing to another variable ) or allocating space at run time ( malloc ) what you are doing is neither so you basically read/write a random memory block and because of some black magic the app doesn't crash because of this, are you running on windows? Windows 2000 or XP? since I know those are not as restrictive as windows since Vista, I remember that back in the day I did similar thing under Windows XP and nothing happened when it was supposed to crash.
So generally, allocate or point to a memory block you want to use before you use the pointer in case you dont know how much memory you need use realloc or just simply figure out a good strategy that has the smallest footprint for your specific case.
One way to see what C actually does is to change this line
char* myString;
into
char* myString=(char*)0;
and break before that line with a debugger and watch the myString variable, it'll junk and if it intalizes the variable it'll be 0 then the rest of your code fail with access violation because you point "nowhere".
The normal operation would be
char* myString=(char*)malloc(125); // whatever amount you want

Tools and methods to identify/prevent static buffer overruns

Are there any tools or methods that can identify buffer overruns in statically defined arrays (ie. char[1234] rather than malloc(1234))?
I spent most of yesterday tracking down crashes and odd behaviour which ultimately turned out to be caused by the following line:
// ensure string is nul terminated due to stupid snprintf
error_msg[error_msg_len] = '\0';
This index obviously caused writing beyond the bounds of the array. This lead to the clobbering of a pointer variable, leading to unexpected behaviour with that pointer later on.
The three things that come to mind that could help alleviate such problems are:
Code review
This wasn't done, but I'm working on that.
valgrind
I often use valgrind during development to detect memory problems but it does not deal with static arrays. In the above instance it only showed me the symptoms such as the invalid free() of the clobbered pointer.
-fstack-protector-all
In the past I have used -fstack-protector-all to detect overruns like the above but for some odd reason it didn't flag anything in this instance.
So can anyone offer any ideas on how I could identify such overruns? Either by improving on the above list or something completely new.
EDIT: Some of the answers so far have mentioned commercial products that are fairly expensive. At this stage I don't think I could convince the powers that be to buy such a tool so I'd like to restrict tools to cheap/free. Yes, you get what you pay for but some improvement is better than none.
Static analyzer tools are able to detect some buffer overflows.
For example with this code:
char bla[1024];
int i;
for (i = 0; i <= 1024; i++)
bla[i] = 0;
Here is what PC-Lint / flexelint reports:
tst.c 9 Warning 661: Possible access of out-of-bounds pointer (1 beyond end of data) by operator '[' [Reference: file tst.c: lines 8, 9]
Have you tried the experimental Valgrind tool "SGCheck: an experimental stack and global array overrun detector" as opposed to the default "memcheck" tool?
http://valgrind.org/docs/manual/sg-manual.html
I haven't tried it myself but it appears to cover some of the types of bugs you are interested in.
Obviously, Valgrind does dynamic rather than static analysis which is a whole other discussion in itself.
Our CheckPointer tool is a dynamic analysis tool that will catch out-of-bounds errors on arrays no matter where they are.
CheckPointer catches many things Valgrind cannot, e.g., references outside of any struct or struct field regardless of where allocated, including static arrays as in the problem shown by OP. Valgrind can't detect such overruns, because it has no clue about the actual shape of the data being manipulated; this requires an understanding of the programming langauge, e.g., C. Valgrind can only detect references outside of allocated memory. Checkpointer can do this because it understands C intimately; it uses a full compiler-style C front end to collect type and critical size information.
It presently is available for C, but not yet for C++.

How run-time detects buffer overflow?

{
char bufBef[32];
char buf[8];
char bufAfter[32];
sprintf(buf,"AAAAAAA\0");
buf[8]='\0';
printf("%s\n",buf);
}
On Windows 7, I compiled the program with Visual Studio 2008 as debug project. 3 buffers are adjacent. I find their addresses with a debugger, as followed:
bufBef 0x001afa50
buf 0x001afa40
bufAfter 0x001afa18
The statement "buf[8]='\0'" writes the address out of buf. When I run the program, Operation System reported " Debug Error: Run-Time Check Failure #2 - Stack around the variable 'buf' was corrupted."
Then I compiled it as a release project. It run quietly, no error report raised.
My question is how run-time detect buffer overflow?
What you see if the effect of the /RTCs switch.
John Robbins' book Debugging Applications for Microsoft .NET and Microsoft Windows talks about this in depth.
Relevant excerpt:
Fortunately for us, Microsoft extended
the /RTCs switch to also do overrun
and underrun checking of all multibyte
local variables such as arrays. It
does this by adding four bytes to the
front and end of those arrays and
checking them at the end of the
function to ensure those extra bytes
are still set to 0xCC.
Note that this switch only works in an unoptimized build (debug build).
In general, you don't. You should write defensive code that does the proper checks to ensure that it never overruns a buffer.
The debug runtime adds a large number of checks to help find this sort of bug (and all sorts of other common memory-related bugs); these checks are often very expensive, so they are only included in debug builds or when running attached to a debugger. They also can't detect every possible error, so they aren't foolproof; they are just debugging aids.
The Wikipedia article on Electric Fence explains how buffer overruns are caught, and why you should not use such mechanisms in production code.
Typically, the run-time will detect overflows like that by allocating some extra space between the variables, and filling that space with a known bit pattern. After your code runs, it looks at the bit pattern in that space. Since it's outside any variable, it should retain the same bit pattern. If the content has changed, you wrote somewhere you shouldn't have.
The three buffers are not adjacent. The difference between the start of buf and the start of bufBef (the following item on the stack) is 16 bytes, but buf is only 8 bytes long.
The 8 bytes in between is presumably filled with an 8 byte "canary" value. When the runtime detects that the canary has been changed by your wild write, it raises the error you have seen.
(Your write to buf[8] writes to address 0x001afa48, which is in between buf and bufBef).
Compiler in debug mode put additional range checks for operations.
You need to understand the stack structure. Usually compiler places extra guard bytes with random cookie around arrays, if the value at the end of the function doesn't match, there is an overflow.
Well 0x001afa50 - 0x001afa40 = 0x10 = 16, and 0x001afa40 - 0x001afa18 = 0x28 = 40, so there's some space between the buffers for it to leave some known dummy data. If that's changed by the time the function ends, it knows you went beyond the end of the buffer. I'm just speculating -- they may have done it another way, but that seems one possibility.
C explicitly permits you to over-run (and under-run) your buffers, at your own peril.
There is no short-n-simple way to detect at run-time (in release builds) buffer overflows.
You're looking for a language different from C. Some languages define the behavior of ever possible program, defining specific error behavior for doing things that are "wrong". C on the other hand leaves the behavior of "wrong" code undefined, which means it's up to the programmer to ensure that he/she never uses the language in ways that result in undefined behavior. Some implementations are debugging-oriented or have debugging modes that assist you in finding errors, which you absolutely need to fix before deploying the code in release/production use.

Need help with buffer overrun

I've got a buffer overrun I absolutely can't see to figure out (in C). First of all, it only happens maybe 10% of the time or so. The data that it is pulling from the DB each time doesn't seem to be all that much different between executions... at least not different enough for me to find any discernible pattern as to when it happens. The exact message from Visual Studio is this:
A buffer overrun has occurred in
hub.exe which has corrupted the
program's internal state. Press
Break to debug the program or Continue
to terminate the program.
For more details please see Help topic
'How to debug Buffer Overrun Issues'.
If I debug, I find that it is broken in __report_gsfailure() which I'm pretty sure is from the /GS flag on the compiler and also signifies that this is an overrun on the stack rather than the heap. I can also see the function it threw this on as it was leaving, but I can't see anything in there that would cause this behavior, the function has also existed for a long time (10+ years, albeit with some minor modifications) and as far as I know, this has never happened.
I'd post the code of the function, but it's decently long and references a lot of proprietary functions/variables/etc.
I'm basically just looking for either some idea of what I should be looking for that I haven't or perhaps some tools that may help. Unfortunately, nearly every tool I've found only helps with debugging overruns on the heap, and unless I'm mistaken, this is on the stack. Thanks in advance.
You could try putting some local variables on either end of the buffer, or even sentinels into the (slightly expanded) buffer itself, and trigger a breakpoint if those values aren't what you think they should be. Obviously, using a pattern that is not likely in the data would be a good idea.
While it won't help you in Windows, Valgrind is by far the best tool for detecting bad memory behavior.
If you are debugging the stack, your need to get to low level tools - place a canary in the stack frame (perhaps a buffer filled with something like 0xA5) around any potential suspects. Run the program in a debugger and see which canaries are no longer the right size and contain the right contents. You will gobble up a large chunk of stack doing this, but it may help you spot exactly what is occurring.
One thing I have done in the past to help narrow down a mystery bug like this was to create a variable with global visibility named checkpoint. Inside the culprit function, I set checkpoint = 0; as the very first line. Then, I added ++checkpoint; statements before and after function calls or memory operations that I even remotely suspected might be able to cause an out-of-bounds memory reference (plus peppering the rest of the code so that I had a checkpoint at least every 10 lines or so). When your program crashes, the value of checkpoint will narrow down the range you need to focus on to a handful of lines of code. This may be a bit overkill, I do this sort of thing on embedded systems (where tools like valgrind can't be used) but it should still be useful.
Wrap it in an exception handler and dump out useful information when it occurs.
Does this program recurse at all? If so, I check there to ensure you don't have an infinite recursion bug. If you can't see it manually, sometimes you can catch it in the debugger by pausing frequently and observing the stack.

Strange code crash problem?

I have a MSVC 6.o workspace, which has all C code.
The code is being run without any optimization switch i.e with option O0, and in debug mode.
This code is obtained from some 3rd party. It executes desirable as it is.
But when I add some printf statements in certain functions for debugging, and then execute the code, it crashes.
I suspect it to be some kind of code/data overflow across a memory-page/memory-segment or something alike. But the code does not have any memory map specifier, or linker command file mentioning the segments/memory map etc.
How do I narrow down the cause, and the fix for this quirky issue?
On Linux, I like valgrind. Here is a Stack Overflow thread for valgrind-like tools on Windows.
You could try to determine where the crash happens, by looking at the stack trace in Visual Studio. You should be able to see what is the sequence of function calls that eventually leads to the crash, and this may give you a hint as to what's wrong.
It is also possible that the printf() alone causes the crash. A possible cause - but not too likely on Windows - is a too-small stack that is being overflown by the call to printf().
Use string.getbuffer while printing cstring objects in printf.
There could be an issue for wide char and normal string.
printf("%s",str.Getbuffer());
str.ReleaseBuffer();
Cheers,
Atul.
In general when trying to deal with a crash, your first port of call should be the debugger.
Used correctly, this will enable you to narrow down your problem to a specific line of code and, hopefully, give you a view of the runtime memory at the moment of the crash. This will allow you to see the immediate cause of the crash.

Resources