GCC how to detect stack buffer overflow - c

Since there is an option -fstack-protector-strong in gcc to detect stack smashing. However, it can not always detect stack buffer overflow. For the first function func, when I input a 10 char more string, the program does not always crash. My question is where there is a way to detect stack buffer overflow.
void func()
{
char array[10];
gets(array);
}
void func2()
{
char buffer[10];
int n = sprintf(buffer, "%s", "abcdefghpapeas");
printf("aaaa [%d], [%s]\n", n, buffer);
}
int main ()
{
func();
func2();
}

Overflows on the stack are either hard to detect or very expensive to detect - chose your poison.
In a nutshell, when you have this:
char a,b;
char *ptr=&a;
ptr[1] = 0;
then this is technically legal: There is space allocated on the stack which belongs to the function. It's just very dangerous.
So the solution might be to add a gap between a and b and fill that with a pattern. But, well, some people actually write code as above. So your compiler needs to detect that.
Alternatively, we could create a bit-map of all bytes that your code has really allocated and then instrument all the code to check against this map. Very safe, pretty slow, bloats your memory usage. On the positive side, there are tools to help with this (like Valgrind).
See where I'm going?
Conclusion: In C, there is no good way to automatically detect many memory problems because the language and the API is often too sloppy. The solution is to move code into helper functions that check their parameters rigorously, always to the right thing and have good unit test coverage.
Always use snprintf() versions of functions if you have a choice. If old code uses the unsafe versions, change it.

You can use a tool called Valgrind
http://valgrind.org/

My question is where there is a way to detect stack buffer overflow...
void func()
{
char array[10];
gets(array);
}
void func2()
{
char buffer[10];
int n = sprintf(buffer, "%s", "abcdefghpapeas");
printf("aaaa [%d], [%s]\n", n, buffer);
}
Because you are using GCC, you can use FORTIFY_SOURCES.
FORTIFY_SOURCE uses "safer" variants of high risk functions like memcpy, strcpy and gets. The compiler uses the safer variants when it can deduce the destination buffer size. If the copy would exceed the destination buffer size, then the program calls abort(). If the compiler cannot deduce the destination buffer size, then the "safer" variants are not used.
To disable FORTIFY_SOURCE for testing, you should compile the program with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.
The C Standard has "safer" functions via ISO/IEC TR 24731-1, Bounds Checking Interfaces. On conforming platforms, you can simply call gets_s and sprintf_s. They offer consistent behavior (like always ensuring a string is NULL terminated) and consistent return values (like 0 on success or an errno_t).
Unfortunately, gcc and glibc does not conform to the C Standard. Ulrich Drepper (one of the glibc maintainers) called bounds checking interfaces "horribly inefficient BSD crap", and they were never added. Hopefully it will change in the future.

First of all Do Not Use gets. By now almost everyone knows the all the security and reliability problems that can occur with gets. But it's included here for historical reasons as well because it's a very good example of bad programming.
Let's look at all the problems with the code:
// Really bad code
char line[100];
gets(line);
Because gets does not do bounds checking a string longer than 100 characters will overwrite memory. If you're lucky the program will just crash Or it might exhibit strange behavior.
The gets function is so bad that the GNU gcc linker issues a warning whenever it's used.
/tmp/ccI5WJ5m.o(.text+0x24): In function `main':
: warning: the `gets' function is dangerous and should not be used.
Protect array accesses with assert
C/C++ does not do bound checking.
for example:
int data[10]
i = 20
data[20] = 100 //Memory Corruption
Use the assert function for above code
#include<assert.h>
int data[10];
i=20
assert((i >= 0) && (i < sizeof(data) / sizeof(data[0]))); // throws
data[i] = 100
Array overflows are one of the most common programming errors and are extremely frustrating to try and locate. This code doesn't eliminate them, but it does cause buggy code to abort early in a way that makes the problem tremendously easier to find.
And use snprintf(buffer, sizeof(buffer), "%s", "abcdefghpapeas") and some tools like valgrind or GDB.
Hope this helps you..

Related

Do I really need malloc?

I understand that malloc is used to dynamically allocate memory. In my code, I have the following function that I sometimes call:
int memory_get_log(unsigned char day, unsigned char date, unsigned char month){
char fileName[11];
unsigned long readItems, itemsToRead;
F_FILE *file;
sprintf(fileName, "%s_%u%u%u%s", "LOG", day, date, month, ".bin");
file = f_open(fileName , "r");
itemsToRead = f_filelength( fileName );
//unsigned char *fileData = (unsigned char *) malloc(itemsToRead);
unsigned char fileData[itemsToRead]; //here I am not using malloc
readItems = f_read(fileData, 1, itemsToRead, file);
transmit_data(fileData, itemsToRead);
f_close(file);
return 0;
}
As you may see, the number of items I read from the file can be different each time. The line
unsigned char fileData[itemsToRead]; is used to read these variable sized files. I can see that I am allocating memory dynamically in some way. This function works fine. Do I really need to use malloc here?
Is there anything wrong with the way I am declaring this array?
TL;DR
If you don't know what you're doing, use malloc or a fixed size array in all situations. VLA:s are not necessary at all. And do note that VLA:s cannot be static nor global.
Do I really need to use malloc here?
Yes. You're reading a file. They are typically way bigger than what's suitable for a VLA. They should only be used for small arrays. If at all.
Long version
Is there anything wrong with the way I am declaring this array?
It depends. VLA:s was removed as a mandatory component from C11, so strictly speaking, you are using compiler extensions, thus reducing the portability. In the future, VLA:s might (The chance is probably extremely low) get removed from your compiler. Maybe you also want to recompile the code on a compiler without support for VLA:s. The risk analysis about this is up to you. But I might mention that the same is true for alloca. Although commonly available, it's not required by the standard.
Another problem is if the allocation fails. If you're using malloc, you have a chance to recover from this, but if you're only going to do something like this:
unsigned char *fileData = malloc(itemsToRead);
if(!fileData)
exit(EXIT_FAILURE);
That is, just exit on failure and not trying to recover, then it does not really matter. At least not from a pure recovery point of view.
But also, although the C standard does not impose any requirement that VLA:s end up on the stack or heap, as far as I know it's pretty common to put them on the stack. This means that the risk of failing the allocation due to insufficient available memory is much, much higher. On Linux, the stack is usually 8MB and on Windows 1MB. In almost all cases, the available heap is much higher. The declaration char arr[n] is basically the same as char *arr = alloca(n) with the exception of how the sizeof operator works.
While I can understand that you might want to use the sizeof operator on a VLA sometimes, I find it very hard to find a real need for it. Afterall, the size can never change, and the size is known when you do the allocation. So instead of:
int arr[n];
...
for(int i=0; i<sizeof(arr), ...
Just do:
const size_t size = n;
int arr[size];
...
for(int i=0; i<size; ...
VLA:s are not a replacement for malloc. They are a replacement for alloca. If you don't want to change a malloc to an alloca, then you should not change to a VLA either.
Also, in many situations where a VLA would seem to bee a good idea, it is ALSO a good idea to check if the size is below a certain limit, like this:
int foo(size_t n)
{
if(n > LIMIT) { /* Handle error */ }
int arr[n];
/* Code */
}
That would work, but compare it to this:
int foo(size_t n)
{
int *arr = malloc(n*sizeof(*arr));
if(!arr) { /* Handle error */ }
/* Code */
free(arr);
}
You did not really make things that much easier. It's still an error check, so the only thing you really got rid of was the free call. I might also add that it's a MUCH higher risk that a VLA allocation fails due to the size being too big. So if you KNOW that the size is small, the check is not necessary, but then again, if you KNOW that it is small, just use a regular array that will fit what you need.
However, I will not deny that there are some advantages of VLA:s. You can read about them here. But IMO, while they have those advantages they are not worth it. Whenever you find VLA:s useful, I would say that you should at least consider switching to another language.
Also, one advantage of VLA:s (and also alloca) is that they are typically faster than malloc. So if you have performance issues, you might want to switch to alloca instead of malloc. A malloc call involves asking the operating system (or something similar) for a piece of memory. The operating system then searches for that and returns a pointer if it finds it. An alloca call, on the other hand, is typically just implemented by changing the stack pointer in one single cpu instruction.
There are many things to consider, but I would avoid using VLA:s. If you ask me, the biggest risk with them is that since they are so easy to use, people become careless with them. For those few cases where I find them suitable, I would use alloca instead, because then I don't hide the dangers.
Short summary
VLA:s are not required by C11 and later, so strictly speaking, you're relying on compiler extensions. However, the same is true for alloca. So if this is a very big concern, use fixed arrays if you don't want to use malloc.
VLA:s are syntactic sugar (Not 100% correct, especially when dealing with multidimensional arrays) for alloca and not malloc. So don't use them instead of malloc. With the exception of how sizeof work on a VLA, they offer absolutely no benefit at all except for a somewhat simpler declaration.
VLA:s are (usually) stored on the stack while allocations done by malloc are (usually) stored on the heap, so a big allocation has a much higher risk to fail.
You cannot check if a VLA allocation failed or not, so it can be a good idea to check if the size is too big in advance. But then we have an error check just as we do with checking if malloc returned NULL.
A VLA cannot be global nor static. The static part alone will likely not cause any problems whatsoever, but if you want a global array, then you're forced to use malloc or a fixed size array.
This function works fine.
No it does not. It has undefined behavior. As pointed out by Jonathan Leffler in comments, the array fileName is too short. It would need to be at least 12 bytes to include the \0-terminator. You can make this a bit safer by changing to:
snprintf(fileName,
sizeof(fileName),
"%s_%u%u%u%s",
"LOG", day, date, month, ".bin");
In this case, the problem with the too small array would manifest itself by creating a file with extension .bi instead of .bin which is a better bug than undefined behavior, which is the current case.
You also have no error checks in your code. I would rewrite it like this. And for those who thinks that goto is bad, well, it usually is, but error handling is both practical and universally accepted among experienced C coders. Another common use is breaking out of nested loops, but that's not applicable here.
int memory_get_log(unsigned char day, unsigned char date, unsigned char month){
char fileName[12];
unsigned long readItems, itemsToRead;
int ret = 0;
F_FILE *file;
snprintf(fileName,
sizeof(fileName),
"%s_%u%u%u%s", "LOG",
day, date, month, ".bin");
file = f_open(fileName , "r");
if(!file) {
ret = 1;
goto END;
}
itemsToRead = f_filelength( fileName );
unsigned char *fileData = malloc(itemsToRead);
if(!fileData) {
ret=2;
goto CLOSE_FILE;
}
readItems = f_read(fileData, 1, itemsToRead, file);
// Maybe not necessary. I don't know. It's up to you.
if(readItems != itemsToRead) {
ret=3;
goto FREE;
}
// Assuming transmit_data have some kind of error check
if(!transmit_data(fileData, itemsToRead)) {
ret=4;
}
FREE:
free(fileData);
CLOSE_FILE:
f_close(file);
END:
return ret;
}
If a function only returns 0, then it's pointless to return anything. Declare it as void instead. Now I used the return value to make it possible for the caller to detect errors and the type of error.
firstly, the line 'unsigned char fileData[itemsToRead]' asks for memory on stack, and this will be terrible mistake if file size is big. You should use 'malloc' to ask memory on heap.
secondly, if file size is really big enough, you should coside to use virsual memory or dynamic load such as 'fseek' method.

How to call a void function with a pointer parameter?

I don't understant how I can call this function correctly in main.
I tried calling like this: X *p1=malloc(sizeof(X));read(&p1);
struct x
{ int c;
char n[250];
char u[150];
};
typedef struct x X;
void myread(X *p);
void myread(X *p)
{
scanf("%d",&p->c);
fgets(p->n,sizeof(p->n),stdin);
fgets(p->u,sizeof(p->u),stdin);
}
void main(){
X *p1;
p1=malloc(sizeof(X));
myread(p1);
}
The function as declared in your question (void read(X *p)) needs a pointer to X as parameter.
Your declaration of (X *p1) ensures that it is a pointer to X.
Your initialisation (p1=malloc(sizeof(X));) makes it a pointer to a suitably sized piece of useable memory.
That's it.
You then however use that appropriate pointer with the "take the address operator" &, which results in a pointer-to-pointer to X.
Calling the function with p1 as parameter would look like
read(p1);
I.e. just give the pointer to X as it is.
The function looks like it is supposed to fill the malloced space with values.
How that works and how it could be improved is not part of your question.
Don't use gets. It is dangerous and obsolete. Use fgets instead. Replace gets(p->n) with fgets(p->n, sizeof(p->n), stdin); and so on.
Don't name your function read (since that is the name of a POSIX standard function). Replace it by another name, e.g. myread.
You probably want to do:
X *p1=malloc(sizeof(X));
in your main. Then, you need to check that malloc succeeded:
if (!p1) { perror("malloc p1"); exit(EXIT_FAILURE); }
Beware that malloc gives (on success) some uninitialized memory zone. You may want to clear it using memset(p1, 0, sizeof(p1)), or you could use p1 = calloc(1, sizeof(X)) instead.
At last you can pass it to your myread:
myread(p1);
Don't forget to call free(p1) (e.g. near the end of your main) to avoid a memory leak.
Learn to use valgrind, it catches many memory related bugs.
Of course you need to carefully read the documentation of every standard function that you use. For example, fgets(3) documents that you need to #include <stdio.h>, and that call to fgets can fail (and your code needs to check that, see also errno(3) & perror(3)...). Likewise, malloc(3) wants #include <stdlib.h> and should be checked. And scanf(3) can fail too, and needs to be checked.
You should compile with all warnings and debug info (gcc -Wall -Wextra -g with GCC), improve your code to get no warnings, and you should use the gdb debugger; you may want to use GCC sanitizers (e.g. instrumentation options like -fsanitize=address, -fsanitize=undefined and others),
Beware of undefined behavior (UB). It is really scary.
PS. I hope that you are using Linux with gcc, since it is a very developer friendly system. If not, adapt my answer to your operating system and compiler and debugging tools.

Does fscanf allocate memory and place a NUL byte at the end of a string?

If I am correct, doing something like:
char *line;
Then I must allocate some memory and assign it to line, is that right? If I am right, the question is the following:
In a line like
while (fscanf(fp,"%[^\n]", line) == 1) { ... }
without assigning any memory to line I am still getting the correct lines and the correct strlen counts on such lines.
So, does fscanf assign that memory for me and it also places the '\0'?
I saw no mention about these 2 things on the spec for fscanf.
The POSIX scanf() family of functions will allocate memory if you use the m (a on some older pre-POSIX versions) format modifier. Note: when fscanf allocates, it expects a char ** pointer. (see man scanf) E.g.:
while(fscanf(fp,"%m[^\n]", &line) == 1) { ... }
I would also suggest consuming the newline with "%m[^\n]%*c". I agree with the other answers that suggest using line-oriented input instead of fscanf. (e.g. getline -- see: Basile's answer)
See the C FAQ:
Q: I just tried the code
char *p;
strcpy(p, "abc");
and it worked. How? Why didn't it crash?
A: You got lucky, I guess. The memory randomly pointed to by the uninitialized pointer p happened to be writable by you, and apparently was not already in use for anything vital. See also question 11.35.
And, here is a longer explanation, and another longer explanation.
To read entire lines with a recent C library on POSIX systems, you should use getline(3). It allocates (and reallocates) the buffer holding the line as needed. See the example on the man page.
If you have a non-POSIX system without getline you might use fgets(3) but then you have to take the pain to allocate the line itself, test that you did not read a full newline terminated line, and repeat. Then you need to pre-allocate some line buffer (using e.g. malloc) before calling fgets (and you might realloc it if a line does not fit and call fgets again). Something like:
//// UNTESTED CODE
size_t linsiz=64;
char* linbuf= malloc(linsiz);
if (!linbuf) { perror("malloc"); exit(EXIT_FAILURE); };
memset(linbuf, 0, sizeof(linbuf));
bool gotentireline= false;
char* linptr= linbuf;
do {
if (!fgets(linptr, linbuf+linsiz-linptr-1, stdin))
break;
if (strchr(linptr, '\n'))
gotentireline= true;
else {
size_t newlinsiz= 3*linsiz/2+16;
char* newlinbuf= malloc(newlinsiz);
int oldlen= strlen(linbuf);
if (!newlinbuf) { perror("malloc"); exit(EXIT_FAILURE); };
memset (newlinbuf, 0, newlinsiz); // might be not needed
strncpy(newlinbuf, linbuf, linsiz);
free (linbuf);
linbuf= newlinbuf;
linsiz= newlinsiz;
linptr= newlinbuf+oldlen;
);
} while(!gotentireline);
/// here, use the linbuf, and free it later
A general rule would be to always initialize pointers (e.g. declare char *line=NULL; in your case), and always test for failure of malloc, calloc, realloc). Also, compile with all warnings and debug info (gcc -Wall -Wextra -g). It could have give a useful warning to you.
BTW, I love to clear every dynamically allocated memory, even when it is not very useful (because the behavior is then more deterministic, making programs easier to debug).
Some systems also have valgrind to help detecting memory leaks, buffer overflows, etc..
line is uninitialized and doen't point to any valid memory location so what you see is undefined behavior.
You need to allocate memory for your pointer before writing something to it.
PS: If you are trying to read the whole line then fgets() is a better option.Note that fgets() comes with a newline character .
Nope. You're just getting lucky undefined behavior. It's completely possible that an hour from now, the program will segfault instead of run as expected or that a different compiler or even the same compiler on a different machine will produce a program that behaves differently. You can't expect things that aren't specified to happen, and if you do, you can't expect it to be any form of reliable.

C Stack-Allocated String Scope

For straight C and GCC, why doesn't the pointed-to string get corrupted here?
#include <stdio.h>
int main(int argc, char *argv[])
{
char* str_ptr = NULL;
{
//local to this scope-block
char str[4]={0};
sprintf(str, "AGH");
str_ptr = str;
}
printf("str_ptr: %s\n", str_ptr);
getchar();
return 0;
}
|----OUTPUT-----|
str_ptr: AGH
|--------------------|
Here's a link to the above code compiled and executed using an online compiler.
I understand that if str was a string literal, str would be stored in the bss ( essentially as a static ), but sprintf(ing) to a stack-allocated buffer, I thought the string buffer would be purely stack-based ( and thus the address meaningless after leaving the scope block )? I understand that it may take additional stack allocations to over-write the memory at the given address, but even using a recursive function until a stack-overflow occurred, I was unable to corrupt the string pointed to by str_ptr.
FYI I am doing my testing in a VS2008 C project, although GCC seems to exhibit the same behavior.
While nasal lizards are a popular part of C folklore, code whose behaviour is undefined can actually exhibit any behaviour at all, including magically resuscitating variables whose lifetime has expired. The fact that code with undefined behaviour can appear to "work" should neither be surprising nor an excuse to neglect correcting it. Generally, unless you're in the business of writing compilers, it's not very useful to examine the precise nature of undefined behaviour in any given environment, especially as it might be different after you blink.
In this particular case, the explanation is simple, but it's still undefined behaviour, so the following explanation cannot be relied upon at all. It might at any time be replaced with reptilian emissions.
Generally speaking, C compilers will make each function's stack frame a fixed size, rather than expanding and contracting as control flow enters and leaves internal blocks. Unless called functions are inlined, their stack frames will not overlap with the stack frame of the caller.
So, in certain C compilers with certain sets of compile options and except for particular phases of the moon, the character array str will not be overwritten by the call to printf, even though the variable's lifetime has expired.
Most likely the compiler does some sort of simple optimizations resulting in the string still being in the same place on the stack. In other words, the compiler allows the stack to grow to store 'str'. But it doesn't shrink the stack in the scope of main, because it is not required to do so.
If you really want to see the result of saving the address of variables on the stack, call a function.
#include <stdio.h>
char * str_ptr = NULL;
void onstack(void)
{
char str[4] = {0};
sprintf(str,"AGH");
str_ptr = str;
}
int main(int argc, char *argv[])
{
onstack();
int x = 0x61626364;
printf("str_ptr: %s\n", str_ptr);
printf("x:%i\n",x);
getchar();
return 0;
}
With gcc -O0 -std=c99 strcorrupt.c I get random output on the first printf. It will vary from machine to machine and architecture to architecture.

Why does it NOT give a segmentation violation?

The code below is said to give a segmentation violation:
#include <stdio.h>
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
It's compiled and run like this:
gcc -Wall -Wextra hw.cpp && a.exe
But there is nothing output.
NOTE
The above code indeed overwrites the ret address and so on if you really understand what's going underneath.
The ret address will be 0x41414141 to be specific.
Important
This requires profound knowledge of stack
You're just getting lucky. There's no reason that code has to generate a segmentation fault (or any other kind of error). It's still probably a bad idea, though. You can probably get it to fail by increasing the size of large_string.
Probably in your implementation buffer is immediately below large_string on the stack. So when the call to strcpy overflows buffer, it's just writing most of the way into large_string without doing any particular damage. It will write at least 255 bytes, but whether it writes more depends what's above large_string (and the uninitialised value of the last byte of large_string). It seems to have stopped before doing any damage or segfaulting.
By fluke, the return address of the call to function isn't being trashed. Either it's below buffer on the stack or it's in a register, or maybe the function is inlined, I can't remember what no optimisation does. If you can't be bothered to check the disassembly, I can't either ;-). So you're returning and exiting without problems.
Whoever said that code would give a segfault probably isn't reliable. It results in undefined behaviour. On this occasion, the behaviour was to output nothing and exit.
[Edit: I checked on my compiler (GCC on cygwin), and for this code it is using the standard x86 calling convention and entry/exit code. And it does segfault.]
You're compiling a .cpp (c++) program by invoking gcc (instead of g++)... not sure if this is the cause, but on a linux system (it appears your running on windows due to the default .exe output) it throws the following error when trying to compile as you have stated:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0'
collect2: ld returned 1 exit status
Its UB ( undefined behavior).
Strcpy might have copied more bytes into memory pointed by buffer and it might not cause problem at that moment.
It's undefined behavior, which means that anything can happen. The program can even appear to work correctly.
It seem that you just happen to not overwrite any parts of memory that are still needed by the rest of the (short) program (or are out of the programs address space/write protected/...), so nothing special happens. At least nothing that would lead to any output.
There's a zero byte on the stack somewhere that stops the strcpy() and there's enough room on the stack not to hit protected page. Try printing out strlen(buffer) in that function. In any case the result is undefined behavior.
Get into habit of using strlcpy(3) family of functions.
You can test this in other ways:
#include <stdlib.h>
int main() {
int *a=(int *)malloc(10*sizeof(int));
int i;
for (i=0;i<1000000; i++) a[i] = i;
return 0;
}
In my machine, this causes SIGSEGV only at around i = 37000! (tested by inspecting the core with gdb).
To guard against these problems, test your programs using a malloc debugger... and use lots of mallocs, since there are no memory debugging libraries that I know of that can look into static memory. Example: Electric Fence
gcc -g -Wall docore.c -o c -lefence
And now the SIGSEGV is triggered as soon as i=10, as would be expected.
As everyone says, your program has undefined behaviour. In fact your program has more bugs than you thought it did, but after it's already undefined it doesn't get any further undefined.
Here's my guess about why there was no output. You didn't completely disable optimization. The compiler saw that the code in function() doesn't have any defined effect on the rest of the program. The compiler optimized out the call to function().
Odds are that the long string is, in fact, terminated by the zero byte in i. Assuming that the variables in main are laid out in the order they are declared -- which isn't required by anything in the language spec that I know of but seems likely in practice -- then large_string would be first in memory, followed by i. The loop sets i to 0 and counts up to 255. Whether i is stored big-endian or little-endian, either way it has a zero byte in it. So in traversing large_string, at either byte 256 or 257 you'll hit a null byte.
Beyond that, I'd have to study the generated code to figure out why this didn't blow. As you seem to indicate, I'd expect that the copy to buffer would overwrite the return address from the strcpy, so when it tried to return you'd be going into deep space some where and would quickly blow up on something.
But as others say, "undefined" means "unpredictable".
There may be anything in your 'char buffer[16]', including \0. strcpy copies till it finds first \0 - thus not going above your boundary of 16 characters.

Resources