Running a little internal CTF to teach people some computer security basics and I've run into a strange behavior. The following is the handle function of a forking TCP server. It is just a cute little buffer overflow demonstration (taken from CSAW CTF).
When testing, I only ever bothered sending it 4097 bytes worth of data, because that will successfully overflow into the backdoor variable. However, many of the participants decided to try to send exactly 4099 bytes and this doesn't actually work. I'm not entirely sure why.
In GDB, recving 4099 bytes works just fine, but otherwise it does not. I've spent a good amount of time debugging this now, as I'd like a good explanation for everybody as to why the service behaved as it did. Is it some sort of quirk with the recv() call or am I doing something fundamentally wrong here?
void handle(int fd)
{
int backdoor = 0;
char attack[4096];
send(fd, greeting, strlen(greeting), 0);
sleep(3);
recv(fd, attack, 0x1003, 0);
if (backdoor)
{
dup2(fd, 0); dup2(fd, 1); dup2(fd, 2);
char* argv[] = {"/bin/cat", "flag", NULL};
execve(argv[0], argv, NULL);
exit(0);
}
send(fd, nope, strlen(nope), 0);
}
Edit
The executable was compiled with:
clang -o backdoor backdoor.c -O0 -fno-stack-protector
I did not use different optimization settings for debugging / the live executable. I can run the following command:
python -c "print 'A'*4099" | nc <ip> <port>
and this will not work. I then attach to the running process via GDB (setting a breakpoint directly after the recv call) and run the above command again and it does work. I have repeated this multiple times with some variations, yet the same results.
Could it be something to do with the way that the OS handles queueing excess bytes sent to the socket? When I am sending 4099 bytes with the above command, I am actually sending 5000 (Python's print appends a newline implicitly). This means that recv's newline gets truncated and is left for the next call to recv to clean up. Still can't figure out how GDB could influence this at all, but just a theory.
... am I doing something fundamentally wrong here?
Yes, you are expecting undefined behaviour to be predictable. It isn't.
If you compile that function with gcc, using -O3, then you'll get a warning about exceeding the size of the receive buffer (but of course you already knew that); but you'll also get a binary which does not actually bother to check backdoor. If you use clang, you don't get the warning, but you get a binary which doesn't even allocate space for backdoor.
The reason is clear: modifying backdoor through a "backdoor" is undefined behaviour, and the compiler is under no obligation to do anything you might consider logical or predictable in the face of undefined behaviour. In particular, it's allowed to assume that the undefined behaviour never happens. Since no valid program could mutate backdoor, the compiler is allowed to assume that backdoor never gets mutated, and hence it can ditch the code inside the if block as unreachable.
You don't mention how you're compiling this program, but if you're compiling without optimization to use gdb and with optimisation when you don't plan to use gdb, then you should not be surprised that undefined behaviour is handled differently. On the other hand, even if you are compiling the program with the same compiler and options in both cases, you still shouldn't be surprised, since undefined behaviour is, as it says, undefined.
Declaring backdoor as volatile might prevent the optimization. Although that's hardly the point, is it?
Note: I'm using gcc version 4.8.1 and clang version 3.4. Different versions (and even different builds) might have different results.
Related
Consider this demo programme:
#include <string.h>
#include <unistd.h>
typedef struct {
int a;
int b;
int c;
} mystruct;
int main() {
int TOO_BIG = getpagesize();
int SIZE = sizeof(mystruct);
mystruct foo = {
123, 323, 232
};
mystruct bar;
memset(&bar, 0, SIZE);
memcpy(&bar, &foo, TOO_BIG);
}
I compile this two ways:
gcc -O2 -o buffer -Wall buffer.c
gcc -g -o buffer_debug -Wall buffer.c
i.e. the first time with optimizations enabled, the second time with debug flags and no optimization.
The first thing to notice is that there are no warnings when compiling, despite getpagesize returning a value that will cause buffer overflow with memcpy.
Secondly, running the first programme produces:
*** buffer overflow detected ***: terminated
Aborted (core dumped)
whereas the second produces
*** stack smashing detected ***: terminated
Aborted (core dumped)
or, and you'll have to believe me here since I can't reproduce this with the demo programme, sometimes no warning at all. The programme doesn't even interrupt, it runs as normal. This was a behaviour I encountered with some more complex code, which made it difficult to debug until I realised that there was a buffer overflow happening.
My question is: why are there two different behaviours with different build flags? And why does this sometimes execute with no errors when built as a debug build, but always errors when built with optimizations?
..I can't reproduce this with the demo program, sometimes no warning at all...
The undefined behavior directives are very broad, there is no requirement for the compiler to issue any warnings for a program that exhibits this behavior:
why are there two different behaviours with different build flags? And why does this sometimes execute with no errors when built as a debug build, but always errors when built with optimizations?
Compiler optimizations tend to optimize away unused variables, if I compile your code with optimizations enabled I don't get a segmentation fault, looking at the assembly (link above), you'll notice that the problematic variables are optimized away, and memcpy doesn't get called, so there is no reason for it to not compile successfuly, the program exits with success code 0, whereas if don't optimize it, the undefined behavior manifests itself, and the program exits with code 139, classic segmentation fault exit code.
As you can see these results are different from yours and that is one of the features of undefined behavior, different compilers, systems or even compiler versions can behave in a completely different way.
Accessing memory behind what's been allocated is undefined behavior, which means the compiler is allowed to do anything. When there are no optimizations, the compiler may try to guess and do something reasonable. When optimizations are turned on, the compiler may take advantage of the fact that any behavior is allowed to do something that runs faster.
The first thing to notice is that there are no warnings when compiling, despite getpagesize returning a value that will cause buffer overflow with memcpy.
That is the programmer's responsibility to fix, not the compiler. You'll be very lucky if a compiler manages to find potential buffer overflows for you. Its job is to check that your code is valid C then translate it to machine code.
If you want a tool that catches bugs, they are called static analysers and that's a different type of program. At some extent, static analysis might be integrated in a compiler as a feature. There is one for clang, but most static analysers are commercial tools and not open source.
Secondly, running the first programme produces: ... whereas the second produces
Undefined behavior simply means there is no defined behavior. What is undefined behavior and how does it work?. Meaning there's not likely anything to learn from examining the results, no interesting mystery to solve. In one case it apparently accessed forbidden memory, in the other case it mangled a poor little "stack canary". The difference will be related to different memory layouts. Who cares - bugs are bugs. Focus on why the bug happened (you already know!), instead of trying to make sense of the undefined results.
Now when I run your code with optimizations actually enabled for real (gcc -O2 on an x86 Linux), the compiler gives me
main:
subq $8, %rsp
call getpagesize
xorl %eax, %eax
addq $8, %rsp
ret
With optimizations actually enabled, it didn't even bother calling memcpy & friends because there are no side effects and the variables aren't used, so they can be safely removed from the executable.
I have an embedded project that requires at some point that I write to address 0. So naturally I try:
*(int*)0 = 0 ;
But at optimisation level 2 or higher, the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
Here is my source file:
void f (void)
{
*(int*)0 = 0 ;
}
and here is the output listing:
.file "bug.c"
.text
.p2align 4,,15
.globl _f
.def _f; .scl 2; .type 32; .endef
_f:
LFB0:
.cfi_startproc
movl $0, 0
ud2 <-- Invalid instruction!
.cfi_endproc
LFE0:
.ident "GCC: (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 7.3.0"
My question is: Why would anybody do this? What possible benefit could accrue from sabotaging code like this? Surely the obvious course of action is to issue a warning and carry on compiling?
I know the compiler is allowed to do this, I just wonder about the motivation of the compiler writer. It cost me two days and four engineering samples to track this down, so I'm a little peeved.
Edited to add: I have worked around this by using assembly language. So I'm not looking for solutions. I'm just curious why anybody would think this compiler behaviour was a good idea.
(Disclaimer: I'm not an expert on GCC internals, and this is more of a "post hoc" attempt to explain its behavior. But maybe it will be helpful.)
the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
I won't deny that there are cases where GCC does more or less that, but here there's a little more going on, and there is some method to its madness.
As I understand it, GCC isn't treating the null dereference as totally undefined here; it is making some assumptions about what it does. Its handling of null dereferences is controlled by a flag called -fdelete-null-pointer-checks, which is probably enabled by default when you turn on optimizations. From the manual:
-fdelete-null-pointer-checks
Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this flag to
control global dataflow analyses that eliminate useless checks for
null pointers; these assume that a memory access to address zero
always results in a trap, so that if a pointer is checked after it has
already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization
for programs that depend on that behavior.
This option is enabled by default on most targets. On Nios II ELF, it defaults to off. On AVR, CR16, and MSP430, this option is
completely disabled.
Passes that use the dataflow information are enabled independently at different optimization levels.
So, if you are intending to actually access address 0, or if for some other reason your code will go on executing after the dereference, then you want to disable this with -fno-delete-null-pointer-checks. That will achieve the "carry on compiling" part of what you want. It will not give you warnings, however, presumably under the assumption that such dereferences are intentional.
But under default options, why are you seeing the generated code that you do, with the undefined instruction, and why isn't there a warning? I would guess that GCC's logic is running as follows:
Because -fdelete-null-pointer-checks is in effect, the compiler assumes that execution will not continue past the null dereference, but instead will trap. How the trap will be handled, it doesn't know: maybe program termination, maybe a signal or exception handler, maybe a longjmp up the stack. The null dereference itself is emitted as requested, perhaps under the assumption that you are intentionally exercising your trap handler. But either way, whatever code comes after the null dereference is now unreachable.
So now it does what any reasonable optimizing compiler does with unreachable code: it doesn't emit it. In your case, that's nothing but a ret, but whatever it is, as far as GCC is concerned it would just be wasted bytes of memory, and should be omitted.
You might think you should get a warning here, but GCC has a longstanding design decision not to warn about unreachable code, on the grounds that such warnings tended to be inconsistent and the false positives would do more harm than good. See for instance https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html.
However, as a safety feature, GCC emits an undefined instruction (ud2 on x86) in place of the omitted unreachable code. The idea, I believe, is that just in case execution somehow does continue past the null dereference, it is better for the program to die, than to go off into the weeds and try to execute whatever memory contents happen to come next. (And indeed this can happen even on systems that do unmap the zero page; for instance, if you do struct huge *p = NULL; p->x = 0;, GCC understands this as a null dereference, even though p->x may not be on the zero page at all, and could conceivably be located at an accessible address.)
There is a warning flag, -Wnull-dereference, that will trigger a warning on your blatant null dereference. However, it only works if -fdelete-null-pointer-checks is enabled.
When would GCC's behavior be useful? Here's an example, maybe contrived, but it might get the idea across. Imagine your program has some allocation function that might fail:
struct foo *p = get_foo();
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
frob(p->bar);
Now imagine that you redesign get_foo() so that it can't fail. You forget to take out your "backup plan" code, but you go ahead and use the returned object right away:
struct foo *p = get_foo();
frob(p->bar);
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
The compiler doesn't know, a priori, that get_foo() will always return a valid pointer. But it can see that you've dereferenced it, and thus can assume that execution will only continue past that point if the pointer was not null. Therefore, it can tell that the elaborate backup plan is unreachable and should be omitted, which will save you a lot of bloat in your binary.
Incidentally, the situation with clang. Although as Eric Postpischil points out you do get a warning, what you don't get is an actual load from address 0: clang omits it and just emits ud2. This is what "doing whatever it likes" would really look like, and if you were hoping to exercise your page zero trap handler, you are out of luck.
In describing Undefined Behavior, the Standard refers to it as resulting "upon use of a nonportable or erroneous program construct or of erroneous data,", and the authors of the Standard clarify their intentions more clearly in the published Rationale: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." The question of when to extend the language in such fashion--treating various forms of UB as non-portable but correct, was left as a Quality of Implementation issue outside the Standard's jurisdiction.
The maintainers of clang and gcc take the view that the phrase "non-portable or erroneous" should be interpreted as synonymous with "erroneous", since the Standard would not forbid such an interpretation. If a compiler will never be used to process non-portable programs that will never be fed erroneous data, such an interpretation will sometimes allow them to process some strictly conforming programs which are fed exclusively valid data more quickly than would otherwise be possible, at the expense of making them less suitable for other purposes. I personally would view the range of programs that a compiler can usefully process reasonably efficiently as a much better metric of quality than the efficiency with which a compiler can process strictly-conforming programs, but people who are using compilers for different purposes may have different views about what would make a compiler more or less useful for those purposes.
I'm aware that in C you may write beyond the end of allocated memory, and that instead of crashing this just leads to undefined behaviour, but somehow after testing many times, even with loops, and other variables, the output is always exactly as expected.
Specifically, I've been writing to an integer beyond the bounds of malloc(1), as such.
int *x = malloc(1);
*x = 123456789;
It's small enough to fit in 4 bytes (my compiler warns me that it will overflow it it's too large, which makes sense), but still clearly larger than one byte, however it still somehow works. I haven't been able to run a single test that didn't either work in a very "defined"-looking manner, or segfault immediately. Such tests include repeatedly recompiling and running the program, and outputting the value of x, trying to write over it with a giant array, and trying to write over it with an array of length 0, going beyond its boundaries.
After seeing this, I immediately went and tried to edit a string literal, which should be read-only. But somehow, it worked, and seemed consistent also.
Can someone recommend a test I may use to demonstrate undefined behaviour? Is my compiler (Mingw64 on Windows 10) somehow doing something to make up for my perceived stupidity? Where are the nasal demons?
The term "Undefined Behavior" embodies two different concepts: actions whose behavior isn't specified by anything, and actions whose behavior isn't specified by the C Standard, but is specified by many implementations. While some people, including the maintainers of some compilers, refuse to acknowledge the existence of the second category, the authors of the Standard described it explicitly:
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.
On most implementations, your program would be example of the first kind. Implementations will typically, for their own convenience, pad small allocation requests up to a certain minimum size, and will also pad larger allocation requests if needed to make them be a multiple of a certain size. They generally do not document this behavior, however. Your code should only be expected to behave meaningfully on an implementation which documents the behavior of malloc in sufficient detail to guarantee that the requisite amount of space will be available; on such an implementation, your code would invoke UB of the second type.
Many kinds of tasks would be impossible or impractical without exploiting the second kind of UB, but such exploitation generally requires disabling certain compiler optimizations and diagnostic features. I can't think of any reason why code that wanted space for 4 bytes would only malloc one, unless it was designed to test the behavior of an allocator which would use the storage immediately past the end of an allocation for a particular documented purpose.
One of the trademarks of undefined behavior is that the same code can behave differently on different compilers or with different compiler settings.
Given this code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *x = malloc(1);
x[100000] = 123456789;
return 0;
}
If I compile this on my local machine with -O0 and run it, the code segfaults. If I compile with -O3, it doesn't.
[dbush#centos72 ~]$ gcc -O0 -Wall -Wextra -o x1 x1.c
[dbush#centos72 ~]$ ./x1
Segmentation fault (core dumped)
[dbush#centos72 ~]$ gcc -O3 -Wall -Wextra -o x1 x1.c
[dbush#centos72 ~]$ ./x1
[dbush#centos72 ~]$
Of course, this is just on my machine. Yours may do something entirely different.
I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...
Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?
Here's the (simple) program I'm using:
int main() {
char *ptr = 0;
*(ptr) = 0;
}
I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.
Edit: It's generating a ud2 instruction...
The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.
From the clang link above the rationale is explained as follows:
Stores to null and calls through null pointers are turned into a
__builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the
result of other transformations like inlining and constant
propagation) and we used to just delete the blocks that contained them
because they were "obviously unreachable".
While (from a pedantic language lawyer standpoint) this is strictly
true, we quickly learned that people do occasionally dereference null
pointers, and having the code execution just fall into the top of the
next function makes it very difficult to understand the problem. From
the performance angle, the most important aspect of exposing these is
to squash downstream code. Because of this, clang turns these into a
runtime trap: if one of these is actually dynamically reached, the
program stops immediately and can be debugged. The drawback of doing
this is that we slightly bloat code by having these operations and
having the conditions that control their predicates.
at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.
As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.
I'm writing a server-client application in C sharing some informations. Server works in a two-thread mode - with main thread waiting for input and side thread responding to clients' requests. Client works like that too, but it waits for user input (from stdin) and if it receives proper command, it sends a request to a server and waits for response. The wait is done in a side thread processing responses. While this seems all fine and works on Ubuntu-based distro (I use Ultimate Edition 2.7) it crashes on some other distribution. Here's what happens.
Server works flawlessly as it is, but the client suffers from glibc detected crashes (I hope I typed it correctly). When it receives response, it parses its structure which contains:
a) header,
b) some static identifiers,
c) data section containing length and the data itself.
What happens is:
a) client receives packet,
b) client checks for size of it (at least sizeof(header) + sizeof(static_data) + sizeof(length) + data -- and data size is as big as length says)),
c) creates structure - conversion from buffer of chars to the structures above,
d) creates some other structures storing those structures.
Structure is interpreted correctly. I tested it by sending 'fixed' structure to the client through the server's interface and by printing the original, sent data and received informations. So this is not the case. Everything is fine to point c).
To point d) I work on the buffer used to receive incoming packets (max size is specified and the buffer is of that size). To store the structures I got to data. I do it by:
a) allocating new memory of correct size
b) copying data.
I am not to discuss the method. It's all fine as long as it works. But as I said it does not on the other distro. It fails on MALLOC allocating memory in point a). And it fails on everything. My guess was that it could be a problem with thread-safety of malloc and / or printf on the other distro, but the problem is the main thread MOSTLY idles on scanf( .. ) method.
Back to the topic: it fails on anything:
char* buffer = (char*)malloc(fixed_size * sizeof(char))
STRUCT_T* struct = (STRUCT_T*)malloc(sizeof(STRUCT_T)) and so on. No matter what I try to allocate, it always throws the glibc detected error and always points to malloc method (even if it's calloc).
It's making me wonder what's wrong with that ? And this is my thread question. Looks a bit like I am overflowing 'memory space', but I doubt it as it always happen on first response receive. I'd be greatful for any help and can post more details if needed. Side threads are joinable.
Options with which I compile:
CC = gcc
CFLAGS = -Wall -ansi --pedantic -c -O0 -g -std=c99 -pthread
$(CC) $(CFLAGS) server.c -o server.o
gcc server.o $(OBJECTS) -o server -pthread -lm
and includes in client.c file:
sys/fcntl.h
netdb.h
errno.h
stdio.h
unistd.h
stdlib.h
string.h
time.h
pthread.h
math.h
I'm not a newbie with C and Linux, but I mostly work on Windows and C++, so this is rather disturbing. And as I said it works just fine on the distro I use, but it does not on some other while the buffer is parsed correctly.
Thanks in advance.
When malloc crashes, it's usually because you have previously stepped on the data it uses to manage itself (and free). It's difficult or impossible to diagnose at the point of the crash because the problem really happened at some previous time. As has already been suggested, the best way to catch where that previous memory overwrite occurred is to run your program through a program like valgrind, purify, insure++ etc. It will inform you if you overwrite something that you shouldn't. valgrind is free software and it likely to be installed already. It can be as simple as sticking the word valgrind in front of everything else on your client's invocation string.