Recently, I wrote the following, buggy, c code:
#include <stdio.h>
struct IpAddr {
unsigned char a, b, c, d;
};
struct IpAddr ipv4_from_str (const char * str) {
struct IpAddr res = {0};
sscanf(str, "%d.%d.%d.%d", &res.a, &res.b, &res.c, &res.d);
return res;
}
int main (int argc, char ** argv) {
struct IpAddr ip = ipv4_from_str("192.168.1.1");
printf("Read in ip: %d.%d.%d.%d\n", ip.a, ip.b, ip.c, ip.d);
return 0;
}
The bug is that I use %d in sscanf, while supplying pointers to the 1-byte-wide unsigned char. %d accepts a 4-byte wide int pointer, and the difference results in an out-of-bounds write. out-of-bound write is definitely ub, and the program will crash.
My confusion is in the non-constant nature of the bug. Over 1,000 runs, the program segfaults before the print statement 50% of the time, and segfaults after the statement the other 50% of the time. I don't understand why this can change. What is the difference between two invocations of the program? I was under the impression that the memory layout of the stack is consistent, and small test programs I've written seem to confirm this. Is that not the case?
I'm using gcc v11.3.0 on Debian bookworm, kernel 5.14.16-1, and I compiled without any flags set.
Here is the assembly output from my compiler for reference.
Undefined behavior means that anything can happen, even inconsistent results.
In practice, this inconsistency is most likely due to Address Space Layout Randomization. Depending on how the data is located in memory, the out-of-bounds accesses may or may not access unallocated memory or overwrite a critical pointer.
See also Why don't I get a segmentation fault when I write beyond the end of an array?
Related
I know I can just copy the function by reference, but I want to understand what's going on in the following code that produces a segfault.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy(r0c, return0, 100);
printf("Address of r0c is: %x\n", r0c);
printf("copied is: %d\n", (*r0c)());
return 0;
}
Here's my mental model of what I thought should work.
The process owns the memory allocated to r0c. We are copying the data from the data segment corresponding to return0, and the copy is successful.
I thought that dereferencing a function pointer is the same as calling the data segment that the function pointer points to. If that's the case, then the instruction pointer should move to the data segment corresponding to r0c, which will contain the instructions for function return0. The binary code corresponding to return0 doesn't contain any jumps or function calls that would depend on the address of return0, so it should just return 0 and restore ip... 100 bytes is certainly enough for the function pointer, and 0xc3 is well within the bounds of r0c (it is at byte 11).
So why the segmentation fault? Is this a misunderstanding of the semantics of C's function pointers or is there some security feature that prevents self-modifying code that I'm unaware of?
The memory pages used by malloc to allocate memory are not marked as executable. You can't copy code to the heap and expect it to run.
If you want to do something like that you have to go deeper into the operating system, and allocate pages yourself. Then you need to mark those as executable. You would most likely need administrator rights to be able to set the executable flag on memory pages.
And it's really dangerous. If you do this in a program you distribute and have some kind of bug that lets an attacker use our program to write to those allocated memory pages, then the attacker can gain administrator rights and take control of the computer.
There's also other problems with your code, like pointers to functions might not translate well into general pointers on all platforms. It's very hard (not to mention non-standard) to predict or otherwise get the size of a function. You also print out pointers wrong in your code example. (use the "%p" format to print a void *, casting the pointer to a void * is needed).
Also when you declare a function like int fun() that's not the same as declaring a function that takes no arguments. If you want to declare a function that takes no arguments you should explicitly use void as in int fun(void).
The standard says:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
[C2011, 7.24.2.1/2; emphasis added]
In the standard's terminology, functions are not "objects". The standard does not define behavior for the case where the source pointer points to a function, therefore such a memcpy() call produces undefined behavior.
Additionally, the pointer returned by malloc() is an object pointer. C does not provide for direct conversion of object pointers to function pointers, and it does not provide for objects to be called as functions. It is possible to convert between object pointer and function pointer by means of an intermediate integer value, but the effect of doing so is at minimum doubly implementation-defined. Under some circumstances it is undefined.
As in other cases, UB can turn out to be precisely the behavior you hoped for, but it is not safe to rely on that. In this particular case, other answers present good reasons to not expect to get the behavior you hoped for.
As was said in some comments, you need to make the data executable. This requires communicating with the OS to change protections on the data. On Linux, this is the system call int mprotect(void* addr, size_t len, int prot) (see http://man7.org/linux/man-pages/man2/mprotect.2.html).
Here is a Windows solution using VirtualProtect.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#ifdef _WIN32
#include <Windows.h>
#endif
int return0()
{
return 0;
}
int main()
{
int (*r0c)(void) = malloc(100);
memcpy((void*) r0c, (void*) return0, 100);
printf("Address of r0c is: %p\n", (void*) r0c);
#ifdef _WIN32
long unsigned int out_protect;
if(!VirtualProtect((void*) r0c, 100, PAGE_EXECUTE_READWRITE, &out_protect)){
puts("Failed to mark r0c as executable");
exit(1);
}
#endif
printf("copied is: %d\n", (*r0c)());
return 0;
}
And it works.
Malloc returns a pointer to an allocated memory (100 bytes in your case). This memory area is uninitialized; assuming that memory could be executed by the CPU, for your code to work, you would have to fill those 100 bytes with the executable instructions that the function implements (if indeed it can be held in 100 bytes). But as has been pointed out, your allocation is on the heap, not in the text (program) segment and I don't think it can be executed as instructions. Perhaps this would achieve what it is you want:
int return0()
{
return 0;
}
typedef int (*r0c)(void);
int main(void)
{
r0c pf = return0;
printf("Address of r0c is: %x\n", pf);
printf("copied is: %d\n", pf());
return 0;
}
I am playing with simple buffer overflows. However, I found such compiler behaviour quite interesting:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void func(char *arg1) {
int authenticated = 0;
char buffer[4];
strcpy(buffer, arg1);
if(authenticated) {
printf("HACKED !\n");
} else {
printf("POOR !\n");
}
return;
}
int main() {
char* mystr = "abcdefghijkl";
func(mystr);
printf("THANK YOU!\n");
return 0;
}
What make me wonder is fact, that I need to assign 13-element buffer to arg1, not 5-element in order to overwrite authenicated variable.
GDB confirms that:
(gdb) print &authenticated
$1 = (int *) 0x7fffffffe75c
(gdb) print &buffer
$33 = (char (*)[4]) 0x7fffffffe750
The difference is 12 between addresses.
Why in this case compile isn't optimal?
In case of refactoring this functions, the difference is changing, but why not always the difference is 4 which seems to be the most optimal solution.
Thank You
The distance between the variables is larger than you expect because they are being aligned to optimize performance. Some operations require that memory location of a variable is a whole multiple of some number (usually the variable's size). So for example, an 8 byte double could be placed at location 0x1000 in memory, or 0x1008, but not 0x1004. Here's how your stack ends up looking (absent optimization etc.), with the numbers indicating the offset from the base of the stack:
-16: char[] buffer (4 bytes)
-12: padding (8 bytes)
-4: int authenticated (4 bytes)
The int is understandably aligned at 4 bytes, but why is the char buffer being aligned at 16 bytes? In order to be able to exploit SSE instructions for string operations. These require 16 bit memory alignment. Compiling the program with SSE disabled (-mno-sse with gcc) resulted in this layout:
-8: char[] buffer (4 bytes)
-4: int authenticated (4 bytes)
So that confirms that the extra padding was due to SSE.
You are invoking undefined behaviour. Anything can happen.
An optimising compiler will notice that buffer is not used after the strcpy, so the strcpy operation can be removed. It cannot have any detectable side effect without undefined behaviour.
An optimising (or non-optimising) compiler will notice that "authenticated" is always 0 and never changed unless there is undefined behaviour, and a compiler can always assume that there is no undefined behaviour. So it is absolutely fine to always print "POOR !\n".
So any conclusions you try to draw from your experiment, in the presence of undefined behaviour, are 100% unwarranted.
I have this piece of C code running on development box with ASLR enabled. It is returning a char pointer (char *) to a function, but somehow few bytes in the returned pointer address are getting changed, printf output below:
kerb_selftkt_cache is 0x00007f0b8e7fc120
cache_str from get_self_ticket_cache 0xffffffff8e7fc120
The char pointer 0x00007f0b8e7fc120 is being returned to another function, which is getting modified as 0xffffffff8e7fc120 that differs from the original pointer address by one word (4-bytes) 0xffffffff instead of 0x00007f0b, the last four bytes (8e7fc120) being same. Any idea what might be going on? and how do I possibly fix this. The code is running on linux 64-bit architecture on Intel Xeon. This code is from an existing proprietary library, so I can't share the exact code, but the code logic looks something like this:
typedef struct mystr {
int num;
char addr[10];
}mystr;
static mystr m1;
char *get_addr() {
return m1.addr;
}
void myprint() {
printf("mystr m1 address %p\n",&m1);
printf("mystr m1 addr %p\n",m1.addr);
}
int main (int argc, char *argv[]) {
char *retadd;
myprint();
retadd = get_addr();
printf("ret address %p\n",retadd);
return 0;
}
retadd and m1.addr are different when ASLR is turned on.
My guess is the func takes an int or something else only 4 bytes wide, then the argument gets casted to a pointer type which sign-extends it. Except the compiler (gcc?) should warn you even without flags like -Wall, but hey, maybe you have weird-ass macros or something which obfuscate it.
Alternatively what you mean by passing is the fact that you return it (as opposed to passing as an argument). That could be easily explained by C defaulting to int as a return value if function declaration is missing. So in that case make sure you got stuff declared.
For straight C and GCC, why doesn't the pointed-to string get corrupted here?
#include <stdio.h>
int main(int argc, char *argv[])
{
char* str_ptr = NULL;
{
//local to this scope-block
char str[4]={0};
sprintf(str, "AGH");
str_ptr = str;
}
printf("str_ptr: %s\n", str_ptr);
getchar();
return 0;
}
|----OUTPUT-----|
str_ptr: AGH
|--------------------|
Here's a link to the above code compiled and executed using an online compiler.
I understand that if str was a string literal, str would be stored in the bss ( essentially as a static ), but sprintf(ing) to a stack-allocated buffer, I thought the string buffer would be purely stack-based ( and thus the address meaningless after leaving the scope block )? I understand that it may take additional stack allocations to over-write the memory at the given address, but even using a recursive function until a stack-overflow occurred, I was unable to corrupt the string pointed to by str_ptr.
FYI I am doing my testing in a VS2008 C project, although GCC seems to exhibit the same behavior.
While nasal lizards are a popular part of C folklore, code whose behaviour is undefined can actually exhibit any behaviour at all, including magically resuscitating variables whose lifetime has expired. The fact that code with undefined behaviour can appear to "work" should neither be surprising nor an excuse to neglect correcting it. Generally, unless you're in the business of writing compilers, it's not very useful to examine the precise nature of undefined behaviour in any given environment, especially as it might be different after you blink.
In this particular case, the explanation is simple, but it's still undefined behaviour, so the following explanation cannot be relied upon at all. It might at any time be replaced with reptilian emissions.
Generally speaking, C compilers will make each function's stack frame a fixed size, rather than expanding and contracting as control flow enters and leaves internal blocks. Unless called functions are inlined, their stack frames will not overlap with the stack frame of the caller.
So, in certain C compilers with certain sets of compile options and except for particular phases of the moon, the character array str will not be overwritten by the call to printf, even though the variable's lifetime has expired.
Most likely the compiler does some sort of simple optimizations resulting in the string still being in the same place on the stack. In other words, the compiler allows the stack to grow to store 'str'. But it doesn't shrink the stack in the scope of main, because it is not required to do so.
If you really want to see the result of saving the address of variables on the stack, call a function.
#include <stdio.h>
char * str_ptr = NULL;
void onstack(void)
{
char str[4] = {0};
sprintf(str,"AGH");
str_ptr = str;
}
int main(int argc, char *argv[])
{
onstack();
int x = 0x61626364;
printf("str_ptr: %s\n", str_ptr);
printf("x:%i\n",x);
getchar();
return 0;
}
With gcc -O0 -std=c99 strcorrupt.c I get random output on the first printf. It will vary from machine to machine and architecture to architecture.
The problem is very obvious so I will just show you some code:)
#include <stdio.h>
#include <string.h>
char *test1()
{
static char c;
char *p;
p = &c;
printf("p[%08x] : %s\n", (unsigned int)p, p);
return p;
}
void *test2()
{
static char i;
char *buf;
int counter = 0;
for(buf = (char *)&i ; ;counter += 8)
{
memset(buf + counter, 0xff, 8);
printf("write %d bytes to static area!\n", counter);
}
}
int main()
{
char *p;
p = test1();
strcpy(p, "lol i asd");
p = test1();
strcpy(p, "sunus again!");
p = test1();
strcpy(p, "sunus again! i am hacking this!!asdfffffffffffffffffffffffffffffffffffffffffffffffffffffffffff");
p = test1();
test2();
return 0;
}
First I wrote test1().
As you can see, those strcpys should cause a segment fault, because it's obviously accessing an illegal memory area. I knew some basic about static variable, but this is just strange for me.
Then I wrote test2().
finally, it caused a segment fault, after it wrote almost 4k bytes.
So, I wonder how to avoid this kind of error (static variable overflow) from happening?
Why can I access those static memory areas?
I know that they aren't in the stack, nor heap.
PS. Maybe I am not describing my problem clearly.
I have got some years of C programming experience; I know what will happen when this is not static.
Now static changes almost everything and I want to know why.
Undefined behaviour is just that - undefined. It might look like it's working, it might crash, it might steal your lunch money. Just don't do it.
Segmentation fault occurs when you exceed memory page, which is 4kB, so with luck you can write full 4k before it happens, and if next page is already utilized - even that isn't guaranteed. Gcc's stack protector could help sometimes, but not in that case. Valgrind could help too. None of these are guaranteed. You better take care of it yourself.
You cannot avoid the potential for overwriting memory that you did not allocate in C.
As to the manner of the failure... When, where and how your application crashes or otherwise misbehaves depends entirely on your compiler, the compiler flags, and the random state memory was in before your program started executing. You are accessing memory you are not supposed to, and it's completely undefined in C what impact that access will have on the operating environment.
Languages that manage memory (e.g. Java, C#) do that for you, but of course there is a cost to the bounds checking.
You can certainly use memory management libraries (replacements for malloc/free/new/delete) that will attempt to detect improper memory management.
You avoid the problem by knowing how big your memory areas are and by not writing outside the boundaries of those areas. There's no other reliable way to deal with the issue.