i'm in serious trouble with a heap/stack corruption. To be able to set a data breakpoint and find the root of the problem, i want to take two core dumps using gdb and then compare them.
First one when i think the heap and stack are still ok, and a second one shortly before my program crashes.
How can i compare those dumps?
Information about my project:
using gcc 5.x
Plugin for a legacy, 3rd-party-program with RT-support. No sources available for the project (for me).
Legacy Project is C, My Plugin is C++.
Other things i tried:
Using address sanitizers -> won't work because the legacy program wont start with them.
Using undefined behavior sanitizers -> same
Figuring out what memory gets corrupted for data breakpoint -> no success, because the corrupted memory does not belong to my code.
Ran Valgrind -> no errors around my code.
Thank you for your help
Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.
A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.
The obvious way to compare the dumps would be to compare a hex-representation:
$ diff -u <(hexdump -C dump1) <(hexdump -C dump2)
--- /dev/fd/63 2020-05-17 10:01:40.370524170 +0000
+++ /dev/fd/62 2020-05-17 10:01:40.370524170 +0000
## -90,8 +90,9 ##
000005e0 00 00 00 00 00 00 00 00 00 00 00 00 80 1f 00 00 |................|
000005f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.
So, more context if needed. The optimal output would be a list of VM addresses including before and after values.
Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (generate) during each phase based on break points triggered by the code:
dump1: Correct state
dump2: Incorrect state, no segmentation fault
dump3: Segmentation fault
The code:
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
int **g_state;
int main()
{
int value = 1;
g_state = malloc(sizeof(int*));
*g_state = &value;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("no corruption\n");
raise(SIGTRAP);
free(g_state);
char **unrelated = malloc(sizeof(int*));
*unrelated = "val";
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("use-after-free hidden by new allocation (invalid value)\n");
raise(SIGTRAP);
printf("use-after-free (segfault)\n");
free(unrelated);
int *unrelated2 = malloc(sizeof(intptr_t));
*unrelated2 = 1;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
return 0;
}
Now, the dumps can be generated:
Starting program: test
state: 1
no corruption
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump1
Saved corefile dump1
(gdb) cont
Continuing.
state: 7102838
use-after-free hidden by new allocation (invalid value)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump2
Saved corefile dump2
(gdb) cont
Continuing.
use-after-free (segfault)
Program received signal SIGSEGV, Segmentation fault.
main () at test.c:31
31 printf("state: %d\n", **g_state);
(gdb) generate dump3
Saved corefile dump3
A quick manual inspection shows the relevant differences:
# dump1
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x7fffffffe2bc
# dump2
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x4008c1
# dump3
$2 = (int **) 0x602260
(gdb) print *g_state
$3 = (int *) 0x1
Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.
Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:
Open a dump
Identify PROGBITS sections of the dump
Remember the data and address information
Repeat the process with the second dump
Compare the two data sets and print the diff
Based on elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.
#include <elf.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#define MAX_MAPPINGS 1024
struct dump
{
char *base;
Elf64_Shdr *mappings[MAX_MAPPINGS];
};
unsigned readdump(const char *path, struct dump *dump)
{
unsigned count = 0;
int fd = open(path, O_RDONLY);
if (fd != -1) {
struct stat stat;
fstat(fd, &stat);
dump->base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf64_Ehdr *header = (Elf64_Ehdr *)dump->base;
Elf64_Shdr *secs = (Elf64_Shdr*)(dump->base+header->e_shoff);
for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
if (secs[secinx].sh_type == SHT_PROGBITS) {
if (count == MAX_MAPPINGS) {
count = 0;
break;
}
dump->mappings[count] = &secs[secinx];
count++;
}
}
dump->mappings[count] = NULL;
}
return count;
}
#define DIFFWINDOW 16
void printsection(struct dump *dump, Elf64_Shdr *sec, const char mode,
unsigned offset, unsigned sizelimit)
{
unsigned char *data = (unsigned char *)(dump->base+sec->sh_offset);
uintptr_t addr = sec->sh_addr+offset;
unsigned size = sec->sh_size;
data += offset;
if (sizelimit) {
size = sizelimit;
}
unsigned start = 0;
for (unsigned i = 0; i < size; i++) {
if (i%DIFFWINDOW == 0) {
printf("%c%016x ", mode, addr+i);
start = i;
}
printf(" %02x", data[i]);
if ((i+1)%DIFFWINDOW == 0 || i + 1 == size) {
printf(" [");
for (unsigned j = start; j <= i; j++) {
putchar((data[j] >= 32 && data[j] < 127)?data[j]:'.');
}
printf("]\n");
}
addr++;
}
}
void printdiff(struct dump *dump1, Elf64_Shdr *sec1,
struct dump *dump2, Elf64_Shdr *sec2)
{
unsigned char *data1 = (unsigned char *)(dump1->base+sec1->sh_offset);
unsigned char *data2 = (unsigned char *)(dump2->base+sec2->sh_offset);
unsigned difffound = 0;
unsigned start = 0;
for (unsigned i = 0; i < sec1->sh_size; i++) {
if (i%DIFFWINDOW == 0) {
start = i;
difffound = 0;
}
if (!difffound && data1[i] != data2[i]) {
difffound = 1;
}
if ((i+1)%DIFFWINDOW == 0 || i + 1 == sec1->sh_size) {
if (difffound) {
printsection(dump1, sec1, '-', start, DIFFWINDOW);
printsection(dump2, sec2, '+', start, DIFFWINDOW);
}
}
}
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: compare DUMP1 DUMP2\n");
return 1;
}
struct dump dump1;
struct dump dump2;
if (readdump(argv[1], &dump1) == 0 ||
readdump(argv[2], &dump2) == 0) {
fprintf(stderr, "Failed to read dumps\n");
return 1;
}
unsigned sinx1 = 0;
unsigned sinx2 = 0;
while (dump1.mappings[sinx1] || dump2.mappings[sinx2]) {
Elf64_Shdr *sec1 = dump1.mappings[sinx1];
Elf64_Shdr *sec2 = dump2.mappings[sinx2];
if (sec1 && sec2) {
if (sec1->sh_addr == sec2->sh_addr) {
// in both
printdiff(&dump1, sec1, &dump2, sec2);
sinx1++;
sinx2++;
}
else if (sec1->sh_addr < sec2->sh_addr) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
else if (sec1) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
return 0;
}
With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:
$ ./compare dump1 dump2
-0000000000601020 86 05 40 00 00 00 00 00 50 3e a8 f7 ff 7f 00 00 [..#.....P>......]
+0000000000601020 00 6f a9 f7 ff 7f 00 00 50 3e a8 f7 ff 7f 00 00 [.o......P>......]
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
-0000000000602280 6e 6f 20 63 6f 72 72 75 70 74 69 6f 6e 0a 00 00 [no corruption...]
+0000000000602280 75 73 65 2d 61 66 74 65 72 2d 66 72 65 65 20 68 [use-after-free h]
-0000000000602290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602290 69 64 64 65 6e 20 62 79 20 6e 65 77 20 61 6c 6c [idden by new all]
The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
The second diff with only the relevant offset:
$ ./compare dump1 dump2
-0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
+0000000000602260 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.
There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.
The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the .data and .bss sections of the library in question if changes in there are relevant to your situation.
Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.
Related
i encounter this error when trying to run the following function in leetcode. but when ran on VSC with my own main function it works perfectly fine.
Line 15: Char 28: runtime error: store to address 0x602000000070 with insufficient space for an object of type 'char' [solution.c]
0x602000000070: note: pointer points here
01 00 80 67 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
char * longestCommonPrefix(char ** strs, int strsSize)
{
int z = 0;
int i = 0;
char* outputa = malloc(sizeof *outputa * z);
for (i = 0; i < *strs[i]; i++)
{
for (z = 0; z < *strs[i*z]; z++)
{
if (strs[i+1][z] == strs[i][z] && strs[i+2][z] == strs[i][z])
{
outputa[z] = strs[i][z];
continue;
}
break;
}
break;
}
return outputa;
}
ive tried to implement a while loop to check if strs[i][z] != '0', but that just gets me a heap buffer overflow error. ive also tried rearranging how i wrote the malloc variable, with many people on stackoverflow giving different recommendations, all of which just land me on a heap buffer overflow error, or an insufficient space for type 'char' error.
Objective
I want to open a file multiple times, but each fd should be only allowed to write to a specific range.
Background
I have an EEPROM which contains multiple "partitions", where each partition holds different information.
I want to avoid that one partition overflows to a different one, or one partition can read other information and misinterpret them.
My Problem
I wanted to use fcntl along with F_OFD_SETLK so that I can lock a specific range on each opened fd.
Locking works as intended and trying to lock an already locked range will result in EAGAIN, which is expected.
What is not so obvious for me is, that I can write to a range that is locked by a different fd.
Question
Is it possible to lock a certain range in a file so that it is not writeable by a different opened fd?
If not, is there a different way to achieve my goal?
Code:
Link to onlinegdb: https://onlinegdb.com/ewE767rbu
#define _GNU_SOURCE 1
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static void ex(const char *what)
{
perror(what);
exit(1);
}
static void lock_range(int fd, off_t start, off_t len)
{
struct flock ff = {
.l_type = F_WRLCK,
.l_whence = SEEK_SET,
.l_start = start,
.l_len = len,
};
if (fcntl(fd, F_OFD_SETLK, &ff) < 0)
perror("fcntl");
}
static void write_at(int fd, const char *str, off_t offset)
{
if (pwrite(fd, str, strlen(str), offset) < 0)
perror("pwrite");
}
int main()
{
int firstfd = open("/tmp/abc.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
if (firstfd < 0)
ex("open");
if (ftruncate(firstfd, 0x1000) < 0)
ex("ftruncate");
lock_range(firstfd, 0, 0x800);
lock_range(firstfd, 0, 0x800); // check if I can aquire the lock multiple times
int secondfd = open("/tmp/abc.txt", O_RDWR);
if (secondfd < 0)
ex("open");
lock_range(secondfd, 0, 0x800); // this one fails on purpose
lock_range(secondfd, 0x800, 0);
lock_range(firstfd, 0x801, 1); // and this one fails on purpose
write_at(firstfd, "hallo", 0);
write_at(firstfd, "hallo", 0x900); // this should fail, but doesn't
write_at(secondfd, "welt", 0); // this should fail, but doesn't
write_at(secondfd, "welt", 0x900);
close(firstfd);
close(secondfd);
system("hexdump -C /tmp/abc.txt"); // just for visualization
}
Output:
fcntl: Resource temporarily unavailable
fcntl: Resource temporarily unavailable
00000000 77 65 6c 74 6f 00 00 00 00 00 00 00 00 00 00 00 |welto...........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000900 77 65 6c 74 6f 00 00 00 00 00 00 00 00 00 00 00 |welto...........|
00000910 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
Please note welto which is hallo overriden by welt. I expected hallo at 0x0 and welt at 0x900.
Locks come in two flavours: mandatory locks, and advisory locks. These are advisory locks. This means they prevent others from obtaining a lock. Period. They don't prevent writes or any other form of modification.
If not, is there a different way to achieve my goal?
Don't ignore failures to obtain a lock.
I'm writing a little eBPF program to set a flag whenever the tracepoint consume_skb is triggered. When I try to load it however, the BPF verifier complains, saying R1 type=inv expected=map_ptr when trying to access the map. I tried to follow the samples given in the linux kernel as closely as possible, but I am failing at even reading anything from the map.
eBPF program
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} my_map SEC(".maps");
SEC("tracepoint/skb/consume_skb")
int bpf_sys(void *ctx)
{
int index = 0;
int *value;
value = bpf_map_lookup_elem(&my_map, &index);
if (!value)
{
return 1;
}
return 0;
}
char _license[] SEC("license") = "GPL";
loader, until point of failure
int main()
{
int bfd;
unsigned char buf[1024] = {};
struct bpf_insn *insn;
union bpf_attr attr = {};
unsigned char log_buf[4096] = {};
int pfd;
int n;
bfd = open("./test_bpf", O_RDONLY);
if (bfd < 0)
{
printf("open eBPF program error: %s\n", strerror(errno));
exit(-1);
}
n = read(bfd, buf, 1024);
close(bfd);
insn = (struct bpf_insn*)buf;
attr.prog_type = BPF_PROG_TYPE_TRACEPOINT;
attr.insns = (unsigned long)insn;
attr.insn_cnt = n / sizeof(struct bpf_insn);
attr.license = (unsigned long)"GPL";
attr.log_size = sizeof(log_buf);
attr.log_buf = (unsigned long)log_buf;
attr.log_level = 1;
pfd = syscall(SYS_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
if (pfd < 0)
{
printf("bpf syscall error: %s\n", strerror(errno));
printf("log_buf = %s\n", log_buf);
exit(-1);
}
...
llvm output
Disassembly of section tracepoint/skb/consume_skb:
0000000000000000 bpf_sys:
; {
0: b7 01 00 00 00 00 00 00 r1 = 0
; int index = 0;
1: 63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r1
2: bf a2 00 00 00 00 00 00 r2 = r10
3: 07 02 00 00 fc ff ff ff r2 += -4
; value = bpf_map_lookup_elem(&my_map, &index);
4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
6: 85 00 00 00 01 00 00 00 call 1
7: bf 01 00 00 00 00 00 00 r1 = r0
8: b7 00 00 00 01 00 00 00 r0 = 1
; if (!value)
9: 15 01 01 00 00 00 00 00 if r1 == 0 goto +1 <LBB0_2>
10: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000058 LBB0_2:
; }
11: 95 00 00 00 00 00 00 00 exit
Error when loading
bpf syscall error: Permission denied
log_buf = 0: (b7) r1 = 0
1: (63) *(u32 *)(r10 -4) = r1
last_idx 1 first_idx 0
regs=2 stack=0 before 0: (b7) r1 = 0
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = 0x0
6: (85) call bpf_map_lookup_elem#1
R1 type=inv expected=map_ptr
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Also, is there a better way to debug BPF error messages?
Issues
Parsing the bytecode
Your loader does not work, at least not directly on the object file compiled from your C code. It just reads from the file, does not parse for an ELF section where the bytecode would be contained. Apparently you're getting the verifier output for the correct eBPF bytecode, though; so I suspect you've been using it in a different way, or with some intermediate step.
Setting up the maps
With regards to your issue, your loader also misses the creation of a map and the ELF relocation of the FD for that map into the bytecode instructions, which would be why you get your error. Typically, when you load a program, several steps occur including:
The loader program (often via libbpf) creates all eBPF maps needed by the eBPF program
Alternatively, it reuses some existing maps. In both cases (existing or new maps), it retrieves a file descriptor for each map (e.g. via their pinned path for existing maps, or returned by the syscall when creating new maps).
Then it inserts this file descriptor into the eBPF bytecode, as an argument to the helper functions used for map access.
At last the loader program loads the updated bytecode into the kernel.
The bytecode goes through the verifier, where the file descriptors are replaced with pointers to the map areas in the kernel memory space.
Your loader is not creating maps (or retrieving FDs for existing, compatible maps), and does not update the bytecode accordingly. So the eBPF program sent to the kernel doesn't say what map it tries to access: we just see, from the verifier logs, 4: (18) r1 = 0x0, where at this stage it should be a memory pointer, such as 4: (18) r1 = 0xffff8ca2c29e3200.
Workarounds
You can obviously extend your loader application to handle maps and ELF relocation. But here are a few other suggestions.
Adding new programs directly amongst the kernel samples is not always ideal. The build system for these samples is rather involved, and relies on a number of header inclusions that are not really standard outside of the kernel repository. On older versions it would also not rely on libbpf but would reimplement most of the loading components, although I think this has improved.
So I would suggest building outside of the kernel repository, if this is not what you have been doing already. In particular, if you're working with C, you should have a look at how libbpf can help handling (loading, attaching, etc.) eBPF programs and other objects.
One of the best starting point would also be libbpf-boostrap, which provides a framework with templates for building eBPF programs and corresponding user space applications very quickly. I would definitely recommend looking at it.
Alternatively, you should be able to load your program with e.g. bpftool prog load <object_file.o> /sys/fs/bpf/<some_path>, although you can't attach it to tracepoints with bpftool only.
I'm trying to write a simple socket filter eBPF program that can access the socket buffer data.
#include <linux/bpf.h>
#include <linux/if_ether.h>
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("socket_filter")
int myprog(struct __sk_buff *skb) {
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct ethhdr *eth = data;
if ((void*)eth + sizeof(*eth) > data_end)
return 0;
return 1;
}
And I'm compiling using clang:
clang -I./ -I/usr/include/x86_64-linux-gnu/asm \
-I/usr/include/x86_64-linux-gnu/ -O2 -target bpf -c test.c -o test.elf
However when I try to load the program I get the following verifier error:
invalid bpf_context access off=80 size=4
My understanding of this error is that it should be thrown when you try to access context data that hasn't been checked to be within data_end, however my code does do that:
Here is the instructions for my program
0000000000000000 packet_counter:
0: 61 12 50 00 00 00 00 00 r2 = *(u32 *)(r1 + 80)
1: 61 11 4c 00 00 00 00 00 r1 = *(u32 *)(r1 + 76)
2: 07 01 00 00 0e 00 00 00 r1 += 14
3: b7 00 00 00 01 00 00 00 r0 = 1
4: 3d 12 01 00 00 00 00 00 if r2 >= r1 goto +1 <LBB0_2>
5: b7 00 00 00 00 00 00 00 r0 = 0
which would imply that the error is being caused by reading the pointer to data_end? However it only happens if I don't try to check the bounds later.
This is because your BPF program is a “socket filter”, and that such programs are not allowed to do direct packet access (see sk_filter_is_valid_access(), where we return false on trying to read skb->data or skb->data_end for example). I do not know the specific reason why it is not available, although I suspect this would be a security precaution as socket filter programs may be available to unprivileged users.
Your program loads just fine as a TC classifier, for example (bpftool prog load foo.o /sys/fs/bpf/foo type classifier -- By the way thanks for the standalone working reproducer, much appreciated!).
If you want to access data for a socket filter, you can still use the bpf_skb_load_bytes() (or bpf_skb_store_bytes()) helper, which automatically does the check on length. Something like this:
#include <linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
static void *(*bpf_skb_load_bytes)(const struct __sk_buff *, __u32,
void *, __u32) =
(void *) BPF_FUNC_skb_load_bytes;
SEC("socket_filter")
int myprog(struct __sk_buff *skb)
{
__u32 foo;
if (bpf_skb_load_bytes(skb, 0, &foo, sizeof(foo)))
return 0;
if (foo == 3)
return 0;
return 1;
}
Regarding your last comment:
However it only happens if I don't try to check the bounds later.
I suspect clang compiles out the assignments for data and data_end if you do not use them in your code, so they are no longer present and no longer a problem for the verifier.
I have a simple problem in c which may be solved using GDB, but I am not able to solved it.
We have a main() function which calls another function, say A(). When function A() executes and returns, instead of returning to main() it goes to another function, say B().
I don't know what to do in A() so that return address will change.
Assuming, the OP wants to force a return from A() to B() instead of to main() from where A() was called before...
I always believed to know how this might happen but never tried by myself. So, I couldn't resist to fiddle a bit.
Manipulation of return can hardly be done portable as it exploits facts of the generated code which may depend on compiler version, compiler settings, platform, and whatever.
At first, I tried to find out some details about coliru which I planned to use for fiddling:
#include <stdio.h>
int main()
{
printf("sizeof (void*): %d\n", sizeof (void*));
printf("sizeof (void*) == sizeof (void(*)()): %s\n",
sizeof (void*) == sizeof (void(*)()) ? "yes" : "no");
return 0;
}
Output:
gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
sizeof (void*): 8
sizeof (void*) == sizeof (void(*)()): yes
Live Demo on coliru
Next, I made a minimal sample to get an impression about the code which will be generated:
Source code:
#include <stdio.h>
void B()
{
puts("in B()");
}
void A()
{
puts("in A()");
}
int main()
{
puts("call A():");
A();
return 0;
}
Compiled with x86-64 gcc 8.2 and -O0:
.LC0:
.string "in B()"
B:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
nop
pop rbp
ret
.LC1:
.string "in A()"
A:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
.LC2:
.string "call A():"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC2
call puts
mov eax, 0
call A
mov eax, 0
pop rbp
ret
Live Explore on godbolt
On Intel x86/x64:
call stores the return address on stack before jumping to the given address
ret pops the return address from stack into PC reg. again.
(Other CPUs might do this differently.)
Additionally, the
push rbp
mov rbp, rsp
is interesting as push stores something on stack as well while rsp is the register with current stack top address and rbp its companion which is usually used for relative addressing of local variables.
So, a local variable (which is addressed relative to rbp – if not optimized) might have a fix offset to the return address on stack.
So, I added some code to the first sample to come in touch:
#include <stdio.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
Output:
&main(): 0x400613, &A(): 0x400553, &B(): 0x400542
call A():
in A()
0x7ffcdedc9738: (+ 0) 00 de ad be ef 4a 11 00
0x7ffcdedc9740: (+ 8) 38 97 dc de fc 7f 00 00
0x7ffcdedc9748: (+16) 60 97 dc de 14 00 00 00
0x7ffcdedc9750: (+24) 60 97 dc de fc 7f 00 00
0x7ffcdedc9758: (+32) 49 06 40 00 00 00 00 00
0x7ffcdedc9760: (+40) 50 06 40 00 00 00 00 00
0x7ffcdedc9768: (+48) 30 48 4a f3 3e 7f 00 00
0x7ffcdedc9770: (+56) 00 00 00 00 00 00 00 00
Live Demo on coliru
This is what I read from this:
0x7ffcdedc9738: (+ 0) 00 de ad be ef 4a 11 00 # local var. buffer
0x7ffcdedc9740: (+ 8) 38 97 dc de fc 7f 00 00 # local var. pI (with address of buffer)
0x7ffcdedc9748: (+16) 60 97 dc de 14 00 00 00 # local var. i (4 bytes)
0x7ffcdedc9750: (+24) 60 97 dc de fc 7f 00 00 # pushed rbp
0x7ffcdedc9758: (+32) 49 06 40 00 00 00 00 00 # 0x400649 <- Aha!
0x400649 is an address which is slightly higher than the address of main() (0x400613). Considering, that there was some code in main() prior the call of A() this makes perfectly sense.
So, if I want to manipulate the return address this has to happen at pI + 32:
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
exit(0);
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
printf("Possible candidate for ret address: %p\n", *(void**)(pI + 32));
*(void**)(pI + 32) = (byte*)&B;
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
I.e. I "patch" the address of function B() as the return address into the stack.
Output:
&main(): 0x400696, &A(): 0x4005aa, &B(): 0x400592
call A():
in A()
0x7fffe0eb0858: (+ 0) 00 de ad be ef 4a 11 00
0x7fffe0eb0860: (+ 8) 58 08 eb e0 ff 7f 00 00
0x7fffe0eb0868: (+16) 80 08 eb e0 14 00 00 00
0x7fffe0eb0870: (+24) 80 08 eb e0 ff 7f 00 00
0x7fffe0eb0878: (+32) cc 06 40 00 00 00 00 00
0x7fffe0eb0880: (+40) e0 06 40 00 00 00 00 00
0x7fffe0eb0888: (+48) 30 c8 41 84 42 7f 00 00
0x7fffe0eb0890: (+56) 00 00 00 00 00 00 00 00
Possible candidate for ret address: 0x4006cc
in B()
Live Demo on coliru
Et voilà: in B().
Instead of assigning the address directly, the same could be achieved by storing a string with at least 40 chars into buffer (only 8 chars capacity):
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
exit(0);
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
// provoke buffer overflow vulnerability
printf("Input: "); fflush(stdout);
fgets(buffer, 40, stdin); // <- intentionally wrong use
// show result
putchar('\n');
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
Compiled and executed with:
$ gcc -std=c11 -O0 main.c
$ echo -e " \xa2\x06\x40\0\0\0\0\0" | ./a.out
To input the exact sequence of bytes by keyboard might be a bit difficult. Copy/paste might work. I used echo and redirection to keep things simple.
Output:
&main(): 0x4007ba, &A(): 0x4006ba, &B(): 0x4006a2
call A():
in A()
0x7ffd1700bac8: (+ 0) 00 de ad be ef 4a 11 00
0x7ffd1700bad0: (+ 8) c8 ba 00 17 fd 7f 00 00
0x7ffd1700bad8: (+16) f0 ba 00 17 14 00 00 00
0x7ffd1700bae0: (+24) f0 ba 00 17 fd 7f 00 00
0x7ffd1700bae8: (+32) f0 07 40 00 00 00 00 00
0x7ffd1700baf0: (+40) 00 08 40 00 00 00 00 00
0x7ffd1700baf8: (+48) 30 48 37 0f 5b 7f 00 00
0x7ffd1700bb00: (+56) 00 00 00 00 00 00 00 00
Input:
in B()
Live Demo on coliru
Please, note that the input of 32 spaces (to align the return address "\xa2\x06\x40\0\0\0\0\0" to the intended offset) "destroys" all the internals of A() which are stored in this range. This might have fatal consequences for the stability of the process but, eventually, it's intact enough to reach B() and report that to console.