Buffer overflow , stack pointer manipulation using GDB - c

I have a simple problem in c which may be solved using GDB, but I am not able to solved it.
We have a main() function which calls another function, say A(). When function A() executes and returns, instead of returning to main() it goes to another function, say B().
I don't know what to do in A() so that return address will change.

Assuming, the OP wants to force a return from A() to B() instead of to main() from where A() was called before...
I always believed to know how this might happen but never tried by myself. So, I couldn't resist to fiddle a bit.
Manipulation of return can hardly be done portable as it exploits facts of the generated code which may depend on compiler version, compiler settings, platform, and whatever.
At first, I tried to find out some details about coliru which I planned to use for fiddling:
#include <stdio.h>
int main()
{
printf("sizeof (void*): %d\n", sizeof (void*));
printf("sizeof (void*) == sizeof (void(*)()): %s\n",
sizeof (void*) == sizeof (void(*)()) ? "yes" : "no");
return 0;
}
Output:
gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
sizeof (void*): 8
sizeof (void*) == sizeof (void(*)()): yes
Live Demo on coliru
Next, I made a minimal sample to get an impression about the code which will be generated:
Source code:
#include <stdio.h>
void B()
{
puts("in B()");
}
void A()
{
puts("in A()");
}
int main()
{
puts("call A():");
A();
return 0;
}
Compiled with x86-64 gcc 8.2 and -O0:
.LC0:
.string "in B()"
B:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
nop
pop rbp
ret
.LC1:
.string "in A()"
A:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
.LC2:
.string "call A():"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC2
call puts
mov eax, 0
call A
mov eax, 0
pop rbp
ret
Live Explore on godbolt
On Intel x86/x64:
call stores the return address on stack before jumping to the given address
ret pops the return address from stack into PC reg. again.
(Other CPUs might do this differently.)
Additionally, the
push rbp
mov rbp, rsp
is interesting as push stores something on stack as well while rsp is the register with current stack top address and rbp its companion which is usually used for relative addressing of local variables.
So, a local variable (which is addressed relative to rbp – if not optimized) might have a fix offset to the return address on stack.
So, I added some code to the first sample to come in touch:
#include <stdio.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
Output:
&main(): 0x400613, &A(): 0x400553, &B(): 0x400542
call A():
in A()
0x7ffcdedc9738: (+ 0) 00 de ad be ef 4a 11 00
0x7ffcdedc9740: (+ 8) 38 97 dc de fc 7f 00 00
0x7ffcdedc9748: (+16) 60 97 dc de 14 00 00 00
0x7ffcdedc9750: (+24) 60 97 dc de fc 7f 00 00
0x7ffcdedc9758: (+32) 49 06 40 00 00 00 00 00
0x7ffcdedc9760: (+40) 50 06 40 00 00 00 00 00
0x7ffcdedc9768: (+48) 30 48 4a f3 3e 7f 00 00
0x7ffcdedc9770: (+56) 00 00 00 00 00 00 00 00
Live Demo on coliru
This is what I read from this:
0x7ffcdedc9738: (+ 0) 00 de ad be ef 4a 11 00 # local var. buffer
0x7ffcdedc9740: (+ 8) 38 97 dc de fc 7f 00 00 # local var. pI (with address of buffer)
0x7ffcdedc9748: (+16) 60 97 dc de 14 00 00 00 # local var. i (4 bytes)
0x7ffcdedc9750: (+24) 60 97 dc de fc 7f 00 00 # pushed rbp
0x7ffcdedc9758: (+32) 49 06 40 00 00 00 00 00 # 0x400649 <- Aha!
0x400649 is an address which is slightly higher than the address of main() (0x400613). Considering, that there was some code in main() prior the call of A() this makes perfectly sense.
So, if I want to manipulate the return address this has to happen at pI + 32:
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
exit(0);
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
printf("Possible candidate for ret address: %p\n", *(void**)(pI + 32));
*(void**)(pI + 32) = (byte*)&B;
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
I.e. I "patch" the address of function B() as the return address into the stack.
Output:
&main(): 0x400696, &A(): 0x4005aa, &B(): 0x400592
call A():
in A()
0x7fffe0eb0858: (+ 0) 00 de ad be ef 4a 11 00
0x7fffe0eb0860: (+ 8) 58 08 eb e0 ff 7f 00 00
0x7fffe0eb0868: (+16) 80 08 eb e0 14 00 00 00
0x7fffe0eb0870: (+24) 80 08 eb e0 ff 7f 00 00
0x7fffe0eb0878: (+32) cc 06 40 00 00 00 00 00
0x7fffe0eb0880: (+40) e0 06 40 00 00 00 00 00
0x7fffe0eb0888: (+48) 30 c8 41 84 42 7f 00 00
0x7fffe0eb0890: (+56) 00 00 00 00 00 00 00 00
Possible candidate for ret address: 0x4006cc
in B()
Live Demo on coliru
Et voilà: in B().
Instead of assigning the address directly, the same could be achieved by storing a string with at least 40 chars into buffer (only 8 chars capacity):
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char byte;
void B()
{
puts("in B()");
exit(0);
}
void A()
{
puts("in A()");
char buffer[8] = { 0x00, 0xde, 0xad, 0xbe, 0xef, 0x4a, 0x11, 0x00 };
byte *pI = (byte*)buffer;
// dump some bytes from stack
for (int i = 0; i < 64; ++i) {
if (!(i % 8)) printf("%p: (+%2d)", pI + i, i);
printf(" %02x", pI[i]);
if (i % 8 == 7) putchar('\n');
}
// provoke buffer overflow vulnerability
printf("Input: "); fflush(stdout);
fgets(buffer, 40, stdin); // <- intentionally wrong use
// show result
putchar('\n');
}
int main()
{
printf("&main(): %p, &A(): %p, &B(): %p\n", (void*)&main, (void*)&A, (void*)&B);
puts("call A():");
A();
return 0;
}
Compiled and executed with:
$ gcc -std=c11 -O0 main.c
$ echo -e " \xa2\x06\x40\0\0\0\0\0" | ./a.out
To input the exact sequence of bytes by keyboard might be a bit difficult. Copy/paste might work. I used echo and redirection to keep things simple.
Output:
&main(): 0x4007ba, &A(): 0x4006ba, &B(): 0x4006a2
call A():
in A()
0x7ffd1700bac8: (+ 0) 00 de ad be ef 4a 11 00
0x7ffd1700bad0: (+ 8) c8 ba 00 17 fd 7f 00 00
0x7ffd1700bad8: (+16) f0 ba 00 17 14 00 00 00
0x7ffd1700bae0: (+24) f0 ba 00 17 fd 7f 00 00
0x7ffd1700bae8: (+32) f0 07 40 00 00 00 00 00
0x7ffd1700baf0: (+40) 00 08 40 00 00 00 00 00
0x7ffd1700baf8: (+48) 30 48 37 0f 5b 7f 00 00
0x7ffd1700bb00: (+56) 00 00 00 00 00 00 00 00
Input:
in B()
Live Demo on coliru
Please, note that the input of 32 spaces (to align the return address "\xa2\x06\x40\0\0\0\0\0" to the intended offset) "destroys" all the internals of A() which are stored in this range. This might have fatal consequences for the stability of the process but, eventually, it's intact enough to reach B() and report that to console.

Related

GCC produces an empty _start function

I'm running GCC Alpine 9.3.0 in Docker Desktop Community 2.4.0.0 (Docker engine 19.03.13). Every now and then, GCC will build an executable that throws a segmentation fault. The segmentation fault occurs at the very beginning of the program's execution. Whenever this happens, I just recompile the code without changing anything in it or my Makefile, and that fixes the issue.
Examining the executable with objdump --disassemble --disassemble-zeroes --full-contents I noticed that whenever I get a segmentation fault the _start and _start_c functions are empty:
0000000000001068 <_start>:
1068: 00 00 add %al,(%rax)
106a: 00 00 add %al,(%rax)
106c: 00 00 add %al,(%rax)
106e: 00 00 add %al,(%rax)
1070: 00 00 add %al,(%rax)
1072: 00 00 add %al,(%rax)
1074: 00 00 add %al,(%rax)
1076: 00 00 add %al,(%rax)
1078: 00 00 add %al,(%rax)
107a: 00 00 add %al,(%rax)
107c: 00 00 add %al,(%rax)
000000000000107e <_start_c>:
107e: 00 00 add %al,(%rax)
1080: 00 00 add %al,(%rax)
1082: 00 00 add %al,(%rax)
1084: 00 00 add %al,(%rax)
1086: 00 00 add %al,(%rax)
1088: 00 00 add %al,(%rax)
108a: 00 00 add %al,(%rax)
108c: 00 00 add %al,(%rax)
108e: 00 00 add %al,(%rax)
1090: 00 00 add %al,(%rax)
1092: 00 00 add %al,(%rax)
1094: 00 00 add %al,(%rax)
1096: 00 00 add %al,(%rax)
1098: 00 00 add %al,(%rax)
109a: 00 00 add %al,(%rax)
109c: 00 00 add %al,(%rax)
109e: 00 00 add %al,(%rax)
10a0: 00 00 add %al,(%rax)
Compare that to the assembly when the executable does work:
0000000000001068 <_start>:
1068: 48 31 ed xor %rbp,%rbp
106b: 48 89 e7 mov %rsp,%rdi
106e: 48 8d 35 b3 2d 00 00 lea 0x2db3(%rip),%rsi # 3e28 <_DYNAMIC>
1075: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1079: e8 00 00 00 00 callq 107e <_start_c>
000000000000107e <_start_c>:
107e: 48 8b 37 mov (%rdi),%rsi
1081: 48 8d 57 08 lea 0x8(%rdi),%rdx
1085: 45 31 c9 xor %r9d,%r9d
1088: 4c 8d 05 d1 03 00 00 lea 0x3d1(%rip),%r8 # 1460 <_fini>
108f: 48 8d 0d 6a ff ff ff lea -0x96(%rip),%rcx # 1000 <_init>
1096: 48 8d 3d d7 01 00 00 lea 0x1d7(%rip),%rdi # 1274 <main>
109d: e9 9e ff ff ff jmpq 1040 <__libc_start_main#plt>
Questions
What might be causing this? Is there a way I could prevent it?
Update
I tried to compile the same code on my Mac (without using Docker). I ran a script that compiled and ran the code 1000 times and it worked every single time. If I try to compile the code in Docker it fails after the fourth or fifth run (sometimes it even failed during the first try).
I also tried to compile the same code using a Ubuntu 18.04 container with glibc 2.27 installed, but I observed the same issues as I did in Alpine, so it doesn't seem to be a MUSL or Alpine-specific problem.
The problem does not seem to be related to GCC; I used clang 6.0.0-1ubuntu2 inside the Ubuntu Docker container and observed the same issue.
To rule out any hardware issues I used an AWS Ubuntu 20.04.1 virtual machine to run Docker 19.03.13 on top of it and compile the same code. Here I didn't observe any issues. Therefore, it seems that the issue is somehow limited to my Macbook running Docker, but I haven't been able to prove that it's a hardware issue.
Minimal reproducible example
I tried to reduce the code to a simpler example but for some reason then I won't have this problem (or at least I haven't seen it). So this is the smallest example that (sometimes) results in a segmentation fault:
ex1.c
#include <stdio.h>
void DoNothing(int x) {
printf("I'm inside %s\n", __FUNCTION__);
printf("%d is stored at %p\n", x, &x);
x += 1;
printf("x is now %d\n", x);
}
void DoSomething(int* p) {
printf("I'm inside %s\n", __FUNCTION__);
printf("%d is stored at %p\n", *p, p);
*p += 1;
printf("x is now %d\n", *p);
}
int main()
{
int x = 101;
int* p = &x;
printf("%d is stored at address %p\n", x, &x);
printf("%d is stored at address %p\n", *p, p);
printf("%d is stored at address %p\n"
" which is the same as %p\n", x, p, &x);
printf("%d is stored at %p\n", x, &x);
DoNothing(x);
printf("but here in %s, x is %d\n", __FUNCTION__, x);
DoSomething(p);
printf("now back in %s, x is %d\n", __FUNCTION__, x);
x = 101;
int** q = &p;
printf("x equals %d == %d == %d\n", x, *p, **q);
printf("x is at %p == %p == %p\n", &x, p, *q);
printf("p equals %p == %p == %p\n", &x, p, *q);
printf("p is at %p == %p\n", &p, q);
printf("q equals %p == %p\n", &p, q);
printf("q is at %p\n", &q);
return 0;
}
Makefile
.PHONY: clean
CFLAGS=-Wall -Werror -Wextra -std=c99 -g
all:
#mkdir -p build
gcc $(CFLAGS) -o build/ex ex1.c
clean:
rm -f build/*
Dockerfile
FROM alpine:3.12.0
RUN apk add \
cmake=3.17.2-r0 \
g++=9.3.0-r2 \
gcc=9.3.0-r2 \
gdb=9.2-r0 \
libc-dev=0.7.2-r3 \
make=4.3-r0
run.sh
docker run \
--rm \
-i \
-t \
-v $(pwd):/home \
-w '/home' \
ccompiler:$VERSION
To trigger the error, I use this script that compiles the code up to 1000 times:
keep_building.sh
#!/bin/bash
for i in {1..1000};
do
echo $i
make
./build/sample
if [ $? -ne 0 ];
then
echo $i
exit 1
fi
#sleep 5
done
The file you posted is badly corrupted. Not just _start, but the whole program headers table, needed to load and execute the ELF file, has been overwritten with zeros. I strongly suspect you have bad RAM, an overclocked CPU, or some other hardware fault causing this kind of corruption; there is not likely any software-level explanation for it.

Compare 2 GDB-Core Dumps

i'm in serious trouble with a heap/stack corruption. To be able to set a data breakpoint and find the root of the problem, i want to take two core dumps using gdb and then compare them.
First one when i think the heap and stack are still ok, and a second one shortly before my program crashes.
How can i compare those dumps?
Information about my project:
using gcc 5.x
Plugin for a legacy, 3rd-party-program with RT-support. No sources available for the project (for me).
Legacy Project is C, My Plugin is C++.
Other things i tried:
Using address sanitizers -> won't work because the legacy program wont start with them.
Using undefined behavior sanitizers -> same
Figuring out what memory gets corrupted for data breakpoint -> no success, because the corrupted memory does not belong to my code.
Ran Valgrind -> no errors around my code.
Thank you for your help
Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.
A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.
The obvious way to compare the dumps would be to compare a hex-representation:
$ diff -u <(hexdump -C dump1) <(hexdump -C dump2)
--- /dev/fd/63 2020-05-17 10:01:40.370524170 +0000
+++ /dev/fd/62 2020-05-17 10:01:40.370524170 +0000
## -90,8 +90,9 ##
000005e0 00 00 00 00 00 00 00 00 00 00 00 00 80 1f 00 00 |................|
000005f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.
So, more context if needed. The optimal output would be a list of VM addresses including before and after values.
Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (generate) during each phase based on break points triggered by the code:
dump1: Correct state
dump2: Incorrect state, no segmentation fault
dump3: Segmentation fault
The code:
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
int **g_state;
int main()
{
int value = 1;
g_state = malloc(sizeof(int*));
*g_state = &value;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("no corruption\n");
raise(SIGTRAP);
free(g_state);
char **unrelated = malloc(sizeof(int*));
*unrelated = "val";
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("use-after-free hidden by new allocation (invalid value)\n");
raise(SIGTRAP);
printf("use-after-free (segfault)\n");
free(unrelated);
int *unrelated2 = malloc(sizeof(intptr_t));
*unrelated2 = 1;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
return 0;
}
Now, the dumps can be generated:
Starting program: test
state: 1
no corruption
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump1
Saved corefile dump1
(gdb) cont
Continuing.
state: 7102838
use-after-free hidden by new allocation (invalid value)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump2
Saved corefile dump2
(gdb) cont
Continuing.
use-after-free (segfault)
Program received signal SIGSEGV, Segmentation fault.
main () at test.c:31
31 printf("state: %d\n", **g_state);
(gdb) generate dump3
Saved corefile dump3
A quick manual inspection shows the relevant differences:
# dump1
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x7fffffffe2bc
# dump2
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x4008c1
# dump3
$2 = (int **) 0x602260
(gdb) print *g_state
$3 = (int *) 0x1
Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.
Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:
Open a dump
Identify PROGBITS sections of the dump
Remember the data and address information
Repeat the process with the second dump
Compare the two data sets and print the diff
Based on elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.
#include <elf.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#define MAX_MAPPINGS 1024
struct dump
{
char *base;
Elf64_Shdr *mappings[MAX_MAPPINGS];
};
unsigned readdump(const char *path, struct dump *dump)
{
unsigned count = 0;
int fd = open(path, O_RDONLY);
if (fd != -1) {
struct stat stat;
fstat(fd, &stat);
dump->base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf64_Ehdr *header = (Elf64_Ehdr *)dump->base;
Elf64_Shdr *secs = (Elf64_Shdr*)(dump->base+header->e_shoff);
for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
if (secs[secinx].sh_type == SHT_PROGBITS) {
if (count == MAX_MAPPINGS) {
count = 0;
break;
}
dump->mappings[count] = &secs[secinx];
count++;
}
}
dump->mappings[count] = NULL;
}
return count;
}
#define DIFFWINDOW 16
void printsection(struct dump *dump, Elf64_Shdr *sec, const char mode,
unsigned offset, unsigned sizelimit)
{
unsigned char *data = (unsigned char *)(dump->base+sec->sh_offset);
uintptr_t addr = sec->sh_addr+offset;
unsigned size = sec->sh_size;
data += offset;
if (sizelimit) {
size = sizelimit;
}
unsigned start = 0;
for (unsigned i = 0; i < size; i++) {
if (i%DIFFWINDOW == 0) {
printf("%c%016x ", mode, addr+i);
start = i;
}
printf(" %02x", data[i]);
if ((i+1)%DIFFWINDOW == 0 || i + 1 == size) {
printf(" [");
for (unsigned j = start; j <= i; j++) {
putchar((data[j] >= 32 && data[j] < 127)?data[j]:'.');
}
printf("]\n");
}
addr++;
}
}
void printdiff(struct dump *dump1, Elf64_Shdr *sec1,
struct dump *dump2, Elf64_Shdr *sec2)
{
unsigned char *data1 = (unsigned char *)(dump1->base+sec1->sh_offset);
unsigned char *data2 = (unsigned char *)(dump2->base+sec2->sh_offset);
unsigned difffound = 0;
unsigned start = 0;
for (unsigned i = 0; i < sec1->sh_size; i++) {
if (i%DIFFWINDOW == 0) {
start = i;
difffound = 0;
}
if (!difffound && data1[i] != data2[i]) {
difffound = 1;
}
if ((i+1)%DIFFWINDOW == 0 || i + 1 == sec1->sh_size) {
if (difffound) {
printsection(dump1, sec1, '-', start, DIFFWINDOW);
printsection(dump2, sec2, '+', start, DIFFWINDOW);
}
}
}
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: compare DUMP1 DUMP2\n");
return 1;
}
struct dump dump1;
struct dump dump2;
if (readdump(argv[1], &dump1) == 0 ||
readdump(argv[2], &dump2) == 0) {
fprintf(stderr, "Failed to read dumps\n");
return 1;
}
unsigned sinx1 = 0;
unsigned sinx2 = 0;
while (dump1.mappings[sinx1] || dump2.mappings[sinx2]) {
Elf64_Shdr *sec1 = dump1.mappings[sinx1];
Elf64_Shdr *sec2 = dump2.mappings[sinx2];
if (sec1 && sec2) {
if (sec1->sh_addr == sec2->sh_addr) {
// in both
printdiff(&dump1, sec1, &dump2, sec2);
sinx1++;
sinx2++;
}
else if (sec1->sh_addr < sec2->sh_addr) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
else if (sec1) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
return 0;
}
With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:
$ ./compare dump1 dump2
-0000000000601020 86 05 40 00 00 00 00 00 50 3e a8 f7 ff 7f 00 00 [..#.....P>......]
+0000000000601020 00 6f a9 f7 ff 7f 00 00 50 3e a8 f7 ff 7f 00 00 [.o......P>......]
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
-0000000000602280 6e 6f 20 63 6f 72 72 75 70 74 69 6f 6e 0a 00 00 [no corruption...]
+0000000000602280 75 73 65 2d 61 66 74 65 72 2d 66 72 65 65 20 68 [use-after-free h]
-0000000000602290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602290 69 64 64 65 6e 20 62 79 20 6e 65 77 20 61 6c 6c [idden by new all]
The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
The second diff with only the relevant offset:
$ ./compare dump1 dump2
-0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
+0000000000602260 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.
There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.
The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the .data and .bss sections of the library in question if changes in there are relevant to your situation.
Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.

Memory alignment and padding — difference between 32 and 64 bits [duplicate]

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 6 years ago.
I would like to understand the results got with "gcc -m32" and "gcc -m64" compilation on the following small code:
#include <stdio.h>
#include <stdlib.h>
int main() {
struct MixedData
{
char Data1;
short Data2;
int Data3;
char Data4;
};
struct X {
char c;
uint64_t x;
};
printf("size of struct MixedData = %zu\n", sizeof(struct MixedData));
printf("size of struct X = %zu\n", sizeof(struct X));
printf("size of uint64_t = %zu\n", sizeof(uint64_t));
return 0;
}
With "gcc -m32", the ouput is :
size of struct MixedData = 12
size of struct X = 12
size of uint64_t = 8
Is size of struct X equal to 12 because compiler sets the following padding?
struct X {
char c; // 1 byte
char d[3]; // 3 bytes
uint64_t x; // 8 bytes
};
If this is the case, what's the size of a single word with 32 bits compilation (4 bytes?)? If it is equal to 4 bytes, this would be consistent because 12 is a multiple of 4.
Now concerning the size of MixedData with "gcc -m32" compilation, I get "size of struct MixedData = 12". I don't understand this value because I saw that total size of a structure had to be a multiple of the biggest size attribute in this structure. For example, here into structure MixedData, the biggest attribute is int Data3 with sizeof(Data3) = 4 bytes; why don't we have rather the following padding:
struct MixedData
{
char Data1; // 1 byte
char Datatemp1[3]; // 3 bytes
short Data2; // 2 bytes
short Data2temp; // 2 bytes
int Data3; // 4 bytes
char Data4; // 1 byte
char Data4temp[3] // 3 bytes
};
So the total size of struct MixedData would be equal to 16 bytes and not 12 bytes like I get.
Can anyone see what's wrong about these 2 interpretations?
A similar issue is about "gcc -m64" compilation; the output is:
size of struct MixedData = 12
size of struct X = 16
size of uint64_t = 8
The size of struct X (16 bytes) seems to be consistent because I think that compiler in 64 bits mode sets the following padding:
struct X {
char c; // 1 byte
char d[7]; // 7 bytes
uint64_t x; // 8 bytes
};
But I don't understand the value of struct MixedData (12 bytes). Indeed, I don't know how compiler sets the padding in this case because 12 is not a multiple of memory word in 64 bits mode (supposing this one is equal to 8 bytes). Could you tell me the padding generated by "gcc -m64" in this last case (for struct MixedData) ?
This one is a curiosity
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} x;
unsigned fun ( void )
{
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
Compiling then disassembling
64
0000000000000000 <fun>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb>
b: 66 c7 05 00 00 00 00 movw $0x2,0x0(%rip) # 14 <fun+0x14>
12: 02 00
14: c7 05 00 00 00 00 03 movl $0x3,0x0(%rip) # 1e <fun+0x1e>
1b: 00 00 00
1e: c6 05 00 00 00 00 04 movb $0x4,0x0(%rip) # 25 <fun+0x25>
25: b8 0c 00 00 00 mov $0xc,%eax
2a: 5d pop %rbp
2b: c3 retq
32
00000000 <fun>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: c6 05 00 00 00 00 01 movb $0x1,0x0
a: 66 c7 05 02 00 00 00 movw $0x2,0x2
11: 02 00
13: c7 05 04 00 00 00 03 movl $0x3,0x4
1a: 00 00 00
1d: c6 05 08 00 00 00 04 movb $0x4,0x8
24: b8 0c 00 00 00 mov $0xc,%eax
29: 5d pop %ebp
2a: c3 ret
Understand that the m32 and m64 are perhaps poorly described, one is basically the 32 bit processor, 32 bit registers (ebx, eax, ax, ah but not rbx,rax) and the other 64 bit processor with 64 bit registers (rbx,ebx,bx,bh,bl)
There doesnt have to be a connection between the size of or construction of structs vs the instruction set chosen.
the interesting thing here is the size of the struct 1+2+4+1 = 8 so they could have done it in 8 bytes. Now they probably wanted the int aligned, so that would pad it by a byte, and perhaps they wanted the whole thing aligned on a 32 bit boundary adding 3 more so that is probably what happened. The 32 bit code does make this a bit clear, no only did they align the int they also aligned the short. So they pad between Data1 and Data2 to align Data2 on a 16 bit boundary then that makes Data3 aligned on a 32 bit boundary and Data3 is a byte so cant be unaligned. Pad the end to aligned the next thing in .data.
The 64 bit code looks broken, perhaps they want the linker to patch that one up.
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: c6 05 57 0b 20 00 01 movb $0x1,0x200b57(%rip) # 601038 <x>
4004e1: 66 c7 05 50 0b 20 00 movw $0x2,0x200b50(%rip) # 60103a <x+0x2>
4004e8: 02 00
4004ea: c7 05 48 0b 20 00 03 movl $0x3,0x200b48(%rip) # 60103c <x+0x4>
4004f1: 00 00 00
4004f4: c6 05 45 0b 20 00 04 movb $0x4,0x200b45(%rip) # 601040 <x+0x8>
4004fb: b8 0c 00 00 00 mov $0xc,%eax
400500: 5d pop %rbp
400501: c3 retq
ahh, I see yes that is what they were doing. And that is what they did align both Data2 and Data3. I guess I should have made it generate the address to the whole struct...
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} x;
unsigned fun ( void )
{
unsigned long long z;
z=(unsigned long long)&x;
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
int main ( void )
{
fun();
}
producing
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 c7 45 f8 38 10 60 movq $0x601038,-0x8(%rbp)
4004e1: 00
4004e2: c6 05 4f 0b 20 00 01 movb $0x1,0x200b4f(%rip) # 601038 <x>
4004e9: 66 c7 05 48 0b 20 00 movw $0x2,0x200b48(%rip) # 60103a <x+0x2>
4004f0: 02 00
4004f2: c7 05 40 0b 20 00 03 movl $0x3,0x200b40(%rip) # 60103c <x+0x4>
4004f9: 00 00 00
4004fc: c6 05 3d 0b 20 00 04 movb $0x4,0x200b3d(%rip) # 601040 <x+0x8>
400503: b8 0c 00 00 00 mov $0xc,%eax
400508: 5d pop %rbp
400509: c3 retq
confirming the base address 0x60138.
The struct is not tied to the instruction set. Change to this
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} __attribute__((packed)) x;
unsigned fun ( void )
{
unsigned long long z;
z=(unsigned long long)&x;
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
int main ( void )
{
fun();
}
and we get this
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 c7 45 f8 38 10 60 movq $0x601038,-0x8(%rbp)
4004e1: 00
4004e2: c6 05 4f 0b 20 00 01 movb $0x1,0x200b4f(%rip) # 601038 <x>
4004e9: 66 c7 05 47 0b 20 00 movw $0x2,0x200b47(%rip) # 601039 <x+0x1>
4004f0: 02 00
4004f2: c7 05 3f 0b 20 00 03 movl $0x3,0x200b3f(%rip) # 60103b <x+0x3>
4004f9: 00 00 00
4004fc: c6 05 3c 0b 20 00 04 movb $0x4,0x200b3c(%rip) # 60103f <x+0x7>
400503: b8 08 00 00 00 mov $0x8,%eax
400508: 5d pop %rbp
400509: c3 retq
the size of the struct is now 8 bytes, and they generated unaligned accesses.

Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

This question already has answers here:
Decrementing a pointer out of bounds; incrementing it into bounds [duplicate]
(3 answers)
Why is out-of-bounds pointer arithmetic undefined behaviour?
(7 answers)
Closed 8 years ago.
I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it.
When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5.
But it prints 25 when compiling without -O2 (eg -O1) or uncommenting one of the 2 commented lines (prevent inlining).
#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline))
int f(int* todos, int input) {
int* cur = todos-1; // fixes the ++ at the beginning of the loop
int result = input;
while(1) {
cur++;
int ch = *cur;
// printf("(%i)\n", ch);
switch(ch) {
case 0:;
goto end;
case 1:;
result = result*result;
break;
}
}
end:
return result;
}
int main() {
int todos[] = { 1, 0}; // 1:square, 0:end
int input = 5;
int result = f(todos, input);
printf("=%i\n", result);
printf("end\n");
return 0;
}
Is GCC's option -O2 breaking this small program or do I have undefined behavior somewhere?
int* cur = todos-1;
invokes undefined behavior. todos - 1 is an invalid pointer address.
Emphasis mine:
(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
In supplement to #ouah's answer, this explains what the compiler is doing.
Generated assembler for reference:
400450: 48 83 ec 18 sub $0x18,%rsp
400454: be 05 00 00 00 mov $0x5,%esi
400459: 48 8d 44 24 fc lea -0x4(%rsp),%rax
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
400465: 00
400466: 48 83 c0 04 add $0x4,%rax
40046a: 8b 10 mov (%rax),%edx
However if I add a printf in main():
400450: 48 83 ec 18 sub $0x18,%rsp
400454: bf 84 06 40 00 mov $0x400684,%edi
400459: 31 c0 xor %eax,%eax
40045b: 48 89 e6 mov %rsp,%rsi
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
40046c: 00
40046d: e8 ae ff ff ff callq 400420 <printf#plt>
400472: 48 8d 44 24 fc lea -0x4(%rsp),%rax
400477: be 05 00 00 00 mov $0x5,%esi
40047c: 48 83 c0 04 add $0x4,%rax
400480: 8b 10 mov (%rax),%edx
Specifically (in the printf version), these two instructions populate the todo array
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
This is conspicuously missing from the non-printf version, which for some reason only assigns the second element:
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)

Addressing of instruction pointer in Mac OS X x86-64

I wanted to understand a litte more about assembly and wrote a little example:
#include <stdio.h>
#include <math.h>
void f() {
unsigned char i[4];
i[0] = 5;
i[1] = 6;
i[2] = 7;
i[3] = 8;
int j = 0;
for(j=0; j < 20; j++)
printf("%02X\n", i[j]);
}
int main() {
int i[5];
i[0] = 3;
i[1] = 3;
i[2] = 3;
i[3] = 3;
i[4] = 3;
f();
return 0;
}
My goal was to see the actual return address for the instruction pointer, laid down by the call to
callq in main(), when it started f().
I used gdb to disassemble main() and got the following
Dump of assembler code for function main:
0x0000000100000eb0 <main+0>: push %rbp
0x0000000100000eb1 <main+1>: mov %rsp,%rbp
0x0000000100000eb4 <main+4>: sub $0x20,%rsp
0x0000000100000eb8 <main+8>: movl $0x3,-0x1c(%rbp)
0x0000000100000ebf <main+15>: movl $0x3,-0x18(%rbp)
0x0000000100000ec6 <main+22>: movl $0x3,-0x14(%rbp)
0x0000000100000ecd <main+29>: movl $0x3,-0x10(%rbp)
0x0000000100000ed4 <main+36>: movl $0x3,-0xc(%rbp)
0x0000000100000edb <main+43>: callq 0x100000e40 <f>
0x0000000100000ee0 <main+48>: movl $0x0,-0x8(%rbp)
0x0000000100000ee7 <main+55>: mov -0x8(%rbp),%eax
0x0000000100000eea <main+58>: mov %eax,-0x4(%rbp)
0x0000000100000eed <main+61>: mov -0x4(%rbp),%eax
0x0000000100000ef0 <main+64>: add $0x20,%rsp
0x0000000100000ef4 <main+68>: pop %rbp
0x0000000100000ef5 <main+69>: retq
so i was expecting to find the laid down instruction pointer return address to be 0x0000000100000ee0, as this is the next instruction after callq. When I run my program I get ( I grouped these in groups of 4 so you can read them better):
05
06
07
08
40
1B
08
56
FF
7F
00
00
E0
EE
B7
09
01
00
00
00
00
00
00
00
03
00
00
00
03
00
00
00
03
00
00
00
03
00
00
00
Ok, so I can see my 5,6,7,8 that I wrote into my local variable in f() and I can see the local variables of main() those 4-byte integers, which have been set to 3. After 5,6,7,8 (this is a 64 bit system) I would have expected the next 8 bytes to encode the previous value of the %rbp register, and THEN the
next 8 bytes to contain the return address for the instruction pointer. So the return address should be
E0
EE
B7
09
01
00
00
00
Now when I compare this to the 0x0000000100000ee0 that I am expecting from gdb, I can see the 00000001 in the last 4 bytes and I can see the e0 from 00000ee0 in the very first byte. But why am I not getting exactly what I am expecting? I thought about byte-ordering (Mac OS X is little endian I believe), but that would not explain what I see here, from what I understood.
Any input is welcome,
Thank you guys,
Christoph
Try this program and run it multiple times.
#include <stdio.h>
int
main(int argc, char **argv)
{
int foo;
printf("%p %p\n", main, &foo);
return 0;
}
I'm pretty sure that you'll get different addresses every time. MacOS has position independent binaries and the stack changes positions all the time too. This is a security feature.
If you run your program in gdb, you'll probably get what you expect since gdb disables the randomization to make debugging easier.

Resources