segfault during write to the realloc'd area - c

I have a very frustrating problem. My application runs on a few machines flawlessly for a month.
However, there is one machine on which my application crashes nearly every day because of segfault.
It always crashes at the same instruction address:
segfault at 7fec33ef36a8 ip 000000000041c16d sp 00007fec50a55c80 error 6 in myapp[400000+f8000]
This address points to memcpy call.
Below, there is an excerpt #1 from my app:
....
uint32_t size = messageSize - sizeof(uint64_t) + 1;
stack->trcData = (char*)Realloc(stack->trcData,(stack->trcSize + size + sizeof(uint32_t)));
char* buffer = stack->trcData + stack->trcSize;
uint32_t n_size = htonl(size);
memcpy(buffer,&n_size,sizeof(uint32_t)); /* ip 000000000041c16d points here*/
buffer += sizeof(uint32_t);
....
stack->trcSize += size + sizeof(uint32_t);
....
where stack is a structure:
struct Stack{
char* trcData;
uint32_t trcSize;
/* ... some other elements */
};
and Realloc is a realloc wrapper:
#define Realloc(x,y) _Realloc((x),(y),__LINE__)
void* _Realloc(void* ptr,size_t size,int line){
void *tmp = realloc(ptr,size);
if(tmp == NULL){
fprintf(stderr,"R%i: Out of memory: trying to allocate: %lu.\n",line,size);
exit(EXIT_FAILURE);
}
return tmp;
}
messageSize is of uint32_t type and its value is always greater than 44 bytes. The code #1 runs in a loop. stack->trcData is just a buffer which collects some data until some condition is fulfilled. stack->trcData is always initialized to NULL. The application is compiled with gcc with optimization -O3 enabled. When I run it in gdb, of course it did not crash, as I expected;)
I ran out of ideas why myapp crashes during memcpy call. Realloc returns with no error, so I guess it allocated enough space and I can write to this area. Valgrind
valgrind --leak-check=full --track-origins=yes --show-reachable=yes myapp
shows absolutely no invalid reads/writes.
Is it possible that on this particular machine the memory itself is corrupted and it causes these often crashes? Or maybe I corrupt memory somewhere else in myapp, but if this is the case, why it does not crash earlier, when the invalid write is made?
Thanks in advance for any help.
Assembly piece:
41c164: 00
41c165: 48 01 d0 add %rdx,%rax
41c168: 44 89 ea mov %r13d,%edx
41c16b: 0f ca bswap %edx
41c16d: 89 10 mov %edx,(%rax)
41c16f: 0f b6 94 24 47 10 00 movzbl 0x1047(%rsp),%edx
41c176: 00
I'm not sure whether this information is relevant but all the machines, my application runs on successfully, have Intel processors whilst the one causing the problem has AMD.

Here is the cause of my problem.
The point is that at some loop step stack->trcSize + size exceeds UINT32_MAX. That means Realloc in fact shrinks stc->trcData. Next, I define buffer which now is far behind the allocated area. Hence, when I write to buffer I get segfault.
I've checked it and it was indeed the cause.

Related

Generating functions at runtime in C

I would like to generate a function at runtime in C. And by this I mean I would essentially like to allocate some memory, point at it and execute it via function pointer. I realize this is a very complex topic and my question is naïve. I also realize there are some very robust libraries out there that do this (e.g. nanojit).
But I would like to learn the technique, starting with the basics. Could someone knowledgeable give me a very simple example in C?
EDIT: The answer below is great but here is the same example for Windows:
#include <Windows.h>
#define MEMSIZE 100*1024*1024
typedef void (*func_t)(void);
int main() {
HANDLE proc = GetCurrentProcess();
LPVOID p = VirtualAlloc(
NULL,
MEMSIZE,
MEM_RESERVE|MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
func_t func = (func_t)p;
PDWORD code = (PDWORD)p;
code[0] = 0xC3; // ret
if(FlushInstructionCache(
proc,
NULL,
0))
{
func();
}
CloseHandle(proc);
VirtualFree(p, 0, MEM_RELEASE);
return 0;
}
As said previously by other posters, you'll need to know your platform pretty well.
Ignoring the issue of casting a object pointer to a function pointer being, technically, UB, here's an example that works for x86/x64 OS X (and possibly Linux too). All the generated code does is return to the caller.
#include <unistd.h>
#include <sys/mman.h>
typedef void (*func_t)(void);
int main() {
/*
* Get a RWX bit of memory.
* We can't just use malloc because the memory it returns might not
* be executable.
*/
unsigned char *code = mmap(NULL, getpagesize(),
PROT_READ|PROT_EXEC|PROT_WRITE,
MAP_SHARED|MAP_ANON, 0, 0);
/* Technically undefined behaviour */
func_t func = (func_t) code;
code[0] = 0xC3; /* x86 'ret' instruction */
func();
return 0;
}
Obviously, this will be different across different platforms but it outlines the basics needed: get executable section of memory, write instructions, execute instructions.
This requires you to know your platform. For instance, what is the C calling convention on your platform? Where are parameters stored? What register holds the return value? What registers must be saved and restored? Once you know that, you can essentially write some C code that assembles code into a block of memory, then cast that memory into a function pointer (though this is technically forbidden in ANSI C, and will not work depending if your platform marks some pages of memory as non-executable aka NX bit).
The simple way to go about this is simply to write some code, compile it, then disassemble it and look at what bytes correspond to which instructions. You can write some C code that fills allocated memory with that collection of bytes and then casts it to a function pointer of the appropriate type and executes.
It's probably best to start by reading the calling conventions for your architecture and compiler. Then learn to write assembly that can be called from C (i.e., follows the calling convention).
If you have tools, they can help you get some things right easier. For example, instead of trying to design the right function prologue/epilogue, I can just code this in C:
int foo(void* Data)
{
return (Data != 0);
}
Then (MicrosoftC under Windows) feed it to "cl /Fa /c foo.c". Then I can look at "foo.asm":
_Data$ = 8
; Line 2
push ebp
mov ebp, esp
; Line 3
xor eax, eax
cmp DWORD PTR _Data$[ebp], 0
setne al
; Line 4
pop ebp
ret 0
I could also use "dumpbin /all foo.obj" to see that the exact bytes of the function were:
00000000: 55 8B EC 33 C0 83 7D 08 00 0F 95 C0 5D C3
Just saves me some time getting the bytes exactly right...

Process Coredumped but does not look like illegal reference in a multithreaded program

Backtrace of the coredump:
#0 0x0000000000416228 in add_to_epoll (struct_fd=0x18d32760, lno=7901) at lbi.c:7092
#1 0x0000000000418b54 in connect_fc (struct_fd=0x18d32760, type=2) at lbi.c:7901
#2 0x0000000000418660 in poll_fc (arg=0x0) at lbi.c:7686
#3 0x00000030926064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003091ed3c2d in clone () from /lib64/libc.so.6
Code Snippet:
#define unExp(x) __builtin_expect((x),0)
...
7087 int add_to_epoll( struct fdStruct * struct_fd, int lno)
7088 {
7089 struct epoll_event ev;
7090 ev.events = EPOLLIN | EPOLLET | EPOLLPRI | EPOLLERR ;
7091 ev.data.fd = fd_st->fd;
7092 if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, struct_fd->fd,&ev) == -1))
7093 {
7094 perror("client FD ADD to epoll error:");
7095 return -1;
7096 }
7097 else
7098 {
...
7109 }
7110 return 1;
7111 }
Disassembly of the offending line. I am not good at interpreting assembly code but have tried my best:
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd, EPOLL_CTL_ADD, stuct_fd->fd,&ev) == -1))
416210: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->fd
416214: 8b 10 mov (%rax),%edx // to EDX
416216: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax // Storing struct_fd->Hdr->info->epollfd
41621a: 48 8b 80 e8 01 00 00 mov 0x1e8(%rax),%rax // to EDI which failed
416221: 48 8b 80 58 01 00 00 mov 0x158(%rax),%rax // while trying to offset members of the structure
416228: 8b 78 5c mov 0x5c(%rax),%edi // <--- failed here since Reg AX is 0x0
41622b: 48 8d 4d e0 lea 0xffffffffffffffe0(%rbp),%rcx
41622f: be 01 00 00 00 mov $0x1,%esi
416234: e8 b7 e1 fe ff callq 4043f0 <epoll_ctl#plt>
416239: 83 f8 ff cmp $0xffffffffffffffff,%eax
41623c: 0f 94 c0 sete %al
41623f: 0f b6 c0 movzbl %al,%eax
416242: 48 85 c0 test %rax,%rax
416245: 74 5e je 4162a5 <add_to_epoll+0xc9>
Printing out Registers and struct member values:
(gdb) i r $rax
rax 0x0 0
(gdb) p struct_fd
$3 = (struct fdStruct *) 0x18d32760
(gdb) p struct_fd->Hdr
$4 = (StHdr *) 0x3b990f30
(gdb) p struct_fd->Hdr->info
$5 = (struct Info *) 0x3b95b410 // Strangely, this is NOT NULL. Inconsistent with assembly dump.
(gdb) p ev
$6 = {events = 2147483659, data = {ptr = 0x573dc648000003d6, fd = 982, u32 = 982, u64= 6286398667419026390}}
Please let me know if my dis-assembly interpretation is OK. And if yes, would like to understand why gdb not showing NULL when it is printing out the structure members.
OR if the analysis is not perfect would like to know the actual reason of coredump. Please let me know if you need more info.
Thanks
---- The following part has been added Later ----
The proxy is a multithreaded program. Doing more digging came to know that when the problem occurs the following two thread were running in parallel. And when I avoid the two functions to run parallely the problem never occurs. But, the thing is I cannot explain how this behavior results into the original problematic scene:
Thread 1:
------------------------------------------------------------
int new_connection() {
...
struct_fd->Hdr->info=NULL; /* (line 1) */
...
<some code>
...
struct_fd->Hdr->info=Golbal_InFo_Ptr; /* (line 2) */ // This is a malloced memory, once allocated never freed
...
...
}
------------------------------------------------------------
Thread 2 executing add_to_epoll():
------------------------------------------------------------
int add_to_epoll( struct fdStruct * struct_fd, int lno)
{
...
if (unExp(epoll_ctl(struct_fd->Hdr->info->epollfd,...) /* (line 3) */
...
}
------------------------------------------------------------
In the above snippets if execution is done in the order,
LIne 1,
Line 3,
Line 2,
the scene can occur. What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
It is a definite behavior because till now I have got around 12 coredumps of the same problem, all showing the exact same thing.
It is clear that struct_fd->Hdr->info is NULL, as Per Johansson already answered.
However, GDB thinks that it is not. How could that be?
One common way this happens, is when
you change the layout of struct fdStruct, struct StHdr (or both),
and
you neglect to rebuild all objects that use these definitions
The disassembly shows that offsetof(struct fdStruct, Hdr) == 0x1e8 and offsetof(struct StHdr, info) == 0x158. See what GDB prints for the following:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
I bet it would print something other than 0x1e8 and 0x158.
If that's the case, make clean && make may fix the problem.
Update:
(gdb) print/x (char*)&struct_fd->Hdr - (char*)struct_fd
$1 = 0x1e8
(gdb) print/x (char*)&struct_fd->Hdr->info - (char*)struct_fd->Hdr
$3 = 0x158
This proves that GDB's idea of how objects are laid out in memory matches compiled code.
We still don't know whether GDB's idea of the value of struct_fd matches reality. What do these commands print?
(gdb) print struct_fd
(gdb) x/gx $rbp-40
They should produce the same value (0x18d32760). Assuming they do, the only other explanation I can think of is that you have multiple threads accessing struct_fd, and the other thread overwrites the value that used to be NULL with the new value.
I just noticed your update to the question ;-)
What I expect is whenever an illegal reference is encountered it should dump immediately without trying to execute LINE 3 which makes it NON NULL.
Your expectation is incorrect: on any modern CPU, you have multiple cores, and your threads are executing simultaneously. That is, you have this code (time goes down along Y axis):
char *p; // global
Time CPU0 CPU1
0 p = NULL
1 if (*p) p = malloc(1)
2 *p = 'a';
...
At T1, CPU0 traps into the OS, but CPU1 continues. Eventually, the OS processes hardware trap, and dumps memory state at that time. On CPU1, hundreds of instructions may have executed after T1. The clocks between CPU0 and CPU1 aren't even synchronized, they don't necessarily go in lock-step.
Moral of the story: don't access global variables from multiple threads without proper locking.
The C line part of the disassembly does not match the one in the original code. But clearly
struct_fd->Hdr->info
is NULL. gdb shouldn't have a problem printing that, but it does sometimes get confused when the code is compiles with -O2 or higher.

Writing a Trampoline Function

I have managed to overwrite the first few bytes of a function in memory and detour it to my own function. I'm now having problems creating a trampoline function to bounce control back over to the real function.
This is a second part to my question here.
BYTE *buf = (BYTE*)VirtualAlloc(buf, 12, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
void (*ptr)(void) = (void (*)(void))buf;
vm_t* VM_Create( const char *module, intptr_t (*systemCalls)(intptr_t *), vmInterpret_t interpret )
{
MessageBox(NULL, L"Oh Snap! VM_Create Hooked!", L"Success!", MB_OK);
ptr();
return NULL;//control should never get this far
}
void Hook_VM_Create(void)
{
DWORD dwBackup;
VirtualProtect((void*)0x00477C3E, 7, PAGE_EXECUTE_READWRITE, &dwBackup);
//save the original bytes
memset(buf, 0x90, sizeof(buf));
memcpy(buf, (void*)0x00477C3E, 7);
//finish populating the buffer with the jump instructions to the original functions
BYTE *jmp2 = (BYTE*)malloc(5);
int32_t offset2 = ((int32_t)0x00477C3E+7) - ((int32_t)&buf+12);
memset((void*)jmp2, 0xE9, 1);
memcpy((void*)(jmp2+1), &offset2, sizeof(offset2));
memcpy((void*)(buf+7), jmp2, 5);
VirtualProtect((void*)0x00477C3E, 7, PAGE_EXECUTE_READ, &dwBackup);
}
0x00477C3E is the address of the function that has been overwritten. The asm for the original function are saved to buf before i over write them. Then my 5 byte jmp instruction is added to buf to return to the rest of the original function.
The problem arises when ptr() is called, the program crashes. When debugging the site it crashes at does not look like my ptr() function, however double checking my offset calculation looks correct.
NOTE: superfluous code is omitted to make reading through everything easier
EDIT: This is what the ptr() function looks like in ollydbg
0FFB0000 55 PUSH EBP
0FFB0001 57 PUSH EDI
0FFB0002 56 PUSH ESI
0FFB0003 53 PUSH EBX
0FFB0004 83EC 0C SUB ESP,0C
0FFB0007 -E9 F1484EFD JMP 0D4948FD
So it would appear as though my offset calculation is wrong.
So your buf[] ends up containing 2 things:
7 first bytes of the original instruction
jmp
Then you transfer control to buf. Is it guaranteed that the first 7 bytes contain only whole instructions? If not, you can crash while or after executing the last, incomplete instruction beginning in those 7 bytes.
Is it guaranteed that the instructions in those 7 bytes do not do any EIP-relative calculations (this includes instructions with EIP-relative addressing such as jumps and calls primarily)? If not, continuation in the original function won't work properly and probably will end up crashing the program.
Does the original function take any parameters? If it does, simply doing ptr(); will make that original code work with garbage taken from the registers and/or stack (depends on the calling convention) and can crash.
EDIT: One more thing. Use buf+12 instead of &buf+12. In your code buf is a pointer, not array.

Buffer Overflow Attack

I'm trying to execute a very simple buffer overflow attack. I'm pretty much a newbie to this. So, if this question is stupid, please excuse me :-)
The code:
#include<stdio.h>
#include<stdlib.h>
int i, n;
void confused(int i)
{
printf("**Who called me? Why am I here?? *** %x\n ", i);
}
void shell_call(char *c)
{
printf(" ***Now calling \"%s\" shell command *** \n", c);
system(c);
}
void victim_func()
{
int a[4];
printf("Enter n: "); scanf("%d",&n);
printf("~~~~~~~~~~~~~ values and address of n locations ~~~~~~~~~~");
for (i = 0;i <n ;i++)
printf ("\n a[%d] = %x, address = %x", i, a[i], &a[i]);
printf("\nEnter %d HEX Values \n", n);
// Buffer Overflow vulnerability HERE!
for (i=0;i<n;i++) scanf("%x",&a[i]);
printf("Done reading junk numbers\n");
}
int main()
{
victim_func();
printf(“\n done”);
return 0;
}
When I use objdump to get the function addresses, I have the following:
main(): 0x804854d
Address of main() where printf() is called: 0x8048563
victim_func(): 0x8048455
confused(): 0x8048414
Now, what I want is to jump to the function 'confused()' from victim_func() by overflowing the buffer there, and overwriting the return address to the address of confused(). And I want to return back from confused() to the printf() statement in main, and exit normally. So, I provide the following input
Enter n: 7
Enter 7 HEX values:
1
2
3
4
5
8048414 (This is to jump to confused)
8048563 (this is to jump to printf() in main)
Although, the program prints "Done" from that printf statement, it is jumping back to victim_func() and prints "Enter n:"
What am I doing wrong? Any help would be greatly appreciated!
PS: I'm not sure if I have put the question right. Please let me know, if any more information is needed.
A buffer overflow attack is a lot more complex than this. First of all you need to understand assembler in order to perform this. After you disassemble the program and function you want to target you need to determine the stack layout when it's executing that function.
Here's a sample of a buffer overflow it's using visual studio but principle is the same.
#include "stdafx.h"
#include <math.h>
volatile double test;
double function3()
{
test++;
return exp(test);
}
double function2()
{
return log(test);
}
double function1()
{
int a[5] = {0};
a[7] = (int)&function3;
return exp(function2());
}
int _tmain(int argc, _TCHAR* argv[])
{
double a = function1();
test = a;
return a;
}
Thanks to disassembly we know that a in function1 is allocated before where the function saved the stack frame pointer. The value after that one is the return address where function1 should go to if it is finished.
00401090 55 push ebp <- we save the stack pointer
00401091 8B EC mov ebp,esp
00401093 83 EC 1C sub esp,1Ch <- save space to allocate a[5]
00401096 B8 CC CC CC CC mov eax,0CCCCCCCCh
0040109B 89 45 E4 mov dword ptr [ebp-1Ch],eax <- crt debug init a[5]
0040109E 89 45 E8 mov dword ptr [ebp-18h],eax
004010A1 89 45 EC mov dword ptr [ebp-14h],eax
004010A4 89 45 F0 mov dword ptr [ebp-10h],eax
004010A7 89 45 F4 mov dword ptr [ebp-0Ch],eax
004010AA 89 45 F8 mov dword ptr [ebp-8],eax
004010AD 89 45 FC mov dword ptr [ebp-4],eax
From this we can conclude if we overwrite a[7] with a different address, the function will return not to main but with whatever address we wrote in a[7].
Hope this helps.
Now, what I want is to jump to the function 'confused()' from victim_func() by overflowing the buffer there, and overwriting the return address to the address of confused()...
On modern Linux platforms, you will also need to ensure two security features are turned off for testing. First in NX-stacks, and second is Stack Protectors.
To turn off NX-Stacks, use -Wl,z,execstack (as opposed to -Wl,z,noexecstack). To turn off Stack Protectors, use -fno-stack-protector (as opposed to -fstack-protector or -fstack-protector-all).
There's a third protection you might need to turn off. That protection is FORTIFY_SOURCE. FORTIFY_SOURCE uses "safer" variants of high risk functions like memcpy and strcpy. The compiler uses the safer variants when it can deduce the destination buffer size. If the copy would exceed the destination buffer size, then the program calls abort(). To disable FORTIFY_SOURCE, compile the program with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.
The security features are turned on by default because there's been so many problems in the past. In general, its a good thing because it stops many problems (like the one you are experimenting with).
First of all it seems to me that you shouldn't enter the number 5 in your sample input. Your array is declared a[4] thus has elements indexed 0-3 - so your attack input seems wrong to me.
It also seems to me that your program assumes several thing about the architecture:
sizof(int)==sizeof(memory address)
The direction of growth and mechanism of the environments stack implementation
If one of these assumptions is untrue it's never going to work.
This seems like a very hard work work assignment.
There are easier examples of buffer overflow attacks than changing the control flow of the code. For example you might be able to overwrite another piece of data which is supposed to be protected from the user (such as a security setting)
You didn't show us the output of the program with the addresses of a[i]. I suspect that the compiler is doing something like aligning the data on the stack to 16. It could be much further to the return address than you expect.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

Resources