Obtaining Source Code From C Code Compiled with Debugging Options [closed]

Obtaining Source Code From C Code Compiled with Debugging Options [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have heard that it is possible to obtain the source code from an executable if it was compiled with debugging (-g) enabled. Is this true? If so, how would one go about doing it?

You can't restore source code from binary executable.
You can use decompiler like REC Studio or Boomerang to convert dissassembled binary into c code, but this code won't be anything like initial code you compiled. It would be more like assembly written in C syntax. If your application is complicated, it probably won't be able to compile. Debugging symbols can help, but not a lot. Many information is lost during compilation and can't be restored.

You can use objdump utility.
objdump has option '-S' which also provides source code and respective assembly code. This also helps in debugging crashes (segmentation faults).
e.g. I have copied the code snippet and its objdump (just for example).
Only the main() of objdump is copied, it contains other info too.
Steps to do :
Suppose you wrote code in file temp.c .
gcc -g temp.c -o temp
objdump -DS temp > temp.dump
#include <stdio.h>
int main()
{
int a, *b;
a = 0;
b = 0;
printf("a=%d *b=%d", a, *b);
return 0;
}
0804841d <main>:
#include <stdio.h>
int main()
{
804841d: 55 push %ebp
804841e: 89 e5 mov %esp,%ebp
8048420: 83 e4 f0 and $0xfffffff0,%esp
8048423: 83 ec 20 sub $0x20,%esp
int a, *b;
a = 0;
8048426: c7 44 24 18 00 00 00 movl $0x0,0x18(%esp)
804842d: 00
b = 0;
804842e: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
8048435: 00
printf("a=%d *b=%d", a, *b);
8048436: 8b 44 24 1c mov 0x1c(%esp),%eax
804843a: 8b 00 mov (%eax),%eax
804843c: 89 44 24 08 mov %eax,0x8(%esp)
8048440: 8b 44 24 18 mov 0x18(%esp),%eax
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 f0 84 04 08 movl $0x80484f0,(%esp)
804844f: e8 9c fe ff ff call 80482f0 <printf#plt>
return 0;
8048454: b8 00 00 00 00 mov $0x0,%eax
}
8048459: c9 leave
804845a: c3 ret
804845b: 66 90 xchg %ax,%ax
804845d: 66 90 xchg %ax,%ax
804845f: 90 nop

Related

Segmentation fault inline jmp [duplicate]

This question already has answers here:
Segmentation fault in C and infinite loop - self calling main function
(4 answers)
Is there a limit of stack size of a process in linux
(4 answers)
GCC Inline Assembly: Jump to label outside block
(2 answers)
How does linux know when to allocate more pages to a call stack?
(1 answer)
How is Stack memory allocated when using 'push' or 'sub' x86 instructions?
(2 answers)
Closed 1 year ago.
I was playing with inline assembly, and I've noticed something strange. I've written a program which calls a wrapper function of jmp and executes in loop:
#include <stdint.h>
void asm_jmp(void* address)
{
__asm__("jmp\t*%%rax"
:
:"a" (address)
:);
}
int main()
{
asm_jmp(&main + 4);
}
I've opened it with gdb and I noticed it iterates hundreds and hundreds of times before giving segmentation fault. Maybe I'm missing something, but I don't see where there could be a problem in this program which causes it to segfault.
Initially I thought that calling asm_jmp in loop saturated the stack, since each call adds an address onto the stack, but there is no return to free the space occupied by that address. Is this the problem? Or there's something else?
Here is the assembly obtained with objdump:
0000000000001119 <asm_jmp>:
1119: 55 push %rbp
111a: 48 89 e5 mov %rsp,%rbp
111d: 48 89 7d f8 mov %rdi,-0x8(%rbp)
1121: 48 8b 45 f8 mov -0x8(%rbp),%rax
1125: ff e0 jmp *%rax
1127: 90 nop
1128: 5d pop %rbp
1129: c3 ret
000000000000112a <main>:
112a: 55 push %rbp
112b: 48 89 e5 mov %rsp,%rbp
112e: 48 8d 05 0d 00 00 00 lea 0xd(%rip),%rax # 1142 <main+0x18>
1135: 48 89 c7 mov %rax,%rdi
1138: e8 dc ff ff ff call 1119 <asm_jmp>
113d: b8 00 00 00 00 mov $0x0,%eax
1142: 5d pop %rbp
1143: c3 ret
1144: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
114b: 00 00 00
114e: 66 90 xchg %ax,%ax

Why does 32 bit compiler and 64 bit compiler makes such a difference with my code? [duplicate]

This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
Excuse my bad English.
I have written down some lines to return max, min, sum of all values, and arrange all values in ascending order when five integers are input.
While writing, I mistakenly wrote 'num[4]' when I declared a INT array when I needed to put in 5 integers.
But as I compiled with TDM-GCC 4.9.2 64-bit release, it worked without any problem. As soon as I realized and changed to TDM-GCC 4.9.2 32-bit release, it did not.
This is my whole code;
#include<stdio.h>
int main()
{
int num[4],i,j,k,a,b,c,m,number,sum=0;
printf("This program returns max, min, sum of all values, and arranges all values in ascending order when five integers are input.\n");
printf("Please enter five integers.\n");
for(i=0;i<5;i++)
{
printf("Enter #%d\n",i+1);
scanf("%d",&num[i]);
}
//arrange all values
for(j=0;j<5;j++)
{
for(k=j+1;k<5;k++)
{
if(num[j]>num[k])
{
number=num[j];
num[j]=num[k];
num[k]=number;
}
}
}
//find maximum value
int max=num[0];
for(a=1;a<5;a++)
{
if(max<num[a])
{
max=num[a];
}
}
//find minimum value
int min=num[0];
for(b=1;b<5;b++)
{
if(min>num[b])
{
min=num[b];
}
}
//find sum of all values
for(c=0;c<5;c++)
{
sum=sum+num[c];
}
printf("Max Value : %d\n",max);//print max
printf("Min Value : %d\n",min);//print min
printf("Sum : %d\n",sum); //print sum
printf("In ascending order : "); //print all values in ascending order
for(m=0;m<5;m++)
{
printf("%d ",num[m]);
}
}
I am new to C and all kinds of programming, and don't know how to search these kind of problems. I know my way of asking like this here is very inappropriate, and I sincerely apologize to people who are irritated by these types of questioning posts. But this is my best try, so please don't blame, but I'm willing to accept any kind of advice or tips.
Thank you.

When allocating on the stack, GCC targeting 64-bit (and probably Clang) will align stack allocations to 8 bytes.
For 32-bit targets, it's only going to use 4 bytes of padding.
So when you compiled your program for 64-bit, an extra four bytes was used to pad the stack. That's why when you accessed that last integer, it didn't segfault.
To see this in action, we'll create a test file.
void test_func() {
int n[4];
int b = 11;
for (int i = 0; i < 4; i++) {
n[i] = b;
}
}
And we'll compile it for 32-bit and 64-bit.
gcc -g -c -m64 test.c -o test_64.o
gcc -g -c -m32 test.c -o test_32.o
And now we'll print the disassembly for each.
objdump -S test_64.o >test_64_dis.txt
objdump -S test_32.o >test_32_dis.txt
Here's the contents of the 64-bit version.
test_64.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func>:
void func() {
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
13: 00 00
15: 48 89 45 f8 mov %rax,-0x8(%rbp)
19: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1b: c7 45 dc 0b 00 00 00 movl $0xb,-0x24(%rbp)
for (int i = 0; i < 4; i++) {
22: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
29: eb 10 jmp 3b <func+0x3b>
n[i] = b;
2b: 8b 45 d8 mov -0x28(%rbp),%eax
2e: 48 98 cltq
30: 8b 55 dc mov -0x24(%rbp),%edx
33: 89 54 85 e0 mov %edx,-0x20(%rbp,%rax,4)
for (int i = 0; i < 4; i++) {
37: 83 45 d8 01 addl $0x1,-0x28(%rbp)
3b: 83 7d d8 03 cmpl $0x3,-0x28(%rbp)
3f: 7e ea jle 2b <func+0x2b>
}
}
41: 90 nop
42: 48 8b 45 f8 mov -0x8(%rbp),%rax
46: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
4d: 00 00
4f: 74 05 je 56 <func+0x56>
51: e8 00 00 00 00 callq 56 <func+0x56>
56: c9 leaveq
57: c3 retq
Here's the 32-bit version.
test_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
void func() {
0: f3 0f 1e fb endbr32
4: 55 push %ebp
5: 89 e5 mov %esp,%ebp
7: 83 ec 28 sub $0x28,%esp
a: e8 fc ff ff ff call b <func+0xb>
f: 05 01 00 00 00 add $0x1,%eax
14: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a: 89 45 f4 mov %eax,-0xc(%ebp)
1d: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1f: c7 45 e0 0b 00 00 00 movl $0xb,-0x20(%ebp)
for (int i = 0; i < 4; i++) {
26: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%ebp)
2d: eb 0e jmp 3d <func+0x3d>
n[i] = b;
2f: 8b 45 dc mov -0x24(%ebp),%eax
32: 8b 55 e0 mov -0x20(%ebp),%edx
35: 89 54 85 e4 mov %edx,-0x1c(%ebp,%eax,4)
for (int i = 0; i < 4; i++) {
39: 83 45 dc 01 addl $0x1,-0x24(%ebp)
3d: 83 7d dc 03 cmpl $0x3,-0x24(%ebp)
41: 7e ec jle 2f <func+0x2f>
}
}
43: 90 nop
44: 8b 45 f4 mov -0xc(%ebp),%eax
47: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
4e: 74 05 je 55 <func+0x55>
50: e8 fc ff ff ff call 51 <func+0x51>
55: c9 leave
56: c3 ret
Disassembly of section .text.__x86.get_pc_thunk.ax:
00000000 <__x86.get_pc_thunk.ax>:
0: 8b 04 24 mov (%esp),%eax
3: c3 ret
You can see the compiler is generating 24 bytes and then 20 bytes respectively, if you look right after the variable declarations.
Regarding advice/tips you asked for, a good starting point would be to enable all compiler warnings and treat them as errors. In GCC and Clang, you'd use the -Wall -Wextra -Werror -Wfatal-errors.
I wouldn't recommend this if you're using the MSVC compiler, though, which often issues warnings about declarations from the header files it's distributed with.

Other answers cover what might he actually happening, by analyzing the generated assembly, but the really relevant explanation is: Indexing out of array bounds is Undefined Behavior in C. And that's kinda the end of story.
UB means, the code is "allowed" to do anything by C standard. It could do different thing every time it is run. It could do what you want it to do with no ill effects. It might do what you want, but then something completely unrelated behaves in a funny way. Compiler, operating system, or even phase of the moon could make a difference. Or not.
It is generally not useful to think about what actually happens with Undefined Behavior at C level. You can of course produce the assembly output of a particular compilation, and inspect what it does, but that is result of that one compilation. A new compilation might change things (even if you just do new build at different time, because value of __TIME__ macro depends on time...).

How to read assembly code output from objdump [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a C code that swaps two numbers.
#include<stdio.h>
void swap(int,int);
void main( )
{
int n1,n2;
printf("Enter the two numbers to be swapped\n");
scanf("%d%d",&n1,&n2);
printf("\nThe values of n1 and n2 in the main function before calling the swap function are n1=%d n2=%d",n1,n2);
swap(n1,n2);
printf("\nThe values of n1 and n2 in the main function after calling the swap function are n1=%d n2=%d",n1,n2);}
void swap(int n1,int n2)
{
int temp;
temp=n1;
n1=n2;
n2=temp;
printf("\nThe values of n1 and n2 in the swap function after swapping are n1=%d n2=%d",n1,n2);
}
I have disassembled it using objdump and been trying to find out how the swap operation happens in machine level. I think this is the swap function.
000006b4 <swap>:
6b4: 55 push %ebp
6b5: 89 e5 mov %esp,%ebp
6b7: 53 push %ebx
6b8: 83 ec 14 sub $0x14,%esp
6bb: e8 37 00 00 00 call 6f7 <__x86.get_pc_thunk.ax>
6c0: 05 0c 19 00 00 add $0x190c,%eax
6c5: 8b 55 08 mov 0x8(%ebp),%edx
6c8: 89 55 f4 mov %edx,-0xc(%ebp)
6cb: 8b 55 0c mov 0xc(%ebp),%edx
6ce: 89 55 08 mov %edx,0x8(%ebp)
6d1: 8b 55 f4 mov -0xc(%ebp),%edx
6d4: 89 55 0c mov %edx,0xc(%ebp)
6d7: 83 ec 04 sub $0x4,%esp
6da: ff 75 0c pushl 0xc(%ebp)
6dd: ff 75 08 pushl 0x8(%ebp)
6e0: 8d 90 c0 e8 ff ff lea -0x1740(%eax),%edx
6e6: 52 push %edx
6e7: 89 c3 mov %eax,%ebx
6e9: e8 72 fd ff ff call 460 <printf#plt>
6ee: 83 c4 10 add $0x10,%esp
6f1: 90 nop
6f2: 8b 5d fc mov -0x4(%ebp),%ebx
6f5: c9 leave
6f6: c3 ret
I want to know how swap operation is happening inside registers, I know it has to be something like this.
push eax
mov eax, ebx
pop ebx
But I can't see anything similar to this. Since I'm new to these things, can someone please help me how to understand how this is happening. Full output of the objdump is here.

To get started with the assembly language you can check the following link:
http://patshaughnessy.net/2016/11/26/learning-to-read-x86-assembly-language

Trouble replicating a stack buffer overflow exploit

I am having trouble replicating the stack buffer overflow example given by OWASP here.
Here is my attempt:
$ cat test.c
#include <stdio.h>
#include <string.h>
void doit(void)
{
char buf[8];
gets(buf);
printf("%s\n", buf);
}
int main(void)
{
printf("So... The End...\n");
doit();
printf("or... maybe not?\n");
return 0;
}
$ gcc test.c -o test -fno-stack-protection -ggdb
$ objdump -d test # omitted irrelevant parts i think
000000000040054c <doit>:
40054c: 55 push %rbp
40054d: 48 89 e5 mov %rsp,%rbp
400550: 48 83 ec 10 sub $0x10,%rsp
400554: 48 8d 45 f0 lea -0x10(%rbp),%rax
400558: 48 89 c7 mov %rax,%rdi
40055b: e8 d0 fe ff ff callq 400430 <gets#plt>
400560: 48 8d 45 f0 lea -0x10(%rbp),%rax
400564: 48 89 c7 mov %rax,%rdi
400567: e8 a4 fe ff ff callq 400410 <puts#plt>
40056c: c9 leaveq
40056d: c3 retq
000000000040056e <main>:
40056e: 55 push %rbp
40056f: 48 89 e5 mov %rsp,%rbp
400572: bf 4c 06 40 00 mov $0x40064c,%edi
400577: e8 94 fe ff ff callq 400410 <puts#plt>
40057c: e8 cb ff ff ff callq 40054c <doit>
400581: bf 5d 06 40 00 mov $0x40065d,%edi
400586: e8 85 fe ff ff callq 400410 <puts#plt>
40058b: b8 00 00 00 00 mov $0x0,%eax
400590: 5d pop %rbp
400591: c3 retq # this is where i took my overflow value from
400592: 90 nop
400593: 90 nop
400594: 90 nop
400595: 90 nop
400596: 90 nop
400597: 90 nop
400598: 90 nop
400599: 90 nop
40059a: 90 nop
40059b: 90 nop
40059c: 90 nop
40059d: 90 nop
40059e: 90 nop
40059f: 90 nop
$ perl -e 'print "A"x12 ."\x91\x05\x40"' | ./test
So... The End...
AAAAAAAAAAAA▒#
or... maybe not? # this shouldn't be outputted
Why isn't this working? I'm assuming that the memory address that I am supposed to insert is the retq from <main>.
My goal is to figure out how to do a stack buffer overflow that calls a function elsewhere in the program. Any help is much appreciated. :)

I'm using Windows & MSVC but you should get the idea.
Consider the following code:
#include <stdio.h>
void someFunc()
{
puts("wow, we should never get here :|");
}
// MSVC inlines this otherwise
void __declspec(noinline) doit(void)
{
char buf[8];
gets(buf);
printf("%s\n", buf);
}
int main(void)
{
printf("So... The End...\n");
doit();
printf("or... maybe not?\n");
return 0;
}
(Note: I had to compile it with /OPT:NOREF to force MSVC not to remove "unused" code and /GS- to turn off stack checks)
Now, let's open it in my favorite disassembler:
We'd like to exploit the gets vulnerability so the execution jumps to someFunc. We can see that its address is 001D1000, so if we can write enough bytes past the buffer to overwrite the return address, we'll be good. Let's take a look at the stack when gets is called:
As we can see, there's 8 bytes of our stack allocated buffer (buf), 4 bytes of some stuff (actually the PUSHed EBP), and the return address. Thus, we need to write 12 bytes of whatever and then our 4 byte return address (001D1000) to "hijack" the execution flow. Let's do just that - we'll prepare an input file with the bytes we need using a hex editor:
And indeed, when we run the program with that input, we get this:
After it prints that line, it will crash with an access violation since there was some garbage on the stack. However, there's nothing stopping you from carefully analyzing the code and preparing such bytes in your input that the program will appear to function as normal (we could overwrite the next bytes with the address of ExitProcess, so that someFunc would jump there).

objdump of binary with debug info produces mangled output

I often notice severely mangled output with mixed assembly and C instructions in the output of objdump -S. This seems to happen only for binaries built with debug info. Is there any way to fix this?
To illustrate the issue i have written a simple program :
/* test.c */
#include <stdio.h>
int main()
{
static int i = 0;
while(i < 0x1000000) {
i++;
}
return 0;
}
The above program was built with/without debug info as follows :
$ gcc test.c -o test-release
$ gcc test.c -g -o test-debug
Disassembling the test-release binary works fine.
$ objdump -S test-release
produces the following clear and concise snippet for the main() function.
080483b4 <main>:
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
80483b7: eb 0d jmp 80483c6 <main+0x12>
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
80483d2: b8 00 00 00 00 mov $0x0,%eax
80483d7: 5d pop %ebp
80483d8: c3 ret
But $ objdump -S test-debug
produces the following mangled snippet for the same main() function.
080483b4 <main>:
#include <stdio.h>
int main()
{
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
static int i = 0;
while(i < 0x1000000) {
80483b7: eb 0d jmp 80483c6 <main+0x12>
i++;
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
int main()
{
static int i = 0;
while(i < 0x1000000) {
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
i++;
}
return 0;
80483d2: b8 00 00 00 00 mov $0x0,%eax
}
80483d7: 5d pop %ebp
80483d8: c3 ret
I do understand that as the debug binary contains additional symbol info, the C code is displayed interlaced with the assembly instructions. But this makes it a tad difficult to follow the flow of code.
Is there any way to instruct objdump to output pure assembly and not interlace debug symbols into the output even if encountered in a binary?

Use -d instead of -S. objdump is doing exactly what you are telling it to. The -S option implies -d but also displays the C source if debugging information is available.