This question already has answers here:
Using GCC to produce readable assembly?
(11 answers)
Closed 9 years ago.
gcc -S test.c converts the c code into assembly. What I need is a per-instruction translator. I mean I need a way through which I can know that this set of assembly instructions correspond to this c statement and so on NOT whole c code to whole assembly code. Any idea? Thanks in advance
This can be done with objdump -S which tries to interpret the debugging info - provided it was compiled with -g or equivalent. For example, for the program:
int main(void)
{
int x = 42;
int y = 24;
return x + y;
}
It does:
00000000 <main>:
int main(void)
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
int x = 42;
6: c7 45 fc 2a 00 00 00 movl $0x2a,-0x4(%ebp)
int y = 24;
d: c7 45 f8 18 00 00 00 movl $0x18,-0x8(%ebp)
return x + y;
14: 8b 45 f8 mov -0x8(%ebp),%eax
17: 8b 55 fc mov -0x4(%ebp),%edx
1a: 01 d0 add %edx,%eax
}
Related
This question already has answers here:
How to get c code to execute hex machine code?
(7 answers)
Closed 2 years ago.
#include<stdio.h>
#include<stdlib.h>
char code[] ="\x52\x56\x57\x50\xB8\x41\x00\x00\x00\x50\xB8\x01\x00\x00\x00\xBF\x01\x00\x00\x00\x48\x89\xE6\xBA\x01\x00\x00\x00\x0F\x05\x58\x58\x5F\x5E\x5A\xC3";
int main(){
void(*func)() = (void (*)())code;
(*func)();
return 0 ;
}
what I have here is a string which stores a binary code to print a character ('A') , I pass it as a function pointer to func , than i try to execute it .
the output :
$ ./test
Segmentation fault (core dumped)
this is the assembly code :
0: 52 push rdx
1: 56 push rsi
2: 57 push rdi
3: 50 push rax
4: b8 41 00 00 00 mov eax,0x41
9: 50 push rax
a: b8 01 00 00 00 mov eax,0x1
f: bf 01 00 00 00 mov edi,0x1
14: 48 89 e6 mov rsi,rsp
17: ba 01 00 00 00 mov edx,0x1
1c: 0f 05 syscall
1e: 58 pop rax
1f: 58 pop rax
20: 5f pop rdi
21: 5e pop rsi
22: 5a pop rdx
23: c3 ret
As pointed out by #WeatherVane, you need permission to execute a data segment, mprotect can help:
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
static char code[] = "\x52\x56\x57\x50\xB8\x41\x00\x00\x00\x50\xB8\x01\x00\x00"
"\x00\xBF\x01\x00\x00\x00\x48\x89\xE6\xBA\x01\x00\x00\x00"
"\x0F\x05\x58\x58\x5F\x5E\x5A\xC3";
int main(void)
{
uintptr_t page = (uintptr_t)code & -4095ULL;
mprotect((void *)page, 4096, PROT_READ | PROT_EXEC | PROT_WRITE);
void (*func)(void) = (void (*)(void))code;
func();
return 0;
}
You can see the result (A) in godbolt
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
Excuse my bad English.
I have written down some lines to return max, min, sum of all values, and arrange all values in ascending order when five integers are input.
While writing, I mistakenly wrote 'num[4]' when I declared a INT array when I needed to put in 5 integers.
But as I compiled with TDM-GCC 4.9.2 64-bit release, it worked without any problem. As soon as I realized and changed to TDM-GCC 4.9.2 32-bit release, it did not.
This is my whole code;
#include<stdio.h>
int main()
{
int num[4],i,j,k,a,b,c,m,number,sum=0;
printf("This program returns max, min, sum of all values, and arranges all values in ascending order when five integers are input.\n");
printf("Please enter five integers.\n");
for(i=0;i<5;i++)
{
printf("Enter #%d\n",i+1);
scanf("%d",&num[i]);
}
//arrange all values
for(j=0;j<5;j++)
{
for(k=j+1;k<5;k++)
{
if(num[j]>num[k])
{
number=num[j];
num[j]=num[k];
num[k]=number;
}
}
}
//find maximum value
int max=num[0];
for(a=1;a<5;a++)
{
if(max<num[a])
{
max=num[a];
}
}
//find minimum value
int min=num[0];
for(b=1;b<5;b++)
{
if(min>num[b])
{
min=num[b];
}
}
//find sum of all values
for(c=0;c<5;c++)
{
sum=sum+num[c];
}
printf("Max Value : %d\n",max);//print max
printf("Min Value : %d\n",min);//print min
printf("Sum : %d\n",sum); //print sum
printf("In ascending order : "); //print all values in ascending order
for(m=0;m<5;m++)
{
printf("%d ",num[m]);
}
}
I am new to C and all kinds of programming, and don't know how to search these kind of problems. I know my way of asking like this here is very inappropriate, and I sincerely apologize to people who are irritated by these types of questioning posts. But this is my best try, so please don't blame, but I'm willing to accept any kind of advice or tips.
Thank you.
When allocating on the stack, GCC targeting 64-bit (and probably Clang) will align stack allocations to 8 bytes.
For 32-bit targets, it's only going to use 4 bytes of padding.
So when you compiled your program for 64-bit, an extra four bytes was used to pad the stack. That's why when you accessed that last integer, it didn't segfault.
To see this in action, we'll create a test file.
void test_func() {
int n[4];
int b = 11;
for (int i = 0; i < 4; i++) {
n[i] = b;
}
}
And we'll compile it for 32-bit and 64-bit.
gcc -g -c -m64 test.c -o test_64.o
gcc -g -c -m32 test.c -o test_32.o
And now we'll print the disassembly for each.
objdump -S test_64.o >test_64_dis.txt
objdump -S test_32.o >test_32_dis.txt
Here's the contents of the 64-bit version.
test_64.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func>:
void func() {
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
13: 00 00
15: 48 89 45 f8 mov %rax,-0x8(%rbp)
19: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1b: c7 45 dc 0b 00 00 00 movl $0xb,-0x24(%rbp)
for (int i = 0; i < 4; i++) {
22: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
29: eb 10 jmp 3b <func+0x3b>
n[i] = b;
2b: 8b 45 d8 mov -0x28(%rbp),%eax
2e: 48 98 cltq
30: 8b 55 dc mov -0x24(%rbp),%edx
33: 89 54 85 e0 mov %edx,-0x20(%rbp,%rax,4)
for (int i = 0; i < 4; i++) {
37: 83 45 d8 01 addl $0x1,-0x28(%rbp)
3b: 83 7d d8 03 cmpl $0x3,-0x28(%rbp)
3f: 7e ea jle 2b <func+0x2b>
}
}
41: 90 nop
42: 48 8b 45 f8 mov -0x8(%rbp),%rax
46: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
4d: 00 00
4f: 74 05 je 56 <func+0x56>
51: e8 00 00 00 00 callq 56 <func+0x56>
56: c9 leaveq
57: c3 retq
Here's the 32-bit version.
test_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
void func() {
0: f3 0f 1e fb endbr32
4: 55 push %ebp
5: 89 e5 mov %esp,%ebp
7: 83 ec 28 sub $0x28,%esp
a: e8 fc ff ff ff call b <func+0xb>
f: 05 01 00 00 00 add $0x1,%eax
14: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a: 89 45 f4 mov %eax,-0xc(%ebp)
1d: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1f: c7 45 e0 0b 00 00 00 movl $0xb,-0x20(%ebp)
for (int i = 0; i < 4; i++) {
26: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%ebp)
2d: eb 0e jmp 3d <func+0x3d>
n[i] = b;
2f: 8b 45 dc mov -0x24(%ebp),%eax
32: 8b 55 e0 mov -0x20(%ebp),%edx
35: 89 54 85 e4 mov %edx,-0x1c(%ebp,%eax,4)
for (int i = 0; i < 4; i++) {
39: 83 45 dc 01 addl $0x1,-0x24(%ebp)
3d: 83 7d dc 03 cmpl $0x3,-0x24(%ebp)
41: 7e ec jle 2f <func+0x2f>
}
}
43: 90 nop
44: 8b 45 f4 mov -0xc(%ebp),%eax
47: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
4e: 74 05 je 55 <func+0x55>
50: e8 fc ff ff ff call 51 <func+0x51>
55: c9 leave
56: c3 ret
Disassembly of section .text.__x86.get_pc_thunk.ax:
00000000 <__x86.get_pc_thunk.ax>:
0: 8b 04 24 mov (%esp),%eax
3: c3 ret
You can see the compiler is generating 24 bytes and then 20 bytes respectively, if you look right after the variable declarations.
Regarding advice/tips you asked for, a good starting point would be to enable all compiler warnings and treat them as errors. In GCC and Clang, you'd use the -Wall -Wextra -Werror -Wfatal-errors.
I wouldn't recommend this if you're using the MSVC compiler, though, which often issues warnings about declarations from the header files it's distributed with.
Other answers cover what might he actually happening, by analyzing the generated assembly, but the really relevant explanation is: Indexing out of array bounds is Undefined Behavior in C. And that's kinda the end of story.
UB means, the code is "allowed" to do anything by C standard. It could do different thing every time it is run. It could do what you want it to do with no ill effects. It might do what you want, but then something completely unrelated behaves in a funny way. Compiler, operating system, or even phase of the moon could make a difference. Or not.
It is generally not useful to think about what actually happens with Undefined Behavior at C level. You can of course produce the assembly output of a particular compilation, and inspect what it does, but that is result of that one compilation. A new compilation might change things (even if you just do new build at different time, because value of __TIME__ macro depends on time...).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have heard that it is possible to obtain the source code from an executable if it was compiled with debugging (-g) enabled. Is this true? If so, how would one go about doing it?
You can't restore source code from binary executable.
You can use decompiler like REC Studio or Boomerang to convert dissassembled binary into c code, but this code won't be anything like initial code you compiled. It would be more like assembly written in C syntax. If your application is complicated, it probably won't be able to compile. Debugging symbols can help, but not a lot. Many information is lost during compilation and can't be restored.
You can use objdump utility.
objdump has option '-S' which also provides source code and respective assembly code. This also helps in debugging crashes (segmentation faults).
e.g. I have copied the code snippet and its objdump (just for example).
Only the main() of objdump is copied, it contains other info too.
Steps to do :
Suppose you wrote code in file temp.c .
gcc -g temp.c -o temp
objdump -DS temp > temp.dump
#include <stdio.h>
int main()
{
int a, *b;
a = 0;
b = 0;
printf("a=%d *b=%d", a, *b);
return 0;
}
0804841d <main>:
#include <stdio.h>
int main()
{
804841d: 55 push %ebp
804841e: 89 e5 mov %esp,%ebp
8048420: 83 e4 f0 and $0xfffffff0,%esp
8048423: 83 ec 20 sub $0x20,%esp
int a, *b;
a = 0;
8048426: c7 44 24 18 00 00 00 movl $0x0,0x18(%esp)
804842d: 00
b = 0;
804842e: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
8048435: 00
printf("a=%d *b=%d", a, *b);
8048436: 8b 44 24 1c mov 0x1c(%esp),%eax
804843a: 8b 00 mov (%eax),%eax
804843c: 89 44 24 08 mov %eax,0x8(%esp)
8048440: 8b 44 24 18 mov 0x18(%esp),%eax
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 f0 84 04 08 movl $0x80484f0,(%esp)
804844f: e8 9c fe ff ff call 80482f0 <printf#plt>
return 0;
8048454: b8 00 00 00 00 mov $0x0,%eax
}
8048459: c9 leave
804845a: c3 ret
804845b: 66 90 xchg %ax,%ax
804845d: 66 90 xchg %ax,%ax
804845f: 90 nop
I have gone through the walkthrough about smashing the stack. Both the one http://insecure.org/stf/smashstack.html here and one I found on here Trying to smash the stack. I understand what is suppose to be happening, but I can't get it to work properly.
This is just like the other scenarios. I need to skip x=1 and print 0 as the value of x.
I compile with:
gcc file.c
The original code :
void function(){
char buffer[8];
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
When I run
objdump -dS a.out
I get
0000000000400530 <function>:
400530: 55 push %rbp
400531: 48 89 e5 mov %rsp,%rbp
400534: 5d pop %rbp
400535: c3 retq
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 20 sub $0x20,%rsp
40053e: 89 7d ec mov %edi,-0x14(%rbp)
400541: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400545: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40054c: b8 00 00 00 00 mov $0x0,%eax
400551: e8 da ff ff ff callq 400530 <function>
400556: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
40055d: 8b 45 fc mov -0x4(%rbp),%eax
400560: 89 c6 mov %eax,%esi
400562: bf 10 06 40 00 mov $0x400610,%edi
400567: b8 00 00 00 00 mov $0x0,%eax
40056c: e8 9f fe ff ff callq 400410 <printf#plt>
400571: c9 leaveq
400572: c3 retq
400573: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40057a: 00 00 00
40057d: 0f 1f 00 nopl (%rax)
In the function I need to figure out how many bytes the return address is beyond the start of the buffer. I am not sure about this value. But since there are 6 bytes from the beginnig of the function to the return; would I add 7 bytes to the buffer?
Then I need to skip the instruction
x=1;
And since that instruction is 7 bytes long. Would I add 7 to return pointer?
Something like this?
void function(){
char buffer[8];
int *ret = buffer + 7;
(*ret) += 7;
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
This throws the warning:
warning: initialization from incompatible pointer type [enabled by default]
int *ret = buffer1 + 5;
^
And the output is 1. What am I doing wrong? And can you explain how to do it right and why it is the correct way?
Thank you.
Try the function below, I wrote it for 32-bit compiler try using (-m32 gcc flag) or with a little effort you can make it work with your 64-bit compiler (Note that in your objdump listing you got 7 bytes offset between call to function and the next instruction so use 7 instead of 8.
void function(void)
{
unsigned long *x;
/* &x will more likely be at -4(ebp) */
/* Adding 1 (+4) gets us to stored ebp */
/* Adding 2 (+8) gets us to stored return address */
x = (unsigned long *)(&x + 2);
/* This is the tricky part */
/* TODO: On my 32-bit compiler gap between call to function
and the next instruction is 8 */
*x += 8;
}
We know that automatic variables are created on the stack - so taking the address of an automatic variable yields a pointer into the stack. When you call a void function, its return address is pushed onto the stack and the size of that address depends on your platform (4 or 8 bytes normally). So if you pass the address of an automatic variable to a function and then write over the memory before that address, you will damage the return address and smash the stack. Here is an example:
#include <stdlib.h>
#include <stdio.h>
static void f(int *p)
{
p[0] = 0x30303030;
p[1] = 0x31313131;
*(p - 1) = 0x35353535;
*(p - 2) = 0x36363636;
}
int main()
{
int a = 0x41424344;
int b = 0x45464748;
int c = 0x494a4b5c;
f(&b);
printf("%08x %08x %08x\n", a, b, c);
return 0;
}
I compiled this on linux with 'gcc -g' and ran under gdb and got this:
Program received signal SIGSEGV, Segmentation fault.
0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
10 }
(gdb) bt
#0 0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
#1 0x3636363600400594 in ?? ()
#2 0x3030303035353535 in ?? ()
#3 0x494a4b5c31313131 in ?? ()
#4 0x0000000000000000 in ?? ()
(gdb)
As you can see, the parent function addresses now contain some of my magic numbers. I ran this on 64 bit linux, so really I should have used 64 bit ints to fully overwrite the return address - as it is I left the lower word untouched.
I often notice severely mangled output with mixed assembly and C instructions in the output of objdump -S. This seems to happen only for binaries built with debug info. Is there any way to fix this?
To illustrate the issue i have written a simple program :
/* test.c */
#include <stdio.h>
int main()
{
static int i = 0;
while(i < 0x1000000) {
i++;
}
return 0;
}
The above program was built with/without debug info as follows :
$ gcc test.c -o test-release
$ gcc test.c -g -o test-debug
Disassembling the test-release binary works fine.
$ objdump -S test-release
produces the following clear and concise snippet for the main() function.
080483b4 <main>:
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
80483b7: eb 0d jmp 80483c6 <main+0x12>
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
80483d2: b8 00 00 00 00 mov $0x0,%eax
80483d7: 5d pop %ebp
80483d8: c3 ret
But $ objdump -S test-debug
produces the following mangled snippet for the same main() function.
080483b4 <main>:
#include <stdio.h>
int main()
{
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
static int i = 0;
while(i < 0x1000000) {
80483b7: eb 0d jmp 80483c6 <main+0x12>
i++;
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
int main()
{
static int i = 0;
while(i < 0x1000000) {
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
i++;
}
return 0;
80483d2: b8 00 00 00 00 mov $0x0,%eax
}
80483d7: 5d pop %ebp
80483d8: c3 ret
I do understand that as the debug binary contains additional symbol info, the C code is displayed interlaced with the assembly instructions. But this makes it a tad difficult to follow the flow of code.
Is there any way to instruct objdump to output pure assembly and not interlace debug symbols into the output even if encountered in a binary?
Use -d instead of -S. objdump is doing exactly what you are telling it to. The -S option implies -d but also displays the C source if debugging information is available.