objdump of binary with debug info produces mangled output - c

I often notice severely mangled output with mixed assembly and C instructions in the output of objdump -S. This seems to happen only for binaries built with debug info. Is there any way to fix this?
To illustrate the issue i have written a simple program :
/* test.c */
#include <stdio.h>
int main()
{
static int i = 0;
while(i < 0x1000000) {
i++;
}
return 0;
}
The above program was built with/without debug info as follows :
$ gcc test.c -o test-release
$ gcc test.c -g -o test-debug
Disassembling the test-release binary works fine.
$ objdump -S test-release
produces the following clear and concise snippet for the main() function.
080483b4 <main>:
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
80483b7: eb 0d jmp 80483c6 <main+0x12>
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
80483d2: b8 00 00 00 00 mov $0x0,%eax
80483d7: 5d pop %ebp
80483d8: c3 ret
But $ objdump -S test-debug
produces the following mangled snippet for the same main() function.
080483b4 <main>:
#include <stdio.h>
int main()
{
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
static int i = 0;
while(i < 0x1000000) {
80483b7: eb 0d jmp 80483c6 <main+0x12>
i++;
80483b9: a1 18 a0 04 08 mov 0x804a018,%eax
80483be: 83 c0 01 add $0x1,%eax
80483c1: a3 18 a0 04 08 mov %eax,0x804a018
int main()
{
static int i = 0;
while(i < 0x1000000) {
80483c6: a1 18 a0 04 08 mov 0x804a018,%eax
80483cb: 3d ff ff ff 00 cmp $0xffffff,%eax
80483d0: 7e e7 jle 80483b9 <main+0x5>
i++;
}
return 0;
80483d2: b8 00 00 00 00 mov $0x0,%eax
}
80483d7: 5d pop %ebp
80483d8: c3 ret
I do understand that as the debug binary contains additional symbol info, the C code is displayed interlaced with the assembly instructions. But this makes it a tad difficult to follow the flow of code.
Is there any way to instruct objdump to output pure assembly and not interlace debug symbols into the output even if encountered in a binary?

Use -d instead of -S. objdump is doing exactly what you are telling it to. The -S option implies -d but also displays the C source if debugging information is available.

Related

How to add a custom label in the compiled binary

I want to add a custom label in my compiled C code which I will be using later. For example, the C code might look like:
#include <stdio.h>
#include <stdbool.h>
int main()
{
volatile bool x = false;
volatile int y = 0;
asm(".MyLabel");
if(x)
{
y = 1;
}
return 0;
}
I want to add a label before the if(x) statement. As shown above, I tried to do this by inserting a label via inline assembly. I am compile my code using:
$ gcc -O3 -g -march=native -pedantic
However, when I look at the assembly generated using objdump, I get:
Disassembly of section .text:
0000000000001020 <main>:
1020: c6 44 24 fb 00 movb $0x0,-0x5(%rsp)
1025: 31 c0 xor %eax,%eax
1027: c7 44 24 fc 00 00 00 movl $0x0,-0x4(%rsp)
102e: 00
102f: c3 ret
0000000000001030 <_start>:
1030: f3 0f 1e fa endbr64
1034: 31 ed xor %ebp,%ebp
1036: 49 89 d1 mov %rdx,%r9
1039: 5e pop %rsi
103a: 48 89 e2 mov %rsp,%rdx
103d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1041: 50 push %rax
1042: 54 push %rsp
...
As you can see, in the main section of the assembly, there is no label called MyLabel. How can I force gcc to insert the label before the if condition?

Why does 32 bit compiler and 64 bit compiler makes such a difference with my code? [duplicate]

This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
Excuse my bad English.
I have written down some lines to return max, min, sum of all values, and arrange all values in ascending order when five integers are input.
While writing, I mistakenly wrote 'num[4]' when I declared a INT array when I needed to put in 5 integers.
But as I compiled with TDM-GCC 4.9.2 64-bit release, it worked without any problem. As soon as I realized and changed to TDM-GCC 4.9.2 32-bit release, it did not.
This is my whole code;
#include<stdio.h>
int main()
{
int num[4],i,j,k,a,b,c,m,number,sum=0;
printf("This program returns max, min, sum of all values, and arranges all values in ascending order when five integers are input.\n");
printf("Please enter five integers.\n");
for(i=0;i<5;i++)
{
printf("Enter #%d\n",i+1);
scanf("%d",&num[i]);
}
//arrange all values
for(j=0;j<5;j++)
{
for(k=j+1;k<5;k++)
{
if(num[j]>num[k])
{
number=num[j];
num[j]=num[k];
num[k]=number;
}
}
}
//find maximum value
int max=num[0];
for(a=1;a<5;a++)
{
if(max<num[a])
{
max=num[a];
}
}
//find minimum value
int min=num[0];
for(b=1;b<5;b++)
{
if(min>num[b])
{
min=num[b];
}
}
//find sum of all values
for(c=0;c<5;c++)
{
sum=sum+num[c];
}
printf("Max Value : %d\n",max);//print max
printf("Min Value : %d\n",min);//print min
printf("Sum : %d\n",sum); //print sum
printf("In ascending order : "); //print all values in ascending order
for(m=0;m<5;m++)
{
printf("%d ",num[m]);
}
}
I am new to C and all kinds of programming, and don't know how to search these kind of problems. I know my way of asking like this here is very inappropriate, and I sincerely apologize to people who are irritated by these types of questioning posts. But this is my best try, so please don't blame, but I'm willing to accept any kind of advice or tips.
Thank you.
When allocating on the stack, GCC targeting 64-bit (and probably Clang) will align stack allocations to 8 bytes.
For 32-bit targets, it's only going to use 4 bytes of padding.
So when you compiled your program for 64-bit, an extra four bytes was used to pad the stack. That's why when you accessed that last integer, it didn't segfault.
To see this in action, we'll create a test file.
void test_func() {
int n[4];
int b = 11;
for (int i = 0; i < 4; i++) {
n[i] = b;
}
}
And we'll compile it for 32-bit and 64-bit.
gcc -g -c -m64 test.c -o test_64.o
gcc -g -c -m32 test.c -o test_32.o
And now we'll print the disassembly for each.
objdump -S test_64.o >test_64_dis.txt
objdump -S test_32.o >test_32_dis.txt
Here's the contents of the 64-bit version.
test_64.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func>:
void func() {
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
13: 00 00
15: 48 89 45 f8 mov %rax,-0x8(%rbp)
19: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1b: c7 45 dc 0b 00 00 00 movl $0xb,-0x24(%rbp)
for (int i = 0; i < 4; i++) {
22: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
29: eb 10 jmp 3b <func+0x3b>
n[i] = b;
2b: 8b 45 d8 mov -0x28(%rbp),%eax
2e: 48 98 cltq
30: 8b 55 dc mov -0x24(%rbp),%edx
33: 89 54 85 e0 mov %edx,-0x20(%rbp,%rax,4)
for (int i = 0; i < 4; i++) {
37: 83 45 d8 01 addl $0x1,-0x28(%rbp)
3b: 83 7d d8 03 cmpl $0x3,-0x28(%rbp)
3f: 7e ea jle 2b <func+0x2b>
}
}
41: 90 nop
42: 48 8b 45 f8 mov -0x8(%rbp),%rax
46: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
4d: 00 00
4f: 74 05 je 56 <func+0x56>
51: e8 00 00 00 00 callq 56 <func+0x56>
56: c9 leaveq
57: c3 retq
Here's the 32-bit version.
test_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
void func() {
0: f3 0f 1e fb endbr32
4: 55 push %ebp
5: 89 e5 mov %esp,%ebp
7: 83 ec 28 sub $0x28,%esp
a: e8 fc ff ff ff call b <func+0xb>
f: 05 01 00 00 00 add $0x1,%eax
14: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a: 89 45 f4 mov %eax,-0xc(%ebp)
1d: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1f: c7 45 e0 0b 00 00 00 movl $0xb,-0x20(%ebp)
for (int i = 0; i < 4; i++) {
26: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%ebp)
2d: eb 0e jmp 3d <func+0x3d>
n[i] = b;
2f: 8b 45 dc mov -0x24(%ebp),%eax
32: 8b 55 e0 mov -0x20(%ebp),%edx
35: 89 54 85 e4 mov %edx,-0x1c(%ebp,%eax,4)
for (int i = 0; i < 4; i++) {
39: 83 45 dc 01 addl $0x1,-0x24(%ebp)
3d: 83 7d dc 03 cmpl $0x3,-0x24(%ebp)
41: 7e ec jle 2f <func+0x2f>
}
}
43: 90 nop
44: 8b 45 f4 mov -0xc(%ebp),%eax
47: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
4e: 74 05 je 55 <func+0x55>
50: e8 fc ff ff ff call 51 <func+0x51>
55: c9 leave
56: c3 ret
Disassembly of section .text.__x86.get_pc_thunk.ax:
00000000 <__x86.get_pc_thunk.ax>:
0: 8b 04 24 mov (%esp),%eax
3: c3 ret
You can see the compiler is generating 24 bytes and then 20 bytes respectively, if you look right after the variable declarations.
Regarding advice/tips you asked for, a good starting point would be to enable all compiler warnings and treat them as errors. In GCC and Clang, you'd use the -Wall -Wextra -Werror -Wfatal-errors.
I wouldn't recommend this if you're using the MSVC compiler, though, which often issues warnings about declarations from the header files it's distributed with.
Other answers cover what might he actually happening, by analyzing the generated assembly, but the really relevant explanation is: Indexing out of array bounds is Undefined Behavior in C. And that's kinda the end of story.
UB means, the code is "allowed" to do anything by C standard. It could do different thing every time it is run. It could do what you want it to do with no ill effects. It might do what you want, but then something completely unrelated behaves in a funny way. Compiler, operating system, or even phase of the moon could make a difference. Or not.
It is generally not useful to think about what actually happens with Undefined Behavior at C level. You can of course produce the assembly output of a particular compilation, and inspect what it does, but that is result of that one compilation. A new compilation might change things (even if you just do new build at different time, because value of __TIME__ macro depends on time...).

GCC optimizer generating error in nostdlib code

I have the following code:
void cp(void *a, const void *b, int n) {
for (int i = 0; i < n; ++i) {
((char *) a)[i] = ((const char *) b)[i];
}
}
void _start(void) {
char buf[20];
const char m[] = "123456789012345";
cp(buf, m, 15);
register int rax __asm__ ("rax") = 60; // exit
register int rdi __asm__ ("rdi") = 0; // status
__asm__ volatile (
"syscall" :: "r" (rax), "r" (rdi) : "cc", "rcx", "r11"
);
__builtin_unreachable();
}
If I compile it with gcc -nostdlib -O1 "./a.c" -o "./a", I get a functioning program, but if I compile it with -O2, I get a program that generates a segmentation fault.
This is the generated code with -O1:
0000000000001000 <cp>:
1000: b8 00 00 00 00 mov $0x0,%eax
1005: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
1009: 88 14 07 mov %dl,(%rdi,%rax,1)
100c: 48 83 c0 01 add $0x1,%rax
1010: 48 83 f8 0f cmp $0xf,%rax
1014: 75 ef jne 1005 <cp+0x5>
1016: c3 retq
0000000000001017 <_start>:
1017: 48 83 ec 30 sub $0x30,%rsp
101b: 48 b8 31 32 33 34 35 movabs $0x3837363534333231,%rax
1022: 36 37 38
1025: 48 ba 39 30 31 32 33 movabs $0x35343332313039,%rdx
102c: 34 35 00
102f: 48 89 04 24 mov %rax,(%rsp)
1033: 48 89 54 24 08 mov %rdx,0x8(%rsp)
1038: 48 89 e6 mov %rsp,%rsi
103b: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
1040: ba 0f 00 00 00 mov $0xf,%edx
1045: e8 b6 ff ff ff callq 1000 <cp>
104a: b8 3c 00 00 00 mov $0x3c,%eax
104f: bf 00 00 00 00 mov $0x0,%edi
1054: 0f 05 syscall
And this is the generated code with -O2:
0000000000001000 <cp>:
1000: 31 c0 xor %eax,%eax
1002: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1008: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
100c: 88 14 07 mov %dl,(%rdi,%rax,1)
100f: 48 83 c0 01 add $0x1,%rax
1013: 48 83 f8 0f cmp $0xf,%rax
1017: 75 ef jne 1008 <cp+0x8>
1019: c3 retq
101a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000001020 <_start>:
1020: 48 8d 44 24 d8 lea -0x28(%rsp),%rax
1025: 48 8d 54 24 c9 lea -0x37(%rsp),%rdx
102a: b9 31 00 00 00 mov $0x31,%ecx
102f: 66 0f 6f 05 c9 0f 00 movdqa 0xfc9(%rip),%xmm0 # 2000 <_start+0xfe0>
1036: 00
1037: 48 8d 70 0f lea 0xf(%rax),%rsi
103b: 0f 29 44 24 c8 movaps %xmm0,-0x38(%rsp)
1040: eb 0d jmp 104f <_start+0x2f>
1042: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1048: 0f b6 0a movzbl (%rdx),%ecx
104b: 48 83 c2 01 add $0x1,%rdx
104f: 88 08 mov %cl,(%rax)
1051: 48 83 c0 01 add $0x1,%rax
1055: 48 39 f0 cmp %rsi,%rax
1058: 75 ee jne 1048 <_start+0x28>
105a: b8 3c 00 00 00 mov $0x3c,%eax
105f: 31 ff xor %edi,%edi
1061: 0f 05 syscall
The crash happens at 103b, instruction movaps %xmm0,-0x38(%rsp).
I noticed that if m contains less than 15 characters, then the generated code is different and the crash does not happen.
What am I doing wrong?
_start is not a function. It's not called by anything, and on entry the stack is 16-byte aligned, not (as the ABI requires) 8 bytes away from 16-byte alignment.
(The ABI requires 16-byte alignment before a call, and call pushes an 8-byte return address. So on function entry RSP-8 and RSP+8 are 16-byte aligned.)
At -O2 GCC uses alignment-required 16-byte instructions to implement the copy done by cp(), copying the "123456789012345" from static storage to the stack.
At -O1, GCC just uses two mov r64, imm64 instructions to get bytes into integer regs for 8-byte stores. These don't require alignment.
Workarounds
Just write a main in C like a normal person if you want everything to work.
Or if you're trying to microbenchmark something light-weight in asm, you can use gcc -nostdlib -O3 -mincoming-stack-boundary=3 (docs) to tell GCC that functions can't assume they're called with more than 8-byte alignment. Unlike -mpreferred-stack-boundary=3, this will still align by 16 before making further calls. So if you have other non-leaf functions, you might want to just use an attribute on your hacky C _start() instead of affecting the whole file.
A worse, more hacky way would be to try putting
asm("push %rax"); at the very top of _start to modify RSP by 8, where GCC hopefully runs it before doing anything else with the stack. GNU C Basic asm statements are implicitly volatile so you don't need asm volatile, although that wouldn't hurt.
You're 100% on your own and responsible for correctly tricking the compiler by using inline asm that works for whatever optimization level you're using.
Another safer way would be write your own light-weight _start that calls main:
// at global scope:
asm(
".globl _start \n"
"_start: \n"
" mov (%rsp), %rdi \n" // argc
" lea 8(%rsp), %rsi \n" // argv
" lea 8(%rsi, %rdi, 8), %rdx \n" // envp
" call main \n"
// NOT DONE: stdio cleanup or other atexit stuff
// DO NOT USE WITH GLIBC; use libc's CRT code if you use libc
" mov %eax, %edi \n"
" mov $231, %eax \n"
" syscall" // exit_group( main() )
);
int main(int argc, char**argv, char**envp) {
... your code here
return 0;
}
If you didn't want main to return, you could just pop %rdi; mov %rsp, %rsi ; jmp main to give it argc and argv without a return address.
Then main can exit via inline asm, or by calling exit() or _exit() if you link libc. (But if you link libc, you should usually use its _start.)
See also: How Get arguments value using inline assembly in C without Glibc? for other hand-rolled _start versions; this is pretty much like #zwol's there.

Overriding a standard function behaves unexpectedly

[Update] 2016.07.02
main.c
#include <stdio.h>
#include <string.h>
size_t strlen(const char *str) {
printf("%s\n", str);
return 99;
}
int main() {
const char *str = "AAA";
size_t a = strlen(str);
strlen(str);
size_t b = strlen("BBB");
return 0;
}
The expected output is
AAA
AAA
BBB
Compile with gcc -O0 -o main main.c :
AAA
Compile with gcc -O3 -o main main.c
AAA
AAA
The corresponding asm code with flag -O0
000000000040057c <main>:
40057c: 55 push %rbp
40057d: 48 89 e5 mov %rsp,%rbp
400580: 48 83 ec 20 sub $0x20,%rsp
400584: 48 c7 45 e8 34 06 40 movq $0x400634,-0x18(%rbp)
40058b: 00
40058c: 48 8b 45 e8 mov -0x18(%rbp),%rax
400590: 48 89 c7 mov %rax,%rdi
400593: e8 c5 ff ff ff callq 40055d <strlen>
400598: 48 89 45 f0 mov %rax,-0x10(%rbp)
40059c: 48 c7 45 f8 03 00 00 movq $0x3,-0x8(%rbp)
4005a3: 00
4005a4: b8 00 00 00 00 mov $0x0,%eax
4005a9: c9 leaveq
4005aa: c3 retq
4005ab: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
and with -O3 :
0000000000400470 <main>:
400470: 48 83 ec 08 sub $0x8,%rsp
400474: bf 24 06 40 00 mov $0x400624,%edi
400479: e8 c2 ff ff ff callq 400440 <puts#plt>
40047e: bf 24 06 40 00 mov $0x400624,%edi
400483: e8 b8 ff ff ff callq 400440 <puts#plt>
400488: 31 c0 xor %eax,%eax
40048a: 48 83 c4 08 add $0x8,%rsp
40048e: c3 retq
With the flag -O0, why is the second and third call to strlen did not call the user defined strlen ?
And with -O3, why is the third call to strlen been optimized out?
GCC recognizes strlen() has a builtin and replaced it builtins. Hence, your version of strlen() isn't called.
I compiled your code with -fno-builtin which disables the builtins and I get the Log in strlen output twice from the 1st and 3rd strlen() calls. Probably the second strlen() gets optimized away as it's return value isn't used. This probably happens before GCC recognizes that it can't use the builtin for strlen(). Otherwise, it can't optimize away 2nd strlen() call because it has the side effect of printing the message.
Similarly, if store the result of 2nd strlen() call with something like:
size_t b = strlen(str); // call 2
then I see, "Log in strlen()" getting printed 3 times.
If I compile with -O3 (either with or without -fno-builtin), I get no output at all because, as said before, GCC optimizes away the whole
program.
This is not an issue with GCC because redefining a standard function is technically undefined behaviour and hence GCC has the liberty handle it in anyway.
Probably build in function enabled by default. If you wise call user define function, then change return type of strlen function.So,
in strlen_adder.h
int strlen(const char*);
in strlen_adder.c
int strlen(const char *s)
Also remove #include<string.h> in main.c file.

gcc function parameter alignment on stack frame

I have this test.c on my Ubuntu14.04 x86_64 system.
void foo(int a, long b, int c) {
}
int main() {
foo(0x1, 0x2, 0x3);
}
I compiled this with gcc --no-stack-protector -g test.c -o test and got the assembly code with objdump -dS test -j .text
00000000004004ed <_Z3fooili>:
void foo(int a, long b, int c) {
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d fc mov %edi,-0x4(%rbp)
4004f4: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4004f8: 89 55 f8 mov %edx,-0x8(%rbp) // !!Attention here!!
}
4004fb: 5d pop %rbp
4004fc: c3 retq
00000000004004fd <main>:
int main() {
4004fd: 55 push %rbp
4004fe: 48 89 e5 mov %rsp,%rbp
foo(0x1, 0x2, 0x3);
400501: ba 03 00 00 00 mov $0x3,%edx
400506: be 02 00 00 00 mov $0x2,%esi
40050b: bf 01 00 00 00 mov $0x1,%edi
400510: e8 d8 ff ff ff callq 4004ed <_Z3fooili>
}
400515: b8 00 00 00 00 mov $0x0,%eax
40051a: 5d pop %rbp
40051b: c3 retq
40051c: 0f 1f 40 00 nopl 0x0(%rax)
I know that the function parameters should be pushed to stack from right to left in sequence. So I was expecting this
void foo(int a, long b, int c) {
push %rbp
mov %rsp,%rbp
mov %edi,-0x4(%rbp)
mov %rsi,-0x10(%rbp)
mov %edx,-0x14(%rbp) // c should be push on stack after b, not after a
But gcc seemed clever enough to push parameter c(0x3) right after a(0x1) to save the four bytes which should be reserved for data alignment of b(0x2). Can someone please explain this and show me some documentation on why gcc did this?
The parameters are passed in registers - edi, esi, edx (then rcx, r8, r9 and only then pushed on stack) - just what the Linux amd64 calling convention mandates.
What you see in your function is just how the compiler saves them upon entry when compiling with -O0, so they're in memory where a debugger can modify them. It is free to do it in any way it wants, and it cleverly does this space optimization.
The only reason it does this is that gcc -O0 always spills/reloads all C variables between C statements to support modifying variables and jumping between lines in a function with a debugger.
All this would be optimized out in release build in the end.

Resources