Why is Clang's assembler output conditionally jumping to an unconditional jump? - c

While debugging an issue with a program crashing on a mangled pointer being dereferenced, I ran lldb and did a disassembly of the crashing function. While perusing the disassembled code, I noticed this odd-looking choice of instructions:
0x100002b06 <+86>: cmpl $0x0, %eax
0x100002b09 <+89>: je 0x100002b14
0x100002b0f <+95>: jmp 0x10000330e
0x100002b14 <+100>: jmp 0x100002c1d
I would expect the code to look like this instead:
0x100002b06 <+86>: cmpl $0x0, %eax
0x100002b09 <+89>: je 0x100002c1d
0x100002b0f <+95>: jmp 0x10000330e
I'm curious as to why Clang made this choice. Is it some sort of branch prediction optimization since this is a NULL pointer check that's very unlikely to match?
edit: This is the originating C code, specifically the line with the NULL pointer check.
traverse = travdone_head;
while (1) {
if (traverse == NULL) nullptr("grokdir() traverse");
/* Don't re-traverse directories we've already seen */
if (inode == traverse->inode && device == traverse->device) {

-O0 is for
Reduce compilation time and make debugging produce the expected
results. This is the default.
It could be interesting to compare with the according source code.

Related

Why <puts> is called? [duplicate]

This question already has answers here:
Compiler changes printf to puts
(2 answers)
Closed last year.
I'm trying to see the disassembled binary of a simple C program in gdb.
C program :
int main(){
int i = 2;
if (i == 0){
printf("YES, it's 0!\n");
}else{
printf("NO");
}
return 0;
}
The disassembled instructions :
0x0000000100401080 <+0>: push rbp
0x0000000100401081 <+1>: mov rbp,rsp
0x0000000100401084 <+4>: sub rsp,0x30
0x0000000100401088 <+8>: call 0x1004010e0 <__main>
0x000000010040108d <+13>: mov DWORD PTR [rbp-0x4],0x2
0x0000000100401094 <+20>: cmp DWORD PTR [rbp-0x4],0x0
0x0000000100401098 <+24>: jne 0x1004010ab <main+43>
0x000000010040109a <+26>: lea rax,[rip+0x1f5f] # 0x100403000
0x00000001004010a1 <+33>: mov rcx,rax
0x00000001004010a4 <+36>: call 0x100401100 <puts>
0x00000001004010a9 <+41>: jmp 0x1004010ba <main+58>
0x00000001004010ab <+43>: lea rax,[rip+0x1f5b] # 0x10040300d
0x00000001004010b2 <+50>: mov rcx,rax
0x00000001004010b5 <+53>: call 0x1004010f0 <printf>
0x00000001004010ba <+58>: mov eax,0x0
0x00000001004010bf <+63>: add rsp,0x30
0x00000001004010c3 <+67>: pop rbp
0x00000001004010c4 <+68>: ret
0x00000001004010c5 <+69>: nop
And I suppose the instruction,
0x00000001004010a4 <+36>: call 0x100401100 <puts>
points to
printf("YES, it's 0!\n");
Now let us assume it is,
then my doubt is why <push> is called here , but <printf> is called at 0x00000001004010b5 <+53>: call 0x1004010f0 <printf> ?
Using the semantics defined in the C Standard, printf("YES, it's 0!\n") produces the same output as puts("YES, it's 0!"), which may be more efficient as the string does not need to be analysed for replacements.
Since the return value is not used, the compiler can replace the printf call with the equivalent call to puts.
This type of optimisation was likely introduced as a way to reduce the executable size for the classic K&R program hello.c. Replacing the printf with puts avoids linking the printf code which is substantially larger than that of puts. In your case, this optimisation is counter productive as both puts and printf are linked, but modern systems use dynamic linking, so it is no longer meaningful to try and reduce executable size this way.
You can play with compiler settings on this Godbolt compiler explorer page to observe compiler behavior:
even with -O0, gcc performs the printf / puts substitution, but clang does not and both compilers generate code for both calls, not optimizing the test if (i == 0), which is OK with optimisations disabled. I suspect the gcc team could not resist biassing size benchmarks even with optimisations disabled.
with -O1 and beyond, both compilers only generate code for the else branch, calling printf.
if you change the second string to just "N", printf is converted to a call to putchar, yet another optimisation.
It's an optimization.
Calling printf with a format string that has no format specifiers and a trailing newline is equivalent to calling puts with the same string with the trailing newline removed.
Since printf has a lot of logic for handling format specifiers but puts just writes the string given, the latter will be faster. So in the case of the first call to printf the compiler sees this equivalence and makes the appropriate substitution.

Interpreting this line in Assembly language?

Below are the first 5 lines of a disassembled C program that I am trying to reverse engineer back into it's C code for purposes of better learning assembly language. At the beginning of this code I see it makes room on the stack and immediately calls
0x000000000040054e <+8>: mov %fs:0x28,%rax
I am confused what this line does, and what might be calling this from the corresponding C program. The only time I have seen this line so far is when a different method within a C program is called, but this time it is not followed by any Callq instructions so I am not so sure... Any ideas what else could be in this C program to be making this call?
0x0000000000400546 <+0>: push %rbp
0x0000000000400547 <+1>: mov %rsp,%rbp
0x000000000040054a <+4>: sub $0x40,%rsp
0x000000000040054e <+8>: mov %fs:0x28,%rax
0x0000000000400557 <+17>: mov %rax,-0x8(%rbp)
0x000000000040055b <+21>: xor %eax,%eax
0x000000000040055d <+23>: movl $0x17,-0x30(%rbp)
...
I know this is to provide some form of stack protection for buffer overflow attacks, I just need to know what C code would prompt this protection if not for a seperate method.
As you say, this is code used to defend against buffer overflows. The compiler generates this "stack canary check" for functions that have local variables that might be buffers that could be overflowed. Note the instructions immediately above and below the line you are asking about:
sub $0x40, %rsp
mov %fs:0x28, %rax
mov %rax, -0x8(%ebp)
xor %eax, %eax
The sub allocates 64 bytes of space on the stack, which is enough room for at least one small array. Then a secret value is copied from %fs:0x28 to the top of that space, just below the previous frame pointer and the return address, and then it is erased from the register file.
The body of the function does something with arrays; if it writes sufficiently far past the end of an array, it will overwrite the secret value. At the end of the function, there will be code along the lines of
mov -0x8(%rbp), %rax
xor %fs:28, %rax
jne 1
mov %rbp, %rsp
pop %rbp
ret
1:
call __stack_chk_fail # does not return
This verifies that the secret value is unchanged, and crashes the program if it has changed. The idea is that someone trying to exploit a simple buffer overflow vulnerability, like you have when you use gets, won't be able to change the return address without also modifying the secret value.
The compiler has several different heuristics, selectable with command line options, for deciding when it is necessary to generate stack-canary protection code.
You can't write C code corresponding to this assembly language yourself, because it uses the unusual %fs:nnnn addressing mode; the stack-canary code intentionally uses an addressing mode that no other code generation relies on, to make it as difficult as possible for the adversary to learn the secret value.

How to associate assembly code to exact line in C program?

Here is an example found via an assembly website. This is the C code:
int main()
{
int a = 5;
int b = a + 6;
return 0;
}
Here is the associated assembly code:
(gdb) disassemble
Dump of assembler code for function main:
0x0000000100000f50 <main+0>: push %rbp
0x0000000100000f51 <main+1>: mov %rsp,%rbp
0x0000000100000f54 <main+4>: mov $0x0,%eax
0x0000000100000f59 <main+9>: movl $0x0,-0x4(%rbp)
0x0000000100000f60 <main+16>: movl $0x5,-0x8(%rbp)
0x0000000100000f67 <main+23>: mov -0x8(%rbp),%ecx
0x0000000100000f6a <main+26>: add $0x6,%ecx
0x0000000100000f70 <main+32>: mov %ecx,-0xc(%rbp)
0x0000000100000f73 <main+35>: pop %rbp
0x0000000100000f74 <main+36>: retq
End of assembler dump.
I can safely assume that this line of assembly code:
0x0000000100000f6a <main+26>: add $0x6,%ecx
correlates to this line of C:
int b = a + 6;
But is there a way to extract which lines of assembly are associated to the specific line of C code?
In this small sample it's not too difficult, but in larger programs and when debugging a larger amount of code it gets a bit cumbersome.
But is there a way to extract which lines of assembly are associated to the specific line of C code?
Yes, in principle - your compiler can probably do it (GCC option -fverbose-asm, for example). Alternatively, objdump -lSd or similar will disassemble a program or object file with source and line number annotations where available.
In general though, for a large optimized program, this can be very hard to follow.
Even with perfect annotation, you'll see the same source line mentioned multiple times as expressions and statements are split up, interleaved and reordered, and some instructions associated with multiple source expressions.
In this case, you just need to think about the relationship between your source and the assembly, but it takes some effort.
One of the best tools I've found for this is Matthew Godbolt's Compiler Explorer.
It features multiple compiler toolchains, auto-recompiles, and it immediately shows the assembly output with colored lines to show the corresponding line of source code.
First, you need to compile the program keeping inside its object file informations about the source code either via gdwarf or g flag or both. Next, if you want to debug it is important for the compiler to avoid optimizations, otherwise it is difficult to see a correspondence code<>assembly.
gcc -gdwarf -g3 -O0 prog.c -o out
Next, tell the disassembler to output the source code. The source flag involves the disassemble flag.
objdump --source out
#Useless is very right. Anyways, a trick to know where C has arrived in the machine code is to inject markers in it; for instance,
#define ASM_MARK do { asm __volatile__("nop; nop; nop;\n\t" :::); } while (0);
int main()
{
int a = 5;
ASM_MARK;
int b = a + 6;
ASM_MARK;
return 0;
}
You will see:
main:
pushq %rbp
movq %rsp, %rbp
movl $5, -4(%rbp)
nop; nop; nop;
movl -4(%rbp), %eax
addl $6, %eax
movl %eax, -8(%rbp)
nop; nop; nop;
movl $0, %eax
popq %rbp
ret
You need to use the __volatile__ keyword or equivalent in order to tell the compiler not to interfere and this is often compiler-specific (notice the __), as C does not
provide this kind of syntax.

Does GCC generate suboptimal code for static branch prediction?

From my university course, I heard, that by convention it is better to place more probable condition in if rather than in else, which may help the static branch predictor. For instance:
if (check_collision(player, enemy)) { // very unlikely to be true
doA();
} else {
doB();
}
may be rewritten as:
if (!check_collision(player, enemy)) {
doB();
} else {
doA();
}
I found a blog post Branch Patterns, Using GCC, which explains this phenomenon in more detail:
Forward branches are generated for if statements. The rationale for
making them not likely to be taken is that the processor can take
advantage of the fact that instructions following the branch
instruction may already be placed in the instruction buffer inside the
Instruction Unit.
next to it, it says (emphasis mine):
When writing an if-else statement, always make the "then" block more
likely to be executed than the else block, so the processor can take
advantage of instructions already placed in the instruction fetch
buffer.
Ultimately, there is article, written by Intel, Branch and Loop Reorganization to Prevent Mispredicts, which summarizes this with two rules:
Static branch prediction is used when there is no data collected by the
microprocessor when it encounters a branch, which is typically the
first time a branch is encountered. The rules are simple:
A forward branch defaults to not taken
A backward branch defaults to taken
In order to effectively write your code to take advantage of these
rules, when writing if-else or switch statements, check the most
common cases first and work progressively down to the least common.
As I understand, the idea is that pipelined CPU may follow the instructions from the instruction cache without breaking it by jumping to another address within code segment. I am aware, though, that this may be largly oversimplified in case modern CPU microarchitectures.
However, it looks like GCC doesn't respect these rules. Given the code:
extern void foo();
extern void bar();
int some_func(int n)
{
if (n) {
foo();
}
else {
bar();
}
return 0;
}
it generates (version 6.3.0 with -O3 -mtune=intel):
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
jne .L6 ; here, forward branch if (n) is (conditionally) taken
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
.L6:
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
The only way, that I found to force the desired behavior is by rewriting the if condition using __builtin_expect as follows:
if (__builtin_expect(n, 1)) { // force n condition to be treated as true
so the assembly code would become:
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
je .L2 ; here, backward branch is (conditionally) taken
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
.L2:
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
The short answer: no, it is not.
GCC does metrics ton of non trivial optimization and one of them is guessing branch probabilities judging by control flow graph.
According to GCC manual:
fno-guess-branch-probability
Do not guess branch probabilities using
heuristics.
GCC uses heuristics to guess branch probabilities if they are not
provided by profiling feedback (-fprofile-arcs). These heuristics are
based on the control flow graph. If some branch probabilities are
specified by __builtin_expect, then the heuristics are used to guess
branch probabilities for the rest of the control flow graph, taking
the __builtin_expect info into account. The interactions between the
heuristics and __builtin_expect can be complex, and in some cases, it
may be useful to disable the heuristics so that the effects of
__builtin_expect are easier to understand.
-freorder-blocks may swap branches as well.
Also, as OP mentioned the behavior might be overridden with __builtin_expect.
Proof
Look at the following listing.
void doA() { printf("A\n"); }
void doB() { printf("B\n"); }
int check_collision(void* a, void* b)
{ return a == b; }
void some_func (void* player, void* enemy) {
if (check_collision(player, enemy)) {
doA();
} else {
doB();
}
}
int main() {
// warming up gcc statistic
some_func((void*)0x1, NULL);
some_func((void*)0x2, NULL);
some_func((void*)0x3, NULL);
some_func((void*)0x4, NULL);
some_func((void*)0x5, NULL);
some_func(NULL, NULL);
return 0;
}
It is obvious that check_collision will return 0 most of the times. So, the doB() branch is likely and GCC can guess this:
gcc -O main.c -o opt.a
objdump -d opt.a
The asm of some_func is:
sub $0x8,%rsp
cmp %rsi,%rdi
je 6c6 <some_func+0x18>
mov $0x0,%eax
callq 68f <doB>
add $0x8,%rsp
retq
mov $0x0,%eax
callq 67a <doA>
jmp 6c1 <some_func+0x13>
But for sure, we can enforce GCC from being too smart:
gcc -fno-guess-branch-probability main.c -o non-opt.a
objdump -d non-opt.a
And we will get:
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
mov %rdi,-0x8(%rbp)
mov %rsi,-0x10(%rbp)
mov -0x10(%rbp),%rdx
mov -0x8(%rbp),%rax
mov %rdx,%rsi
mov %rax,%rdi
callq 6a0 <check_collision>
test %eax,%eax
je 6ef <some_func+0x33>
mov $0x0,%eax
callq 67a <doA>
jmp 6f9 <some_func+0x3d>
mov $0x0,%eax
callq 68d <doB>
nop
leaveq
retq
So GCC will leave branches in source order.
I used gcc 7.1.1 for those tests.
I Think That You've Found A "Bug"
The funny thing is that optimization for space and no optimization are the only cases in which the "optimal" instruction code is generated: gcc -S [-O0 | -Os] source.c
some_func:
FB0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
cmpl $0, 8(%ebp)
je L2
call _foo
jmp L3
2:
call _bar
3:
movl $0, %eax
# Or, for -Os:
# xorl %eax, %eax
leave
ret
My point is that ...
some_func:
FB0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
cmpl $0, 8(%ebp)
je L2
call _foo
... up to & through the call to foo everything is "optimal", in the traditional sense, regardless of the exit strategy.
Optimality is ultimately determined by the processor, of course.

x86-64 Assembly "cmovge" to C code

While I shouldn't list out the entire 4 line sample I'm given, (since this is a homework question) I'm confused how this should be read and translated into C.
cmovge %edi, %eax
What I understand so far is that the instruction is a conditional move for when the result is >=. It's comparing the first parameter of a function %edi to the integer register %eax (which was assigned the other parameter value %esi in the previous line of assembly code). However, I don't understand its result.
My problem is interpreting the optimized code. It doesn't manipulate the stack, and I'm not sure how to write this in C (or at least the gcc switch I could even use to generate the same result when compiling).
Could someone please give a few small examples of how the cmovge instruction might translate into C code? If it doesn't make sense as its own line of code, feel free to make something up with it.
This is in x86-64 assembly through a virtualized Linux operating system (CentOS 7).
I'm probably giving you the whole solution here:
int
doit(int a, int b) {
return a >= b ? a : b;
}
With gcc -O3 -masm=intel becomes:
doit:
.LFB0:
.cfi_startproc
cmp edi, esi
mov eax, esi
cmovge eax, edi
ret
.cfi_endproc

Resources