Jump to a label from inline assembly to C

Jump to a label from inline assembly to C - c

I have a written piece of code in assembly and at some points of it, I want to jump to a label in C. So I have the following code (shortened version but still, I am having the same problem):
#include <stdio.h>
#define JE asm volatile("jmp end");
int main(){
printf("hi\n");
JE
printf("Invisible\n");
end:
printf("Visible\n");
return 0;
}
This code compiles, but there is no end label in the disassembled version of the code.
If I change the label name from end to any other thing (let's say l1, both in asm code(jmp l1) and in the C code), the compiler says that
main.c:(.text+0x6b): undefined reference to `l1'
collect2: error: ld returned 1 exit status
Makefile:2: recipe for target 'main' failed
make: *** [main] Error 1
I have tried different things(different length, different cases, upper, lower, etc.) and I think it only compiles with end label. And with end label, I am receiving segmentation fault because, there is no end label in the disassembled version.
Compiled with: gcc -O0 main.c -o main
Disassembled code:
000000000000063a <main>:
63a: 55 push %rbp
63b: 48 89 e5 mov %rsp,%rbp
63e: 48 8d 3d af 00 00 00 lea 0xaf(%rip),%rdi # 6f4 <_IO_stdin_used+0x4>
645: e8 c6 fe ff ff callq 510 <puts#plt>
64a: e9 c9 09 20 00 jmpq 201018 <_end> # there is no _end label!
64f: 48 8d 3d a1 00 00 00 lea 0xa1(%rip),%rdi # 6f7 <_IO_stdin_used+0x7>
656: e8 b5 fe ff ff callq 510 <puts#plt>
65b: 48 8d 3d 9f 00 00 00 lea 0x9f(%rip),%rdi # 701 <_IO_stdin_used+0x11>
662: e8 a9 fe ff ff callq 510 <puts#plt>
667: b8 00 00 00 00 mov $0x0,%eax
66c: 5d pop %rbp
66d: c3 retq
66e: 66 90 xchg %ax,%ax
So, the questions are:
Am I doing something wrong? I have seen this kind of jumps (from
assembly to C) in codes. I can provide example links.
Why the compiler/linker cannot find l1 but can find end?

This is what asm goto is for. GCC Inline Assembly: Jump to label outside block
Note that defining a label inside another asm statement will sometimes work (e.g. with optimization disabled) but IS NOT SAFE.
asm("end:"); // BROKEN; NEVER USE
// except for toy experiments to look at compiler output
GNU C does not define the behaviour of jumping from one asm statement to another without asm goto. The compiler is allowed to assume that execution comes out the end of an asm statement and e.g. put a store after it.
The C end: label within a given function won't just have the asm symbol name of end or _end: - that wouldn't make sense because separate C functions are each allowed to have their own end: label. It could be something like main.end but it turns out GCC and clang just use their usual autonumbered labels like .L123.
Then how this code works: https://github.com/IAIK/transientfail/blob/master/pocs/spectre/PHT/sa_oop/main.c
It doesn't; the end label that asm volatile("je end"); references is in the .data section and happens to be defined by the compiler or linker to mark the end of that section.
asm volatile("je end") has no connection to the C label in that function.
I commented out some of the code in other functions to get it to compile without the "cacheutils.h" header but that didn't affect that part of the oop() function; see https://godbolt.org/z/jabYu3 for disassembly of the linked executable with JE_4k changed to JE_16 so it's not huge. It's disassembly of a linked executable so you can see the numeric address of je 6010f0 <_end> while the oop function itself starts at 4006e0 and ends at 400750. (So it doesn't contain the branch target).
If this happens to work for Spectre exploits, that's because apparently the branch is never actually taken.

Related

Minimal 64-bit Windows executable crashes with tail-call optimization enabled by gcc

I'm trying to create a minimal 64-bit Windows executable to better understand how the Windows executable format works.
I wrote very basic assembly and C code as follows.
hi.s
section .text
hi:
db "hi", 0
global sayHi
align 16
sayHi:
lea rax, [rel hi]
ret
start.c
extern int puts();
extern const char *sayHi();
void start() {
puts(sayHi());
}
compiled with,
nasm -fwin64 hi.s
gcc -c -ostart.obj -O3 -fno-optimize-sibling-calls start.c
# I will explain the flag
and linked with,
golink /fo r.exe /console start.obj hi.obj msvcrt.dll
# create a console application `r.exe`
# the default entry point is `start`
The program runs fine and prints hi, but note the gcc flag -fno-optimize-sibling-calls. That flag disables tail-call optimizations so that the program always allocates stack space and calls a function. Without the flag, the program crashes.
This is the disassembled result without tail-call optimization. Not sure why gcc put a nop there, but otherwise it's very simple and runs fine.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: e8 ff 2f 00 00 call 0x404010 # puts
401011: 90 nop
401012: 48 83 c4 28 add rsp,0x28
401016: c3 ret
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip+0xffffffffffffffe9] # 0x401020
401037: c3 ret
This is when tail-call opt is enabled, in which the program crashes.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: 48 83 c4 28 add rsp,0x28
401010: e9 eb 2f 00 00 jmp 0x404000 # puts
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip+0xffffffffffffffe9] # 0x401020
401037: c3 ret
Now the program doesn't allocate stack space before puts and simply does a jmp instead of call.
I investigated further to see where exactly it jumps when calling puts.
In the no-tail-call case, the called address 0x404010 in the .idata section has the instruction jmp QWORD PTR [rip+0xffffffffffffffea] # 0x404000, and 0x404000 seems to contain the address to puts.
However in the tail-call case, the called address 0x404000 has 54 40 00 00 which is no meaningful instruction. The debugger says the program segfaults at 0x404003, so I'm pretty sure the program chokes trying to execute a garbage instruction.
I must be doing something wrong, but I'm not sure which, so could you explain why the tail-call case fails and how to get it work?

The problem was on golink not correctly handling tail-calls. I searched a while to make GNU ld link the program with the same options given to golink.
You can create a console-mode Windows executable by GNU ld with this command.
ld -o... --subsystem=console object-files...
--subsystem console or -subsystem=console also means the same. Use --subsystem=windows to create a GUI application.
GNU ld also handles Windows dll files, so in this case, simply giving ld a copy of msvcrt.dll from the system folder worked.

Why assembly code is different for simple C program with different gcc version?

I'm understanding the basics of assembly and c programming.
I compiled following simple program in C,
#include <stdio.h>
int main()
{
int a;
int b;
a = 10;
b = 88
return 0;
}
Compiled with following command,
gcc -ggdb -fno-stack-protector test.c -o test
The disassembled code for above program with gcc version 4.4.7 is:
5 push %ebp
89 e5 mov %esp,%ebp
83 ec 10 sub $0x10,%esp
c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
c7 45 fc 58 00 00 00 movl $0x58,-0x4(%ebp)
b8 00 00 00 00 mov $0x0,%eax
c9 leave
c3 ret
90 nop
However disassembled code for same program with gcc version 4.3.3 is:
8d 4c 23 04 lea 0x4(%esp), %ecx
83 e4 f0 and $0xfffffff0, %esp
55 push -0x4(%ecx)
89 e5 mov %esp,%ebp
51 push %ecx
83 ec 10 sub $0x10,%esp
c7 45 f4 0a 00 00 00 00 movl $0xa, -0xc(%ebp)
c7 45 f8 58 00 00 00 00 movl $0x58, -0x8(%ebp)
b8 00 00 00 00 mov $0x0, %eax
83 c4 10 add $0x10,%esp
59 pop %ecx
5d pop %ebp
8d 61 fc lea -0x4(%ecx),%esp
c3 ret
Why there is difference in the assembly code?
As you can see in second assembled code, Why pushing %ecx on stack?
What is significance of and $0xfffffff0, %esp?
note: OS is same

Compilers are not required to produce identical assembly code for the same source code. The C standard allows the compiler to optimize the code as they see fit as long as the observable behaviour is the same. So, different compilers may generate different assembly code.
For your code, GCC 6.2 with -O3 generates just:
xor eax, eax
ret
because your code essentially does nothing. So, it's reduced to a simple return statement.

To give you some idea, how many ways exists to create valid code for particular task, I thought this example may help.
From time to time there are size coding competitions, obviously targetting Assembly programmers, as you can't compete with compiler against hand written assembly at this level at all.
The competition tasks are fairly trivial to make the entry level and total effort reasonable, with precise input and output specifications (down to single byte or pixel perfection).
So you have almost trivial exact task, human produced code (at the moment still outperforming compilers for trivial task), with single simple rule "minimal size" as a goal.
With your logic it's absolutely clear every competitor should produce the same result.
The real world answer to this is for example:
Hugi Size Coding Competition Series - Compo29 - Random Maze Builder
12 entries, size of code (in bytes): 122, 122, 128, 135, 136, 137, 147, ... 278 (!).
And I bet the first two entries, both having 122B are probably different enough (too lazy to actually check them).
Now producing valid machine code from high level programming language and by machine (compiler) is lot more complex task. And compilers can't compete with humans in reasoning, most of the "how good code is produced by c++ compiler" stems from C++ language itself being defined quite close to machine code (easy to compile) and from brute CPU power allowing the compilers to work on thousands of variants for particular code path, searching for near-optimal solution mostly by brute force.
Still the numerical "reasoning" behind the optimizers are state of art in their own way, getting to the point where human are still unreachable, but more like in their own way, just as humans can't achieve the efficiency of compilers within reasonable effort for full-sized app compilation.
At this point reasoning about some debug code being different in few helper prologue/epilogue instructions... Even if you would find difference in optimized code, and the difference being "obvious" to human, it's still quite a feat the compiler can produce at least that, as compiler has to apply universal rules on specific code, without truly understanding the context of task.

Understanding assembly .long directive

In Secure programming cookbook for C and C++ from John Viega I met the following statement
asm("value_stored: \n"
".long 0xFFFFFFFF \n"
);
I do not really understand the use of .long directive in assembly, but here it is used to embed a precalculated value in the executable. Can I somehow force the position of these bytes in the executable? I have tried to put it at the end of main (thinking that this way will be at the end of .text section), but I got segmentation fault. Putting it outside the main works.

Even at the end of main the inline assembler sequence will generate code to be executed. In my environment objdump -d foo.o shows:
00000000004004b4 <main>:
4004b4: 55 push %rbp
4004b5: 48 89 e5 mov %rsp,%rbp
00000000004004b8 <value>:
4004b8: ff (bad)
4004b9: ff (bad)
4004ba: ff (bad)
4004bb: ff (bad)
4004bc: b8 01 00 00 00 mov $0x1,%eax
4004c1: 5d pop %rbp
4004c2: c3 retq
This can be mitigated by jumping over it
asm("jmp 1f"
"value: .long 0xffffffff"
"1:");
Keywords Nf or Nb create local temporary labels to jump forward or backwards.
Another option will be to place the variable to a named segment, which can be sorted in the linker file as the last segment in either .text or .data.

C: Var and Function have same name -- a bug of ld?

In my project (https://github.com/zzt93/os-lab1), I encounter that a global variable has the same name with a function, but compile it produce on error or warning, which cause a bug.
A simple program which can almost reproduce this problem:
//a.c
struct {
int t;
int *s;
} empty, full;
int main(){
printf("full is at %p", &full);
printf("empty is at %p", &empty);
empty.t = 1;
return 0;
}
//b.c
int empty() {
return 1;
}
Compiling them use gcc -o res.out -Wall -g -Wextra a.c b.c
will just produce some warning like this (notice: in my project, it even produce no error):
/usr/bin/ld: Warning: alignment 1 of symbol empty in /tmp/ccq70SCM.o is smaller than 16 in /tmp/ccVCOeWq.o
/usr/bin/ld: Warning: size of symbol empty changed from 16 in /tmp/ccVCOeWq.o to 11 in /tmp/ccq70SCM.o
/usr/bin/ld: Warning: type of symbol empty changed from 1 to 2 in /tmp/ccq70SCM.o
it seems it take struct empty and function empty as the same one.
Decompile it, you can clearly see that linker link the address of function empty rather than that struct empty.So try running res.out will cause segment fault.
40054e: be 73 05 40 00 mov $0x400573,%esi
400553: bf 12 06 40 00 mov $0x400612,%edi
400558: b8 00 00 00 00 mov $0x0,%eax
40055d: e8 ae fe ff ff callq 400410 <printf#plt>
400562: c7 05 07 00 00 00 01 movl $0x1,0x7(%rip) # 400573 <empty>
400569: 00 00 00
40056c: b8 00 00 00 00 mov $0x0,%eax
400571: 5d pop %rbp
400572: c3 retq
0000000000400573 <empty>:
400573: 55 push %rbp
400574: 48 89 e5 mov %rsp,%rbp
400577: b8 01 00 00 00 mov $0x1,%eax
40057c: 5d pop %rbp
40057d: c3 retq
40057e: 66 90 xchg %ax,%ax
Question:
Why linker choose function rather than that struct? Am i right to think it as a bug?
why add a static for the declaration of struct can prevent this error? -- I understand that static make the variable invisible outside this file, but notice I add static to struct empty not function empty solving the problem.
Edit::
And strange enough, in the symbol table of res.out, there is only one empty
Name Value Class Type Size Line Section
empty |0000000000400573| T | FUNC|000000000000000b| |.text
I am using
gcc version 4.9.2

Adding static prevents the error because static, when applied to functions or global variables, makes the symbol not be exported to the linker - in simple words, it makes it "private" to that file.
If you don't use static, the linker will see both definitions, but the types don't match. However, since compilation is applied file by file, the linker has no way to know the correct type of a variable - it must trust that you did your job and didn't lie.
This is why header files are important - it makes sure that types match in different files.

Generating simple shell binary code to be copied to the stack for stack overflow

I am trying to implement the buffer overflow attack, but I need to generate instruction code in binary so I can put it on the stack to be executed. My problem is that the instructions that I am getting now have jumps to different parts of a program which becomes hard to put on the stack. So I have this simple piece of code (not the code to be exploited) that I want to put on the stack to spawn a new shell.
#include <stdio.h>
int main( ) {
char *buf[2];
buf[0] = "/bin/bash";
buf[1] = NULL;
execve(buf[0], buf, NULL);
}
The code is being compiled with gcc with the following flags:
CFLAGS = -Wall -Wextra -g -fno-stack-protector -m32 -z execstack
LDFLAGS = -fno-stack-protector -m32 -z execstack
Finally using objdump -d -S, I get the following code (parts of it) in hex:
....
....
08048320 <execve#plt>:
8048320: ff 25 08 a0 04 08 jmp *0x804a008
8048326: 68 10 00 00 00 push $0x10
804832b: e9 c0 ff ff ff jmp 80482f0 <_init+0x3c>
....
....
int main( ) {
80483e4: 55 push %ebp
80483e5: 89 e5 mov %esp,%ebp
80483e7: 83 e4 f0 and $0xfffffff0,%esp
80483ea: 83 ec 20 sub $0x20,%esp
char *buf[2];
buf[0] = "/bin/bash";
80483ed: c7 44 24 18 f0 84 04 movl $0x80484f0,0x18(%esp)
80483f4: 08
buf[1] = NULL;
80483f5: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483fc: 00
execve(buf[0], buf, NULL);
80483fd: 8b 44 24 18 mov 0x18(%esp),%eax
8048401: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048408: 00
8048409: 8d 54 24 18 lea 0x18(%esp),%edx
804840d: 89 54 24 04 mov %edx,0x4(%esp)
8048411: 89 04 24 mov %eax,(%esp)
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
}
8048419: c9 leave
804841a: c3 ret
804841b: 90 nop
As you can see this code is hard to copy onto the stack. execve jumps to a different part of the assembly code to be executed. Is there a way to nicely get a program which can be put compactly on the stack without too much space and branches being used?

If you want a clean assembly code without coding it yourself, do the following:
Move your code to a separate function
Pass the -O0 compilation flag to gcc in order to prevent optimizations
Compile just an object file - use gcc [input file] -o [output file]
Follow these steps, and use the assembly code generated from objdump.
Please keep in mind, you have an external dependency for execve:
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
so you must either include its implementation explicitly and remove the call, or know in advance that the process you want to attack has this function in its address space, and modify the call address to match the process execve address.

Welcome to Stack Overflow (pun intended).
This is an unusual request. Are you trying to get hired by the NSA?
Compilers don't structure assembly code in a very human friendly form, obviously. And their idea of optimization might be for performance rather than for compactness. Therefore, you might consider hand-coding it in assembler, using the compiler output as a guideline to achieve the effect you're going for.
What you have there may not be the best representation of what the compiler can do to inform your investigation in any case. Put the code into a function other than main, so you can just get the minimal stack setup and argument handling necessary, and try compiling it with all the different optimization levels to see what it does.
You're probably getting some extra setup overhead for putting it in main(), because it's the main entry point to your program and has to interface with libc and the OS (just guessing), and may be setting things up for the program's operational context in general (which you could presume would have been done for whichever executable you were inserting the code into so it would be redundant).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight