What is the behavior of these linkers? - c

From this question, I've seen a funny code which compile (although with warnings) and produce a segmentation fault (gcc 4.4.4; clang 2.8):
main;
If we expand it, here is the result:
int main = 0;
So what is the linker's behavior here?

The linker's behavior is that it defines a symbol called main in either the program's data or BSS segment. It is 4 bytes long and initialized to 0. Ordinarily, it creates a symbol in the program's code segment (typically called .text) with the executable code for the main function.
The C runtime starts up at a fixed entry point (typically called _start), initializes a bunch of stuff (e.g. sets up the program's arguments), and calls the main function. When main is executable code, this is all fine and dandy, but if it's instead 4 zero bytes, the program will transfer control to those zero bytes and try to execute them.
Typically, the data and BSS segments are marked as non-executable, so when you try to execute code there, the processor will raise an exception, which the OS will interpret and then terminate your program with a signal. If somehow the segment it's in is executable, then it will try to execute the machine instructions defined by 00 00 00 00. In x86 and x86-64, that's an illegal instruction, so you'd also get a SIGILL signal in POSIX OSes.

Under my system (CentOS 6.3), main is placed into the BSS and contains all 0's, hence the crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000006007f0 in main ()
(gdb) where
#0 0x00000000006007f0 in main ()
(gdb) l
"main" is not a function
(gdb) disass 0x6007f0
Dump of assembler code for function main:
=> 0x00000000006007f0 <+0>: add %al,(%rax)
0x00000000006007f2 <+2>: add %al,(%rax)
End of assembler dump.
(gdb) info symbol &main
main in section .bss of /home/ajd/tmp/x
(gdb) x/16b 0x6007f0
0x6007f0 <main>: 0 0 0 0 0 0 0 0
0x6007f8: 0 0 0 0 0 0 0 0
(gdb)

The symbol main is expected to be a function, not an integer. However, the linker doesn't much care about the type of main; the symbol is defined. If the symbol main is not a function with one of the prescribed signatures, then you invoke undefined behaviour.
The start-up code then calls the 'function', which is actually an address in the data segment of the program. It goes wrong because the 'code' stored at that address is invalid — the first 4 bytes are likely to be zeros; what comes later is anyone's guess. The data segment may be marked non-executable, in which case trying to execute the data will trigger a crash on that account.
When you invoke undefined behaviour, anything can happen. A crash is quite sensible here.

Related

What is in the address of main?

A simple piece of code like this
#include<stdio.h>
int main()
{
return 0;
}
check the value in "&main" with gdb,I got 0xe5894855, I wonder what's this?
(gdb) x/x &main
0x401550 <main>: 0xe5894855
(gdb)
(gdb) x/x &main
0x401550 <main>: 0xe5894855
(gdb)
0xe5894855 is hex opcodes of the first instructions in main, but since you used x/x now gdb is displaying it as just a hex number and is backwards due to x86-64 being little-endian. 55 is the opcode for push rbp and the first instruction of main. Use x/i &main to view the instructions.
check the value in "&main" with gdb,I got 0xe5894855, I wonder what's this?
The C expression &main evaluates to a pointer to (function) main.
The gdb command
x/x &main
prints (eXamines) the value stored at the address expressed by &main, in hexadecimal format (/x). The result in your case is 0xe5894855, but the C language does not specify the significance of that value. In fact, C does not define any strictly conforming way even to read it from inside the program.
In practice, that value probably represents the first four bytes of the function's machine code, interpreted as a four-byte unsigned integer in native byte order. But that depends on implementation details both of GDB of the C implementation involved.
Ok so the 0x401550 is the address of main() and the hex goo to the right is the "contents" of that address, which doesn't make much sense since it's code stored there, not data.
To explain what that hex goo is coming from, we can toy around with some artificial examples:
#include <stdio.h>
int main (void)
{
printf("%llx\n", (unsigned long long)&main);
}
Running this code on gcc x86_64, I get 401040 which is the address of main() on my particular system (this time). Then upon modifying the example into some ugly hard coding:
#include <stdio.h>
int main (void)
{
printf("%llx\n", (unsigned long long)&main);
printf("%.8x\n", *(unsigned int*)0x401040);
}
(Please note that accessing absolute addresses of program code memory like this is dirty hacking. It is very questionable practice and some systems might toss out an hardware exception if you attempt it.)
I get
401040
08ec8348
The gibberish second line is something similar to what gdb would give: the raw op codes for the instructions stored there.
(That is, it's actually a program that prints out the machine code used for printing out the machine code... and now my head hurts...)
Upon disassembly and generating a binary of the executable, then viewing numerical op codes with annotated assembly, I get:
main:
48 83 ec 08
401040 sub rsp,0x8
Where the 48 83 ec 08 is the raw machine code, including the instruction sub with its parameters (x86 assembler isn't exactly my forte, but I believe 48 is "REX prefix" and 83 is the op code for sub). Upon attempting to print this as if it was integer data rather than machine code, it got tossed around according to x86 little endian ordering from 48 83 ec 08 to 08 ec 83 48. And that's the hex gibberish 08ec8348 from before.

32 bit shellcode causes a segmentation fault when trying to push "/bin//sh" to the stack

I'm following the opensecuritytraining course "exploits 1". Currently I'm trying to exploit a simple c program with some shellcode on a 32 bit linux system using a buffer overflow. The c program:
void main(int argc, char **argv)
{
char buf[64];
strcpy(buf,argv[1]);
}
I compiled the program using the command "tcc -g -o basic_vuln basic_vuln.c". Then, I programmed the following shellcode.
section .text
global _start
_start:
xor eax, eax
xor ebx, ebx
xor ecx, ecx
xor edx, edx
mov al, 11
push ebx
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
int 0x80
I compiled it by typing "nasm -f elf shell.asm; ld -o shell shell.o". When I try to execute "shell" on it's own, it works and I get a shell. Next, I disassembled the program with objdump, wrote a perl file which prints the opcodes, and then redirected the output of said perl file along with 39 nop instructions before the shellcode to a file called "shellcode", so the payload is now 64 bytes long, filling the buffer. Then, I opened the c program in gdb, and picked an address in the middle of the nop sled, which will be the new return address (0xbffff540). I appended the address to the "shellcode" file, along with 4 bytes to overwrite the saved frame pointer. The shellcode looks like this:
Now, when I try to run this shellcode in gdb in the c program, it causes a segmentation fault at address 0xbffff575, which points at a certain point in my shellcode, 0x62, which is the character "b" in "/bin/sh". What could cause this?
Here's my stack frame, confirming that the return address I choose does return to the middle of the nop sled.
The course does provide shellcode that does work in gdb in the c program:
After main returns into your shellcode, ESP will probably be pointing just above that buffer. And EIP is pointing to the start of it; that's what returning into it means.
A couple push instructions may modify the machine code at the end of the buffer, leading to a SIGILL with EIP pointing at a byte you just pushed.
Probably the easiest fix is add esp, -128 to go all the way past your buffer. Or sub esp, -128 to go higher up the stack. (-128 is the largest magnitude 8-bit immediate you can use, avoiding introducing zeros in the machine code with sub esp, 128 or 1024. If you wanted to move the stack farther, you could of course construct a larger value in a register.)
I didn't test this guess, but you can confirm it in GDB by single-stepping into your shellcode with si from the end of main to step by instructions.
Use disas after each instruction to see disassembly. Or use layout reg. See the bottom of https://stackoverflow.com/tags/x86/info for more GDB debugging tips.
The given solution is more complicated because it apparently sets up an actual argv array instead of just passing NULL pointers for char **argv and char **envp. (Which on Linux is treated the same as valid pointers to empty NULL-terminated arrays: http://man7.org/linux/man-pages/man2/execve.2.html#NOTES).
But the key difference is that it uses jmp/call/pop to get a pointer to a string already in memory. That's only one stack slot not three. (The end of its payload before the return address is data, not instructions, but it would fail in a different way if it did too many pushes and overwrote the string instead of just storing a 0 terminator. The call jumps backwards before the pushed return address actually modifies the buffer, but if it did overwrite anything near the end it would still break.)
#Margaret looked into this in more detail, and spotted that it's only the 3rd push that breaks anything. That makes sense: the first 2 are presumably overwriting the part of the payload that contained the new return address and the saved EBP value. And it just so happened that the compiler put main's buffer contiguous with that.
If you actually used tcc not gcc, that's probably not a surprise. GCC would have aligned it by 16 and probably for one reason or another left a gap between the buffer and the top of the stack frame.

What is the simplest standard conform way to produce a Segfault in C?

I think the question says it all. An example covering most standards from C89 to C11 would be helpful. I though of this one, but I guess it is just undefined behaviour:
#include <stdio.h>
int main( int argc, char* argv[] )
{
const char *s = NULL;
printf( "%c\n", s[0] );
return 0;
}
EDIT:
As some votes requested clarification: I wanted to have a program with an usual programming error (the simplest I could think of was an segfault), that is guaranteed (by standard) to abort. This is a bit different to the minimal segfault question, which don't care about this insurance.
raise() can be used to raise a segfault:
raise(SIGSEGV);
A segmentation fault is an implementation defined behavior. The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant. To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document. Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.
If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says (emphasis mine going forward):
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
and in paragraph 2 says:
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV) should accomplish that goal. Although, strictly speaking, SIGSEGV is defined as follows:
SIGSEGV an invalid access to storage
and §7.14 Signal handling <signal.h> says:
An implementation need not generate any of these signals, except as a result of explicit calls to the raise function. Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation. The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive.
The standard only mentions undefined behavior. It knows nothing about memory segmentation. Also note that the code that produces the error is not standard-conformant. Your code cannot invoke undefined behavior and be standard conformant at the same time.
Nonetheless, the shortest way to produce a segmentation fault on architectures that do generate such faults would be:
int main()
{
*(int*)0 = 0;
}
Why is this sure to produce a segfault? Because access to memory address 0 is always trapped by the system; it can never be a valid access (at least not by userspace code.)
Note of course that not all architectures work the same way. On some of them, the above could not crash at all, but rather produce other kinds of errors. Or the statement could be perfectly fine, even, and memory location 0 is accessible just fine. Which is one of the reasons why the standard doesn't actually define what happens.
A correct program doesn't produce a segfault. And you cannot describe deterministic behaviour of an incorrect program.
A "segmentation fault" is a thing that an x86 CPU does. You get it by attempting to reference memory in an incorrect way. It can also refer to a situation where memory access causes a page fault (i.e. trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory. To trigger those conditions, you need to program directly for your OS and your hardware. It is nothing that is specified by the C language.
If we assume we are not raising a signal calling raise, segmentation fault is likely to come from undefined behavior. Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations. Moreover a program which invokes undefined behavior is an erroneous program.
But this one is the shortest I can get that segfault on my system:
main(){main();}
(I compile with gcc and -std=c89 -O0).
And by the way, does this program really invokes undefined bevahior?
main;
That's it.
Really.
Essentially, what this does is it defines main as a variable.
In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.
However, the problem rests in how the system runs executables. In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main".
In this particular case, however, main is a variable, so it is placed in a non-executable section of memory called .bss, intended for variables (as opposed to .text for the code). Trying to execute code in .bss violates its specific segmentation, so the system throws a segmentation fault.
To illustrate, here's (part of) an objdump of the resulting file:
# (unimportant)
Disassembly of section .text:
0000000000001020 <_start>:
1020: f3 0f 1e fa endbr64
1024: 31 ed xor %ebp,%ebp
1026: 49 89 d1 mov %rdx,%r9
1029: 5e pop %rsi
102a: 48 89 e2 mov %rsp,%rdx
102d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1031: 50 push %rax
1032: 54 push %rsp
1033: 4c 8d 05 56 01 00 00 lea 0x156(%rip),%r8 # 1190 <__libc_csu_fini>
103a: 48 8d 0d df 00 00 00 lea 0xdf(%rip),%rcx # 1120 <__libc_csu_init>
# This is where the program should call main
1041: 48 8d 3d e4 2f 00 00 lea 0x2fe4(%rip),%rdi # 402c <main>
1048: ff 15 92 2f 00 00 callq *0x2f92(%rip) # 3fe0 <__libc_start_main#GLIBC_2.2.5>
104e: f4 hlt
104f: 90 nop
# (nice things we still don't care about)
Disassembly of section .data:
0000000000004018 <__data_start>:
...
0000000000004020 <__dso_handle>:
4020: 20 40 00 and %al,0x0(%rax)
4023: 00 00 add %al,(%rax)
4025: 00 00 add %al,(%rax)
...
Disassembly of section .bss:
0000000000004028 <__bss_start>:
4028: 00 00 add %al,(%rax)
...
# main is in .bss (variables) instead of .text (code)
000000000000402c <main>:
402c: 00 00 add %al,(%rax)
...
# aaand that's it!
PS: This won't work if you compile to a flat executable. Instead, you will cause undefined behaviour.
On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system. For instance, allocating a large object with malloc can appear to succeed, but later, when the object is accessed, it will crash.
Note that such a program is not strictly conforming; programs which meet that definition have to stay within each of the minimum implementation limits.
A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.
The SIGSEGV signal can be raised explicitly, but there is no SIGSEGV symbol in the standard C library.
(In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.")
The simplest form considering the smallest number of characters is:
++*(int*)0;
Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault. (Since C99 it includes the signal number SIGSEGV, but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV), which as discussed in other answers doesn't count.)
Therefore, there is no "strictly conforming" program (i.e. program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.
Segmentation faults are defined by a different standard, POSIX. This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" (SIGBUS), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf, posix_memalign and mprotect succeed. My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming.
#define _XOPEN_SOURCE 700
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(void)
{
size_t pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) {
fprintf(stderr, "sysconf: %s\n", strerror(errno));
return 1;
}
void *page;
int err = posix_memalign(&page, pagesize, pagesize);
if (err || !page) {
fprintf(stderr, "posix_memalign: %s\n", strerror(err));
return 1;
}
if (mprotect(page, pagesize, PROT_NONE)) {
fprintf(stderr, "mprotect: %s\n", strerror(errno));
return 1;
}
*(long *)page = 0xDEADBEEF;
return 0;
}
It's hard to define a method to segmentation fault a program on undefined platforms. A segmentation fault is a loose term that is not defined for all platforms (eg. simple small computers).
Considering only the operating systems that support processes, processes can receive notification that a segmentation fault occurred.
Further, limiting operating systems to 'unix like' OSes, a reliable method for a process to receive a SIGSEGV signal is kill(getpid(),SIGSEGV)
As is the case in most cross platform problems, each platform may (an usually does) have a different definition of seg-faulting.
But to be practical, current mac, lin and win OSes will segfault on
*(int*)0 = 0;
Further, it's not bad behaviour to cause a segfault. Some implementations of assert() cause a SIGSEGV signal which might produce a core file. Very useful when you need to autopsy.
What's worse than causing a segfault is hiding it:
try
{
anyfunc();
}
catch (...)
{
printf("?\n");
}
which hides the origin of an error and all you've got to go on is:
?
.
Here's another way I haven't seen mentioned here:
int main() {
void (*f)(void);
f();
}
In this case f is an uninitialized function pointer, which causes a segmentation fault when you try to call it.

How can I know how the exit() function work?

I wrote a small program to find how the exit() function works in Linux.
#include <unistd.h>
int main()
{
exit(0);
}
And then I compiled the program with gcc.
gcc -o example -g -static example.c
In gdb, when I set a breakpoint, I got these lines.
Dump of assembler code for function exit:
0x080495a0 <+0>: sub $0x1c,%esp
0x080495a3 <+3>: mov 0x20(%esp),%eax
0x080495a7 <+7>: movl $0x1,0x8(%esp)
0x080495af <+15>: movl $0x80d602c,0x4(%esp)
0x080495b7 <+23>: mov %eax,(%esp)
0x080495ba <+26>: call 0x80494b0 <__run_exit_handlers>
End of assembler dump.
(gdb) b 0x080495a3
Function "0x080495a3" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (0x080495a3) pending.
(gdb) run
Starting program: /home/jack/Documents/overflow/example
[Inferior 1 (process 2299) exited normally]
The program does not stop at the breakpoint. Why? I use -static to compile the program, why does the breakpoint pend until the library loads into the memory?
You're asking gdb to break on a function called 0x080495a3. You'll need to use b *0x080495a3 instead.
(gdb) help break
Set breakpoint at specified line or function.
break [LOCATION] [thread THREADNUM] [if CONDITION]
LOCATION may be a line number, function name, or "*" and an address.
As the help says, The * tells gdb it's an address you want to break on.
From your example:
Function "0x080495a3" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (0x080495a3) pending.
The "pending" means that the breakpoint is waiting until a function called 0x080495a3 is loaded from a shared library.
You might also be interested in break-range:
(gdb) help break-range
Set a breakpoint for an address range.
break-range START-LOCATION, END-LOCATION
where START-LOCATION and END-LOCATION can be one of the following:
LINENUM, for that line in the current file,
FILE:LINENUM, for that line in that file,
+OFFSET, for that number of lines after the current line
or the start of the range
FUNCTION, for the first line in that function,
FILE:FUNCTION, to distinguish among like-named static functions.
*ADDRESS, for the instruction at that address.
The breakpoint will stop execution of the inferior whenever it executes
an instruction at any address within the [START-LOCATION, END-LOCATION]
range (including START-LOCATION and END-LOCATION).
It looks like that you're trying to set a breakpoint at a function named 0x080495a3. Instead try b *0x080495a3 to indicate to GDB that you want to break at a specific address.
0x080495a3 is an address of the line on which you are willing to apply break point. But the format for gdb is b (function name or line number). So You have 2 ways to do this.
1) do an l after your gdb session has started. It will list you the code in C. And then apply a break point using the line number else
2) if you want to use the address, use b *0x080495a3 way to set a break point.
This way you will be able to halt at line
0x080495a3 <+3>: mov 0x20(%esp),%eax

How to get c code to execute hex machine code?

I want a simple C method to be able to run hex bytecode on a Linux 64 bit machine. Here's the C program that I have:
char code[] = "\x48\x31\xc0";
#include <stdio.h>
int main(int argc, char **argv)
{
int (*func) ();
func = (int (*)()) code;
(int)(*func)();
printf("%s\n","DONE");
}
The code that I am trying to run ("\x48\x31\xc0") I obtained by writting this simple assembly program (it's not supposed to really do anything)
.text
.globl _start
_start:
xorq %rax, %rax
and then compiling and objdump-ing it to obtain the bytecode.
However, when I run my C program I get a segmentation fault. Any ideas?
Machine code has to be in an executable page. Your char code[] is in the read+write data section, without exec permission, so the code cannot be executed from there.
Here is a simple example of allocating an executable page with mmap:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
int (*sum) (int, int) = NULL;
// allocate executable buffer
sum = mmap (0, sizeof(code), PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
// copy code to buffer
memcpy (sum, code, sizeof(code));
// doesn't actually flush cache on x86, but ensure memcpy isn't
// optimized away as a dead store.
__builtin___clear_cache (sum, sum + sizeof(sum)); // GNU C
// run code
int a = 2;
int b = 3;
int c = sum (a, b);
printf ("%d + %d = %d\n", a, b, c);
}
See another answer on this question for details about __builtin___clear_cache.
Until recent Linux kernel versions (sometime before 5.4), you could simply compile with gcc -z execstack - that would make all pages executable, including read-only data (.rodata), and read-write data (.data) where char code[] = "..." goes.
Now -z execstack only applies to the actual stack, so it currently works only for non-const local arrays. i.e. move char code[] = ... into main.
See Linux default behavior against `.data` section for the kernel change, and Unexpected exec permission from mmap when assembly files included in the project for the old behaviour: enabling Linux's READ_IMPLIES_EXEC process for that program. (In Linux 5.4, that Q&A shows you'd only get READ_IMPLIES_EXEC for a missing PT_GNU_STACK, like a really old binary; modern GCC -z execstack would set PT_GNU_STACK = RWX metadata in the executable, which Linux 5.4 would handle as making only the stack itself executable. At some point before that, PT_GNU_STACK = RWX did result in READ_IMPLIES_EXEC.)
The other option is to make system calls at runtime to copy into an executable page, or change permissions on the page it's in. That's still more complicated than using a local array to get GCC to copy code into executable stack memory.
(I don't know if there's an easy way to enable READ_IMPLIES_EXEC under modern kernels. Having no GNU-stack attribute at all in an ELF binary does that for 32-bit code, but not 64-bit.)
Yet another option is __attribute__((section(".text"))) const char code[] = ...;
Working example: https://godbolt.org/z/draGeh.
If you need the array to be writeable, e.g. for shellcode that inserts some zeros into strings, you could maybe link with ld -N. But probably best to use -z execstack and a local array.
Two problems in the question:
exec permission on the page, because you used an array that will go in the noexec read+write .data section.
your machine code doesn't end with a ret instruction so even if it did run, execution would fall into whatever was next in memory instead of returning.
And BTW, the REX prefix is totally redundant. "\x31\xc0" xor eax,eax has exactly the same effect as xor rax,rax.
You need the page containing the machine code to have execute permission. x86-64 page tables have a separate bit for execute separate from read permission, unlike legacy 386 page tables.
The easiest way to get static arrays to be in read+exec memory was to compile with gcc -z execstack. (Used to make the stack and other sections executable, now only the stack).
Until recently (2018 or 2019), the standard toolchain (binutils ld) would put section .rodata into the same ELF segment as .text, so they'd both have read+exec permission. Thus using const char code[] = "..."; was sufficient for executing manually-specified bytes as data, without execstack.
But on my Arch Linux system with GNU ld (GNU Binutils) 2.31.1, that's no longer the case. readelf -a shows that the .rodata section went into an ELF segment with .eh_frame_hdr and .eh_frame, and it only has Read permission. .text goes in a segment with Read + Exec, and .data goes in a segment with Read + Write (along with the .got and .got.plt). (What's the difference of section and segment in ELF file format)
I assume this change is to make ROP and Spectre attacks harder by not having read-only data in executable pages where sequences of useful bytes could be used as "gadgets" that end with the bytes for a ret or jmp reg instruction.
// TODO: use char code[] = {...} inside main, with -z execstack, for current Linux
// Broken on recent Linux, used to work without execstack.
#include <stdio.h>
// can be non-const if you use gcc -z execstack. static is also optional
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3"; // xor eax,eax ; ret
// the compiler will append a 0 byte to terminate the C string,
// but that's fine. It's after the ret.
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// run code
int c = sum (2, 3);
return ret0();
}
On older Linux systems: gcc -O3 shellcode.c && ./a.out (Works because of const on global/static arrays)
On Linux before 5.5 (or so) gcc -O3 -z execstack shellcode.c && ./a.out (works because of -zexecstack regardless of where your machine code is stored). Fun fact: gcc allows -zexecstack with no space, but clang only accepts clang -z execstack.
These also work on Windows, where read-only data goes in .rdata instead of .rodata.
The compiler-generated main looks like this (from objdump -drwC -Mintel). You can run it inside gdb and set breakpoints on code and ret0_code
(I actually used gcc -no-pie -O3 -zexecstack shellcode.c hence the addresses near 401000
0000000000401020 <main>:
401020: 48 83 ec 08 sub rsp,0x8 # stack aligned by 16 before a call
401024: be 03 00 00 00 mov esi,0x3
401029: bf 02 00 00 00 mov edi,0x2 # 2 args
40102e: e8 d5 0f 00 00 call 402008 <code> # note the target address in the next page
401033: 48 83 c4 08 add rsp,0x8
401037: e9 c8 0f 00 00 jmp 402004 <ret0_code> # optimized tailcall
Or use system calls to modify page permissions
Instead of compiling with gcc -zexecstack, you can instead use mmap(PROT_EXEC) to allocate new executable pages, or mprotect(PROT_EXEC) to change existing pages to executable. (Including pages holding static data.) You also typically want at least PROT_READ and sometimes PROT_WRITE, of course.
Using mprotect on a static array means you're still executing the code from a known location, maybe making it easier to set a breakpoint on it.
On Windows you can use VirtualAlloc or VirtualProtect.
Telling the compiler that data is executed as code
Normally compilers like GCC assume that data and code are separate. This is like type-based strict aliasing, but even using char* doesn't make it well-defined to store into a buffer and then call that buffer as a function pointer.
In GNU C, you also need to use __builtin___clear_cache(buf, buf + len) after writing machine code bytes to a buffer, because the optimizer doesn't treat dereferencing a function pointer as reading bytes from that address. Dead-store elimination can remove the stores of machine code bytes into a buffer, if the compiler proves that the store isn't read as data by anything. https://codegolf.stackexchange.com/questions/160100/the-repetitive-byte-counter/160236#160236 and https://godbolt.org/g/pGXn3B has an example where gcc really does do this optimization, because gcc "knows about" malloc.
(And on non-x86 architectures where I-cache isn't coherent with D-cache, it actually will do any necessary cache syncing. On x86 it's purely a compile-time optimization blocker and doesn't expand to any instructions itself.)
Re: the weird name with three underscores: It's the usual __builtin_name pattern, but name is __clear_cache.
My edit on #AntoineMathys's answer added this.
In practice GCC/clang don't "know about" mmap(MAP_ANONYMOUS) the way they know about malloc. So in practice the optimizer will assume that the memcpy into the buffer might be read as data by the non-inline function call through the function pointer, even without __builtin___clear_cache(). (Unless you declared the function type as __attribute__((const)).)
On x86, where I-cache is coherent with data caches, having the stores happen in asm before the call is sufficient for correctness. On other ISAs, __builtin___clear_cache() will actually emit special instructions as well as ensuring the right compile-time ordering.
It's good practice to include it when copying code into a buffer because it doesn't cost performance, and stops hypothetical future compilers from breaking your code. (e.g. if they do understand that mmap(MAP_ANONYMOUS) gives newly-allocated anonymous memory that nothing else has a pointer to, just like malloc.)
With current GCC, I was able to provoke GCC into really doing an optimization we don't want by using __attribute__((const)) to tell the optimizer sum() is a pure function (that only reads its args, not global memory). GCC then knows sum() can't read the result of the memcpy as data.
With another memcpy into the same buffer after the call, GCC does dead-store elimination into just the 2nd store after the call. This results in no store before the first call so it executes the 00 00 add [rax], al bytes, segfaulting.
// demo of a problem on x86 when not using __builtin___clear_cache
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
__attribute__((const)) int (*sum) (int, int) = NULL;
// copy code to executable buffer
sum = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (sum, code, sizeof(code));
//__builtin___clear_cache(sum, sum + sizeof(code));
int c = sum (2, 3);
//printf ("%d + %d = %d\n", a, b, c);
memcpy(sum, (char[]){0x31, 0xc0, 0xc3, 0}, 4); // xor-zero eax, ret, padding for a dword store
//__builtin___clear_cache(sum, sum + 4);
return sum(2,3);
}
Compiled on the Godbolt compiler explorer with GCC9.2 -O3
main:
push rbx
xor r9d, r9d
mov r8d, -1
mov ecx, 34
mov edx, 7
mov esi, 4
xor edi, edi
sub rsp, 16
call mmap
mov esi, 3
mov edi, 2
mov rbx, rax
call rax # call before store
mov DWORD PTR [rbx], 12828721 # 0xC3C031 = xor-zero eax, ret
add rsp, 16
pop rbx
ret # no 2nd call, CSEd away because const and same args
Passing different args would have gotten another call reg, but even with __builtin___clear_cache the two sum(2,3) calls can CSE. __attribute__((const)) doesn't respect changes to the machine code of a function. Don't do it. It's safe if you're going to JIT the function once and then call many times, though.
Uncommenting the first __clear_cache results in
mov DWORD PTR [rax], -1019804531 # lea; ret
call rax
mov DWORD PTR [rbx], 12828721 # xor-zero; ret
... still CSE and use the RAX return value
The first store is there because of __clear_cache and the sum(2,3) call. (Removing the first sum(2,3) call does let dead-store elimination happen across the __clear_cache.)
The second store is there because the side-effect on the buffer returned by mmap is assumed to be important, and that's the final value main leaves.
Godbolt's ./a.out option to run the program still seems to always fail (exit status of 255); maybe it sandboxes JITing? It works on my desktop with __clear_cache and crashes without.
mprotect on a page holding existing C variables.
You can also give a single existing page read+write+exec permission. This is an alternative to compiling with -z execstack
You don't need __clear_cache on a page holding read-only C variables because there's no store to optimize away. You would still need it for initializing a local buffer (on the stack). Otherwise GCC will optimize away the initializer for this private buffer that a non-inline function call definitely doesn't have a pointer to. (Escape analysis). It doesn't consider the possibility that the buffer might hold the machine code for the function unless you tell it that via __builtin___clear_cache.
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
// can be non-const if you want, we're using mprotect
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3";
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// hard-coding x86's 4k page size for simplicity.
// also assume that `code` doesn't span a page boundary and that ret0_code is in the same page.
uintptr_t page = (uintptr_t)code & -4095ULL; // round down
mprotect((void*)page, 4096, PROT_READ|PROT_EXEC|PROT_WRITE); // +write in case the page holds any writeable C vars that would crash later code.
// run code
int c = sum (2, 3);
return ret0();
}
I used PROT_READ|PROT_EXEC|PROT_WRITE in this example so it works regardless of where your variable is. If it was a local on the stack and you left out PROT_WRITE, call would fail after making the stack read only when it tried to push a return address.
Also, PROT_WRITE lets you test shellcode that self-modifies, e.g. to edit zeros into its own machine code, or other bytes it was avoiding.
$ gcc -O3 shellcode.c # without -z execstack
$ ./a.out
$ echo $?
0
$ strace ./a.out
...
mprotect(0x55605aa3f000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
exit_group(0) = ?
+++ exited with 0 +++
If I comment out the mprotect, it does segfault with recent versions of GNU Binutils ld which no longer put read-only constant data into the same ELF segment as the .text section.
If I did something like ret0_code[2] = 0xc3;, I would need __builtin___clear_cache(ret0_code+2, ret0_code+2) after that to make sure the store wasn't optimized away, but if I don't modify the static arrays then it's not needed after mprotect. It is needed after mmap+memcpy or manual stores, because we want to execute bytes that have been written in C (with memcpy).
You need to include the assembly in-line via a special compiler directive so that it'll properly end up in a code segment. See this guide, for example: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
Your machine code may be all right, but your CPU objects.
Modern CPUs manage memory in segments. In normal operation, the operating system loads a new program into a program-text segment and sets up a stack in a data segment. The operating system tells the CPU never to run code in a data segment. Your code is in code[], in a data segment. Thus the segfault.
This will take some effort.
Your code variable is stored in the .data section of your executable:
$ readelf -p .data exploit
String dump of section '.data':
[ 10] H1À
H1À is the value of your variable.
The .data section is not executable:
$ readelf -S exploit
There are 30 section headers, starting at offset 0x1150:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[...]
[24] .data PROGBITS 0000000000601010 00001010
0000000000000014 0000000000000000 WA 0 0 8
All 64-bit processors I'm familiar with support non-executable pages natively in the pagetables. Most newer 32-bit processors (the ones that support PAE) provide enough extra space in their pagetables for the operating system to emulate hardware non-executable pages. You'll need to run either an ancient OS or an ancient processor to get a .data section marked executable.
Because these are just flags in the executable, you ought to be able to set the X flag through some other mechanism, but I don't know how to do so. And your OS might not even let you have pages that are both writable and executable.
You may need to set the page executable before you may call it.
On MS-Windows, see the VirtualProtect -function.
URL: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898%28v=vs.85%29.aspx
Sorry, I couldn't follow above examples which are complicated.
So, I created an elegant solution for executing hex code from C.
Basically, you could use asm and .word keywords to place your instructions in hex format.
See below example:
asm volatile(".rept 1024\n"
CNOP
".endr\n");
where CNOP is defined as below:
#define ".word 0x00010001 \n"
Basically, c.nop instruction was not supported by my current assembler. So, I defined CNOP as the hex equivalent of c.nop with proper syntax and used inside asm, with which I was aware of.
.rept <NUM> .endr will basically, repeat the instruction NUM times.
This solution is working and verified.

Resources