I am testing an assembler I am writing which generates X86 instructions. I would like to do something like this to test whether the instructions work or not.
#include<stdio.h>
unsigned char code[2] = {0xc9, 0xc3};
int main() {
void (*foo)();
foo = &code;
foo();
return 0;
}
However it seems that OS X is preventing this due to DEP. Is there a way to either (a) disable DEP for this program or (b) enter the bytes in another format such that I can jump to them.
If you just need to test, try this instead, it's magic...
const unsigned char code[2] = {0xc9, 0xc3};
^^^^^
The const keyword causes the compiler to place it in the const section (warning! this is an implementation detail!), which is in the same segment as the text section. The entire segment should be executable. It is probably more portable to do it this way:
__attribute__((section("text"))
const unsigned char code[2] = {0xc9, 0xc3};
And you can always do it in an assembly file,
.text
.globl code
code:
.byte 0xc9
.byte 0xc3
However: If you want to change the code at runtime, you need to use mprotect. By default, there are no mappings in memory with both write and execute permissions.
Here is an example:
#include <stdlib.h>
#include <sys/mman.h>
#include <err.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
unsigned char *p = malloc(4);
int r;
// This is x86_64 code
p[0] = 0x8d;
p[1] = 0x47;
p[2] = 0x01;
p[3] = 0xc3;
// This is hackish, and in production you should do better.
// Casting 4095 to uintptr_t is actually necessary on 64-bit.
r = mprotect((void *) ((uintptr_t) p & ~(uintptr_t) 4095), 4096,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (r)
err(1, "mprotect");
// f(x) = x + 1
int (*f)(int) = (int (*)(int)) p;
return f(1);
}
The mprotect specification states that its behavior is undefined if the memory was not originally mapped with mmap, but you're testing, not shipping, so just know that it works just fine on OS X because the OS X malloc uses mmap behind the scenes (exclusively, I think).
Don't know about your DEP on OSX, but another thing you could do would be to malloc() the memory you write the code to and then jump into this malloc'ed area. At least on Linux this memory would not be exec-protected (and in fact that's how a JIT usually does the trick).
Related
This question already has answers here:
How to get c code to execute hex machine code?
(7 answers)
Closed 2 years ago.
I have the following assembly function (shown with objdump already)
0000000000000000 <add>:
0: b8 06 00 00 00 mov $0x6,%eax
5: c3 retq
Now in C I made the following code:
#include <stdio.h>
typedef int (*funcp) (int x);
unsigned char foo[] = {0xb8,0x06,0x00,0x00,0x00,0xc3};
int main(void)
{
int i;
funcp f = (funcp)foo;
i = (*f);
printf("exit = %d\n", i);
return 0;
}
In the global variable foo I typed the memory address of my function in assembly and tried to execute it but it does not return 6 as expected.
How can I execute functions for their memory addresses? furthermore, where can i research more on the subject?
obs: sometimes I got the Segmentation fault (core dumped) error
The NX flag might be your 'friend' here. Parts of memory which are never meant to be executed as binary machine code can be marked as No-eXecute. See https://en.wikipedia.org/wiki/NX_bit . So, depending on architecture, operating system and settings, and even BIOS settings.
So this feature might be on or off. If NX is used on the data-section of your program, it will not run. You will need to mmap() a piece of memory with PROT_EXEC set, copy the data in, then run it.
For the following, I changed the binary to be an amd64 code (+1 func). When using the mmap() copy, it works. When directly calling foo, it fails (on my machine with NX active)
(code without err-check, freeing of mem, etc)
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
typedef int (*funcp) (int x);
unsigned char foo[] = {0x8d,0x47,0x01,0xc3};
//unsigned char foo[] = {0xc3,0xb8,0x06,0x00,0x00,0x00,0xc3};
int main(void)
{
int i;
void *mem2;
mem2 = mmap(0,4096,PROT_WRITE|PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_ANONYMOUS|MAP_EXECUTABLE,-1,0);
memcpy(mem2,foo,sizeof(foo));
funcp f = (funcp)mem2;
i = f(42);
printf("exit = %d\n", i);
return 0;
}
I am calling two functions on my char* s = "pratik" as:
User code:
#include <zlib.h>
int main()
{
char *s = "pratik";
printf("%x\n",crc32(0x80000000, s, strlen(s)));
return 0;
}
Output:
66fa3c99
Kernel code:
#include <linux/crc32.h>
int main()
{
char *s = "pratik";
u32 checksum = crc32(0x80000000, s, strlen(s));
printk("\nChecksum --> %x", checksum);
return checksum;
}
Output:
Checksum --> d7389d3a
Why are the values of the checksums on the same strings different?
It appears that someone was disturbed by the fact that the standard Ethernet (PKZIP, ITU V.42 etc. etc.) CRC-32 does a pre- and post-exclusive-or with 0xffffffff. So the version in the Linux kernel leaves that out, and expects the application to do that. Go figure.
Anyway, you can get the same result as the (correct) zlib crc32(), using the (non-standard) Linux crc32() instead, thusly:
crc_final = crc32(crc_initial ^ 0xffffffff, buf, len) ^ 0xffffffff;
In fact, that exact same code would allow you to duplicate the Linux crc32() using the zlib crc32() as well.
i need to calculate statically the address of the first page that contains the text segment of an elf, in order to use mprotect() and make the text segment writable.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
..
[14] .text PROGBITS 08048380 000380 0002e0 00 AX 0 0 128
Any ideas?
How about this program, which compiles normally and does not crash.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/mman.h>
extern char __executable_start;
extern char __etext;
int
main (int argc, char **argv)
{
int pagesize = sysconf (_SC_PAGE_SIZE);
char *start =
(char *) (((uintptr_t) & __executable_start) & ~(pagesize - 1));
char *end =
(char *) (((uintptr_t) & __etext + pagesize - 1) & ~(pagesize - 1));
mprotect (start, end - start, PROT_READ | PROT_WRITE | PROT_EXEC);
printf ("Hello world\n");
void *m = main;
*((char *) m) = 0;
exit (0);
}
I've used __executable_start and __etext, but you might be better seeing if you can get these to work, which are at least documented in man pages:
NAME
`etext`, `edata`, `end` - end of program segments
SYNOPSIS
extern etext;
extern edata;
extern end;
DESCRIPTION
The addresses of these symbols indicate the end of various program segments:
`etext` This is the first address past the end of the text segment (the program
code).
`edata` This is the first address past the end of the initialized data segment.
`end` This is the first address past the end of the uninitialized data
segment (also known as the BSS segment).
CONFORMING TO
Although these symbols have long been provided on most UNIX systems, they are
not standardized; use with caution.
I write a sample code on x86_64,try to execute dynamiclly malloc code.
there is a
Program received signal SIGSEGV, Segmentation fault.
0x0000000000601010 in ?? ()
0x0000000000601010 is the position of bin,someone can tell why? thanks!!
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include <sys/mman.h>
volatile int sum(int a,int b)
{
return a+b;
}
int main(int argc, char **argv)
{
char* bin = NULL;
unsigned int len = 0;
int ret = 0;
/*code_str is the compiled code for function sum.*/
char code_str[] ={0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x89,
0x75,0xf8,0x8b,0x45,0xf8,0x03,0x45,0xfc,0xc9,0xc3};
len = sizeof(code_str)/sizeof(char);
bin = (char*)malloc(len);
memcpy(bin,code_str,len);
mprotect(bin,len , PROT_EXEC | PROT_READ | PROT_WRITE);
asm volatile ("mov $0x2,%%esi \n\t"
"mov $0x8,%%edi \n\t"
"mov %1,%%rbx \n\t"
"call *%%rbx "
:"=a"(ret)
:"g"(bin)
:"%rbx","%esi","%edi");
printf("sum:%d\n",ret);
return 0;
}
Never do such tricks without checking the return of system functions. My man page for mprotect says in particular:
POSIX says that the behavior of mprotect() is unspecified if it
is applied to a region of memory that was not obtained via mmap(2).
so don't do that with malloced buffers.
Also:
The buffer size is just sizeof(code_str), there is no reason to divide by sizeof(char) (which is guaranteed to be 1, but that doesn't make it correct).
There's no need to cast the return of malloc (nor mmap if you change it to that).
The correct type for code_str is unsigned char and not char.
the question is that bin address should align to multiple PAGESIZE,or mprotect will return -1,arguments invalid.
bin = (char *)(((int) bin + PAGESIZE-1) & ~(PAGESIZE-1));//added....
memcpy(bin,code_str,len);
if(mprotect(bin, len , PROT_EXEC |PROT_READ | PROT_WRITE) == -1)
{
printf("mprotect error:%d\n",errno);
return 0;
}
When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!